Last updated: May 29, 2026
Application No. 18/318,380
METHOD AND SYSTEM FOR ADVERSARIAL MULTI-ARCHITECTURE BASED DELAY PREDICTION IN SCHEDULED TRANSPORTATION NETWORKS

Non-Final OA §103
Filed
May 16, 2023
Priority
Jun 13, 2022 — IN 202221033795
Examiner
RYLANDER, BART I
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Tata Consultancy Services Limited
OA Round
1 (Non-Final)
Interview Optional

— +13.6% interview lift. Interview lift (+13.6%) is below the 15.0% threshold. A written response is recommended.
Based on 117 resolved cases, 2023–2026
Examiner Intelligence

RYLANDER, BART I View full profile →
Grants 65% — above average
Career Allowance Rate
76 granted / 117 resolved
+10.0% vs TC avg
Moderate +14% lift
Without
With
+13.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
15 currently pending
Career history
142
Total Applications
across all art units
Statute-Specific Performance

§101
2.0%
-38.0% vs TC avg
§103
95.1%
+55.1% vs TC avg
§102
1.5%
-38.5% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 117 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 5/16/2023.
Claims 1-16 are presented for examination.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-16 are rejected under 35 U.S.C. § 103 as being unpatentable over Haggiag, et al (WO2017/098494 A1, Robust Dynamic Time Scheduling and Planning), Laifa, et al (Predicting Trains Delays using a Two-level Machine Learning Approach, herein Laifa), and Saxena, et al (D-GAN: Deep Generative Adversarial Nets for Spatio-Temporal Prediction, herein Saxena).
Regarding claim 1,
	Haggiag teaches a processor implemented method (Haggiag, FIG. 1, and, Claim 1, “A robust dynamic scheduling and planning comprising: (a) a client interface; (b) an offline and/or real time data processor for creating a prediction…”

    PNG
    media_image1.png
    682
    556
    media_image1.png
    Greyscale


In other words, robust dynamic scheduling and planning is method and data processor is processor.) , the method comprising: 
	receiving, by one or more hardware processors, a user query with respect to an expected delay of a target vehicle pertaining to a scheduled transportation network, in at least one target station  (Haggiag, page 17, paragraph 1, line 1 “Preferably, client interface 12 displays alternatives 44 to be selected by a dispatcher 50, controller 22 or equivalent thereof. Preferably, dispatcher 50 chooses whether to accept one or none of alternatives 44 according to the expected impact 42 and/or nature of the delay and initiates execution of new schedule 46 selected.” And, page 17, paragraph 4, line 1 “It is envisaged that predictions 36 should be with high probability of substantially above 50% way in advance to have time notifying all the relevant controllers 22 and transportation means 15 about their changes due to new schedule 46.” In other words, client is user, expected impact…of the delay is expected delay, schedule is schedule, and dispatcher chooses…accept….alternative is receiving a user query with respect to an expected delay to a scheduled transportation.) , wherein 
	the scheduled transportation networks include [a plurality of stations] and a plurality of vehicles (Haggiag, page 2, paragraph 3, line 1 “Invariably, such events bring about an undesired result of at least one vehicle and/or driver are not to being able to meet with a planned schedule for that vehicle of a fleet of vehicles.” In other words, planned schedule for that vehicle is scheduled transportation, and fleet of vehicles is plurality of vehicles.);
	obtaining, by the one or more hardware processors, a real time data associated with the user query in a predefined horizon using a real time data acquisition system (Haggiag, page 8, paragraph 2, line 1 “Robust dynamic scheduling and planning 10 preferably also includes a real-time data listener 24 and a prediction engine 26.” And, page 10, paragraph 4, line 1 “Preferably, prediction engine 26 predicts in high accuracy the probabilities for a trip to be completed within a certain time frame.”  In other words, real-time data is real time data, real time data listener is real time data acquisition system, and a certain time frame is predefined horizon.) ;
	extracting, by the one or more hardware processors, a plurality of spatial features based on the real time data, the [plurality of stations] in the scheduled transportation networks, the plurality of vehicles in the scheduled transportation networks, and the predefined horizon, using a feature extraction technique (Haggiag, page 11, paragraph 2, line 1 “By way of an example only, feature extraction module 62 extracts at least one of the  following features from each measurement: a temporal feature such as day of week, month, season, holiday, time, hour, time period (morning/evening/afternoon), speed, a spatial features such as route identification, route sign, route direction, origin location, destination location,
major locations in-route, a demand feature such as expected passenger load for the day, expected passenger load for the route, expected passenger load for the time , expected passenger load for the trip, a vehicle feature such as vehicle type, average vehicle speed, max vehicle speed, vehicle passenger capacity, average vehicle capacity and a relative feature such as time of previous trip of the same route and direction, time in other direction, speed of routes of the same vehicle type and speed of routes with the same expected passenger capacity.” In other words, spatial features is spatial features, from prior mapping real time data is real time data, temporal feature such as a day or week, month, etc., is predefined horizon, and, feature extraction module is using a feature extraction technique.) ;
	extracting, by the one or more hardware processors, a plurality of temporal features based on the real time data, the [plurality of stations], and the predefined horizon, using the feature extraction technique, wherein the plurality of temporal features are segmented based on a plurality of vehicle categories (Haggiag, see above mapping.  In other words, temporal feature is temporal features, feature extraction module is feature extraction technique, and vehicle feature such as vehicle type, etc. is plurality of vehicle categories.) , wherein the plurality of vehicle categories comprises 
	a route based vehicle, a trip based vehicle, and a current vehicle (Haggiag, FIG. 1, and See above mapping. In other words, vehicle is vehicle, spatial features such as route is route based, previous trip is trip based, and vehicle type is current vehicle.) ;
	extracting, by the one or more hardware processors, a plurality of spatiotemporal features based on the [plurality of stations], the plurality of vehicles, and the real time data, using the feature extraction technique (Haggiag, See prior mapping. And, page 11, paragraph 2, line 1 “By way of an example only, feature extraction module 62 extracts at least one of the following features from each measurement: a temporal feature such as day of week, month, season, holiday, time, hour, time period (morning/evening/afternoon), speed, a spatial features such as route identification, route sign, route direction, origin location, destination location,
major locations in-route, a demand feature such as expected passenger load for the day, expected passenger load for the route, expected passenger load for the time , expected passenger load for the trip, a vehicle feature such as vehicle type, average vehicle speed, max vehicle speed, vehicle passenger capacity, average vehicle capacity and a relative feature such as time of previous trip of the same route and direction, time in other direction, speed of routes of the same vehicle type and speed of routes with the same expected passenger capacity. In other words, vehicle speed is spatiotemporal features, vehicle type is plurality of vehicles, real time data is real time data, and feature extraction module is feature extraction technique.) ; and
	predicting, by the one or more hardware processors, the expected delay of the target vehicle in at least one target station based on the spatial features, the temporal features and the spatiotemporal features using a [trained adversarial regression model] (Haggiag, See prior mapping, and, page 15, paragraph 4, line 1 “Since the actual travel time depends on predictions, there is some uncertainty in the cost. In addition, constraints uses the travel time as parameters. By way of example only, a transportation means 15 and/or controller 22 cannot perform the trip that starts at 9:00 after a trip that ends at 9: 10, and due to uncertainty in travel time there is uncertainty in the feasibility of the schedule. Thus, optimization engine 46 uses prediction engine 26 to incorporate the uncertainty in travel time to the schedule, in the following way: Each trip is scored using prediction engine 26 to get probability for several travel time buckets; Occasioning on evaluating each pair of trips connection feasibility, a penalty is added based on the following formula: 
Penalty ( trip 1, trip2) = Probability( trip2 starts before trip 1 ) X Penalty(delay)
wherein penalty (delay) is a function that receives the expected delay between the trips, and penalizes connections with higher delay. Exponential function is used to penalize in higher ratios higher delays. When the delay crosses off a predefined threshold the connection is allowed only if the probability of the delay is low enough.” In other words, predict is predict, expected delay is expected delay, and from prior mapping, spatial features and temporal features are spatial features and temporal features.) , wherein 
	[the trained adversarial regression model comprises a critic network and a regressor network, wherein the regressor network is trained using a plurality of regression architectures, wherein  the plurality of regression architectures are trained simultaneously and, wherein one architecture with a minimum Mean Absolute Error (MAE) is selected from among the plurality of regression architectures is selected for the expected delay prediction].  
Thus far, Haggiag does not explicitly teach plurality of stations.
Laifa teaches plurality of stations (Laifa, page 739, column 2, paragraph 1, line 21 “The first stage predicts the total buffer time of delayed trains in sections and stations, and the second stage predicts the recovery time of primary delay based on the first stage results.” In other words, predicts …time of delayed trains in …stations is plurality of stations.)
Both Haggiag and Laifa are directed to using machine learning to predict transportation delays, among other things. Haggiag teaches a processor implemented method, the method comprising receiving, by one or more hardware processors, a user query with respect to an expected delay of a target vehicle pertaining to a scheduled transportation network, wherein  the scheduled transportation networks include a plurality of vehicles; but does not explicitly teach a plurality of stations.  Laifa teaches predicting delays to include a plurality of stations. 
In view of the teaching of Haggiag, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Laifa into Haggiag.  This would result in a processor implemented method, the method comprising  receiving, by one or more hardware processors, a user query with respect to an expected delay of a target vehicle pertaining to a scheduled transportation network, in at least one target station, wherein the scheduled transportation networks include a plurality of stations and a plurality of vehicles.
One of ordinary skill in the art would be motivated to do this order to more accurately predict delays, thus saving time and money. (Laifa, page 737, column 2, paragraph 3, line 1 “A late train is likely to propagate its delay with other trains. Thus, managing these delays (rescheduling) allows traffic managers to change the direction of trains to use the rail network appropriately. In this context, delay prediction is one of the most significant challenges to improving traffic management and dispatch.”)
Thus far, the combination of Haggiag and Laifa does not explicitly teach a trained adversarial regression model or the trained adversarial regression model comprises a critic network and a regressor network, wherein the regressor network is trained using a plurality of regression architectures, wherein the plurality of regression architectures are trained simultaneously and, wherein  one architecture with a minimum Mean Absolute Error (MAE) is selected from among the plurality of regression architectures is selected for the expected delay prediction.  
Saxena teaches a trained adversarial regression model and the trained adversarial regression model comprises a critic network and a regressor network, wherein the regressor network is trained using a plurality of regression architectures, wherein the plurality of regression architectures are trained simultaneously and, wherein  one architecture with a minimum Mean Absolute Error (MAE) is selected from among the plurality of regression architectures is selected for the expected delay prediction (Saxena, Fig. 2, Equations (8) and (9), and, page 2, column  2, paragraph 2, subparagraph 1 “We propose a novel deep generative adversarial network based model (named, D-GAN) to deeply capture the underlying ST data distribution implicitly for more accurate ST prediction. Different from the existing deep models for ST prediction, D-GAN combines GAN and VAE and jointly learns generation and variational inference of ST data in an unsupervised manner. This, to the best of our knowledge, has not been done in prior ST prediction research.” and, page 2, column 2, paragraph 3, line 1 “Our D-GAN model uses the GAN framework to generate the next ST map from a sequence of given ST maps. GANs use the concept of a non-cooperative game in which two networks, a generator (G) and a discriminator (D), are trained to play against each other.” And, page 4, column 2, paragraph 6, line 1 “We formulate the demand prediction as a ST prediction problem in which both the input and the prediction target are ST sequences. The main aim of this task is to learn an accurate model to predict the total number of requests for a particular service in each grid of ST map during each time slot where a time slot can be an hour, or a day, or a week. We use two largescale datasets collected at NYC city: Yellow Taxi dataset (Taxi) [15] which contains requests from 01/01/2016 to 30/06/2016 and CitiBike trip dataset (Bike) [16] which contains requests from 01/01/2016 to 31/01/2016 for the demand prediction. We represent a city as ST map where 𝑙s =<𝑙𝑎𝑡\, 𝑙𝑜𝑛𝑔\> and 𝑙e = <𝑙𝑎𝑡|, 𝑙𝑜𝑛𝑔|> denote the start 𝑠 and end 𝑒 location coordinates of a city, respectively. The square area clustered into 9×9 non-overlapping regions which represents as a ST map.” And, page 5, column 1, paragraph graph 2, line 1 “We use both Rooted Mean Square Error (RMSE) and Mean Absolute Error (MAE) as the evaluation metrics to
evaluate the performance of D-GAN:

    PNG
    media_image2.png
    110
    418
    media_image2.png
    Greyscale

Where 
    PNG
    media_image3.png
    24
    18
    media_image3.png
    Greyscale
  and 𝑥i are the prediction value and real value of ith ST map, and 𝑚𝑛 is total number of regions in a ST map.” And, page 5, column 2, paragraph 5, line 5 “Regression methods, such as Linear Regression (LR), Random Forest (RF), XGBoost, and MLP achieve better performance than previous ones but still these methods are not able to capture ST correlations in the ST data. We further extend our model comparison with the well-known deep learning methods, such as LSTM and CNN. LSTM cannot model spatial correlations while CNN cannot capture temporal dynamics.” Examiner notes that the term “critic” is typically used in an adversarial model that is used for regression, whereas the term “discriminator” is typically used if the adversarial model is being used for classification. Both are implemented as neural networks.  In this instance, Saxena uses the phrase “discriminator” even though the overall algorithm is being used to make predictions instead of classifying. Since Saxena is being used to make predictions, and the claims are directed to making predictions, examiner is interpreting that “discriminator” performs the same function as “critic”.

    PNG
    media_image4.png
    448
    1092
    media_image4.png
    Greyscale

In other words, D-GAN is adversarial regression model, trained is trained, discriminator is critic, from Fig. 2, MLPs is plurality of trained regression architectures, and we use mean absolute error (MAE) as the evaluation metrics is minimum mean absolute error.).
Both Saxena and the combination of Haggiag and Laifa are directed to using machine learning to predict transportation delays, among other things. The combination of Haggiag and Laifa teaches a processor implemented method, the method comprising receiving, by one or more hardware processors, a user query with respect to an expected delay of a target vehicle pertaining to a scheduled transportation network, in at least one target station, wherein 
the scheduled transportation networks include a plurality of stations and a plurality of vehicles, obtaining, by the one or more hardware processors, a real time data associated with the user query in a predefined horizon using a real time data acquisition system, extracting, by the one or more hardware processors, a plurality of spatial features based on the real time data, the plurality of stations in the scheduled transportation networks, the plurality of vehicles in the scheduled transportation networks, and the predefined horizon, using a feature extraction technique, extracting, by the one or more hardware processors, a plurality of temporal features based on the real time data, the plurality of stations, and the predefined horizon, using the feature extraction technique, wherein the plurality of temporal features are segmented based on a plurality of vehicle categories, wherein the plurality of vehicle categories comprises a route based vehicle, a trip based vehicle, and a current vehicle, extracting, by the one or more hardware processors, a plurality of spatiotemporal features based on the plurality of stations, the plurality of vehicles, and the real time data, using the feature extraction technique, and predicting, by the one or more hardware processors, the expected delay of the target vehicle in at least one target station based on the spatial features, the temporal features and the spatiotemporal features; but does not explicitly teach using a trained adversarial regression model, wherein the trained adversarial regression model comprises a critic network and a regressor network, wherein the regressor network is trained using a plurality of regression architectures, wherein the plurality of regression architectures are trained simultaneously and, wherein  one architecture with a minimum Mean Absolute Error (MAE) is selected from among the plurality of regression architectures is selected for the expected delay prediction.  Saxena teaches using a trained adversarial regression model, wherein the trained adversarial regression model comprises a critic network and a regressor network, wherein the regressor network is trained using a plurality of regression architectures, wherein the plurality of regression architectures are trained simultaneously and, wherein  one architecture with a minimum Mean Absolute Error (MAE) is selected from among the plurality of regression architectures is selected for the expected delay prediction.
	In view of the teaching of the combination of Haggiag and Laifa, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Saxena into the combination of Haggiag and Laifa. This would result in a processor implemented method, the method comprising receiving, by one or more hardware processors, a user query with respect to an expected delay of a target vehicle pertaining to a scheduled transportation network, in at least one target station, wherein the scheduled transportation networks include a plurality of stations and a plurality of vehicles, obtaining, by the one or more hardware processors, a real time data associated with the user query in a predefined horizon using a real time data acquisition system, extracting, by the one or more hardware processors, a plurality of spatial features based on the real time data, the plurality of stations in the scheduled transportation networks, the plurality of vehicles in the scheduled transportation networks, and the predefined horizon, using a feature extraction technique, extracting, by the one or more hardware processors, a plurality of temporal features based on the real time data, the plurality of stations, and the predefined horizon, using the feature extraction technique, wherein the plurality of temporal features are segmented based on a plurality of vehicle categories, wherein the plurality of vehicle categories comprises a route based vehicle, a trip based vehicle, and a current vehicle, extracting, by the one or more hardware processors, a plurality of spatiotemporal features based on the plurality of stations, the plurality of vehicles, and the real time data, using the feature extraction technique, and predicting, by the one or more hardware processors, the expected delay of the target vehicle in at least one target station based on the spatial features, the temporal features and the spatiotemporal features using a trained adversarial regression model, wherein the trained adversarial regression model comprises a critic network and a regressor network, wherein the regressor network is trained using a plurality of regression architectures, wherein the plurality of regression architectures are trained simultaneously and, wherein one architecture with a minimum Mean Absolute Error (MAE) is selected from among the plurality of regression architectures is selected for the expected delay prediction.
	One of ordinary skill in the art would be motivated to do this in order to process spatiotemporal data for applications that are inherently difficult to predict. (Saxena, abstract, line 1 “Spatio-temporal (ST) data for urban applications, such as taxi demand, traffic flow, regional rainfall is inherently stochastic and unpredictable. Recently, deep learning based ST prediction models are proposed to learn the ST characteristics of data. However, it is still very challenging (1) to adequately learn the complex and non-linear ST relationships; (2) to model the high variations in the ST data volumes as it is inherently dynamic, changing over time (i.e.,
irregular) and highly influenced by many external factors, such as adverse weather, accidents, traffic control, PoI, etc.; and (3) as there can be many complicated external factors that can
affect the accuracy and it is impossible to list them explicitly.”)
Regarding claim 2,
	The combination of Haggiag, Laifa, and Saxena teaches the processor implemented method of claim1, wherein  
	the critic network comprises two feed forward layers, wherein the regressor network comprises a plurality of blocks, wherein the plurality of blocks comprises a spatial block for processing the spatial feature vector, atemporal block for processing the temporal feature vector and a spatiotemporal block for processing the spatiotemporal feature vector, wherein the spatial block comprises a plurality of Fully Connected Neural Networks (FCNNs), wherein the temporal block comprises a plurality of Long Short-Term Memory (LSTMs), and wherein the spatiotemporal block comprises a plurality of 3 Dimensional Convolutional Neural Network (Saxena, Fig. 2, Equation 2,and, page 4, column 1, paragraph 2, line 1 “In D-GAN, encoder first processes the real data by multiple stack of ConvLSTM and 3DConvNet, and a multi-layer perceptron (MLP) to produce a condensed feature vector 𝐹𝑉;, i.e., 

    PNG
    media_image5.png
    94
    492
    media_image5.png
    Greyscale

where 𝑥Real is a real data space, 𝐹𝑉; is the extracted feature vector of 𝑥Real, and L is the number of ConvLSTM layers.” In other words, ConvLSTM in the Discriminator is the critic network comprises two feed forward layers, MLPs are regressors, there are plurality of MLP blocks for processing, feature vector is feature vector, MLPs and ConvLSTMs are neural networks that have fully connected layers which is fully connected neural networks, LSTM is LSTM, and from Fig. 2, 3DConvNet is 3 Dimensional Convolutional Neural Network.).  
Regarding claim 3,
	The combination of Haggiag, Laifa, and Saxena teaches the processor implemented method of claim 1, wherein the method of training the adversarial regression model comprises:
	receiving a training dataset further comprising a plurality of samples (Haggiag, page 8, paragraph 5, line 1 “Preferably, dataset 18 including historical data is fed into prediction engine 26, wherein preferably, a multiplicity of data points that are including for readily improving prediction accuracy.” And, page 10, paragraph 4, line 2 “Preferably, dataset 18 including historical data is fed into prediction engine 26, wherein preferably, a multiplicity of data points that are including for readily improving prediction accuracy.” In other words, training is training, and dataset is fed is receiving a training dataset.) ; 
	generating a plurality of features corresponding to each of the plurality of samples associated with the training dataset (Saxena, Fig. 2, In other words, feature restoration of the generator is generating a plurality of features corresponding to each of the plurality of samples in the dataset.), wherein 
the plurality of features comprises the plurality of spatial features, the plurality of temporal features, and the plurality of spatiotemporal features  (Saxena, Fig. 2, and abstract, line 13 “…in this paper, we propose a novel deep generative adversarial network based model (named, D-GAN) for more accurate ST prediction by implicitly learning ST feature representations in an unsupervised manner.” In other words, ST feature representations is spatiotemporal features, temporal features, and spatial features.    ;
	generating a plurality of actual training samples based on the plurality of features and a corresponding actual delay, wherein the actual delay is a historical delay; and  repeatedly performing until a minimum regression loss is obtained (Saxena, equation 1, and, page 2, column 1, paragraph 4, line 8 “D-GAN combines the variational inference with GAN into an unsupervised generative model to model stochastic behavior in the ST data. GAN does not use elementwise similarity measure by construction which gives GANs an ability to predict more accurately. In training time, an inference network estimates the distribution of latent variables, while a generative network learns to reconstruct input space from the latent variables.” And, page 2, column 2, paragraph 4, line 19 “A minimax objective is used to train both G and D models jointly via solving: 

    PNG
    media_image6.png
    72
    476
    media_image6.png
    Greyscale

In other words, the generator generates a plurality of training samples based on the plurality of features, and equation (1) shows repeating until a minimum regression loss is obtained. Examiner notes that actual delay and historical delay is previously mapped in claim 1.) :
	training the critic network associated with the adversarial regression model with the plurality of actual training samples until the critic network is trained to identify a positive sample, wherein the positive sample is associated with a critic score greater than a predefined threshold  (Saxena, Equation (1), page 2, column 2, paragraph 4, line 1 “The goal of G is to generate samples resembling to samples generated from true data distribution, while D’s purpose is to distinguish between samples drawn from G and samples drawn from the true data distribution. D assigns higher probabilities to real data samples and lower probabilities to samples generated by G, where GAN training simultaneously keeps trying to move the generated samples towards the real data manifolds using the gradient information provided by the D.” and, page 3, column 1, paragraph 1, line 8  “A GAN model is well trained when equilibrium is achieved between 𝐷 and 𝐺 , and 𝐷 cannot distinguish whether a sample is generated by the 𝐺 or generated from the real data distribution.” In other words, GAN training simultaneously is training the critic network associated with the adversarial regression model, and D is to distinguish true samples is until the critic is able to identify a positive sample, and when equilibrium is reached is a predefined threshold.);
	computing a predicted delay corresponding to each of the plurality of samples based on a corresponding feature vector (Haggiag,  page 17, paragraph 1, line 1 “Wherein penalty (delay) is a function that receives the expected delay between the trips, and penalizes  connections with higher delay. Exponential function is used to penalize in higher ratios higher delays. When the delay crosses off a predefined threshold the connection is allowed only if the
probability of the delay is low enough.” In other words, expected delay is predicted delay.)  , 
using the regressor network associated with the adversarial regression model (Saxena, Fig. 2, In other words, the MLPs are regressor networks, and D-GAN is adversarial regression model.) ; 
	generating a plurality of estimated training samples based on the corresponding feature vector and a corresponding predicted delay (Saxena, Fig. 2, and page 2, paragraph 4, line 1 “The goal of G is to generate samples resembling to samples generated from true data distribution, while D’s purpose is to distinguish between samples drawn from G and samples drawn from the true data distribution. D assigns higher probabilities to real data samples and lower probabilities to samples generated by G, where GAN training simultaneously keeps trying to move the generated samples towards the real data manifolds using the gradient information
provided by the D.” In other words, G generates samples is generating estimated training samples based on the feature vector.) ; 
training the critic network with the plurality of estimated samples until the critic network is trained to identify a negative sample, wherein the negative sample is associated with the critic score less than the predefined threshold (Saxena, Fig. 2, Eq. 1, and page 2, paragraph 4, line 1 “The goal of G is to generate samples resembling to samples generated from true data distribution, while D’s purpose is to distinguish between samples drawn from G and samples drawn from the true data distribution. D assigns higher probabilities to real data samples and lower probabilities to samples generated by G, where GAN training simultaneously keeps trying to move the generated samples towards the real data manifolds using the gradient information provided by the D.”

    PNG
    media_image7.png
    82
    466
    media_image7.png
    Greyscale

In other words, D is being trained as part of the adversarial process, and D(G(z)) predicts wrong is a negative sample which provides a score to the parameter Θ as an update. ); and 
updating the regressor network based on a gradient associated with the negative sample such that each subsequent negative sample is associated with the critic score greater than the predefined threshold (Saxena, Fig. 2, and, page 2, column 2, paragraph 4, line 4 “D assigns higher probabilities to real data samples and lower probabilities to samples generated by G, where GAN training simultaneously keeps trying to move the generated samples towards the real data manifolds using the gradient information provided by the D.” In other words, D is critic, and move generated samples towards the real data manifolds using the gradient information is updating based on a gradient associated with the negative sample.) .  
Regarding claim 4,
	The combination of Haggiag, Laifa, and Saxena teaches the processor implemented method of claim 1, wherein the plurality of spatial features comprises 
	geospatial information about the at least one target station and a vehicle capacity associated with each of a plurality of stations, wherein the geospatial information comprises 
distance between each of the plurality of stations and the at least one target station (Laifa, Table 1. (Top on left, bottom on right), and, page 740, column 1, paragraph 1, line 1 “The used database is collected from the National Tunisian Railway Company (SNCFT). It consists of 12350 travel samples, from 1.1.2019 to 31/12/2019, including 55 passenger trains and 4 main  destinations (Tunis-Nabeul, Tunis-Sousse, Tunis-Tozeur, Tunis- Sfax).
Table 1: Data features summary.

    PNG
    media_image8.png
    542
    498
    media_image8.png
    Greyscale
    
    PNG
    media_image9.png
    560
    496
    media_image9.png
    Greyscale

In other words, target station is target station, unique code of train is vehicle, and distance is distance between each of the plurality of stations.) .  
Regarding claim 5,
	The combination of Haggiag, Laifa, and Saxena teaches the processor implemented method of claim 1, wherein the plurality of temporal features associated with each of the plurality of vehicles in each of the plurality of stations comprises, 
a scheduled travel time between a previous station and a current station, an actual travel time between the previous station and the current station, a scheduled dwell time, an actual dwell time, an actual interval with a preceding vehicle in the predefined horizon and a scheduled interval with the preceding vehicle in the predefined horizon (Laifa, Table 1. In other words, arrival time minus departure time is scheduled travel time, scheduled arrival time minus departure time plus arrival delay is actual travel time, arrival time minus departure delay is dwell time, and unique code for each train is vehicle, and stations is plurality of stations.) .
Regarding claim 6,
	The combination of Haggiag, Laifa and Saxena teaches the processor implemented method of claim 1, wherein the plurality of spatiotemporal features comprises 
	an arrival delay and a departure delay associated with each of the plurality of vehicles in each of the plurality of stations (Laifa, Table 1, In other words, from the table, arrival delay is arrival delay, departure delay is departure delay, trains is plurality of vehicles, and stations is plurality of stations.) .  
Regarding claim 7,
	The combination of Haggiag, Laifa, and Saxena teaches the processor implemented method of claim 1, wherein 
the plurality of blocks are connected in one of a) a serial fashion and b) a parallel fashion (Saxena, Fig. 2, Examiner notes, the specification of the instant application recites “The regressor network includes a plurality of blocks.” (Specification, paragraph [0049], line 1. ) Based on this, examiner is interpreting “blocks” as steps of a method or algorithm.  In other words, the architecture depicted in Fig. 2 is serial which is one of a serial fashion and a parallel fashion.). 
Regarding claim 8, 
	The combination of Haggiag, Laifa and Saxena teaches the processor implemented method of claim 7, wherein 
the plurality of blocks connected in the serial fashion comprises the plurality of regression architectures obtained using permutation (Saxena, Fig. 2. Examiner notes the specification of the instant application recites  “The plurality of blocks connected in the serial fashion includes a plurality of regression architectures obtained using permutation as shown in FIGS 3A to 3G.” (Specification, paragraph [0049], page 26, line 12.)  There is no other mention of permutation in the specification.  Therefore, examiner is interpreting that “permutation” is merely identifying an order of functions.  However, the Figures identify different orders of functions. For example, FIG. 3D starts with a “spatial block”, whereas FIGs 3F and 3E start with a “temporal block”. This seems a contradiction. The order of execution can’t simultaneously start with a “temporal block” and a “spatial block”. Therefore, examiner is interpreting that the limitation means any particular order of functions.  In other words, Fig. 2 depicts functions in a serial order.)
Claims 9-16 are system claims corresponding to processor implemented method claims 1-8, respectively.  Otherwise, they are not patentably distinct.  The combination of Haggiag, Laifa, and Saxena teaches a system  (Laifa, page 742, column 2, paragraph 1, line 4 “All the experiments were conducted on an I7 3.2 GHz 8-core CPU and 16 GB of memory.” In other words, I7 with CPU and memory, is a system.). Therefore, claims 9-16 are rejected for the same reasons as claims 1-8, respectively.
Claims 17-20 are non-transitory machine readable information storage medium claims corresponding to processor implemented method claims 1, and 3-5, respectively.  Otherwise, they are not patentably distinct.  The combination of Haggiag, Laifa, and Saxena teaches a non-transitory machine readable information storage medium (Laifa, page 742, column 2, paragraph 1, line 4 “All the experiments were conducted on an I7 3.2 GHz 8-core CPU and 16 GB of memory.” In other words, 16 GB of memory is a non-transitory machine readable information storage medium.).  Therefore, claims 17-20 are rejected for the same reasons as claims 1, and 3-5, respectively.
The prior art made of record and not used is considered pertinent to applicant’s disclosure:
Artan, et al “Exploring patterns of Train Delay Evolution and Timetable Robustness” discloses using homogeneous and non-homogeneous Markov models to effectively predict train delays and to evaluate timetable robustness.
Cai, et al “Real-time crash prediction on expressways using deep generative models” discloses deep a learning method called deep convolutional generative adversarial network (DCGAN) model to fully understand the traffic data leading to crashes.
Noursalehi, et al “Dynamic Origin-Destination Prediction in Urban Rail Systems: A Multi-Resolution Spatio-Temporal Deep Learning Approach” discloses three modules: a  multi-resolution spatial feature extraction module for capturing the local spatial dependencies with a channel-wise attention block, an auxiliary information encoding module (AIE) for encoding the exogenous information, and a module for capturing the temporal evolution of demand.
Saxena, D., et al “Multimodal Spatio-Temporal Prediction with Stochastic Adversarial Networks” discloses two components (1) a spatio-temporal correlation network which models spatio-temporal joint distribution of pixels and supports a stochastic sampling of latent variables for multiple plausible futures, and (2) a stochastic adversarial network to jointly learn generation and variational inference of data through implicit distribution modeling.”

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Bart I Rylander/
Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

May 16, 2023
Application Filed
Apr 29, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/617,994
Patent 12608632
ERROR DETECTION DEVICE, ERROR DETECTION METHOD, AND ERROR DETECTION PROGRAM
4y 4m to grant Granted Apr 21, 2026
17/509,582
Patent 12555002
RULE GENERATION FOR MACHINE-LEARNING MODEL DISCRIMINATORY REGIONS
4y 3m to grant Granted Feb 17, 2026
17/204,188
Patent 12530572
Method for Configuring a Neural Network Model
4y 10m to grant Granted Jan 20, 2026
18/072,677
Patent 12530622
GENERATING NEW DATA BASED ON CLASS-SPECIFIC UNCERTAINTY INFORMATION USING MACHINE LEARNING
3y 1m to grant Granted Jan 20, 2026
17/956,120
Patent 12493826
AUTOMATIC MACHINE LEARNING FEATURE BACKWARD STRIPPING
3y 2m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
65%
Grant Probability
79%
With Interview (+13.6%)
3y 11m (~10m remaining)
Median Time to Grant
Low
PTA Risk
Based on 117 resolved cases by this examiner. Grant probability derived from career allowance rate.