Last updated: May 29, 2026
Application No. 16/176,903
ARCHITECTURE FOR DEEP Q LEARNING

Final Rejection §103
Filed
Oct 31, 2018
Examiner
JAYAKUMAR, CHAITANYA R
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Advanced Micro Devices, Inc.
OA Round
8 (Final)
Interview Optional

— +20.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 24% grant rate with +20.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 53 resolved cases, 2023–2026
Examiner Intelligence

JAYAKUMAR, CHAITANYA R View full profile →
Grants only 24% of cases
Career Allowance Rate
13 granted / 53 resolved
-30.5% vs TC avg
Strong +20% interview lift
Without
With
+20.5%
Interview Lift
resolved cases with interview
Typical timeline
5y 3m
Avg Prosecution
9 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
7.0%
-33.0% vs TC avg
§103
90.8%
+50.8% vs TC avg
§102
1.8%
-38.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 53 resolved cases
Office Action

§103
DETAILED ACTION
 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is in response to submission filed 13 August 2025 for application 16/176,903. Claims 1, 10, and 19 are amended. Currently claims 1-20 are pending and have been examined.
The objection to claim 1 has been withdrawn in view of the amendments made. 

	
	
Response to Arguments
	
Regarding Applicant’s arguments, filed 13 August 2025, see pages 8-10, with respect to the USC 103 rejection, Applicant argues that Claims 1, 10, and 19 have been amended to recite "wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights." Support for the amendments can be found at least at paragraph [0042]. The cited references do not teach or suggest this feature. 
Page 6 of the Office Action indicates that Babaeizadeh does not teach that the adjusting and the applying are performed in parallel. Page 7 cites to Nair as teaching some features, but not the "in parallel" feature described above. Applicants have searched Nair and submit that Nair does not teach this feature. Page 8 of the Office Action cites to Dean as teaching the "in parallel feature." Applicants submit, however, that Dean does not teach the amended version of this feature, which includes additional detail. 
In particular, the operations described as being performed "in parallel" in Dean are very generalized. Page 3, section 3, paragraph 1 states that "Distbelief," the software framework allows for "distributed computation in neural networks...." User- defined computations take place in each node, and each node can be distributed to a different machine. The operations are generally being described as being "parallelized." Importantly, however, Dean does not disclose the specific operations recited in claims 1, 10, and 19 provided above - specifically, that "the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly -generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights." For the foregoing reasons, Applicant submits that the cited references do not teach each and every feature of claims 1, 10 and 19. Thus, Applicant requests withdrawal of the rejections of claims 1, 10 and 19 and all claims dependent thereon. 
Examiner’s Response: Applicant’s arguments have been fully considered but are not persuasive because with respect to the feature “wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights” as recited in independent claim 1 (and similarly in independent claims 10 and 19) have been considered but are moot because the new ground of rejection (citing new reference Atiya et al (New Results on Recurrent Network Training: Unifying the Algorithms and Accelerating Convergence, 2000) for teaching the new limitation) does not rely on any reference combination applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

	 

Claim Objections
Claim 1 is objected to because of the following informalities: The phrase “… and and …” (lines 11 and line 13) is redundant.  Appropriate correction is required.
Claim 1 is objected to because of the following informalities: The phrase “… for first set of newly generated weights …” (line 18) is awkward.  Appropriate correction is required.
Claims 1, 10, and 19 is objected to because of the following informalities: The phrase “… the applying the training information …” (last limitation) is awkward.  Appropriate correction is required.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Babaeizadeh et al (REINFORCEMENT LEARNING THROUGH ASYNCHRONOUS ADVANTAGE ACTOR-CRITIC ON A GPU, 2017) in view of Nair et al (Massively Parallel Methods for Deep Reinforcement Learning, 2015) and further in view of Atiya et al (New Results on Recurrent Network Training: Unifying the Algorithms and Accelerating Convergence, 2000).


Regarding claim 1
Babaeizadeh teaches: A method for training a neural network, the method comprising ([Page 1, Section 1, Paragraph 2] The DNN model is constantly queried to guide the actions of agents whose gameplay in turn feeds DNN training): 
applying training information, by one or more training cores to a target neural network having weights stored in a target network weight memory ([Page 3, Section 3.2, Paragraph 1] the central server propagates new weights to the agents. [Page 10, Paragraph 3] Each time a trainer updates the DNN weights. Note: Also see Figure 1(b) where bottom right green square corresponds to weights stored in a target network weight memory and top left green square corresponds to training cores);
applying the training information, by one or more inference cores that have a different architecture than the one or more training cores, to a prediction neural network having weights stored in a prediction network weights memory to obtain an action to be performed in a simulated environment ([Page 3, Section 3.2, Paragraph 1] the central server propagates new weights to the agents. [Page 10, Paragraph 3] Each time a trainer updates the DNN weights. Note: Also see Figure 1(b) where top right green square corresponds to weights stored in a prediction network weights memory, the GPU on the right (pink rectangle) corresponds to inference cores, top left green square corresponds to training cores, and bottom right green square corresponds to target network weights memory. Bottom left square shows obtaining an action to be performed in a simulator corresponding to a simulated environment);
applying the action to the simulated environment to obtain resulting information from the simulated environment ([Page 4] Figure 1(b). Note: Bottom left square shows applying the action to a simulator corresponding to a simulated environment to obtain resulting information);
However, Babaeizadeh is not relied upon to teach: adjusting, by the one or more training cores, the weights of the prediction neural network based on the resulting information and the action, wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights. 
Nair teaches, in an analogous system: adjusting, by the one or more training cores, the weights of the prediction neural network based on the resulting information and the action ([Page 3, Column 1, Paragraph 1] In the reinforcement learning (RL) paradigm, the agent interacts sequentially with an environment, with the goal of maximising cumulative rewards. At each step t the agent observes state st, selects an action at, and receives a reward rt. The agent’s policy (ajs) maps states to actions and defines its behavior. [Page 3, Column 1, Last Paragraph] One of the core ideas  behind reinforcement learning is to represent the action-value function using a function approximator such as a neural network, Q(s; a) = Q(s; a; ). The parameters  of the so-called Q-network are optimized so as to approximately solve the Bellman equation. For example, the Q-learning algorithm iteratively updates the action-value function Q(s; a; ) towards a sample of the Bellman target, r +  max a0 Q(s0; a0; ). [Page 3, Section 3.3, Paragraph 2] Second, DQN maintains two separate Q-networks Q(s; a; ) and Q(s; a; 􀀀) with current parameters and old parameters 􀀀 respectively. The current parameters  may be updated many times per time-step, and are copied into the old parameters 􀀀 after N iterations. At every update iteration i the current parameters are updated so as to minimise the mean-squared Bellman error with respect to old parameters 􀀀, by optimizing the following loss function (DQN Loss), [Page 3, Section 3.3, Paragraph 3] Specifically, is adjusted. Note: See Algorithm 1. Also see Figure 2 where Target Q network corresponds to the target neural network. Q network in the Actor corresponds to the prediction neural network).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for training a prediction neural network of Babaeizadeh to incorporate the teachings of Nair to adjust, by the one or more training cores, the weights of the prediction neural network based on the resulting information and the action. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].
Atiya teaches, in an analogous system: wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights ([Page 705, Column 1, Section VI, Paragraphs 1 and 2] the following two methods: 1) the BTT approach, which is the fastest of the existing gradient-descent- based algorithms and 2) the BTT approach [21], which is a very efficient accelerated technique for recurrent networks. The BTT (h, h') is summarized as follows. 1) Run h the network for steps. 2) Propagate backwards for h' steps (h' > h), and update the weights. 3) Run the network for the next h steps, then propagate backwards h' steps, and update the weights. Continue in a similar manner till the end of the data, and then repeat another cycle.  [Page 705, Column 2, Paragraph 3] In the first trial we tune the parameter values and for each of the methods (all runs start from the same initial weights). The way we have done the comparison is to perform the following for each method. Several runs each with different parameter values (learning rate, etc.) are performed. We then choose the parameter values that lead to fastest convergence. We then fix the parameters on these values, and run ten more trials each with different initial weights. For a particular trial we fix the initial weight configuration across the five methods to make the comparison fair. We record the number of iterations needed to reach particular error levels, and obtain the average for each method. We note that for recurrent networks it is always better to start with small weights, because if we have long sequences for large initial weights the states tend to wander off into the saturation region. We have generated the initial weights always in the range from 0.2 to 0.2.We have trained all methods for a maximum of 10 000 iterations. If the method did not reached the prespecified error levels by then, then we declare that it failed to converge on this particular trial. Note: Generating the initial weights always in the range from 0.2 to 0.2 corresponds to adjusting the weights generated immediately prior to the first set of newly generated weights. Run the network for the next h steps, then propagate backwards h' steps, and update the weights corresponds to applying the training information occurs for first set of newly generated weights. Run ten more trials each with different initial weights shows the adjusting of weights generated immediately prior to the first set of newly generated weight in the trial, where each trial corresponds to applying the training information, thereby performing the training and adjusting in a parallel operation.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified combined teachings of Babaeizadeh and Nair to incorporate the teachings of Atiya wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights. One would have been motivated to do this modification because doing so would give the benefit of two methods which are fast and efficient as taught by Atiya [Page 705, Column 1, Section VI, Paragraphs 1 and 2].

Regarding claim 2
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 1 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network includes: sampling, by one or more training cores, one or more tuples from a replay memory, where each tuple includes a state sj, an action aj, a reward for the action rj, and a subsequent state sj+i. ([Page 4, Column 1, Last Paragraph] For each learner update k, a minibatch of experience tuples e = (s, a, r, s') is sampled from either a local or global experience replay memory D (see above)). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network includes: sampling, by one or more training cores, one or more tuples from a replay memory, where each tuple includes a state sj, an action aj, a reward for the action rj, and a subsequent state sj+i. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 3
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 2 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj+i to the target neural network and obtaining a highest action score output from the target artificial neural network ([Page 3, Column 1, Paragraph 2] The action-value function Q(s; a) is the expected return after observing state st and taking an action under a policy , Q(s; a) = E [Rtjst = s; at = a; ], and the optimal action-value function is the maximum possible value that can be achieved by any policy, Q (s; a) = argmax Q(s; a). [Page 5, Column 1, Algorithm 1] Execute the action in the environment and observe the reward rt and the next state st+1. Note: Maximum possible value corresponds to highest action score. Also, note in algorithm 1 the 'for loop' from t=1 to T).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj+i to the target neural network obtaining a highest action score output from the target artificial neural network. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 4
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 3 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj to the prediction artificial neural network to obtain an action score for action aj ([Page 3, Column 1, Paragraph 2] The action-value function Q(s; a) is the expected return after observing state st and taking an action under a policy , Q(s; a) = E [Rtjst = s; at = a; ].  [Page 5, Column 1, Algorithm 1] Initialise the training network for the action-value function Q(s; a; ) with weights  and target network Q(s; a; 􀀀) with weights 􀀀 = . Note: Action value corresponds to action score).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj to the prediction artificial neural network to obtain an action score for action aj. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 5
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 4 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: determining, by the one or more training cores, a loss function based on the highest action score output by the target neural network for state sj+i, the action score for action aj output by the prediction neural network, and a reward score rj ([Page 5, Column 1, Algorithm 1] With probability  take a random action at or else at = argmax a Q(s; a; ). Calculate the loss Lt = (yt 􀀀 Q(si; ai; )2). Note: Also, see Figure 2. DQN Loss corresponds to loss function).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: determining, by the one or more training cores, a loss function based on the highest action score output by the target neural network for state sj+i, the action score for action aj output by the prediction neural network, and a reward score rj. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 6 
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 5 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: performing, by the one or more training cores, a gradient descent operation on the loss function with respect to the weights of the prediction neural network ([Page 3, Column 2, Section 3.3, Paragraph 3] For each sample (or minibatch), the current parameters are updated by a stochastic gradient descent algorithm. Specifically, is adjusted in the direction of the sample gradient gi of the loss with respect to ,).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: performing, by the one or more training cores, a gradient descent operation on the loss function with respect to the weights of the prediction neural network. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 7
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 1 (as shown above).
Nair further teaches: further comprising: periodically updating the weights of the target neural network via a copy engine by copying the weights of the prediction neural network into the target artificial neural network memory ([Page 3, section 3.2] Note: Figure 1 shows "copy every N updates". [Page 4, Figure 4] Note: Figure 4 shows "sync every global N steps" corresponding to copying).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair to periodically update the weights of the target neural network via a copy engine by copying the weights of the prediction neural network into the target artificial neural network memory. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].


Regarding claim 8
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 1 (as shown above).
Nair further teaches: further comprising: repeating the applying steps and the adjusting step for each step of an episode of training ([Page 5, Column 1, Algorithm 1]  for episode = 1 to M do … for t=1 to T. Note: ‘for’ shows repeating steps for each step t of episodes 1 to M and ‘do’ shows the adjusting).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair to repeat the applying, selecting, storing, and adjusting steps for each step of an episode of training. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 9
The system of Babaeizadeh, Nair, and Atiya teaches: The method of claim 8 (as shown above).
Nair further teaches: further comprising: performing multiple episodes of training to train the prediction neural network ([Page 5, Column 1, Algorithm 1] for episode = 1 to M (corresponds to multiple episodes)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Babaeizadeh and Atiya to incorporate the teachings of Nair to perform multiple episodes of training to train the prediction neural network. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 10
Babaeizadeh teaches: A machine learning device for training a neural network, the machine learning device comprising ([Page 1, Section 1, Paragraph 2] The DNN model is constantly queried to guide the actions of agents whose gameplay in turn feeds DNN training): 
a set of memories including a prediction network weights memory, and a target network weight memory (Note: Also see Figure 1(b) where bottom right green square corresponds to  target network weight memory, top right green square corresponds to prediction network weight memory and both these squares are separate from each other); 
one or more training cores configured to apply training information to a target neural network having weights stored in the target network weight memory ([Page 3, Section 3.2, Paragraph 1] the central server propagates new weights to the agents. [Page 10, Paragraph 3] Each time a trainer updates the DNN weights. Note: Also see Figure 1(b) where bottom right green square corresponds to weights stored in a target network weight memory and top left green square corresponds to training cores); 
one or more inference cores that have a different architecture than the one or more training cores, the one or more inference cores configured to apply the training information to a prediction neural network having weights stored in a prediction network weights memory to obtain an action to be performed in a simulated environment ([Page 3, Section 3.2, Paragraph 1] the central server propagates new weights to the agents. [Page 10, Paragraph 3] Each time a trainer updates the DNN weights. Note: Also see Figure 1(b) where top right green square corresponds to weights stored in a prediction network weight memory, the GPU on the right (pink rectangle) corresponds to inference cores, top left green square corresponds to training cores. Bottom left square shows obtaining an action to be performed in a simulator corresponding to a simulated environment), 
a control core configured to apply the action to the simulated environment to obtain resulting information from the simulated environment ([Page 4] Figure 1(b). Note: Bottom left square shows a CPU core applying the action corresponding to a control core configured to apply the action to a simulator corresponding to a simulated environment to obtain resulting information);
However, Nair is not relied upon to teach: a replay memory, and wherein the one or more training cores are configured to adjust the weights of the prediction neural network based on the resulting information and the action, and wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights.
Nair teaches, in an analogous system: a replay memory ([Page 4] Note: Figure 2 shows replay memory),
wherein the one or more training cores are configured to adjust the weights of the prediction neural network based on the resulting information and the action ([Page 3, Column 1, Paragraph 1] In the reinforcement learning (RL) paradigm, the agent interacts sequentially with an environment, with the goal of maximising cumulative rewards. At each step t the agent observes state st, selects an action at, and receives a reward rt. The agent’s policy (ajs) maps states to actions and defines its behavior. [Page 3, Column 1, Last Paragraph] One of the core ideas  behind reinforcement learning is to represent the action-value function using a function approximator such as a neural network, Q(s; a) = Q(s; a; ). The parameters  of the so-called Q-network are optimized so as to approximately solve the Bellman equation. For example, the Q-learning algorithm iteratively updates the action-value function Q(s; a; ) towards a sample of the Bellman target, r +  max a0 Q(s0; a0; ).  [Page 3, Section 3.3, Paragraph 2] Second, DQN maintains two separate Q-networks Q(s; a; ) and Q(s; a; 􀀀) with current parameters and old parameters 􀀀 respectively. The current parameters  may be updated many times per time-step, and are copied into the old parameters 􀀀 after N iterations. At every update iteration i the current parameters are updated so as to minimise the mean-squared Bellman error with respect to old parameters 􀀀, by optimizing the following loss function (DQN Loss), [Page 3, Section 3.3, Paragraph 3] Specifically, is adjusted. Note: See Algorithm 1. Also see Figure 2 where Target Q network corresponds to the target artificial neural network. Learner server has its own processors corresponding to the training cores and also has its own memory corresponding to the target network weight memory. Q network in the Actor corresponds to the prediction artificial neural network. Actor server also has its own memory corresponding to the prediction network weight memory).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for training a prediction neural network of Babaeizadeh to incorporate the teachings of Nair wherein the one or more training cores are configured to adjust the weights of the prediction neural network based on the resulting information and the action. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].
Atiya teaches, in an analogous system: wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights ([Page 705, Column 1, Section VI, Paragraphs 1 and 2] the following two methods: 1) the BTT approach, which is the fastest of the existing gradient-descent- based algorithms and 2) the BTT approach [21], which is a very efficient accelerated technique for recurrent networks. The BTT (h, h') is summarized as follows. 1) Run h the network for steps. 2) Propagate backwards for h' steps (h' > h), and update the weights. 3) Run the network for the next h steps, then propagate backwards h' steps, and update the weights. Continue in a similar manner till the end of the data, and then repeat another cycle.  [Page 705, Column 2, Paragraph 3] In the first trial we tune the parameter values and for each of the methods (all runs start from the same initial weights). The way we have done the comparison is to perform the following for each method. Several runs each with different parameter values (learning rate, etc.) are performed. We then choose the parameter values that lead to fastest convergence. We then fix the parameters on these values, and run ten more trials each with different initial weights. For a particular trial we fix the initial weight configuration across the five methods to make the comparison fair. We record the number of iterations needed to reach particular error levels, and obtain the average for each method. We note that for recurrent networks it is always better to start with small weights, because if we have long sequences for large initial weights the states tend to wander off into the saturation region. We have generated the initial weights always in the range from 0.2 to 0.2.We have trained all methods for a maximum of 10 000 iterations. If the method did not reached the prespecified error levels by then, then we declare that it failed to converge on this particular trial. Note: Generating the initial weights always in the range from 0.2 to 0.2 corresponds to adjusting the weights generated immediately prior to the first set of newly generated weights. Run the network for the next h steps, then propagate backwards h' steps, and update the weights corresponds to applying the training information occurs for first set of newly generated weights. Run ten more trials each with different initial weights shows the adjusting of weights generated immediately prior to the first set of newly generated weight in the trial, where each trial corresponds to applying the training information, thereby performing the training and adjusting in a parallel operation.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified combined teachings of Babaeizadeh and Nair to incorporate the teachings of Atiya wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights. One would have been motivated to do this modification because doing so would give the benefit of two methods which are fast and efficient as taught by Atiya [Page 705, Column 1, Section VI, Paragraphs 1 and 2].

Regarding claim 11
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 10 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network includes: sampling, by one or more training cores, one or more tuples from the replay memory, where each tuple includes a state sj, an action aj, a reward for the action rj, and a subsequent state sj+i ([Page 4, Column 1, Last Paragraph] For each learner update k, a minibatch of experience tuples e = (s, a, r, s') is sampled from either a local or global experience replay memory D (see above)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network includes: sampling, by one or more training cores, one or more tuples from the replay memory, where each tuple includes a state sj, an action aj, a reward for the action rj, and a subsequent state sj+i. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 12
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 11 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj+i to the target neural network and obtaining a highest action score output from the target neural network ([Page 3, Column 1, Paragraph 2] The action-value function Q(s; a) is the expected return after observing state st and taking an action under a policy , Q(s; a) = E [Rtjst = s; at = a; ], and the optimal action-value function is the maximum possible value that can be achieved by any policy, Q (s; a) = argmax Q(s; a). [Page 5, Column 1, Algorithm 1] Execute the action in the environment and observe the reward rt and the next state st+1. Note: Maximum possible value corresponds to highest action score. Also, note in algorithm 1 the 'for loop' from t=1 to T).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj+i to a target neural network having weights stored in a target network weight memory and obtaining a highest action score output from the target neural network. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 13
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 12 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj to the prediction neural network to obtain an action score for action aj ([Page 3, Column 1, Paragraph 2] The action-value function Q(s; a) is the expected return after observing state st and taking an action under a policy , Q(s; a) = E [Rtjst = s; at = a; ].  [Page 5, Column 1, Algorithm 1] Initialise the training network for the action-value function Q(s; a; ) with weights  and target network Q(s; a; 􀀀) with weights 􀀀 = . Note: Action value corresponds to action score).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: applying, by the one or more training cores, state sj to the prediction neural network to obtain an action score for action aj. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].


Regarding claim 14
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 13 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: determining, by the one or more training cores, a loss function based on the highest action score output by the target neural network for state sj+i, the action score for action aj output by the prediction neural network, and a reward score rj ([Page 5, Column 1, Algorithm 1] With probability  take a random action at or else at = argmax a Q(s; a; ). Calculate the loss Lt = (yt 􀀀 Q(si; ai; )2). Note: Also, see Figure 2. DQN Loss corresponds to loss function).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: determining, by the one or more training cores, a loss function based on the highest action score output by the target neural network for state sj+i, the action score for action aj output by the prediction neural network, and a reward score rj. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 15
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 14 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network further includes: performing, by the one or more training cores, a gradient descent operation on the loss function with respect to the weights of the prediction neural network ([Page 3, Column 2, Section 3.3, Paragraph 3] For each sample (or minibatch), the current parameters are updated by a stochastic gradient descent algorithm. Specifically, is adjusted in the direction of the sample gradient gi of the loss with respect to ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network further includes: performing, by the one or more training cores, a gradient descent operation on the loss function with respect to the weights of the prediction neural network. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 16
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 10 (as shown above).
Nair further teaches: further comprising: a copy engine configured to periodically update the weights of the target neural network by copying the weights of the prediction neural network into the target artificial neural network memory ([Page 3, section 3.2] Note: Figure 1 shows "copy every N updates". [Page 4, Figure 4] Note: Figure 4 shows "sync every global N steps" corresponding to copying).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair to use a copy engine configured to periodically update the weights of the target neural network by copying the weights of the prediction neural network into the target artificial neural network memory. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].


Regarding claim 17
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 10 (as shown above).
Nair further teaches: wherein the one or more inference cores, the action selection processor, the tuple storing processor, and the one or more training cores are further configured to: repeat the applying steps and the adjusting for each step of an episode of training ([Page 5, Column 1, Algorithm 1]  for episode = 1 to M do … for t=1 to T Note: Shows repeating steps for each step t of episodes 1 to M).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein the one or more inference cores, the action selection processor, the tuple storing processor, and the one or more training cores are further configured to: repeat the applying steps and the adjusting for each step of an episode of training. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 18
The system of Babaeizadeh, Nair, and Atiya teaches: The machine learning device of claim 17 (as shown above).
Nair further teaches: wherein the one or more inference cores, the action selection processor, the tuple storing processor, and the one or more training cores are further configured to: performing multiple episodes of training to train the prediction neural network ([Page 5, Column 1, Algorithm 1] for episode = 1 to M (corresponds to multiple episodes)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein the one or more inference cores, the action selection processor, the tuple storing processor, and the one or more training cores are further configured to: performing multiple episodes of training to train the prediction neural network. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].

Regarding claim 19
Babaeizadeh teaches: A computing device for training a prediction neural network, the computing device comprising ([Page 1, Section 1, Paragraph 2] The DNN model is constantly queried to guide the actions of agents whose gameplay in turn feeds DNN training): 
a central processor configured to interface with an environment by applying actions to the environment and observing resulting information including states and rewards output by the environment ([Page 2, Section 3.1, Paragraph 1] In standard RL, an agent interacts with an environment over a number of discrete time steps. At each time step t, the agent observes a state st and, in the discrete case, selects an action at from the set of valid actions. An agent is guided by policy , a function mapping from states st to actions at. After each action, the agent observes the next state st+1 and receives feedback in the form of a reward rt); 
and a machine learning device for training the prediction neural network, the machine learning device comprising: a set of memories including a prediction network weights memory, and a target network weight memory (Note: Also see Figure 1(b) where bottom right green square corresponds to  target network weight memory, top right green square corresponds to prediction network weight memory and both the green squares are separate from each other); 
one or more training cores configured to apply training information to a target neural network having weights stored in the target network weight memory ([Page 3, Section 3.2, Paragraph 1] the central server propagates new weights to the agents. [Page 10, Paragraph 3] Each time a trainer updates the DNN weights. Note: Also see Figure 1(b) where bottom right green square corresponds to weights stored in a target network weight memory and top left green square corresponds to training cores);
one or more inference cores that have a different architecture than the one or more training cores, the one or more inference cores configured to apply the training information to a prediction neural network having weights stored in the prediction network weights memory to obtain an action to be performed in the environment ([Page 3, Section 3.2, Paragraph 1] the central server propagates new weights to the agents. [Page 10, Paragraph 3] Each time a trainer updates the DNN weights. Note: Also see Figure 1(b) where top right green square corresponds to weights stored in a prediction network weight memory, the GPU on the right (pink rectangle) corresponds to inference cores, top left green square corresponds to training cores. Bottom left square shows obtaining an action to be performed in a simulator corresponding to a simulated environment),
wherein the central processor is configured to apply the action to the environment to obtain resulting information from the environment ([Page 4] Figure 1(b). Note: Bottom left square shows applying the action to a simulator corresponding to a simulated environment to obtain resulting information);
However, Nair is not relied upon to teach: a replay memory, and wherein the one or more training cores are configured to adjust the weights of the prediction neural network based on the resulting information and the action, wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights.
Nair teaches, in an analogous system: a replay memory ([Page 4] Note: Figure 2 shows replay memory);
wherein the one or more training cores are configured to adjust the weights of the prediction neural network based on the resulting information and the action ([Page 3, Column 1, Paragraph 1] In the reinforcement learning (RL) paradigm, the agent interacts sequentially with an environment, with the goal of maximising cumulative rewards. At each step t the agent observes state st, selects an action at, and receives a reward rt. The agent’s policy (ajs) maps states to actions and defines its behavior. [Page 3, Column 1, Last Paragraph] One of the core ideas  behind reinforcement learning is to represent the action-value function using a function approximator such as a neural network, Q(s; a) = Q(s; a; ). The parameters  of the so-called Q-network are optimized so as to approximately solve the Bellman equation. For example, the Q-learning algorithm iteratively updates the action-value function Q(s; a; ) towards a sample of the Bellman target, r +  max a0 Q(s0; a0; ).  [Page 3, Section 3.3, Paragraph 2] Second, DQN maintains two separate Q-networks Q(s; a; ) and Q(s; a; 􀀀) with current parameters and old parameters 􀀀 respectively. The current parameters  may be updated many times per time-step, and are copied into the old parameters 􀀀 after N iterations. At every update iteration i the current parameters are updated so as to minimise the mean-squared Bellman error with respect to old parameters 􀀀, by optimizing the following loss function (DQN Loss), [Page 3, Section 3.3, Paragraph 3] Specifically, is adjusted. Note: See Algorithm 1. Also see Figure 2 where Target Q network corresponds to the target artificial neural network. Learner server has its own processors corresponding to the training cores and also has its own memory corresponding to the target network weight memory. Q network in the Actor corresponds to the prediction artificial neural network. Actor server also has its own memory corresponding to the prediction network weight memory).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a machine learning device for training a prediction artificial neural network of Babaeizadeh to incorporate the teachings of Nair to use a replay memory and wherein the one or more training cores are configured to adjust the weights of the prediction neural network based on the resulting information and the action. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].
Atiya teaches, in an analogous system: wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights ([Page 705, Column 1, Section VI, Paragraphs 1 and 2] the following two methods: 1) the BTT approach, which is the fastest of the existing gradient-descent- based algorithms and 2) the BTT approach [21], which is a very efficient accelerated technique for recurrent networks. The BTT (h, h') is summarized as follows. 1) Run h the network for steps. 2) Propagate backwards for h' steps (h' > h), and update the weights. 3) Run the network for the next h steps, then propagate backwards h' steps, and update the weights. Continue in a similar manner till the end of the data, and then repeat another cycle.  [Page 705, Column 2, Paragraph 3] In the first trial we tune the parameter values and for each of the methods (all runs start from the same initial weights). The way we have done the comparison is to perform the following for each method. Several runs each with different parameter values (learning rate, etc.) are performed. We then choose the parameter values that lead to fastest convergence. We then fix the parameters on these values, and run ten more trials each with different initial weights. For a particular trial we fix the initial weight configuration across the five methods to make the comparison fair. We record the number of iterations needed to reach particular error levels, and obtain the average for each method. We note that for recurrent networks it is always better to start with small weights, because if we have long sequences for large initial weights the states tend to wander off into the saturation region. We have generated the initial weights always in the range from 0.2 to 0.2.We have trained all methods for a maximum of 10 000 iterations. If the method did not reached the prespecified error levels by then, then we declare that it failed to converge on this particular trial. Note: Generating the initial weights always in the range from 0.2 to 0.2 corresponds to adjusting the weights generated immediately prior to the first set of newly generated weights. Run the network for the next h steps, then propagate backwards h' steps, and update the weights corresponds to applying the training information occurs for first set of newly generated weights. Run ten more trials each with different initial weights shows the adjusting of weights generated immediately prior to the first set of newly generated weight in the trial, where each trial corresponds to applying the training information, thereby performing the training and adjusting in a parallel operation.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified combined teachings of Babaeizadeh and Nair to incorporate the teachings of Atiya wherein the adjusting and the applying the training information to obtain the action are performed in a parallel operation, in which the applying the training information occurs for first set of newly generated weights and the adjusting occurs for a set of weights generated immediately prior to the first set of newly generated weights. One would have been motivated to do this modification because doing so would give the benefit of two methods which are fast and efficient as taught by Atiya [Page 705, Column 1, Section VI, Paragraphs 1 and 2].

Regarding claim 20
The system of Babaeizadeh, Nair, and Atiya teaches: The computing device of claim 19 (as shown above).
Nair further teaches: wherein adjusting the weights of the prediction neural network includes: sampling, by one or more training cores, one or more tuples from the replay memory, where each tuple includes a state sj, an action aj, a reward for the action r;, and a subsequent state s;+1 ([Page 4, Column 1, Last Paragraph] For each learner update k, a minibatch of experience tuples e = (s, a, r, s') is sampled from either a local or global experience replay memory D (see above)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning device for training a neural network of Babaeizadeh to incorporate the teachings of Nair wherein adjusting the weights of the prediction neural network includes: sampling, by one or more training cores, one or more tuples from the replay memory, where each tuple includes a state sj, an action aj, a reward for the action r;, and a subsequent state s;+1. One would have been motivated to do this modification because doing so would give the benefit of current parameters being updated many times per time-step as taught by Nair [Page 3, Section 3.3, Paragraph 2].


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Fogel et al (US 5214746 A) discloses A method and apparatus for training neural networks using evolutionary programming. A network is adjusted to operate in a weighted configuration defined by a set of weight values and a plurality of training patterns are input to the network to generate evaluations of the training patterns as network outputs. Each evaluation is compared to a desired output to obtain a corresponding error. From all of the errors, an overall error value corresponding to the set of weight values is determined. The above steps are repeated with different weighted configurations to obtain a plurality of overall error values. Then, for each set of weight values, a score is determined by selecting error comparison values from a predetermined variable probability distribution and comparing them to the corresponding overall error value. A predetermined number of the sets of weight values determined to have the best scores are selected and copies are made. The copies are mutated by adding random numbers to their weights and the above steps are repeated with the best sets and the mutated copies defining the weighted configurations. This procedure is repeated until the overall error values diminish to below an acceptable threshold. The random numbers added to the weight values of copies are obtained from a continuous random distribution of numbers having zero mean and variance determined such that it would be expected to converge to zero as the different sets of weight values in successive iterations converge toward sets of weight values yielding the desired neural network performance. 
Andoni et al (US 9785886 B1) discloses Cooperative Execution Of A Genetic Algorithm With An Efficient Training Algorithm For Data-driven Model Creation. A method includes, based on a fitness function, selecting a subset of models from a plurality of models. The plurality of models is generated based on a genetic algorithm and corresponds to a first epoch of the genetic algorithm. Each of the plurality of models includes data representative of a neural network. The method also includes performing at least one genetic operation of the genetic algorithm with respect to at least one model of the subset to generate a trainable model and sending the trainable model to an optimization trainer. The method includes adding a trained model received from the optimization trainer as input to a second epoch of the genetic algorithm that is subsequent to the first epoch.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 9am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Show 16 earlier events
Sep 05, 2024
Response after Non-Final Action
Sep 18, 2024
Applicant Interview (Telephonic)
Sep 18, 2024
Response after Non-Final Action
Oct 08, 2024
Request for Continued Examination
Oct 17, 2024
Response after Non-Final Action
May 16, 2025
Non-Final Rejection mailed — §103
Aug 13, 2025
Response Filed
Oct 21, 2025
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

15/884,279
Patent 12293260
GENERATING AND DEPLOYING PACKAGES FOR MACHINE LEARNING AT EDGE DEVICES
7y 3m to grant Granted May 06, 2025
16/547,380
Patent 12147915
SYSTEMS AND METHODS FOR MODELLING PREDICTION ERRORS IN PATH-LEARNING OF AN AUTONOMOUS LEARNING AGENT
5y 3m to grant Granted Nov 19, 2024
15/866,225
Patent 11770571
Matrix Completion and Recommendation Provision with Deep Learning
5y 8m to grant Granted Sep 26, 2023
16/507,025
Patent 11769074
COLLECTING OBSERVATIONS FOR MACHINE LEARNING
4y 2m to grant Granted Sep 26, 2023
15/826,613
Patent 11741693
SYSTEM AND METHOD FOR SEMI-SUPERVISED CONDITIONAL GENERATIVE MODELING USING ADVERSARIAL NETWORKS
5y 9m to grant Granted Aug 29, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

9-10
Expected OA Rounds
24%
Grant Probability
45%
With Interview (+20.5%)
5y 3m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 53 resolved cases by this examiner. Grant probability derived from career allowance rate.
ARCHITECTURE FOR DEEP Q LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email