Last updated: April 19, 2026
Application No. 17/359,427
REINFORCEMENT LEARNING USING TARGET NEURAL NETWORKS

Final Rejection §101§102§112
Filed
Jun 25, 2021
Examiner
PELLETT, DANIEL T
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Deepmind Technologies Limited
OA Round
2 (Final)
Interview Optional

— +13.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 451 resolved cases, 2023–2026
Examiner Intelligence

PELLETT, DANIEL T View full profile →
Grants 78% — above average
Career Allow Rate
350 granted / 451 resolved
+22.6% vs TC avg
Moderate +14% lift
Without
With
+13.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
7 currently pending
Career history
458
Total Applications
across all art units
Statute-Specific Performance

§101
23.7%
-16.3% vs TC avg
§103
33.5%
-6.5% vs TC avg
§102
16.6%
-23.4% vs TC avg
§112
19.8%
-20.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 451 resolved cases
Office Action

§101 §102 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Status of Claims
This action is in reply to the amendments filed on January 5, 2026.
This action is a continuation of application 15/619,393 (now patent 11,049,008) filed June 9, 2017, which is a continuation of application 14/097,862 (now patent 9,679,258) filed December 5, 2013, which claims priority to provisional application 61/888,247 filed October 8, 2013.
Claims 2-21 are currently pending.
Claims 2-9, 11-18, 20, and 21 have been amended.

Claim Rejections - 35 USC § 112
The previous rejection of claims 2-21 for the second neural network and claims 3, 12, and 21 for the term “likely” are withdrawn in view of Applicant’s amendments.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 2-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  
When considering subject matter eligibility under 35 U.S.C. § 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1).  If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A).  The step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g. mathematical concepts, mental processes, certain methods of organizing human activity).  If it is determined in step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application.  If it is determined that step 2A, Prong that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B).  If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself. 
According to Step 1 of the analysis, in the instant case claims 2-10 are directed to a method, claims 11-19 are directed to a system, and claims 20 and 21 are directed to a non-transitory computer-readable storage media.  Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter); claim 18 is not directed to a statutory category, as discussed above, but considered here for compact prosecution.  

Considering independent claim 2 and Step 2A, Prong One, the limitations including: “generating a transition as a result of an interaction with an environment” and “sampling a transition from the replay memory in accordance with a sampling strategy that prioritizes different transitions in the replay memory differently;” covers performance of the mind.  That is, nothing in the claim element precludes the step from practically being performed in the mind.  
MPEP 2106.04(a)(2)(III) notes “the "mental processes" abstract idea grouping is defined as concepts performed in the human mind and examples of mental processes include observations, evaluations, judgments, and opinions.”  The “generating” step is an evaluation/judgment, and a mental step.  Further, the “sampling” step in claim 2 is an evaluation, judgment, and/or opinion, and a mental step.  Therefore, the claim contains abstract elements and the evaluation continues.
Considering Step 2A, Prong Two, the judicial exception in claim 2 is not integrated into a practical application.  Claim 2 includes the additional elements: “maintaining a replay memory that stores a plurality of transitions, each transition comprising respective starting state data defining a respective starting state of an environment, respective action data defining a respective action from a set of actions, and respective next state data defining a respective next state of the environment resulting from the respective action being performed in the environment when the environment is in the respective starting state;” and “training a neural network on at least the sampled transition, wherein the neural network is a deep neural network that is configured to receive an input comprising input state data and to generate as output a respective action — value parameter for each action in the set of actions.”  The replay memory are recited at a high level of generality and amounts to mere instructions to implement the abstract idea on a computer; see MPEP 2106.05(f).  Additionally, the transition details of classifications do not integrate the abstract idea into a practical application because it is insignificant extra-solution data activity, mere data gathering; see MPEP 2106.05(g).  Further, the training of the neural network is insignificant post-solution activity; see MPEP 2106.05(g).
  Considering Step 2B, the additional elements do not amount to significantly more.  The replay memory are recited at a high level of generality and amounts to a generic computer which is well-understood, routine, and conventional; i.e. storing and retrieving information in memory as detailed in MPEP 2106.05(d)(II).  Additionally, the details of the training is insignificant extra-solution activity, mere data gathering, and do not amount to significantly more; see MPEP 2106.05(g).  Further, the neural network training is insignificant post-solution activity, as the inference is tangential to the analysis and selection of classification; see MPEP 2106.05(g).
Therefore, claim 2 is ineligible in view of 35 U.S.C. 101.

Considering claim 3, dependent on claim 2, and Step 2A, Prong One, the limitations including: “the sampling strategy prioritizes transitions based on how much the neural network will learn from being trained on the transitions” covers performance of the mind.  MPEP 2106.04(a)(2)(III) notes “the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions.”  Claim 3’s sampling is an evaluation, judgment, and/or opinion.
Claim 3 does not contain any new additional elements and, therefore, is not integrated into a practical application or amount to significantly more under Step 2A, Prong Two, and Step 2B.  Therefore, claim 3 is ineligible in view of 35 U.S.C. 101.

Considering claim 4, dependent on claim 2, and Step 2A, Prong One, the limitations including: “processing an input comprising the respective starting state data in the sampled transition … to generate an action — value parameter for the respective action defined by the respective action data in the sampled transition;”, “generating, from at least the respective next state data in the sampled transition, a target value for the neural network”, and “a loss function that depends on a difference between the target value and the action — value parameter for the respective action defined by the respective action data in the sampled transition” covers performance of the mind.  MPEP 2106.04(a)(2)(III) notes “the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions.”  The input processing, action generation, target value generation, and loss function optimization are observations, evaluations, and/or judgments.
Claim 4 recites first and second neural networks, but the recitation is at a high level and amounts to insignificant post-solution activity and does not integrate the abstract idea into a practical application; see MPEP 2106.05(g).  Further, the neural network aspects are insignificant post-solution activity and does not amount to significantly more than the abstract idea; see MPEP 2106.05(g).  Therefore, claim 4 is ineligible in view of 35 U.S.C. 101.

Considering claim 5, dependent on claim 4, the claim does not contain an abstract idea.  Claim 5 includes only additional elements that have been previously discussed in the rejection above.  Therefore, claim 5 is not eligible.

Considering claim 6, dependent on claim 4, and Step 2A, Prong One, the limitations including: “generating, from at least the respective next state data in the sampled transition, a target value for the second neural network” covers performance of the mind.  MPEP 2106.04(a)(2)(III) notes “the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions.”  Claim 6’s generating is an evaluation, judgment, and/or opinion.
Claim 6 does not contain any new additional elements and, therefore, is not integrated into a practical application or amount to significantly more under Step 2A, Prong Two, and Step 2B.  Therefore, claim 6 is ineligible in view of 35 U.S.C. 101.

Considering claim 7, dependent on claim 6, and Step 2A, Prong One, the limitations including: “identifying a maximum next action – value parameter from the respective next action-value value parameter” covers performance of the mind.  MPEP 2106.04(a)(2)(III) notes “the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions.”  Claim 7’s identifying is an evaluation, judgment, and/or opinion.
Claim 7 does not contain any new additional elements and, therefore, is not integrated into a practical application or amount to significantly more under Step 2A, Prong Two, and Step 2B.  Therefore, claim 7 is ineligible in view of 35 U.S.C. 101.

Considering claim 8, dependent on claim 7, and Step 2A, Prong One, the limitations including: “determining the target value from the reward value and the identified maximum next action – value parameter” covers performance of the mind.  MPEP 2106.04(a)(2)(III) notes “the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions.”  Claim 8’s determining is an evaluation, judgment, and/or opinion.
Claim 8 does not contain any new additional elements and, therefore, is not integrated into a practical application or amount to significantly more under Step 2A, Prong Two, and Step 2B.  Therefore, claim 8 is ineligible in view of 35 U.S.C. 101.

Considering claim 9, dependent on claim 2, the claim does not contain an abstract idea.  Claim 9 includes only additional elements that have been previously discussed in the rejection above.  Therefore, claim 9 is not eligible.

Considering claim 10, dependent on claim 9, the claim does not contain an abstract idea.  Claim 10 includes only additional elements that have been previously discussed in the rejection above.  Therefore, claim 10 is not eligible.

Claim 11 is similar to claim 2 above and rejected for the same reasons as claim 2.

Claim 12 is similar to claim 3 above and rejected for the same reasons as claim 3.

Claim 13 is similar to claim 4 above and rejected for the same reasons as claim 4.

Claim 14 is similar to claim 5 above and rejected for the same reasons as claim 5.

Claim 15 is similar to claim 6 above and rejected for the same reasons as claim 6.

Claim 16 is similar to claim 7 above and rejected for the same reasons as claim 7.

Claim 17 is similar to claim 8 above and rejected for the same reasons as claim 8.

Claim 18 is similar to claim 9 above and rejected for the same reasons as claim 9.

Claim 19 is similar to claim 10 above and rejected for the same reasons as claim 10.

Claim 20 is similar to claim 1 above and rejected for the same reasons as claim 1.  Claim 20 includes new additional elements “non-transitory computer-readable storage media”, but this element does not integrate the abstract idea into a practical application (the media is recited at a high level of generality and amounts to mere instructions to implement the abstract idea on a computer; see MPEP 2106.05(f)) nor does it amount to significantly more (amounts to a generic computer which is well-understood, routine, and conventional; i.e. storing and retrieving information in memory as detailed in MPEP 2106.05(d)(II)).

Claim 21 is similar to claim 3 above and rejected for the same reasons as claim 3.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 2, 3, 9-12, and 18-21 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gabel, et al., “Improved Neural Fitted Q Iteration Applied to a Novel Computer Gaming and Learning Benchmark” (“Gabel”).

With respect to independent claim 2, Gabel teaches: 
A method performed by one or more computers, the method comprising: 
generating a transition as a result of an interaction with an environment (Gabel teaches sampling realizes interaction with the environment and creates a set of transition tuples; see second paragraph, in the left column, on page 2);
storing the transition in a replay memory that stores a plurality of transitions, each transition comprising respective starting state data defining a respective starting state of an environment, respective action data defining a respective action from a set of actions, and respective next state data defining a respective next state of the environment resulting from the respective action being performed in the environment when the environment is in the respective starting state (Gabel teaches a set of transition tuples consisting each of a state (i.e. starting state), an action taken in that state (i.e. action data), the immediate reward received, as well as the successor state entered (i.e. next state); see left column on page 2.); 
sampling a transition from the plurality of transitions in the replay memory in accordance with a sampling strategy that prioritizes different transitions in the replay memory differently (Gabel teaches the step of sampling experience that creates a set of transition tuples in figure 1 and the left column on page 2.  Gabel further teaches sampling methods including greedy or exploring policy in the left column on page 2.  At least a greedy sampling method prioritizes different data differently.); and 
training a neural network on at least the sampled transition, wherein the neural network is a deep neural network that is configured to receive an input comprising input state data and to generate as output a respective action — value parameter for each action in the set of actions (Gabel teaches training phases in section II and training a neural network in II.B.  Further, Gabel teaches a neural fitted Q iteration to produce Q numbers; see section III.).

With respect to claim 3 the rejection of claim 2 is incorporated.  Further, Gabel teaches: 
wherein the sampling strategy prioritizes transitions based on how much the neural network will learn from being trained on the transitions (Gabel teaches sampling methods including greedy or exploring policy in the left column on page 2.  A greedy sampling method prioritizes different data differently.).

With respect to claim 9 the rejection of claim 2 is incorporated.  Further, Gabel teaches: 
generating a new transition using the neural network (Gabel teaches generating new state transitions using the NFQ iterations in section III.A.); and
storing the new transition in the replay memory (Gabel teaches a computer implemented system that would include a memory; see abstract.). 

With respect to claim 10 the rejection of claim 9 is incorporated.  Further Gabel teaches: 
after storing the new transition in the replay memory, discarding one or more of the plurality of transitions from the replay memory (Gabel teaches filtering out transitions that comprise improper policies in figures 5 and 6.).

With respect to independent claim 11 Gabel teaches: 
A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers (Gabel teaches a computer implemented system; see abstract.) cause the one or more computers to perform operations comprising:
generating a transition as a result of an interaction with an environment (Gabel teaches sampling realizes interaction with the environment and creates a set of transition tuples; see second paragraph, in the left column, on page 2);
storing the transition in a replay memory that stores a plurality of transitions, each transition comprising respective starting state data defining a respective starting state of the environment, respective action data defining a respective action from a set of actions, and respective next state data defining a respective next state of the environment resulting from the respective action being performed in the environment when the environment is in the respective starting state (Gabel teaches a set of transition tuples consisting each of a state (i.e. starting state), an action taken in that state (i.e. action data), the immediate reward received, as well as the successor state entered (i.e. next state); see left column on page 2.);
sampling a transition from the plurality of transitions in the replay memory in accordance with a sampling strategy that prioritizes different transitions in the replay memory differently (Gabel teaches the step of sampling experience that creates a set of transition tuples in figure 1 and the left column on page 2.  Gabel further teaches sampling methods including greedy or exploring policy in the left column on page 2.  At least a greedy sampling method prioritizes different data differently.); and
training a neural network on at least the sampled transition, wherein the neural network is a deep neural network that is configured to receive an input comprising input state data and to generate as output a respective action — value parameter for each action in the set of actions (Gabel teaches training phases in section II and training a neural network in II.B.  Further, Gabel teaches a neural fitted Q iteration to produce Q numbers; see section III.).

With respect to claim 12 the rejection of claim 11 is incorporated.  Further Gabel teaches: 
wherein the sampling strategy prioritizes transitions based on how much the neural network will learn from being trained on the transitions (Gabel teaches sampling methods including greedy or exploring policy in the left column on page 2.  A greedy sampling method prioritizes different data differently.).

With respect to claim 18 the rejection of claim 11 is incorporated.  Further Gabel teaches: 
generating a new transition using the neural network (Gabel teaches generating new state transitions using the NFQ iterations in section III.A.); and
storing the new transition in the replay memory (Gabel teaches a computer implemented system that would include a memory; see abstract.).

With respect to claim 19 the rejection of claim 18 is incorporated.  Further Gabel teaches: 
after storing the new transition in the replay memory, discarding one or more of the plurality of transitions from the replay memory (Gabel teaches filtering out transitions that comprise improper policies in figures 5 and 6.).

With respect to independent claim 20 Gabel teaches: 
One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers (Gabel teaches a computer implemented system; see abstract.) cause the one or more computers to perform operations comprising:
generating a transition as a result of an interaction with an environment (Gabel teaches sampling realizes interaction with the environment and creates a set of transition tuples; see second paragraph, in the left column, on page 2);
storing the transition in a replay memory that stores a plurality of transitions, each transition comprising respective starting state data defining a respective starting state of the environment, respective action data defining a respective action from a set of actions, and respective next state data defining a respective next state of the environment resulting from the respective action being performed in the environment when the environment is in the respective starting state (Gabel teaches a set of transition tuples consisting each of a state (i.e. starting state), an action taken in that state (i.e. action data), the immediate reward received, as well as the successor state entered (i.e. next state); see left column on page 2.);
sampling a transition from the plurality of transitions in the replay memory in accordance with a sampling strategy that prioritizes different transitions in the replay memory differently (Gabel teaches the step of sampling experience that creates a set of transition tuples in figure 1 and the left column on page 2.  Gabel further teaches sampling methods including greedy or exploring policy in the left column on page 2.  At least a greedy sampling method prioritizes different data differently.); and
training a neural network on at least the sampled transition, wherein the neural network is a deep neural network that is configured to receive an input comprising input state data and to generate as output a respective action — value parameter for each action in the set of actions (Gabel teaches training phases in section II and training a neural network in II.B.  Further, Gabel teaches a neural fitted Q iteration to produce Q numbers; see section III.).

With respect to claim 21 the rejection of claim 20 is incorporated.  Further Gabel teaches: 
wherein the sampling strategy prioritizes transitions based on how much the neural network will learn from being trained on the transitions (Gabel teaches sampling methods including greedy or exploring policy in the left column on page 2.  A greedy sampling method prioritizes different data differently.).

Allowable Subject Matter
Claims 4-8 and 13-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and the other rejections are overcome.  The prior art does not teach the features of claims 4 and 13.

Response to Arguments
Applicant's arguments filed January 5, 2026 have been fully considered but they are not persuasive.
Beginning on page 8 of remarks, Applicant argues that the claims are eligible because they solve the problem of how to effectively apply reinforcement learning to large data sets.  MPEP 2106.05(a) indicates that “[a]n indication that the claimed invention provides an improvement can include a discussion in the specification that identifies a technical problem and explains the details of an unconventional technical solution expressed in the claim,” and “[a]n important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome.”  The claims do not detail a particular solution as the quantity or type of data, and the training of the neural network are not recited in particular detail.  As such, the details are not expressed in the claims and the arguments are not persuasive.
Beginning on page 9 of remarks, Applicant argues that the amended claims overcome the previously cited at, Gabel.  Applicant argues that Gabel’s teaching of greedy sampling impacts how the tuples are created and not how transitions are sampled from the replay memory.  Gabel teaches greedy sampling, which prioritizes samples according to some measure.  As claimed, the sampling broadly recites prioritizing transition samples and a greedy sampling is a prioritization scheme.  Gabel reads on the claimed sampling features in claim 2.  Additionally, Gabel teaches the newly amended features in claim 2, as shown above.  Applicant’s arguments are not persuasive.
The previous rejections of claims under 35 U.S.C. 112, second paragraph, are withdrawn in view of Applicant’s amendments and arguments on page 9.

Conclusion
Claims 2-21 are rejected.

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T PELLETT whose telephone number is (571)270-7156.  The examiner can normally be reached on Monday - Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DANIEL T PELLETT/Primary Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Jun 25, 2021
Application Filed
Aug 29, 2025
Non-Final Rejection — §101, §102, §112
Jan 05, 2026
Response Filed
Feb 26, 2026
Final Rejection — §101, §102, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

19/027,530
Patent 12602622
METHOD AND DEVICE FOR TRAINING AND PREDICTING A CONJUNCTION PARAMETER FROM CONJUNCTION DATA MESSAGES
2y 5m to grant Granted Apr 14, 2026
16/566,907
Patent 12585976
AUTOMATED EXPLAINER OF REINFORCEMENT LEARNING ACTIONS USING OCCUPATION MEASURES
2y 5m to grant Granted Mar 24, 2026
17/381,141
Patent 12586683
DECISION-MAKING UNDER SELECTIVE LABELS
2y 5m to grant Granted Mar 24, 2026
17/488,198
Patent 12572623
MACHINE LEARNING FOR INTELLIGENT RADIOTHERAPY DATA ANALYTICS
2y 5m to grant Granted Mar 10, 2026
17/142,117
Patent 12566950
Generation of Secure Synthetic Data Based On True-Source Datasets
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
91%
With Interview (+13.8%)
3y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 451 resolved cases by this examiner. Grant probability derived from career allow rate.