Last updated: April 19, 2026
Application No. 18/206,644
LEARNING DEVICE AND LEARNING METHOD

Non-Final OA §101§102
Filed
Jun 07, 2023
Examiner
ABOU EL SEOUD, MOHAMED
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Honda Motor Co. Ltd.
OA Round
1 (Non-Final)
This examiner grants 38% of cases after interview

— +38.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 208 resolved cases, 2023–2026
Examiner Intelligence

ABOU EL SEOUD, MOHAMED View full profile →
Grants only 38% of cases
Career Allow Rate
80 granted / 208 resolved
-16.5% vs TC avg
Strong +39% interview lift
Without
With
+38.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
46 currently pending
Career history
254
Total Applications
across all art units
Statute-Specific Performance

§101
16.1%
-23.9% vs TC avg
§103
48.2%
+8.2% vs TC avg
§102
15.1%
-24.9% vs TC avg
§112
14.7%
-25.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 208 resolved cases
Office Action

§101 §102
DETAILED ACTION
This office action is responsive to the above identified application filed 6/7/2023.  The application contains claims 1-5, all examined and rejected.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The Information Disclosure Statement with references submitted 9/30/2025, and 6/7/2023 have been considered and entered into the file.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. 
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Claim limitations in claim 1 has been interpreted under 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, because it uses a non-structural term “module” coupled with functional language without reciting sufficient structure to achieve the function.  Furthermore, the non-structural term is not preceded by a structural modifier.

Claim 1 recites limitations " acquisition unit configured to”, “estimation unit configured to”, “learning unit configured to” estimation unit configured to”, “identification unit configured to”  coupled with functional language without reciting sufficient structure to achieve the function.
Since these claim limitations invoke 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, claim 1 is interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof. 
A review of the specification and drawings shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph limitation: Fig. 3, elements 1, 11, 13, 14, 15, 16 Pages 8-9 that states, “As shown in FIG. 3, the learning device 1 includes, for example, an acquisition unit 11, a storage unit 12, a discrete latent variable estimation unit 13, an optimal action learning unit 14, a value function estimation unit 15, an identification unit 16, and a processing unit 17” Based on the guidelines announced from Federal Register Vol. 76, No. 27, this has been interpreted as encompassing a hardware or hardware in combination with software implementation of the module, but not a pure software implementation.
If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. Claimed modules also trigger interpretation of the claim language under 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph since they are considered a place holder for a corresponding structure in the specification.
If applicant does not wish to have the claim limitation treated under 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, applicant may amend the claim so that it will clearly not invoke 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, or present a sufficient showing that the claim recites sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance with 35 U.S.C. § 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-5 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Claims 1-5 is rejected under 35 USC 101 because the claimed inventions are directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
While independent claims 1, and 2 are each directed to a statutory category, it recites a series of steps, which appears to be directed to an abstract idea (mental process, mathematical concept).  
Claims 1-5 are rejected under 35 U.S.C. § 101 because the instant application is directed to non-patentable subject matter. Specifically, the claims are directed toward at least one judicial exception without reciting additional elements that amount to significantly more than the judicial exception. The rationale for this determination is in accordance with the guidelines of USPTO, applies to all statutory categories, and is explained in detail below.
When considering subject matter eligibility under 35 U.S.C. 101, (1) it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter. If the claim does fall within one of the statutory categories, (2a) it must then be determined whether the claim is directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), and if so (2b), it must additionally be determined whether the claim is a patent-eligible application of the exception. If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim amounts to significantly more than the abstract idea itself. Examples of abstract ideas include certain methods of organizing human activities; a mental processes; and mathematical concepts, (2019 PEG)
STEP 1.
Per Step 1, the claims are determined to include machine and process, and as in independent Claim 1, 2 and in the therefrom dependent claims. Therefore, the claims are directed to a statutory eligibility category.
At step 2A, prong 1, The invention is directed to identifying features within received data that could be an indication of the probability of occurrence of a machine failure based on analyzed historic data which is akin to Mental Process (see Alice), As such, the claims include an abstract idea. When considering the limitations individually and as a whole the limitations directed to the abstract idea are: 
Claim 1:
“estimate a discrete latent variable representing characteristics of features from the state information and the action information”, “learn an optimal action using the state information and the discrete latent variable”, “learn an action value from the state information and the action information”, “identify the discrete latent variable that maximizes the action value using a result”  (Mental process, observation, evaluation and judgment) (Mental process, observation, evaluation and judgment).
Claim 2:
“estimation step of estimating a discrete latent variable representing characteristics of features of the dataset from the state information and the action information included in the dataset”, “a first learning step of learning an optimal action using the state information and the estimated discrete latent variable”, “a second learning step of learning an action value from the state information and the action information”, “an identification step of identifying the discrete latent variable that maximizes the action value using a result of learning of the first learning step and a result of learning of the second learning step” (Mental process, observation, evaluation and judgment) (Mental process, observation, evaluation and judgment).
The claims recites additional elements as 
Claim 1:
“A learning device comprising: a dataset acquisition unit”, “a discrete latent variable estimation unit”, “an optimal action learning unit”, “a value function estimation unit”, “an identification unit” (“Using a computer as a tool to perform a mental process”, MPEP 2106.04(a)(2)(III)(C));
“acquire a dataset including state information and action information on which a policy is to be learned” (insignificant extra-solution activity, MPEP 2106.05(g)).
	Claim 2:
“an acquisition step of acquiring a dataset including state information and action information on which a policy is to be learned” (insignificant extra-solution activity, MPEP 2106.05(g)).
This judicial exception is not integrated into a practical application. The elements are recited at a high level of generality, i.e. a generic computing system performing generic functions including generic processing of data. Accordingly the additional elements do not integrate the abstract into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Therefore the claims are directed to an abstract idea. (2019 Revised Patent Subject Matter Eligibility Guidance ("2019 PEG"). Thus, under Step 2A of the Mayo framework, the Examiner holds that the claims are directed to concepts identified as abstract.
STEP 2B.
Because the claims include one or more abstract ideas, the examiner now proceeds to Step 2B of the analysis, in which the examiner considers if the claims include individually or as an ordered combination limitations that are "significantly more" than the abstract idea itself. This includes analysis as to whether there is an improvement to either the "computer itself," "another technology," the "technical field," or significantly more than what is "well-understood, routine, or conventional" (WURC) in the related arts. 
The instant application includes in Claim 1 additional steps to those deemed to be abstract idea(s).
When taken the steps individually, these steps are:
Claim 1:
“A learning device comprising: a dataset acquisition unit”, “a discrete latent variable estimation unit”, “an optimal action learning unit”, “a value function estimation unit”, “an identification unit” (“Using a computer as a tool to perform a mental process”, MPEP 2106.05(f)(2));
“acquire a dataset including state information and action information on which a policy is to be learned” (Well-Understood Routine, Conventional Activity, sending, receiving, displaying and processing data are common and basic functions in computer technology, MPEP 2106.05(d)(II)(i)).
	Claim 2:
“an acquisition step of acquiring a dataset including state information and action information on which a policy is to be learned” (Well-Understood Routine, Conventional Activity, sending, receiving, displaying and processing data are common and basic functions in computer technology, MPEP 2106.05(d)(II)(i)).
In the instant case, Claim 1 is directed to above mentioned abstract idea. Technical functions such as receiving, and extracting are common and basic functions in computer technology. The individual limitations are recited at a high level and do not provide any specific technology or techniques to perform the functions claimed.
In addition, when the claims are taken as a whole, as an ordered combination, the combination of steps does not add "significantly more" by virtue of considering the steps as a whole, as an ordered combination. The instant application, therefore, still appears only to implement the abstract idea to the particular technological environments using what is well-understood, routine, and conventional in the related arts. The steps are still a combination made to the abstract idea. The additional steps only add to those abstract ideas using well understood and conventional functions, and the claims do not show improved ways of, for example, an unconventional non-routine functions for analyzing model operations or updating the model that could then be pointed to as being "significantly more" than the abstract ideas themselves. 
Moreover, Examiner was not able to identify any "unconventional" steps, which, when considered in the ordered combination with the other steps, could have transformed the nature of the abstract idea previously identified. The instant application, therefore, still appears to only implement the abstract ideas to the particular technological environments using what is well-understood, routine, and conventional (WURC) in the related arts.
Further, note that the limitations, in the instant claims, are done by the generically
recited computing devices. The limitations are merely instructions to implement the abstract idea on a computing device that is recited in an abstract level and require no more than a generic computing devices to perform generic functions.
CONCLUSION
It is therefore determined that the instant application not only represents an abstract idea identified as such based on criteria defined by the Courts and on USPTO examination guidelines, but also lacks the capability to bring about "Improvements to another technology or technical field" (Alice), bring about "Improvements to the functioning of the computer itself" (Alice), "Apply the judicial exception with, or by use of, a particular machine" (Bilski), "Effect a transformation or reduction of a particular article to a different state or thing" (Diehr), "Add a specific limitation other than what is well-understood, routine and conventional in the field" (Mayo), "Add unconventional steps that confine the claim to a particular useful application" (Mayo), or contain "Other meaningful limitations beyond generally linking the use of the judicial exception to a particular technological environment" (Alice), transformed a traditionally subjective process performed by humans into a mathematically automated process executed on computers (McRO), or limitations directed to improvements in computer related technology, including claims directed to software (Enfish).
The dependent claims, when considered individually and as a whole, likewise do not provide "significantly more" than the abstract idea for similar reasons as the independent claim.
Claim 3 disclose “a value function update step of putting the identified discrete latent variable into the second learning step to update the value function”, “a latent variable action update step of putting the updated value function into the estimation step and the first learning step to update the discrete latent variable and the optimal action”, “a third learning step of repeating the value function update step and the latent variable action update step to learn the discrete latent variable and the optimal action”(Mathematical concept, Mental process); It does not integrate the abstract idea into a practical application and did not add significantly more to the abstract idea. claims 4 disclose “the learned policy is executed, not all the first learning steps are activated, the discrete latent variable is estimated according to a situation, and a lower policy corresponding to the estimated discrete latent variable is sequentially selected and activated” (Mental process); It does not integrate the abstract idea into a practical application and did not add significantly more to the abstract idea. claims 5 disclose wherein, when z is the discrete latent variable, z′ is a next discrete latent variable, s is a state, s′ is a next state, Qw is an estimate of a Q value parameterized by a vector w, y is a target value, r is a reward in learning, γ is a discount factor, θ is a vector representing parameters of a policy, ϕ is a vector representing parameters of a model of a posterior distribution, (z˜)′ is the next discrete latent variable that has been estimated, fπ is a function that quantifies performance of a policy π, lcvae is a variational lower bound, and a is an action, the estimation step includes calculating the latent variable using

    PNG
    media_image1.png
    76
    455
    media_image1.png
    Greyscale
the value function update step includes calculating the target value y using 
    PNG
    media_image2.png
    73
    376
    media_image2.png
    Greyscale
 the value function update step includes updating an action value function by updating a critic that minimizes 
    PNG
    media_image3.png
    77
    210
    media_image3.png
    Greyscale
, and the latent variable action update step includes updating a first model by updating an actor and a posterior distribution to maximize 
    PNG
    media_image4.png
    98
    500
    media_image4.png
    Greyscale
 (Mathematical Concept). It does not integrate the abstract idea into a practical application and did not add significantly more to the abstract idea.
The dependent claims which impose additional limitations also fail to claim patent eligible subject matter because the limitations cannot be considered statutory. The dependent claim(s) have been examined individually and in combination with the preceding claims, however they do not cure the deficiencies of claims 1 and 2 ; where all claims are directed to the same abstract idea, "addressing each claim of the asserted patents [is] unnecessary." Content Extraction &. Transmission LLC v, Wells Fargo Bank, Natl Ass'n, 776 F.3d 1343, 1348 (Fed. Cir. 2014). If applicant believes the dependent claims are directed towards patent eligible subject matter, they are invited to point out the specific limitations in the claim that are directed towards patent eligible subject matter. Claims for the other statutory classes are similarly analyzed.
For at least these reasons, the claimed inventions of each of dependent claims 1-5,are directed or indirect to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more and are rejected under 35 USC 101.
Claim Rejections - 35 USC § 102
Claims 1-4 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by HIERARCHICAL REINFORCEMENT LEARNING VIA ADVANTAGE-WEIGHTED INFORMATION MAXIMIZATION Published on Mar 2019 [hereinafter D1].

With regard to Claim 1,
D1 teach a learning device comprising:
a dataset acquisition unit configured to acquire a dataset including state information and action information on which a policy is to be learned (P.6, Algorithm 1, “Record a data sample (s, a, r, s'), Aggregate the data in DR and Don”, “Initialize: Replay buffer DR, on-policy buffer Don”, P.5, Sec (4), “Instead of stochastic option policies, we consider deterministic option policies and model them using separate neural networks”, P.6, Sec. 4, “Two neural networks trained …”);
a discrete latent variable estimation unit configured to estimate a discrete latent variable representing characteristics of features from the state information and the action information (Abstract, “we propose an HRL method that learns a latent variable of a hierarchical policy …”, “Our approach can be interpreted as a way to learn a discrete and latent representation of the state-action space”, P.2, Sec. (2.1), “hierarchical policy π(a|s) = P o∈O π(o|s)π(a|s, o), where o is the latent variable and O is the set of possible values of o”, “In general, latent variable o can be discrete … or continuous”, “we propose the learning of the latent variable by maximizing MI between latent variables and state-action pairs”, P. 4, 3.1, “we consider a neural network that estimates p(o|s, a; η) parameterized with vector η, which we refer to as the option network”, discrete variable o estimated from state action pairs (s,a), P.5, Sec (4), “Instead of stochastic option policies, we consider deterministic option policies and model them using separate neural networks”, P.6, Sec. 4, “Two neural networks trained …”); 
an optimal action learning unit configured to learn an optimal action using the state information and the discrete latent variable (P.5, 4, “π(a|s, o) = µ o θ (s) deterministic option policies parameterized by vector θ”, P.6, Algorithm 1, “Update the option policy networks µ o θ (s) for o = 1, ..., O with Equation (19)”, P.6, “Differentiating the objective function in Equation (14), we obtain the deterministic policy gradient of our option-policy µ o θ (s) given by ∇θJ(w, θ) = ... (19)”, Algorithm 1, “Update the option policy networks µ o θ (s) for o = 1, ..., O with Equation (19)”, P.5, Sec (4), “Instead of stochastic option policies, we consider deterministic option policies and model them using separate neural networks”, P.6, Sec. 4, “Two neural networks trained …”); 
a value function estimation unit configured to learn an action value from the state information and the action information (P.5, Sec. 4, “Qπ (s, a; w) is an approximated Q-function parameterized using vector w”, P.6, Sec. 4, “Two neural networks (Qπ w1 , Qπ w2 ) are trained to estimate the Q-function”, learning action-value Q(s,a) from state and action data, P.5, Sec (4), “Instead of stochastic option policies, we consider deterministic option policies and model them using separate neural networks”, P.6, Sec. 4, “Two neural networks trained …”); and 
an identification unit configured to identify the discrete latent variable that maximizes the action value using a result from the optimal action learning unit and a result from the value function estimation unit (Abstract, “the gating policy learns to select option policies based on an option-value function”, P. 5, Sec. 4, “In HRL, the goal of the gating policy is to generate a value of o that maximizes the conditional expectation of the return … Eq. 15”, P. 5, “option-value function for deterministic option policies is given by … Eq. 16 … Eq. 17”, Algorithm 1, “Draw an option for a given s by following Equation 17: o ∼ π(o|s)”, identified option 0 is chosen by gating rule that depends on the learned option action and the learned Q value, P.5, Sec (4), “Instead of stochastic option policies, we consider deterministic option policies and model them using separate neural networks”, P.6, Sec. 4, “Two neural networks trained …”).

With regard to Claim 2,
D1 teach a learning method comprising: 
an acquisition step of acquiring a dataset including state information and action information on which a policy is to be learned (P.6, Algorithm 1, “Record a data sample (s, a, r, s'), Aggregate the data in DR and Don”, “Initialize: Replay buffer DR, on-policy buffer Don”); 
an estimation step of estimating a discrete latent variable representing characteristics of features of the dataset from the state information and the action information included in the dataset (P.6, Algorithm 1, “Update the option network by minimizing Equation (7) for samples in Don”, P. 4, “We formulate the learning of the latent variable o as minimizing … Eq. 7”, P. 4, 3.1, “we consider a neural network that estimates p(o|s, a; η) parameterized with vector η, which we refer to as the option network”, P. 2, 2.1, ¶2, “latent variable o can be discrete”); 
a first learning step of learning an optimal action using the state information and the estimated discrete latent variable (P.5, Sec (4), “π(a|s, o) = µ o θ (s) deterministic option policies for o = 1, ..., O with Equation (19)”, P.6, Algorithm 1, “Update the option policy networks µ o θ (s)”); 
a second learning step of learning an action value from the state information and the action information (P.6, Algorithm 1, “Update the Q network parameter w”, P.6, Sec. 4, “Two neural networks (Qπ w1 , Qπ w2 ) are trained to estimate the Q-function, and the target value of the Q-function is computed as yi = ri + γ min1,2 Q(si , ai) for sample (si , ai , a 0 i , ri) in a batch sampled from a replay buffer …”); and
an identification step of identifying the discrete latent variable that maximizes the action value using a result of learning of the first learning step and a result of learning of the second learning step (P.6, Algorithm 1, “Draw an option for a given s by following Equation 17: o ∼ π(o|s)”, “which we can estimate using the deterministic option policy µ o θ (s) and the approximated action value function Qπ (s, a; w). In this work we employ the softmax gating policy of the form … eq. (17)”, P. 5, Sec. 4, “In HRL, the goal of the gating policy is to generate a value of o that maximizes the conditional expectation of the return”, P. 5, “option-value function for deterministic option policies is given by … Eq. 16 … we employ the softmax gating policy of the form … Eq. 17”, option action output (from first learning step) and Q(s,a,w) (from second learning step) to select latent o based on option value). 

With regard to Claim 3,
D1 teach the learning method according to claim 2, further comprising: 
a value function update step of putting the identified discrete latent variable into the second learning step to update the value function (P. 5, Sec (4), “where Qπ (s, a; w) is an approximated Q-function parameterized using vector w”, P.6, “In this study, the Q-function is trained in a manner proposed by Fujimoto et al. (2018). Two neural networks (Qπ w1 , Qπ w2 ) are trained to estimate the Q-function, and the target value of the Q-function is computed as yi = ri + γ min 1,2 Q(si, ai) for sample (si, ai, ai' , ri) in a batch sampled from a replay buffer …”, P.6, Algorithm 1, “Sample a batch Dbatch ∈ DR “, “Update the Q network parameter w”);
a latent variable action update step of putting the updated value function into the estimation step and the first learning step to update the discrete latent variable and the optimal action (P.6 Algorithm 1, “Draw an option for a given s by following Equation 17: o ∼ π(o|s)”, “Draw an action a ∼ β(a|s, o) = µ o θ (s) + ε”, “Record a data sample (s, a, r, s 0 )”, “Aggregate the data in DR and Don”, “Sample a batch Dbatch ∈ DR”, “Update the Q network parameter w”); and 
a third learning step of repeating the value function update step and the latent variable action update step to learn the discrete latent variable and the optimal action (P. 4, ¶3, “we consider a neural network that estimates p(o|s, a; η) parameterized with vector η, which we refer to as the option network. We formulate the learning of the latent variable o as minimizing Loption(η) = `(η) − λI o,(s, a); η  , (7)”, ¶2, “where Aπ (s, a) = Qπ (s, a) − V π (s) is the advantage function”, P.6, ¶3, “The state value function is given as V π (s) = X o∈O π(o|s)Q π (s, µ o θ (s); w), (18) which can be computed using Equation (17). We use this state-value function when computing the advantage-weighted importance as A(s, a) = Q(s, a) − V (s)”, Eq. (10) P.5, ¶3, “We call this importance weight W the advantage-weighted importance and employ it to compute the objective function used to estimate the latent variable”, Algorithm 1, Exterior loop, “repeat”, “until the convergence”, within the loop, “Update the option network by minimizing Equation (7) for samples”, “Update the Q network parameter w”, “Estimate p(o|si , ai) for (si , ai) ∈ Dbatch using the option network Assign samples (si , ai) ∈ Dbatch to the option o ∗ = arg max p(o|si , ai) Update the option policy networks µ o θ (s) for o = 1, ..., O with Equation (19)”).

With regard to Claim 4,
D1 teach the learning method according to claim 2, wherein, when the learned policy is executed, not all the first learning steps are activated, the discrete latent variable is estimated according to a situation, and a lower policy corresponding to the estimated discrete latent variable is sequentially selected and activated (P. 2, 2.1, “we consider hierarchical policy π(a|s) = P o∈O π(o|s)π(a|s, o), where o is the latent variable and O is the set of possible values of o”, P.6, Sec. (4), “When performing a rollout, o is drawn by following the gating policy in Equation (17), and an action is generated by the selected option-policy network”, P.6, ¶3, “In this study, the gating policy determines the option once every N time steps’).  
Examiner notes
	
Examiner notes that the claim 5 is rejected under 35 USC 101. Currently, there is no art rejection applied as upon review of the evidence at hand, it is hereby concluded that the evidence obtained and made of record, alone or in combination, neither anticipates, reasonably teaches, nor renders obvious the below noted features of applicant's invention as the noted features amount to more than a predictable use of elements in the prior art. The features include “The learning method according to claim 3, wherein, when z is the discrete latent variable, z′ is a next discrete latent variable, s is a state, s′ is a next state, Qw is an estimate of a Q value parameterized by a vector w, y is a target value, r is a reward in learning, γ is a discount factor, θ is a vector representing parameters of a policy, ϕ is a vector representing parameters of a model of a posterior distribution, (z˜)′ is the next discrete latent variable that has been estimated, fπ is a function that quantifies performance of a policy π, lcvae is a variational lower bound, and a is an action, the estimation step includes calculating the latent variable using

    PNG
    media_image1.png
    76
    455
    media_image1.png
    Greyscale
the value function update step includes calculating the target value y using 
    PNG
    media_image2.png
    73
    376
    media_image2.png
    Greyscale
the value function update step includes updating an action value function by updating a critic that minimizes 
    PNG
    media_image3.png
    77
    210
    media_image3.png
    Greyscale
, and the latent variable action update step includes updating a first model by updating an actor and a posterior distribution to maximize 
    PNG
    media_image4.png
    98
    500
    media_image4.png
    Greyscale


A remarkable art in this area, HIERARCHICAL REINFORCEMENT LEARNING VIA ADVANTAGE-WEIGHTED INFORMATION MAXIMIZATION (hereinafter D1) that teach hierarchical RL with discrete latent option and a gating policy that select option policy using an option value determined based on Q values. It also teaches an advantage weight importance weighting scheme used to estimate a mutual information objective for learning a latent representation based on state and action. However, it does not teach requirement 5 z' = argmax over candidate discrete latent at next state as required (D1 uses a softmax gating policy). D1 also does not teach CVAE-style variational lower bound objective Icvae (s, a;ϴ, ɸ) optimized with an actor as required by claim 5. 

Another remarkable teaching, “Goal-Conditioned Variational Autoencoder Trajectory Primitives with Continuous and Discrete Latent Codes” disclose a framework for modeling demonstrated trajectories using a variational autoencoder (VAE) framework for modeling and generating robot trajectories, including training with a variational lower bound and discrete latent variables. However it does not teach reinforcement learning, compute Q learning target Y, use a double critic minimum, select next discrete latent variable using an argmax over Q function, and weight updates by policy performance.
Another remarkable teaching , “Categorical Reparameterization with Gumbel softmax” [herein D3] that teach training with discrete latent variable. However, D3 does not teach Q learning target y, or double critic min, or argmax next latent selection, and no policy performance weighting.
In addition to the above, the Examiner emphasizes the interrelation of the above distinguishing elements with the remainder of each respective claim element, and further notes that it is the interrelation that truly distinguishes applicant's invention from the evidence at hand.
However, the claim is still rejected under 35 USC 101 and further evaluation will be provided to the claims upon receiving the applicant’s response.
Conclusion

The prior art made of record and not relied upon is considered pertinent to the applicant’s disclosure.   
“Goal-Conditioned Variational Autoencoder Trajectory Primitives with Continuous and Discrete Latent Codes” disclose a framework for modeling demonstrated trajectories using a variational autoencoder (VAE) framework for modeling and generating robot trajectories, including training with a variational lower bound and discrete latent variables. However it does not teach reinforcement learning, compute Q learning target Y, use a double critic minimum, select next discrete latent variable using an argmax over Q function, and weight updates by policy performance.
Examiner has pointed out particular references contained in the prior arts of record in the body of this action for the convenience of the applicant.  Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and Figures may apply as well.  It is respectfully requested from the applicant, in preparing the response, to consider fully the entire references as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior arts or disclosed by the examiner.  It is noted that any citation to specific pages, columns, figures, or lines in the prior art references any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331-33, 216 USPQ 1038-39 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMED ABOU EL SEOUD whose telephone number is (303)297-4285. The examiner can normally be reached Monday-Thursday 9:00am-6:00pm MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMED ABOU EL SEOUD/Primary Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jun 07, 2023
Application Filed
Feb 07, 2026
Non-Final Rejection — §101, §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/304,415
Patent 12602602
SYSTEMS AND METHODS FOR VALIDATING FORECASTING MACHINE LEARNING MODELS
2y 5m to grant Granted Apr 14, 2026
18/105,317
Patent 12578719
PREDICTION OF REMAINING USEFUL LIFE OF AN ASSET USING CONFORMAL MATHEMATICAL FILTERING
2y 5m to grant Granted Mar 17, 2026
17/331,383
Patent 12561565
MODEL DEPLOYMENT AND OPTIMIZATION BASED ON MODEL SIMILARITY MEASUREMENTS
2y 5m to grant Granted Feb 24, 2026
17/210,303
Patent 12461702
METHODS AND SYSTEMS FOR PROPAGATING USER INPUTS TO DIFFERENT DISPLAYS
2y 5m to grant Granted Nov 04, 2025
17/820,909
Patent 12405722
USER INTERFACE DEVICE FOR INDUSTRIAL VEHICLE
2y 5m to grant Granted Sep 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
38%
Grant Probability
77%
With Interview (+38.7%)
4y 2m
Median Time to Grant
Low
PTA Risk
Based on 208 resolved cases by this examiner. Grant probability derived from career allow rate.