DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is responsive to pending claims 1-20 filed 6/1/2023.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim(s) 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The 35 U.S.C. 101 subject matter eligibility analysis first asks whether the claim is directed to one of the four statutory categories (Step 1). It next asks whether the claim is directed to an abstract idea (Step 2A), via Prong 1, whether an abstract idea (e.g., mathematical concept, mental process, certain methods of organizing human activity) is recited, and Prong 2, whether it is integrated into a practical application. It finally asks whether the claim as a whole includes additional elements that amount to significantly more than the judicial exception (Step 2B). See MPEP 2106.
STEP 1: The claims falls within one of the four statutory categories:
As all the claims are directed to hardware systems, tangible computer program products, and methods, the claims fall within the statutory categories.
STEP 2A PRONG 1: The claims recite a judicial exception:
The claims are directed to generating policy solutions to Markov Decision Processes, such as via a Reinforcement learning (RL) agent, by performing iterative tuning. As such, the claims are directed to a mental process, that of using judgment and observation to adjust a process in order to generate a strategy to generate a solution. Furthermore, the machine learning model itself is trained via a mathematical process. In the outline below, additional elements are underlined for further analysis. In particular:
For claim 1: A system, comprising:
one or more processors; and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to:
iteratively, until one or more Markov Decision Process (MDP) solutions are identified:
receive input data comprising one or more first states and one or more first actions (Receiving of state data for a Markov process may be performed in the mind);
identify, via a machine learning model (MLM), a subset of the input data (identifying subsets of the state data may be performed in the mind);
formulate, via the MLM, a search space based on the subset of the input data, the search space comprising one or more second states and one or more second actions (Formulating and planning a target subspace and target actions in order to identify policy may be performed in the mind);
conduct, via the MLM, hyperparameter tuning of the search space (conducting and generating hyperparameter tuning, such as adjusting hyperparameters, may be performed in the mind);
generate, via the MLM, an MDP instance based on the hyperparameter tuning (generating a MDP instance, i.e., performing training and tuning based on hyperparameters, is a mathematical optimization algorithm, hence, a mathematical concept); and
determine, via the MLM, whether the generated MDP instance comprises a first MDP solution (determining suitability of an MDP instance is an instance of judgment and observation performed in the mind).
For claim 2: The system of claim 1, wherein the input data comprises one or more annotated states, one or more first actions, one or more binning strategies, or combinations thereof (Training parameters may be identified and selected with the mind).
For claim 4: The system of claim 3, wherein the MLM is trained to accept a transformed state search space via feature transformation (Transformed feature spaces, e.g., removing states, etc. based on target policy, may be performed in the mind).
For claim 5: The system of claim 1, wherein identifying the subset of the input data comprises: ranking the one or more first states; and selecting the one or more second states based on the ranked one or more first states (Ranking states via a heuristic may be performed in the mind).
For claim 6. The system of claim 1, wherein the search space further comprises one or more bins configured to reduce a total number of the one or more second states and the one or more second actions thereby reducing an overall search space area (Determining binning policy may be performed in the mind).
For claim 7: The system of claim 1, wherein the instructions are further configured to cause the system to: receive, via a graphical user interface (GUI), a user selection of a specific criteria, wherein determining whether the generated MDP instance comprises the first MDP solution is based on the specific criteria (Determining criteria and evaluation based on criteria is a mental process).
For claim 8: The system of claim 1, wherein the input data is received via a web-based interface (receiving input data may be performed in the mind).
For claim 9. The system of claim 1, wherein conducting the hyperparameter tuning comprises: iteratively, via a search algorithm: generating one or more hyperparameter combinations (generating hyperparameter settings may be performed in the mind); and training the MLM to utilize the one or more hyperparameter combinations (Training a MLM via an optimization algorithm is a mathematical concept).
For claim 10: The system of claim 9, wherein generating the one or more hyperparameter combinations is based on a metric produced by performing a Fitted Q Evaluation of the trained MLM (Generating metric via FQE is a mathematical process; furthermore, adjusting hyperparameter combinations based on performance metrics is a mental process ).
For claim 18: The computer-implemented method of claim 16, wherein the input data is continuously received via a graphical user interface (GUI), and wherein the identifying, formulating, conducting, and generating are conducted dynamically based on the continuously received input data (repeatedly receiving input data for identifying / observing, formulating, conducting may be performed in the mind; the iterative performing of the generating operation is a repeated application of the mathematical concept).
The remaining claims contain analogous limitations and hence are similarly analyzed.
STEP 2A PRONG 2: The claims do not integrate the exception into a practical application:
The additional elements do not integrate the iterative mental tuning and adjustments of training parameters into a practical integration. They are not directed to a particular technical problem, but rather comprise improvements on the mental process itself.
For claim 1, 4, 19 the additional elements comprise one or more processors; and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors and performing the mental steps via a MLM. However, these are mere instructions to implement an abstract idea on a computer or via machine learning algorithms and hence do not constitute an integration into practical application.
For claim 3, the additional element further recite that the wherein the MLM comprises a predictive model. However, this is mere instructions to implement the mental process on a predictive MLM model and hence do not constitute an integration into practical application.
For claim 7-8, 17-18, the additional element further recites that the receiving occurs via a GUI and via a web-based interface. However, these are mere instructions to implement an abstract idea on a computer or networking interface and hence do not constitute an integration into practical application.
For claim 11, the additional elements include: the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor. However, these are mere instructions to implement an abstract idea on a computer and hence do not constitute an integration into practical application.
For claim 16, the additional elements include implementation via a computer. However, these are mere instructions to implement an abstract idea on a computer and hence do not constitute an integration into practical application.
For claim 20, the additional elements include: wherein the input data is received via a JSON file format. However, this is a general linking of the abstract idea to a field of use or particular technology and hence does not serve to meaningfully limit the abstract idea. Hence, they do not constitute an integration into a practical application.
STEP 2B: The claim as a whole do not include additional elements that amount to significantly more than the abstract idea:
For claim 1, 4, 19 the additional elements comprise one or more processors; and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors and performing the mental steps via a MLM. However, these are general purpose computing and machine learning elements and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.
For claim 3, the additional element further recite that the wherein the MLM comprises a predictive model. However, these predictive machine learning elements and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.
For claim 7-8, 17-18, the additional element further recites that the receiving occurs via a GUI and via a web-based interface. However, these are general purpose computing and networking interfaces and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.
For claim 11, the additional elements include: the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor. However, these are general purpose computing elements and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.
For claim 16, the additional elements include implementation via a computer. However, these are general purpose computing and networking interfaces and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.
For claim 20, the additional elements include: wherein the input data is received via a JSON file format. However, JSON is a widely known file format for storing data and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.
.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-9, 11-13, 15-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Nair (US 20230281504 A1).
For claim 1, Nair discloses: a system, comprising:
one or more processors (fig.9:910); and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to (fig.9:915, 935, 940, 950):
iteratively, until one or more Markov Decision Process (MDP) solutions are identified (fig.2 gives overview of the invention, including executing and evaluating learned MDP policies, see fig.2:260, 0089; 0080-81, with 0089, 0017-18 contemplating an iterative process of patching rules, training agents, etc., such as conducted via the user interface fig.2:210, see also fig.7, 0166-167 disclosing iterative GUI workflow, such as via interface of fig.4-5, 0108):
receive input data comprising one or more first states and one or more first actions (fig.2:230, 0059, 0066-69: action spaces are defined for the RL agent and scenario-specific state spaces are defined);
identify, via a machine learning model (MLM), a subset of the input data (0066-69: based on scenario being tested, a state space is defined, such as a subset or a reduced state space (0067), for the RL model to traverse);
formulate, via the MLM, a search space based on the subset of the input data, the search space comprising one or more second states and one or more second actions (ibid: the RL agent is defined over the search space via step functions, see 0070-71, and initial states, see 0076, in order to learn policies);
conduct, via the MLM, hyperparameter tuning of the search space (0077-78);
generate, via the MLM, an MDP instance based on the hyperparameter tuning (0078); and
determine, via the MLM, whether the generated MDP instance comprises a first MDP solution (0083-87 gives overviews of RL learning of an MDP policy based on a scenario, including maximized reward values, goals achieved, etc., see 0088, hence, solutions).
For claim 2, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the input data comprises one or more annotated states (0068: states annotated by, e.g., account balance, credit to debt ratio), one or more first actions (0059-62), one or more binning strategies, or combinations thereof.
For claim 3, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the MLM comprises a predictive model (0084: inferred expected benefit values, such as stored via a neural net or a data structure).
For claim 4, Nair discloses the system of claim 3, as described above. Nair further discloses: wherein the MLM is trained to accept a transformed state search space via feature transformation (0084: RL agents operating on transition values indicating expected future benefit that are encoded as a neural net constitutes a transformed state space, i.e., a state space transformed by the neural net weights to yield outputs).
For claim 5, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein identifying the subset of the input data comprises:
ranking the one or more first states (fig.7:710, 0167 contemplates evaluating overall system strength via a ranking of scenarios, with each scenario representing a subset of state spaces, see 0066-69, hence, state spaces are ranked); and
selecting the one or more second states based on the ranked one or more first states: (fig.7:710-725, 0168-: based on the ranking, various scenarios may be selected for tuning).
For claim 6, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the search space further comprises one or more bins configured to reduce a total number of the one or more second states and the one or more second actions thereby reducing an overall search space area (0062: enforcing transaction constraints., by setting 1000 step increments, constitutes a binning of the search space that reduces total searched states).
For claim 7, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the instructions are further configured to cause the system to:
receive, via a graphical user interface (GUI), a user selection of a specific criteria (fig.7: 0167-168: user scenario tuning via GUI),
wherein determining whether the generated MDP instance comprises the first MDP solution is based on the specific criteria (fig.7, 0167-168: determine overall strength of scenarios, hence, whether MDP is a solution to the scenario and whether scenario needs to be strengthened based on scenario criteria).
For claim 8, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the input data is received via a web-based interface (fig.2: REST API is a web-based interface, see 0052; see also 0051: Flask web framework).
For claim 9, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein conducting the hyperparameter tuning comprises:
iteratively, via a search algorithm: generating one or more hyperparameter combinations (fig.7, 0167: iteratively generating hyperparameters via workflow, including interface of fig.4 and hyperparameter selection fig.2:250); and
training the MLM to utilize the one or more hyperparameter combinations (fig.2:250-255).
For claim 12, Nair discloses the system of claim 11, as described above. Nair does not disclose: wherein the program instructions further cause the processor to:
identify, by the processor and via the MLM, a subset of the input data by:
ranking the one or more first states (fig.7:710, 0167 contemplates evaluating overall system strength via a ranking of scenarios, with each scenario representing a subset of state spaces, see 0066-69, hence, state spaces are ranked); and
selecting the one or more second states based on the ranked one or more first states (fig.7:710-725, 0168-: based on the ranking, various scenarios may be selected for tuning),
wherein the search space is based on the subset of the input data (ibid: further tuning and evaluation based on ranking, hence, search space is formulated based on scenario).
For claim 18, Nair discloses the system of claim 16, as described above. Nair further discloses: wherein the input data is continuously received via a graphical user interface (GUI) (fig.7, 0166-168 contemplates a continuous and iterative process of refining the scenarios for training, see fig.4 for GUI example and fig.2:250-260 or scenario and training workflow), and wherein the identifying, formulating, conducting, and generating are conducted dynamically based on the continuously received input data (fig.7, fig.2:250-260, 0167-168: conducting iterative formulating, conducting, and generating of RL space and hyperparameters via custom user-defined scenarios).
For claim 19, Nair discloses the system of claim 16, as described above. Nair further discloses: wherein the MLM comprises a predictive model and is trained to accept a transformed state search space via feature transformation (0084: RL agents operating on transition values indicating expected future benefit that are encoded as a neural net constitutes a transformed state space, i.e., a state space transformed by the neural net weights to yield outputs).
For claim 20, Nair discloses the system of claim 16, as described above. Nair further discloses: wherein the input data is received via a JSON file format (0031).
Claim(s) 11, 13-17 recite computer program products and methods corresponding to the above systems and are hence rejected under the same rationale.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 10, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Nair (US 20230281504 A1) in view of Paine ("Offline Hyperparameter Selection for Offline Reinforcement Learning", published 2020).
For claim 10, Nair discloses the system of claim 9, as described above. Nair does not disclose: wherein generating the one or more hyperparameter combinations is based on a metric produced by performing a Fitted Q Evaluation of the trained MLM.
Paine discloses: wherein generating the one or more hyperparameter combinations is based on a metric produced by performing a Fitted Q Evaluation of the trained MLM (§2 gives overview of the process including hyperparameter selection in ¶2-3 and evaluation via critics trained by FQE, see p.3 ¶1-2, §2.1 ¶1).
It would have been obvious before the effective filing date to a person of ordinary skill in the art to modify the system of Nair by incorporating the FQE based hyperparameter selection technique of Paine. Both concern the art of reinforcement learning, and the incorporation would have, according to Paine, allow better hyperparameter selection for RL policies, especially in an offline setting (§1 ¶1-2).
Claim(s) 14 recite computer program product corresponding to the above systems and are hence rejected under the same rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Gioioso (US 20220084802 A1) discloses a iterative reinforcement learning tuning technique for tuning medical devices.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LIANG LI whose telephone number is (303)297-4263. The examiner can normally be reached Mon-Fri 9-12p, 3-11p MT (11-2p, 5-1a ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. The examiner is available for interviews Mon-Fri 6-11a, 2-7p MT (8-1p, 4-9p ET).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Jennifer Welch can be reached on (571)272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center or Private PAIR to authorized users only. Should you have questions about access to Patent Center or the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/LIANG LI/
Primary examiner AU 2143