Last updated: May 29, 2026

Application No. 18/327,156

SYSTEMS AND METHODS FOR IDENTIFYING MARKOV DECISION PROCESS SOLUTIONS

Non-Final OA §101§102§103

Filed

Jun 01, 2023

Examiner

LI, LIANG Y

Art Unit

2143

Tech Center

2100 — Computer Architecture & Software

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

This examiner grants 61% of cases after interview

— +69.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 274 resolved cases, 2023–2026

Examiner Intelligence

LI, LIANG Y View full profile →

Grants 61% of resolved cases

Career Allowance Rate

168 granted / 274 resolved

+6.3% vs TC avg

Strong +69% interview lift

Without

With

+69.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

17 currently pending

Career history

303

Total Applications

across all art units

Statute-Specific Performance

§101

0.7%

-39.3% vs TC avg

§103

89.6%

+49.6% vs TC avg

§102

9.4%

-30.6% vs TC avg

§112

0.2%

-39.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 274 resolved cases

Office Action

§101 §102 §103

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is responsive to pending claims 1-20 filed 6/1/2023.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim(s) 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The 35 U.S.C. 101 subject matter eligibility analysis first asks whether the claim is directed to one of the four statutory categories (Step 1). It next asks whether the claim is directed to an abstract idea (Step 2A),  via Prong 1, whether an abstract idea (e.g., mathematical concept, mental process, certain methods of organizing human activity) is recited, and Prong 2, whether it is integrated into a practical application. It finally asks whether the claim as a whole includes additional elements that amount to significantly more than the judicial exception (Step 2B). See MPEP 2106.

STEP 1: The claims falls within one of the four statutory categories:
	As all the claims are directed to hardware systems, tangible computer program products, and methods, the claims fall within the statutory categories.

	STEP 2A PRONG 1: The claims recite a judicial exception:
	The claims are directed to generating policy solutions to Markov Decision Processes, such as via a Reinforcement learning (RL) agent, by performing iterative tuning. As such, the claims are directed to a mental process, that of using judgment and observation to adjust a process in order to generate a strategy to generate a solution. Furthermore, the machine learning model itself is trained via a mathematical process. In the outline below, additional elements are underlined for further analysis. In particular:

	For claim 1: A system, comprising:
	one or more processors; and
	a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to:
	iteratively, until one or more Markov Decision Process (MDP) solutions are identified:
		receive input data comprising one or more first states and one or more first actions (Receiving of state data for a Markov process may be performed in the mind);
		identify, via a machine learning model (MLM), a subset of the input data (identifying subsets of the state data may be performed in the mind);
		formulate, via the MLM, a search space based on the subset of the input data, the search space comprising one or more second states and one or more second actions (Formulating and planning a target subspace and target actions in order to identify policy may be performed in the mind);
		conduct, via the MLM, hyperparameter tuning of the search space (conducting and generating hyperparameter tuning, such as adjusting hyperparameters, may be performed in the mind);
		generate, via the MLM, an MDP instance based on the hyperparameter tuning (generating a MDP instance, i.e., performing training and tuning based on hyperparameters, is a mathematical optimization algorithm, hence, a mathematical concept); and	
		determine, via the MLM, whether the generated MDP instance comprises a first MDP solution (determining suitability of an MDP instance is an instance of judgment and observation performed in the mind).

	For claim 2: The system of claim 1, wherein the input data comprises one or more annotated states, one or more first actions, one or more binning strategies, or combinations thereof (Training parameters may be identified and selected with the mind).

	For claim 4: The system of claim 3, wherein the MLM is trained to accept a transformed state search space via feature transformation (Transformed feature spaces, e.g., removing states, etc. based on target policy, may be performed in the mind).

	For claim 5: The system of claim 1, wherein identifying the subset of the input data comprises: ranking the one or more first states; and selecting the one or more second states based on the ranked one or more first states (Ranking states via a heuristic may be performed in the mind).

	For claim 6. The system of claim 1, wherein the search space further comprises one or more bins configured to reduce a total number of the one or more second states and the one or more second actions thereby reducing an overall search space area (Determining binning policy may be performed in the mind).

	For claim 7: The system of claim 1, wherein the instructions are further configured to cause the system to: receive, via a graphical user interface (GUI), a user selection of a specific criteria, wherein determining whether the generated MDP instance comprises the first MDP solution is based on the specific criteria (Determining criteria and evaluation based on criteria is a mental process).

	For claim 8: The system of claim 1, wherein the input data is received via a web-based interface (receiving input data may be performed in the mind).

	For claim 9. The system of claim 1, wherein conducting the hyperparameter tuning comprises: iteratively, via a search algorithm: generating one or more hyperparameter combinations (generating hyperparameter settings may be performed in the mind); and training the MLM to utilize the one or more hyperparameter combinations (Training a MLM via an optimization algorithm is a mathematical concept).

 	For claim 10: The system of claim 9, wherein generating the one or more hyperparameter combinations is based on a metric produced by performing a Fitted Q Evaluation of the trained MLM (Generating metric via FQE is a mathematical process; furthermore, adjusting hyperparameter combinations based on performance metrics is a mental process ).

	For claim 18: The computer-implemented method of claim 16, wherein the input data is continuously received via a graphical user interface (GUI), and wherein the identifying, formulating, conducting, and generating are conducted dynamically based on the continuously received input data (repeatedly receiving input data for identifying / observing, formulating, conducting may be performed in the mind; the iterative performing of the generating operation is a repeated application of the mathematical concept).

	The remaining claims contain analogous limitations and hence are similarly analyzed.

		
	STEP 2A PRONG 2: The claims do not integrate the exception into a practical application:
	The additional elements do not integrate the iterative mental tuning and adjustments of training parameters into a practical integration. They are not directed to a particular technical problem, but rather comprise improvements on the mental process itself.

	For claim 1, 4, 19 the additional elements comprise one or more processors; and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors and performing the mental steps via a MLM.  However, these are mere instructions to implement an abstract idea on a computer or via machine learning algorithms and hence do not constitute an integration into practical application.

	For claim 3, the additional element further recite that the wherein the MLM comprises a predictive model. However, this is mere instructions to implement the mental process on a predictive MLM model and hence do not constitute an integration into practical application.

	For claim 7-8, 17-18, the additional element further recites that the receiving occurs via a GUI and via a web-based interface. However, these are mere instructions to implement an abstract idea on a computer or networking interface and hence do not constitute an integration into practical application.

	For claim 11, the additional elements include: the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor. However, these are mere instructions to implement an abstract idea on a computer and hence do not constitute an integration into practical application.

	For claim 16, the additional elements include implementation via a computer. However, these are mere instructions to implement an abstract idea on a computer and hence do not constitute an integration into practical application.

	For claim 20, the additional elements include: wherein the input data is received via a JSON file format. However, this is a general linking of the abstract idea to a field of use or particular technology and hence does not serve to meaningfully limit the abstract idea. Hence, they do not constitute an integration into a practical application.

	STEP 2B: The claim as a whole do not include additional elements that amount to significantly more than the abstract idea:
	
	For claim 1, 4, 19 the additional elements comprise one or more processors; and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors and performing the mental steps via a MLM.  However, these are general purpose computing and machine learning elements and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.

	For claim 3, the additional element further recite that the wherein the MLM comprises a predictive model. However, these predictive machine learning elements and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.

	For claim 7-8, 17-18, the additional element further recites that the receiving occurs via a GUI and via a web-based interface. However, these are general purpose computing and networking interfaces and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.

	For claim 11, the additional elements include: the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor. However, these are general purpose computing elements and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.

	For claim 16, the additional elements include implementation via a computer. However, these are general purpose computing and networking interfaces and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.

	For claim 20, the additional elements include: wherein the input data is received via a JSON file format. However, JSON is a widely known file format for storing data and is well-understood, routine and conventional (WURC) and hence does not constitute significantly more.
.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
	Claim(s) 1-9, 11-13, 15-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Nair (US 20230281504 A1).
	For claim 1, Nair discloses: a system, comprising:
	one or more processors (fig.9:910); and
	a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to (fig.9:915, 935, 940, 950):
	iteratively, until one or more Markov Decision Process (MDP) solutions are identified (fig.2 gives overview of the invention, including executing and evaluating learned MDP policies, see fig.2:260, 0089; 0080-81, with 0089, 0017-18 contemplating an iterative process of patching rules, training agents, etc., such as conducted via the user interface fig.2:210, see also fig.7, 0166-167 disclosing iterative GUI workflow, such as via interface of fig.4-5, 0108):
		receive input data comprising one or more first states and one or more first actions (fig.2:230, 0059, 0066-69: action spaces are defined for the RL agent and scenario-specific state spaces are defined);
		identify, via a machine learning model (MLM), a subset of the input data (0066-69: based on scenario being tested, a state space is defined, such as a subset or a reduced state space (0067), for the RL model to traverse);
		formulate, via the MLM, a search space based on the subset of the input data, the search space comprising one or more second states and one or more second actions (ibid: the RL agent is defined over the search space via step functions, see 0070-71, and initial states, see 0076, in order to learn policies);
		conduct, via the MLM, hyperparameter tuning of the search space (0077-78);
		generate, via the MLM, an MDP instance based on the hyperparameter tuning (0078); and	
		determine, via the MLM, whether the generated MDP instance comprises a first MDP solution (0083-87 gives overviews of RL learning of an MDP policy based on a scenario, including maximized reward values, goals achieved, etc., see 0088, hence, solutions).

	For claim 2, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the input data comprises one or more annotated states (0068: states annotated by, e.g., account balance, credit to debt ratio), one or more first actions (0059-62), one or more binning strategies, or combinations thereof.

	For claim 3, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the MLM comprises a predictive model (0084: inferred expected benefit values, such as stored via a neural net or a data structure).

	For claim 4, Nair discloses the system of claim 3, as described above. Nair further discloses: wherein the MLM is trained to accept a transformed state search space via feature transformation (0084: RL agents operating on transition values indicating expected future benefit that are encoded as a neural net constitutes a transformed state space, i.e., a state space transformed by the neural net weights to yield outputs).

	For claim 5, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein identifying the subset of the input data comprises:
	ranking the one or more first states (fig.7:710, 0167 contemplates evaluating overall system strength via a ranking of scenarios, with each scenario representing a subset of state spaces, see 0066-69, hence, state spaces are ranked); and
	selecting the one or more second states based on the ranked one or more first states: (fig.7:710-725, 0168-: based on the ranking, various scenarios may be selected for tuning).

	For claim 6, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the search space further comprises one or more bins configured to reduce a total number of the one or more second states and the one or more second actions thereby reducing an overall search space area (0062: enforcing transaction constraints., by setting 1000 step increments, constitutes a binning of the search space that reduces total searched states).

	For claim 7, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the instructions are further configured to cause the system to:
	receive, via a graphical user interface (GUI), a user selection of a specific criteria (fig.7: 0167-168: user scenario tuning via GUI),
	wherein determining whether the generated MDP instance comprises the first MDP solution is based on the specific criteria (fig.7, 0167-168: determine overall strength of scenarios, hence, whether MDP is a solution to the scenario and whether scenario needs to be strengthened based on scenario criteria).

	For claim 8, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein the input data is received via a web-based interface (fig.2: REST API is a web-based interface, see 0052; see also 0051: Flask web framework).

	For claim 9, Nair discloses the system of claim 1, as described above. Nair further discloses: wherein conducting the hyperparameter tuning comprises:
	iteratively, via a search algorithm: generating one or more hyperparameter combinations (fig.7, 0167: iteratively generating hyperparameters via workflow, including interface of fig.4 and hyperparameter selection fig.2:250); and
	training the MLM to utilize the one or more hyperparameter combinations (fig.2:250-255).

	For claim 12, Nair discloses the system of claim 11, as described above. Nair does not disclose: wherein the program instructions further cause the processor to:
	identify, by the processor and via the MLM, a subset of the input data by:
		ranking the one or more first states (fig.7:710, 0167 contemplates evaluating overall system strength via a ranking of scenarios, with each scenario representing a subset of state spaces, see 0066-69, hence, state spaces are ranked); and
		selecting the one or more second states based on the ranked one or more first states (fig.7:710-725, 0168-: based on the ranking, various scenarios may be selected for tuning),
			wherein the search space is based on the subset of the input data (ibid: further tuning and evaluation based on ranking, hence, search space is formulated based on scenario).

	For claim 18, Nair discloses the system of claim 16, as described above. Nair further discloses: wherein the input data is continuously received via a graphical user interface (GUI) (fig.7, 0166-168 contemplates a continuous and iterative process of refining the scenarios for training, see fig.4 for GUI example and fig.2:250-260 or scenario and training workflow), and wherein the identifying, formulating, conducting, and generating are conducted dynamically based on the continuously received input data (fig.7, fig.2:250-260, 0167-168: conducting iterative formulating, conducting, and generating of RL space and hyperparameters via custom user-defined scenarios).

	For claim 19, Nair discloses the system of claim 16, as described above. Nair further discloses: wherein the MLM comprises a predictive model and is trained to accept a transformed state search space via feature transformation (0084: RL agents operating on transition values indicating expected future benefit that are encoded as a neural net constitutes a transformed state space, i.e., a state space transformed by the neural net weights to yield outputs).

	For claim 20, Nair discloses the system of claim 16, as described above. Nair further discloses: wherein the input data is received via a JSON file format (0031).

	Claim(s) 11, 13-17 recite computer program products and methods corresponding to the above systems and are hence rejected under the same rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claim(s) 10, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Nair (US 20230281504 A1) in view of Paine ("Offline Hyperparameter Selection for Offline Reinforcement Learning", published 2020).

	For claim 10, Nair discloses the system of claim 9, as described above. Nair does not disclose: wherein generating the one or more hyperparameter combinations is based on a metric produced by performing a Fitted Q Evaluation of the trained MLM.
	Paine discloses: wherein generating the one or more hyperparameter combinations is based on a metric produced by performing a Fitted Q Evaluation of the trained MLM (§2 gives overview of the process including hyperparameter selection in ¶2-3 and evaluation via critics trained by FQE, see p.3 ¶1-2, §2.1 ¶1).
It would have been obvious before the effective filing date to a person of ordinary skill in the art to modify the system of Nair by incorporating the FQE based hyperparameter selection technique of Paine. Both concern the art of reinforcement learning, and the incorporation would have, according to Paine, allow better hyperparameter selection for RL policies, especially in an offline setting (§1 ¶1-2).

Claim(s) 14 recite computer program product corresponding to the above systems and are hence rejected under the same rationale.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Gioioso (US 20220084802 A1) discloses a iterative reinforcement learning tuning technique for tuning medical devices.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LIANG LI whose telephone number is (303)297-4263.  The examiner can normally be reached Mon-Fri 9-12p, 3-11p MT (11-2p, 5-1a ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. The examiner is available for interviews Mon-Fri 6-11a, 2-7p MT (8-1p, 4-9p ET).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Jennifer Welch can be reached on (571)272-7212.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center or Private PAIR to authorized users only. Should you have questions about access to Patent Center or the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/LIANG LI/
Primary examiner AU 2143

Read full office action

Prosecution Timeline

Jun 01, 2023

Application Filed

Mar 26, 2026

Non-Final Rejection mailed — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/811,444

Patent 12625602

Method for user interaction for data manipulation in a CAE/CAD system

3y 10m to grant Granted May 12, 2026

18/535,832

Patent 12596463

METHOD AND APPARATUS FOR IMAGE-BASED NAVIGATION

2y 3m to grant Granted Apr 07, 2026

17/445,905

Patent 12585716

INTELLIGENT RECOMMENDATION METHOD AND APPARATUS, MODEL TRAINING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM

4y 7m to grant Granted Mar 24, 2026

17/887,322

Patent 12585375

GENERATING SNAPPING GUIDE LINES FROM OBJECTS IN A DESIGNATED REGION

3y 7m to grant Granted Mar 24, 2026

18/152,328

Patent 12580000

MULTITRACK EFFECT VISUALIZATION AND INTERACTION FOR TEXT-BASED VIDEO EDITING

3y 2m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

61%

Grant Probability

99%

With Interview (+69.2%)

3y 3m (~3m remaining)

Median Time to Grant

Low

PTA Risk

Based on 274 resolved cases by this examiner. Grant probability derived from career allowance rate.