Last updated: May 29, 2026
Application No. 18/410,369
MEDICAL LEARNING SYSTEM, MEDICAL LEARNING METHOD, AND STORAGE MEDIUM

Non-Final OA §101§103§112
Filed
Jan 11, 2024
Priority
Jan 18, 2023 — JP 2023-006029
Examiner
GEDRA, OLIVIA ROSE
Art Unit
3681
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Canon Kabushiki Kaisha
OA Round
3 (Non-Final)
Interview Optional

— +0.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 0% grant rate with +0.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 14 resolved cases, 2023–2026
Examiner Intelligence

GEDRA, OLIVIA ROSE View full profile →
Grants only 0% of cases
Career Allowance Rate
0 granted / 14 resolved
-52.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.0%
-37.0% vs TC avg
§103
95.1%
+55.1% vs TC avg
§102
2.0%
-38.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/09/2026 has been entered.
Status of Claims 
This action is in reply to the communication filed on 02/09/2026. 
Claims 1 and 19-20 have been amended.
Claims 1-20 are currently pending and have been examined.
Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claims 1, 19, and 20 recite that a “first inference model that is a policy model” and then later state “training a policy model using the first inference model as an initial value”. As the first inference model is in and of itself a policy model, it is unclear how a policy model is trained using the policy model. For the purposes of compact prosecution, the Examiner will interpret this limitation as “training a second policy model using the first inference model as an initial value”. Claims 2-18 are further rejected as being dependent on a rejected claim. Appropriate correction is required. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 USC § 101 as being directed to a judicial exception (i.e. a law of nature, a natural phenomenon, or an abstract idea) without significantly more. 
Step 1 Analysis:
Independent Claims 1, 19, and 20 are within the four statutory categories. Claims 1, 19, and 20 are directed toward a system, method, and non-transitory computer-readable storage medium (i.e. machine) respectively. Dependent Claims 2-18 are directed toward a system and therefore also fall into one of the four statutory categories. 
Step 2A Analysis- Prong One:
Claim 1, which is indicative of the inventive concept, recites the following: 
A medical learning system comprising processing circuitry configured to:
 acquire a first inference model that is a policy model that infers a treatment action of a target medical care provider based on a state of a discretionarily selected patient;
acquire treatment progress data relating to a target patient, from a first computer that stores treatment progress data for each combination of a plurality of patients and a plurality of doctors, 
the treatment progress data being sequential data of samples including a state of a patient at a point in time, a doctor’s treatment action taken for the patient in the state, a state of the patient at a next time point after the patient receives the treatment action, and a reward denoting a treatment effect in the patient with respect to the treatment action; 
generate a second inference model that receives an input of a state of a target patient at a certain time point and that outputs a treatment action that should be taken by a target doctor for the target patient who is in this state,
by training a policy model using the first inference model as the initial value through reinforcement learning based on the acquired treatment progress data; 
store the second inference model in a second computer, wherein the second model is associated with an identifier of the target medical care provider and an identifier of the target patient;
and update the second inference model based on the treatment progress data at a time point following a time point to which the treatment progress data used in a generation of the second inference model belongs.
The limitations as shown in underline above, given the broadest reasonable interpretation, cover the abstract idea of certain methods of organizing human activity because they recite managing personal behavior or relationships or interactions between people (i.e. social activities, teachings, and following rules or instructions- in this case, acquiring progress data and a model generating a second model based on updating the first model and the progress data, and updating the second inference model based on treatment progress data at a specific time point), e.g., see MPEP 2106.04(a)(2). Any limitations not identified above as part of the abstract idea are deemed “additional elements” and will be discussed in further detail below.
Dependent Claims 2-3, 5-6, 8-14, and 16-18 include other limitations directed toward the abstract idea. For example, Claim 2 recites the first inference model is generated based on the treatment action data, Claim 3 recites the details of what entails the treatment action data, Claim 5 recites the treatment progress data is measured relating to the target patient, Claim 6 recites acquiring a third model that infers progress of the patient and acquiring data, Claim 8 recites treatment progress data is factual data, Claim 9 recites acquiring a third inference model that infers treatment progress and acquires counterfactual data, Claim 10 recites searching a plurality of models and providers to find an optimal combination, Claim 11 recites the use of common layers, Claim 12 recites acquiring a third model and data which includes a second common layer, Claim 13 recites searching for an optimal first layer, a specific second layer or a plurality of second layers, Claim 14 recites updating the second model based on treatment progress data, Claim 16 recites the second inference model and the treatment progress data are associated with each other, Claim 17 recites updating the second model based on the treatment progress data, Claim 18 recites the target medical care provider or target patient is a specific individual. These limitations only serve to further narrow the abstract idea, and a claim may not preempt abstract ideas, even if the judicial exception is narrow, e.g. see MPEP 2106.04. Additionally, any limitations in the dependent claims not addressed above are part of the additional elements and will be further addressed below. Hence, dependent Claims 2-4, 6, 8-14, and 16-18 are nonetheless directed toward fundamentally the same abstract idea as the independent claims. 
Step 2A Analysis – Prong Two:
Claims 1, 19, and 20 are not integrated into practical application because the additional elements (i.e. the non-underlined limitations above- in this case, the processing circuitry and computer of Claim 1, the computer of Claim 19, and the non-transitory computer-readable storage medium and computer of Claim 20) are recited at a high level of generality (i.e. as a generic processor performing generic computer functions) such that they amount to no more than mere instructions to apply an exception using generic computer parts. For example, Applicant’s specification explains that the processing circuitry 51 includes processors such as a CPU (central processing unit) and a GPU (graphics processing unit). The processing circuitry 51 executes a medical learning program to realize a model acquisition function 511, a data acquisition function 512, a first model generation function 513, a second model generation function 514, a third model generation function 515, and a display control function 516 [0030]. The treatment progress storage device 3 is a computer that includes a storage device for storing treatment progress data D (sti, at(i,j), st+1i, rti) relating to combinations of a patient i and a doctor j [0025]. Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into practical application because they do not impose any meaningful limits on the abstract idea. Therefore, independent Claims 1, 19, and 20 are directed to an abstract idea without practical application. 
Dependent Claims 4, 6-7, 9-10, and 12-17 recite additional elements. Claims 6, 9-10, 12-14, and 16-17 recite the previously recited additional element of processing circuitry. Claim 4 recites a new additional element of a policy model, behavior cloning, and imitation learning and specifies the inference model is a policy model generated through behavior cloning or imitation learning. Claim 7 recites previously recited additional elements of processing circuitry and a policy model and specifies the circuitry is configured to generate the second inference model by training a policy model using the first model and progress data. Claim 9 recites the previously recited processing circuitry and specifies the circuitry is configured to acquire a third inference model that infers treatment progress of the target patient and acquires counterfactual data. Claim 10 recites the previously recited processing circuitry and specifies the circuitry is configured to search a plurality of first inference models. Claim 12 recites the previously recited processing circuitry and specifies the circuitry is configured to acquire a third inference model and data inferred by the third model as the treatment progress data. Claim 13 recites the previously recited processing circuitry and specifies the circuitry is configured to search among the plurality of first individual layers for an optimal first layer. Claim 14 recites the previously recited processing circuitry and specifies the circuitry is configured to update the second inference model based on the treatment progress data at a time point. Claim 15 recites a new additional element of a block chain and previously recited processing circuitry. Claim 16 recites the previously recited processing circuitry and specifies the circuitry is configured to add the second inference model to a block. Claim 17 recites the previously recited processing circuitry and specifies the circuitry is configured to update the second inference model based on the treatment progress data relating to a time point following a timepoint to which the treatment progress data belongs. However, these additional elements are used in their expected fashion, so they do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on the abstract idea. These limitations amount to no more than mere instructions to apply an exception, and hence, do not integrate the aforementioned abstract idea into practical application.
Step 2B Analysis: 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of the processing circuitry and computer of Claim 1, computer of Claim 19, and the non-transitory computer-readable storage medium and computer of Claim 20 amount to no more than mere instructions to apply an exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (“significantly more”). MPEP 2106.05(I)(A) indicates that merely stating “apply it” or equivalent to the abstract idea cannot provide an inventive concept (“significantly more”). Accordingly, even in combination, these additional elements do not provide significantly more. As such, Claims 1, 19, and 20 are  not patent eligible.
Dependent Claims 2-3, 5, 8, 11, and 18 do not recite any additional elements and only narrow the abstract idea. Claim 2 narrows the abstract idea by specifying the first inference model is generated based on the treatment action data, Claim 3 narrows the abstract idea by specifying the details of what entails the treatment action data, Claim 5 narrows the abstract idea by specifying the treatment progress data is measured relating to the target patient, Claim 8 narrows the abstract idea by specifying treatment progress data is factual data, Claim 11 narrows the abstract idea by specifying the target medical care provider includes a plurality of providers, the first inference model includes a first common layer that is common between the providers and, the common layer receives an input state and outputs a feature amount, Claim 18 narrows the abstract idea by specifying the target medical care provider or the target patient is a specific individual. 
 Dependent Claims 4 and 15 recite new additional elements. Claim 4 recites new additional elements of a policy model, behavior cloning, and imitation learning. Claim 15 recites a new additional element of a block chain. 	Dependent Claims 6-7, 9-10, and 12-14, and 16-17 narrow the previously recited additional element of the processing circuitry. Claim 6 narrows the processing circuitry by specifying it is configured to acquire a third inference model that infers treatment progress and acquires data. Claim 7 narrows the processing circuitry by specifying it generates a second model by training a policy model using the first model, Claim 9 narrows the processing circuitry by specifying it acquires a third inference model and acquires counterfactual data, Claim 10 narrows the processing circuitry by specifying it searches a plurality of first interference models to find an optimal combination, Claim 12 narrows the processing circuitry by specifying it acquires a third model and data which includes a second common layer, Claim 13 narrows the processing circuitry by specifying it searches for an optimal first layer, a specific second layer or a plurality of second layers, Claim 14 narrows the processing circuitry by specifying it updates the second model based on treatment progress data, Claim 16 narrows the processing circuitry by specifying it adds the second inference model and the treatment data progress to a block, Claim 17 narrows the processing circuitry by specifying it updates the second inference model based on the treatment progress data relating to a time point. Hence, Claims 2-18 do not include any additional elements that amount to “significantly more” than the judicial exception. 
Thus, taken alone, the additional elements do not amount to significantly more than the abstract idea identified above. Furthermore, looking at the limitations as an ordered combination does not add anything that is already present when looking at the elements taken individually, and there is no indication that the combination of elements improves the functioning of computer implementation.
Therefore, whether taken individually or as an ordered combination, Claims 1-20 are nonetheless rejected under 35 U.S.C 101 as being directed to non-statutory subject matter.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Osogami et al. (US 20200303068 A1) in view of Bostic et al. (US 20200303047 A1) and Goecks et al. (Goecks et al. "Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments." arXiv preprint arXiv:1910.04281 (2019)).
Regarding Claim 1, Osogami discloses the following:
A medical learning system comprising processing circuitry configured to: (Osogami discloses the processing system 800 includes at least one processor (CPU) 804  operatively coupled to other components via a system bus 802 [0103].)
acquire a first inference model that infers a treatment action of a target medical care provider based on a state of a discretionarily selected patient; (Osogami discloses upon predicting a treatment action, the treatment prediction agent 100 can provide the action to a healthcare professional, such as a doctor or nurse, at the care center 140 and/or via the user access terminal 150 [0055].)
acquire treatment progress data relating to a target patient; (Osogami discloses the treatment prediction agent 100 can then use the current state to evaluate each action of a set of actions to predict an appropriate action to treat the adverse conditions according to the evaluation. Reinforcement learning can be incorporated into the evaluation mechanism to update parameters of the treatment prediction agent 100 based on changes to the health of the patient [0056]. FIG. 2 is a diagram showing a treatment agent that interacts with a condition m to learn patient treatment procedures according to objectives for achieving a goal [0009, see Fig. 2].)
from a first computer that stores treatment progress data for each combination of a plurality of patients and a plurality of doctors, (Osogami discloses the data can include, e.g., any patient health data useable for determining a diagnosis and treatment such as, blood pressure, heart rate, age, height, weight, injuries, white blood cell count, red blood cell count, blood oxygen levels, calorie intake, fitness level, sleep patterns, among other biomarkers and health data and histories thereof. The data can be collected at a care center 140 and provided directly to the treatment prediction agent 100, or stored in the database 120 for later retrieval by the treatment prediction agents 100 [0053].  The condition monitor 202 can assess the patient for changes to biomarkers and health indicia as a result of the action. The changes can be used to make a state determination of the adverse condition of the patient [0063]. The Examiner interprets the patient data as patient specific treatment data and the action as the action taken by the target physician.)
the treatment progress data being sequential data of samples including a state of a patient at a time point, (Osogami discloses the treatment agent 200 can then be adjusted to take into account the effectiveness or ineffectiveness of the action by, e.g., updating parameters corresponding to a state representation model and a value model… the treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient [0065]. The treatment pathway is progressively formed through action generation, such as, discrete actions to treat the adverse condition, or a treatment protocol for a given period of time. The new state resulting from the actions can then be measured after, e.g., the discrete action or the period of time for the protocol [0056]. The Examiner interprets the time period being sequential data. The condition monitor 202 can assess the patient for changes to biomarkers and health indicia as a result of the action. The changes can be used to make a state determination of the adverse condition of the patient [0063]. The Examiner interprets the patient data as patient specific treatment data and the action as the action taken by the target physician.)
a doctor’s treatment action taken for the patient in the state, (Osogami discloses the treatment prediction agent 100 can provide the action to a healthcare professional, such as a doctor or nurse, at the care center 140 and/or via the user access terminal 150… the treatment prediction agent 100 can provide treatment actions directly to a patient via the user access terminal 150 in the form of, e.g., exercise advice, diet advice, among other healthcare advice to meet health goals of an individual [0055]. The representation model  340 can also incorporate a past state and a past action to provide more information to determine the representation for the current state, thus improving accuracy [0070].)
a state of the patient at a next time point after the patient receives the treatment action, (Osogami discloses the treatment agent 300 predicts an action at a current time frame and analyzes a change to a state of the patient as a result of that action. The treatment pathway is progressively formed through action generation, such as, discrete actions to treat the adverse condition, or a treatment protocol for a given period of time. The new state resulting from the actions can then be measured after, e.g., the discrete action or the period of time for the protocol [0067].)
and a reward denoting a treatment effect in the patient with respect to the treatment action; (Osogami discloses a previous action can be used to provide positive reinforcement through reinforcement learning for the use an action that resulted in an objective being attained. Reinforcement, as well as an evaluation of actions in the set of actions can be performed concurrently [0057]. The Examiner interprets the result of the action as being a reward.)
generate a second inference model that receives an input of a state of a target patient at a certain time point and…training…through reinforcement learning based on the acquired treatment progress data; (Osogami discloses the measured state in response to the treatment action is interpreted as the progress data. The model parameters can, therefore, be updated based on the success or lack thereof of the previous action [0080]. The updated model based on the previous action is interpreted as the second model. An optimization module 456 uses cumulative temporal difference error to update the parameters θ of the state representation model 440. Thus, the state representation model 440 can be updated and trained according to the changing states resulting from prediction actions [0084]. Each action can be assessed according to a change in state of, e.g., the patient. The value model 450 can be trained to recognize higher value actions at each state of the patient by updating the model parameters θ, and determining the value of each action of the set of candidate actions according to each value head for each objective and the goal. The training of the value model 450 improves the accuracy and efficiency through reinforcement learning that takes into account sub-objectives corresponding to achieving a goal [0085].)
…that outputs a treatment action that should be taken by a target doctor for the target patient who is in this state (Osogami discloses the treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient. According to the values for each action, a next action can be determined and suggested to a user [0065]. The treatment prediction agent 100 can provide the action to a healthcare professional, such as a doctor or nurse, at the care center 140 and/or via the user access terminal 150 [0055].)
and store the second inference model (Osogami discloses cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage,…) [0034]. The treatment prediction agent 1096 can include, e.g., a state representation model and value model that interacts with patient monitoring systems via processing in the virtualization layer 1070. Thus, data, such as, e.g., patient conditions, can be input into a virtual machine managed in the virtualization layer 1070 according to, e.g., a SLA at the service level management 1084, and stored in the virtual storage 1072 [0116].)
and update the second inference model based on the treatment progress data at a time point following a time point to which the treatment progress data used in a generation of the second inference model belongs. (Osogami discloses the treatment agent 200 can then be adjusted to take into account the effectiveness…of the action by, e.g., updating parameters corresponding to a state representation model… the treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient. According to the values for each action, a next action can be determined and suggested to a user [0065]. To generate the pathway, the treatment agent 300 predicts an action at a current time frame and analyzes a change to a state of the patient as a result of that action. The treatment pathway is progressively formed through action generation, such as, discrete actions to treat the adverse condition, or a treatment protocol for a given period of time. The new state resulting from the actions can then be measured after, e.g., the discrete action or the period of time for the protocol [0067].)
Osogami does not disclose the stored information being in association with an identifier of the provider and the patient which is met by Bostic:
and store [information]… in a second computer, is associated with an identifier of the target medical care provider and an identifier of the target patient. (Bostic teaches a healthcare system 130 may include an EMR data store 132. An EMR data store 132 may include one or more databases that store and/or index electronic medical records. A respective electronic medical record may store or reference patient data of a respective patient of the healthcare organization. An electronic medical record may include a patient identifier, one or more physician identifiers [0060].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the system for acquiring treatment progress data related to a patient, training and updating an inference model, and storing the model and data to a computer as disclosed by Osogami to incorporate the stored data being associated with a patient and a provider identifier as taught by Bostic. This modification would create a system capable of providing improved simulation of patient medical and diagnostic states and improvements to those states based on presented contingencies and options in care and health of the patient (see Bostic, ¶ 0005).
Osogami and Bostic do not teach the models being policy models or training the policy model using the model as an initial value which is met by Goecks:
… a policy model… (Goecks teaches our approach uses an actor-critic architecture to learn both a policy and value function from the human demonstration data, which we show, speeds up learning (p. 2, ¶ 0003).
…by training a policy model using the first…model as an initial value through reinforcement learning… (Goecks teaches how to effectively update a policy initially trained with BC using RL as these approaches are inherently optimizing different objective functions. [B]y combining BC [behavior cloning] with subsequent RL [reinforcement learning], it is possible to address the drawbacks of either approach, initializing a significantly more capable and safer agent than with random initialization, while also allowing for further self-improvement without needing to collect additional data from a human demonstrator (p. 2, ¶ 0002-3). The Examiner interprets the initially trained policy as the first model which is the input to train the policy model.)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the system for acquiring treatment progress data related to a patient, training and updating an inference model, and storing the model and data to a computer as disclosed by Osogami to incorporate the use of a policy model which is trained using a first model as an initial training value as taught by Goecks. This modification would create a system capable of providing a higher level of performance than if the initialized input was randomized (see Goecks, p. 1, ¶ 0002).
Regarding Claim 19, this claim recites limitations substantially similar to those recited in Claim 1 above; thus, the same rejection applies. Osogami further discloses:
a medical learning method (Osogami discloses a method for determining a treatment action is presented. The method includes recording batches of data…[0003].)
Regarding Claim 20, this claim recites limitations substantially similar to those recited in Claim 1 above; thus, the same rejection applies. Osogami further discloses:
a non-transitory computer readable storage medium storing a program causing a computer to implement: (Osogami discloses the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device…a computer readable storage medium… is not to be construed as being transitory signals per se [0024].)
Regarding Claim 2, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami discloses the following:
wherein the first inference model is generated based on the treatment action data of the target medical care provider. (Osogami discloses the treatment agent 100 can suggest a treatment action according to an evaluation of the set of actions. The suggested treatment action as well as a measured state in response to the suggested treatment action can be provided back to the treatment agent 100 [0058].)
Regarding Claim 3, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 2 above. Osogami discloses the following:
wherein the treatment action data includes data relating to a treatment action taken by the target medical care provider for a predetermined state of the patient.  (Osogami discloses a reward is also generated according to whether the previous action met the goal according to a previously encoded state. The goal value head 452, therefore, can incorporate a predicted value of each action according to the present state as well as the success of the previous action to determine a value of each action according to the present model parameters. The model parameters can, therefore, be updated based on the success or lack thereof of the previous action [0080].)

Regarding Claim 4, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 2 above. Osogami discloses the following:
…to which the state is input and which outputs the treatment action, …based on state data of the patient and the treatment action data of the target medical care provider. (Osogami discloses a treatment agent that utilizes states and actions with a state   representation model and value model to predict treatment procedure [0010, see also Fig. 3-4]. The treatment agent 200 can suggest an action to take to treat an adverse condition of the patient. The condition monitor 202 can implement the action, or record biological effects upon the implementation of the action by a healthcare professional [0062].)
Osogami and Bostic do not teach the model being a policy model which is met by Goecks:
wherein the first inference model is a policy model …the policy model being generated through behavior cloning or imitation learning (Goecks teaches we focus on extending the Cycle-of-Learning framework to tackle the known issue of transitioning BC [behavior cloning] policies to RL [reinforcement learning] by utilizing an actor-critic architecture with a combined BC+RL loss function and pre-training phase for continuous state-action spaces, that can learn in both dense- and sparse-reward environments. The main advantage of our method is the use of an off-policy, actor critic architecture to pre-train both a policy and value function, as well as continued re-use of demonstration data during agent training (p. 3, ¶ 0006).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the system for acquiring treatment progress data related to a patient, training and updating an inference model, and storing the model and data to a computer as disclosed by Osogami to incorporate the use of a policy model generated with behavior cloning as taught by Goecks. This modification would create a system capable of providing a higher level of performance than if the initialized input was randomized (see Goecks, p. 1, ¶ 0002).
Regarding Claim 5, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami discloses the following:
wherein the treatment progress data is actually measured data relating to the target patient. (Osogami discloses the treatment prediction agent 100 can provide treatment actions directly to a patient via the user access terminal 150… to meet the health goals of an individual [0055]. The treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient [0065].)

Regarding Claim 7, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami further discloses the following:
wherein the processing circuitry is further configured to generate the second inference model by …using the first inference model as an initial value …based on the treatment progress data. (Osogami discloses the suggested treatment action as well as a measured state in response to the suggested treatment action can be provided back to the treatment agent 100. The degree of success of the suggested treatment action can be evaluated while also evaluating the set of actions to suggest a new  treatment action in light of the measured state. The degree of success of the suggested treatment action is used to provide reinforcement to the treatment agent 100 [0058]. The measured state in response to the treatment action is interpreted as the progress data.)
Osogami and Bostic do not teach the model being a policy model which is met by Goecks:
…training a policy model …through reinforcement learning (Goecks teaches how to effectively update a policy initially trained with BC using RL as these approaches are inherently optimizing different objective functions. [B]y combining BC [behavior cloning] with subsequent RL [reinforcement learning], it is possible to address the drawbacks of either approach, initializing a significantly more capable and safer agent than with random initialization, while also allowing for further self-improvement without needing to collect additional data from a human demonstrator (p. 2, ¶ 0002-3). The Examiner interprets the initially trained policy as the first model which is the input to train the policy model.)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the system for acquiring treatment progress data related to a patient, training and updating an inference model, and storing the model and data to a computer as disclosed by Osogami to incorporate the use of a policy model which is trained with reinforcement learning as taught by Goecks. This modification would create a system capable of providing a higher level of performance than if the initialized input was randomized (see Goecks, p. 1, ¶ 0002).
Regarding Claim 8, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 7 above. Osogami further discloses the following:
wherein the treatment progress data is factual data relating to the target patient. (Osogami discloses each action can be assessed against a newly measured state using the value model 350. The newly measured state can be provided by, e.g., a condition monitoring device such as, e.g., the condition monitor 202 described above….The value corresponds to a quantitative measurement of the action's contribution towards achieving the goal and the objectives [0071].)
Regarding Claim 14, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami discloses the following:
wherein the processing circuitry is further configured to update the second inference model based on the treatment progress data at a time point following a time point to which the treatment progress data used in a generation of the second inference model belong. (Osogami discloses reinforcement learning can be incorporated into the evaluation mechanism to update parameters of the treatment prediction agent 100 based on changes to the health of the patient [0056].)

Regarding Claim 18, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami discloses the following:
wherein at least one of the target medical care provider or the target patient is a specific individual. (Osogami discloses the treatment prediction agent 100 can provide treatment actions directly to a patient via the user access terminal 150 in the form of, e.g., exercise advice, diet advice, among other healthcare advice to meet health goals of an individual [0055]. This is interpretted as the patient being a specific individual.)

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Osogami, Bostic, and Goecks in view of Steck et al. (US 20080033894 A1).
Regarding Claim 6, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami discloses: 
…the treatment progress data…(Osogami discloses the treatment prediction agent 100 can provide treatment actions directly to a patient via the user access terminal 150… to meet the health goals of an individual [0055]. The treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient [0065].)
Osogami and Bostic do not teach the following limitations met by Steck:
wherein the processing circuitry is further configured to: acquire a third inference model that infers a treatment progress of the target patient; (Steck teaches inference is the process of taking all the factoids and/or elements that are available about a patient and producing a composite view of the patient's progress through disease states [0088].)
and acquire data inferred by the third inference model as the treatment progress data. (Steck teaches the inference component 356 deals with the combination of these factoids, at the same point in time and/or at different points in time, to produce a coherent and concise picture of the progression of the patient's state over time [0087].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of a model which infers the progress of the patient as taught by Steck. This modification would create a system and methods capable of effectively predicting treatment outcome (see Steck, ¶ 0003).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Osogami, Bostic, and Goecks in view of Steck, further in view of Nair et al. (US 11568205 B1).
Regarding Claim 9, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 7 above. Osogami discloses:
…the treatment progress data…(Osogami discloses the treatment prediction agent 100 can provide treatment actions directly to a patient via the user access terminal 150… to meet the health goals of an individual [0055]. The treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient [0065].)
Osogami, Bostic, and Goecks do not teach the following limitations met by Steck: 
wherein the processing circuitry is further configured to: acquire a third inference model that infers treatment progress of the target patient; (Steck teaches inference is the process of taking all the factoids and/or elements that are available about a patient and producing a composite view of the patient's progress through disease states [0088].) 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of a model which infers the progress of the patient as taught by Steck. This modification would create a system and methods capable of effectively predicting treatment outcome (see Steck, ¶ 0003).
Osogami, Bostic, Goecks, and Steck do not teach the following limitations met by Nair: 
and acquire counterfactual data …(Nair teaches regression models may be used to generate synthetic twins by estimating the output of a system in the absence of exposure to the treatment variable (e.g., “counter-factuals”). The difference between actual (possibly de-noised) and synthetic, counter-factual output determines the impact estimate of the treatment variable on the output of the system. In various examples counter-factual data may be a representation of a person and/or system that has not been exposed to the treatment variable (col. 2, lines 38-46).)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of a counterfactual data as taught by Nair. This modification would create a system and methods which minimizes selection bias in a machine learning model (see Nair, col. 2, lines 14-23).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Osogami, Bostic, Goecks, and Steck in view of Brobst et al. (US 11791038 B1).

Regarding Claim 10, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami discloses the following:
wherein the processing circuitry …first inference models respectively… (Osogami discloses the treatment prediction agent 100 can provide the action to a healthcare professional, such as a doctor or nurse, at the care center 140 and/or via the user access terminal 150 [0055].)
Osogami, Bostic, and Goecks do not teach the following limitations met by Steck: 
…of third inference models respectively… (Steck teaches inference is the process of taking all the factoids and/or elements that are available about a patient and producing a composite view of the patient's progress through disease states [0088].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of a model which infers the progress of the patient as taught by Steck. This modification would create a system and methods capable of effectively predicting treatment outcome (see Steck, ¶ 0003).
Osogami, Bostic, Goecks, and Steck do no teach the searching of a plurality of medical care providers and patients which is met by Brobst: 
… is further configured to search among a plurality of… corresponding to a plurality of medical care providers (Brobst teaches a second model is provided for modeling the preferences of a plurality of MPs for select patients and trained on a second training data set comprised of MPs and patients (col. 2, lines 48-50). The medical provider data 108, by comparison, is information regarding a plurality of medical providers (col. 4, lines 32-34).)
and a plurality …corresponding to a plurality of patients for an optimal combination. (Brobst teaches a first model is provided for modeling the preferences of a plurality of patients for select MPs and trained on a first training data set comprised of patients and MPs (col. 2, lines 34-37). The final rank list model 716 utilizes a combination of patient desires and medical provider desires in order to provide a list that provides an optimal ranking of available medical providers to a patient that are most likely to meet the needs of all parties involved and provide a best medical provider list for selection by the patient (col. 11, lines 40-46, see also Fig. 5).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the searching of a plurality of medical care providers and patients as taught by Brobst. This modification would create a system and methods capable of selecting an optimal combination of provider to patient (see Brobst, col. 1, lines 20-29).

Claims 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Osogami, Bostic, Goecks, Brobst and Steck in view of Saini et al. (US 20190272553 A1).

Regarding Claim 11, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami further discloses: 
the first inference model includes… medical care providers, (Osogami discloses the treatment prediction agent 100 can provide the action to a healthcare professional, such as a doctor or nurse, at the care center 140 and/or via the user access terminal 150 [0055]. The treatment agent 100 can suggest a treatment action according to an evaluation of the set of actions. The suggested treatment action as well as a measured state in response to the suggested treatment action can be provided back to the treatment agent 100 [0058].)
outputs a…action of the corresponding medical care provider (Osogami discloses a treatment agent that utilizes states and actions with a state representation model and value model to predict treatment procedure [0010, see also Fig. 3-4]. The treatment agent 200 can suggest an action to take to treat an adverse condition of the patient. The condition monitor 202 can implement the action, or record biological effects upon the implementation of the action by a healthcare professional [0062].)
Osogami, Bostic, and Goecks do not teach the use of a plurality of providers which is met by Brobst:
wherein the target medical care provider includes a plurality of medical care providers,…corresponding to the plurality of medical care providers, (Brobst teaches a second model is provided for modeling the preferences of a plurality of MPs for select patients and trained on a second training data set comprised of MPs and patients (col. 2, lines 48-50). The medical provider data 108, by comparison, is information regarding a plurality of medical providers (col. 4, lines 32-34).)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the searching of a plurality of medical care providers and patients as taught by Brobst. This modification would create a system and methods capable of selecting an optimal combination of provider to patient (see Brobst, col. 1, lines 20-29).
Osogami, Bostic, Goecks, and Brobst do not teach the following limitations met by Steck:
…thus outputs a feature amount, (Steck teaches information may include, for example, computed tomography (CT) images, X-ray images, laboratory test results, doctor progress notes, details about medical procedures, prescription drug information, radiological reports, other specialist reports, demographic information, family history, patient information, and billing (financial) information [0069].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the outputting of feature data as taught by Steck. This modification would create a system and methods capable of effectively predicting treatment outcome (see Steck, ¶ 0003).
Osogami, Bostic, Goecks, Brobst, and Steck do not teach the following limitations met by Saini:
a first common layer that is common between the plurality of… (Saini teaches the dense vector entity representation is generated at a common layer of the multi-task neural network, where the common layer connects various subnets corresponding to the propensity models. The model development system extracts the dense vector entity representation from the common layer [0019].)
and each of the plurality of first individual layers to which a …is input thus (Saini teaches the independent loss model 500 includes an input layer 512 [0063, see Fig. 5]. Figure 5 displays a plurality of inputs (tasks) for the input layers.) 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of common layers as taught by Saini. This modification would create a system and methods capable of incorporating a plurality of models for an accurate picture of a consumer’s intentions (see Saini, ¶ 0005).

Regarding Claim 12, Osogami, Bostic, Goecks, Steck, Brobst, and Saini teach the limitations as seen in the rejection of Claim 11 above. Osogami further discloses: 
the state and a diagnosis and treatment action are input …outputs a treatment progress of the patient. (Osogami discloses the replay buffer 310 can receive an action selected on the basis of the outputs from the value model 350, and a new state from an environment such as…the condition monitor 202… a batch of data in the replay buffer can include,… where s is a previous state, a is a previous action, s’ is a new state [0068].)
Osogami, Bostic, and Goecks do not teach the following limitations met by Steck:
acquire a third inference model that infers treatment progress of the target patient; (Steck teaches inference is the process of taking all the factoids and/or elements that are available about a patient and producing a composite view of the patient’s progress through disease states [0088].) 
and acquire data inferred by the third inference model as the treatment progress data, (Steck teaches the inference component 356 deals with the combination of these factoids, at the same point in time and/or at different points in time, to produce a coherent and concise picture of the progression of the patient's state over time [0087].)
…thus, outputs a feature amount, (Steck teaches information may include, for example, computed tomography (CT) images, X-ray images, laboratory test results, doctor progress notes, details about medical procedures, prescription drug information, radiological reports, other specialist reports, demographic information, family history, patient information, and billing (financial) information [0069].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of a model which infers the progress of the patient and utilizes feature amount data as taught by Steck. This modification would create a system and methods capable of effectively predicting treatment outcome (see Steck, ¶ 0003).
Osogami, Bostic, Goecks, and Steck do not teach the following limitations met by Brobst:
a plurality of patients, (Brobst teaches a first model is provided for modeling the preferences of a plurality of patients for select MPs and trained on a first training data  set comprised of patients and MPs (col. 2, lines 34-37). The final rank list model 716 utilizes a combination of patient desires and medical provider desires in order to provide a list that provides an optimal ranking of available medical providers to a patient that are most likely to meet the needs of all parties involved and provide a best   medical provider list for selection by the patient (col. 11, lines 40- 46, see also Fig. 5).)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the searching of a plurality of patients as taught by Brobst. This modification would create a system and methods capable of selecting an optimal combination of provider to patient (see Brobst, col. 1, lines 20-29).
Osogami, Bostic, Goecks, Steck, and Brobst do not teach the following limitations met by Saini:
includes a second common layer that is common between the plurality of…, and a plurality of second individual layers (Saini teaches with a neural network based consumer reaction model 100, sharing the propensity model weights translates to an architecture where the input data (e.g., the training data 118) passes through a series of common shared layers. Thereafter, a general architecture with action of interest specific layers specializes the shared information for a given action of interest [0058].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of common layers as taught by Saini. This modification would create a system and methods capable of incorporating a plurality of models for an accurate picture of a consumer’s intentions (see Saini, ¶ 0005).
Regarding Claim 13, Osogami, Bostic, Goecks, Brobst, and Saini teach the limitations as seen in the rejection of Claim 12 above. Osogami, Bostic, Goecks, Steck, and Brobst do not teach the following limitations met by Saini:
wherein the processing circuitry is further configured to search, among the plurality of first individual layers for an optimal first individual layer, (Saini teaches the training process for building each propensity model 524 (e.g., identifying an optimal set of weights for a neural network) uses the common layer 528 to capture cross-task signals between the propensity models 524 [0065].  At block 606, the process 600 involves identifying similar users or groups of users of the consumer reaction model 100 using the predictive model 134. In the predictive model  134, the dense vector entity representations received at block 604 are compared to segment users or groups of users into similar groupings [0071].)
for a specific second individual layer of the plurality of second individual layers, or the plurality of second individual layers for a second individual layer optimal for a specific first individual layer of the plurality of first individual layers. (Saini teaches the predictive model 134 is thus able to identify groups of users that include the specified key traits 708 of the identified user [0074, see also Fig. 7].)
Although Saini does not explicitly disclose the identification of an optimal layer, it teaches the identification of groups with the specified necessary traits, and the substitution to searching for specific layers would have been obvious. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the use of determining the optimal common layers as taught by Saini. This modification would create a system and methods capable of incorporating a plurality of models for an accurate picture of a consumer’s intentions (see Saini, ¶ 0005). 
Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Osogami, Bostic, and Goecks in view of Rangarajan et al. (US 20210058235 A1).
Regarding Claim 15, Osogami, Bostic, and Goecks teach the limitations as seen in the rejection of Claim 1 above. Osogami further discloses: 
wherein the processing circuitry is further configured to manage the second inference model … (Osogami discloses the suggested treatment action as well as a measured state in response to the suggested treatment action can be provided back to the treatment agent 100 [0058]. The measured state in response to the treatment action is interpreted as the progress data. The model parameters can, therefore, be updated based on the success or lack thereof of the previous action [0080]. The updated model based on the previous  action is interpreted as the second model. These computer readable program instructions may be provided to a processor …to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart [0028].)
Osogami, Bostic, and Goecks do not teach the use of a blockchain which is met by Rangarajan: 
…in a block chain. (Rangarajan teaches site operator can manage operations …based on a plurality of variables and variable types procured from various sources, data models, Machine Learning (ML) and Artificial Intelligence (AI) algorithmic models using operation control variables procured therefrom. Provenance and security of the variables, such as the operation control variables, are preserved in the blockchain 12. Other variables can also be stored in the blockchain 12 [0017, Fig. 2].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the model being managed on a blockchain as taught by Rangarajan. This modification would create a system and methods capable of automating performances and providing information in the form of analysis for decision making purposes (see Rangarajan, ¶ 0002).
Regarding Claim 16, Osogami, Bostic, Goecks, and Rangarajan teach the limitations as seen in the rejection of Claim 15 above. Osogami further discloses: 
wherein at a time of inference using the second inference model, the processing circuitry is further configured …the second inference model used in the inference and the treatment progress data…with the second inference model and the treatment progress data being associated with each other. (Osogami discloses the suggested treatment action as well as a measured state in response to the suggested treatment action can be  provided back to the treatment agent 100. The degree of success of the suggested treatment action can be evaluated while also evaluating the set of actions to suggest a new treatment action in light of the measured state. The degree of success of the suggested treatment action is used to provide reinforcement to the treatment agent 100 [0058]. The measured state in response to the treatment action is interpretted as the progress data.)
Osogami, Bostic, and Goecks do not teach the use of a blockchain which is met by Rangarajan:
… to add…to a block, (Rangarajan teaches communications …can include identifiers identifying an AI/ML algorithmic model or models, data models, trained algorithmic models, sensor variables, and ROS automation and configuration variables…The ROS nodes 34 can store the control variables … and communicate the control variable to the blockchain 12 through the distributed network 14 in order to create an entry in the blockchain 12 [0023].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate the model being managed on a blockchain as taught by Rangarajan. This modification would create a system and methods capable of automating performances and providing information in the form of analysis for decision making purposes (see Rangarajan, ¶ 0002).
Regarding Claim 17, Osogami, Bostic, Goecks, and Rangarajan teach the limitations as seen in the rejection of Claim 15 above. Osogami further discloses: 
wherein the processing circuitry is further configured to: update the second inference model based on the treatment progress data relating to a time point following a time point to which the treatment progress data used in a generation of the second inference model belong…. at a time of updating the second inference model, (Osogami discloses the suggested treatment action as well as a measured state in response to the suggested treatment action can be provided back to the treatment agent 100. The degree of success of the suggested treatment action can be evaluated while also evaluating the set of actions to suggest a new treatment action in light of the measured state. The degree of success of the suggested treatment action is used to provide reinforcement to the treatment agent 100 [0058]. The measured state in response to the treatment action is interpreted as the progress data. The treatment prediction agent 100 predicts treatment actions for an episode of treatment, such as, e.g., a specified time-period [0056].)
Osogami, Bostic, and Goecks do not teach the limitation of adding a model to the block chain which is met by Rangarajan:
add, the…model to the blockchain (Rangarajan teaches communications …can include identifiers identifying an AI/ML algorithmic model or models, data models, trained algorithmic models, sensor variables, and ROS automation and configuration variables…The ROS nodes 34 can store the control variables … and communicate the control variable to the blockchain 12 through the distributed network 14 in order to create an entry in the blockchain 12 [0023].)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the systems and methods for inferring a treatment action based on the state of a patient and updating the model based upon patient treatment progress as disclosed by Osogami to incorporate adding the model being managed on a blockchain as taught by Rangarajan. This modification would create a system and methods capable of automating performances and providing information in the form of analysis for decision making purposes (see Rangarajan, ¶ 0002).

Response to Arguments
Regarding rejections to Claim 1-20 under 35 USC 112(b), Applicant’s amendments have been considered and the rejection has been withdrawn. 
Regarding rejections to Claims 1-20 under 35 USC 101, Applicant’s arguments have been considered but are not persuasive. The rejection has been updated in light of the amendments above. 
Applicant argues the claims expressly recite a specific improvement to the operation of a machine-learning system itself. Desjardins does not hold that only claims addressing catastrophic forgetting improve machine- learning technology. Rather, Desjardins confirms that claims reciting specific mechanisms for training and updating machine-learning models constitute technological improvements when those mechanisms improve how the learning system operates. The Appeals Review Panel in Desjardins credited improvements such as reduced system complexity, preservation of learned behavior, and improved operational efficiency as improvements to machine-learning technology, based on claim limitations directed to particular parameter-updating techniques. Here, the claims likewise recite a specific training architecture that improves machine- learning system operation. By initializing a patient-specific policy model from a previously trained doctor-specific policy model and retraining that model through reinforcement learning, the claimed invention constrains learning behavior, stabilizes training, and reduces training complexity relative to training from scratch. These improvements arise from the claimed training mechanism itself and are directed to how the machine-learning system is trained and updated, not to any medical decision-making outcome. The fact that the present invention addresses personalization of inference models rather than catastrophic forgetting does not negate that the claimed subject matter improves machine-learning technology in the same doctrinal manner recognized in Desjardins (see p. 9-10 of Applicant’s Remarks). 
Regarding (a), Examiner respectfully disagrees. Examiner initially notes that, as shown above, the limitations pertaining to the training of the model are an additional element. However, the training of the model is recited at a high level of generality such that it amounts to no more than "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f). In Desjardins, the claimed invention trains an existing model in such a way that it improves said model. However, in the instant claims, an existing model is not being trained, and instead a “new” policy model is being trained, so there is no improvement to an existing model. 
Further, Examiner notes that the claims and the specification do not specify how the use of the first model as the initial value through reinforcement learning process improves a policy model or addresses any technical problem. If it is asserted that the invention improves upon conventional functioning of a computer, or upon conventional technology or technological processes, a technical explanation as to how to implement the invention should be present in the specification. That is, the disclosure must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology. An indication that the claimed invention provides an improvement can include a discussion in the specification that identifies a technical problem and explains the details of an unconventional technical solution expressed in the claim, or identifies technical improvements realized by the claim over the prior art (MPEP § 2106.05(a)). The instant claims are not analogous to Desjardins because the instant claims do not provide an apparent improvement to the training of machine-learning models. 
Applicant argues the independent claims now recite "updating the second inference model based on the treatment progress data at a time point following a time point to which the treatment progress data used in a generation of the second inference model belongs." The effect of this feature constitutes an improvement to machine-learning technology as follows: "Preferably, the treatment progress data DI and/or DI' used in the updating process includes only new treatment progress data DI and/or DI' that was not used in a previous updating process. By performing the updating process using only the new treatment progress data DI and/or DI', it is possible to exclude past insights and adopt the latest insights into the improved doctor model; therefore, the accuracy of the output of the improved doctor model is expected to improve. Since the accuracy of the treatment progress data DI' is expected to improve every time it is repeatedly generated, the past treatment progress data DI' can be discarded, the updating process can be therefore performed using only the new treatment progress data DI', and the accuracy of the output of the improved doctor model is thus expected to improve." See Specification, ¶ 0063 (p. 11).
Regarding (b), Examiner respectfully disagrees. The specific operations of training the model are performed by a computer that is recited at a high level of generality (i.e., circuitry) and hence is considered as a tool to perform the generic computer function of receiving data (i.e. inputs). The limitation of updating the model using additional information is identified as part of the abstract idea. An improvement to the abstract idea of the type of data used in a model does not amount to an improvement to a technology or a technical field (see MPEP § 2106.05(a)(III) stating “it is important to keep in mind that an improvement in the abstract idea itself (e.g. a recited fundamental economic concept) is not an improvement in technology. 
An indication that the claimed invention provides an improvement can include a discussion in the specification that identifies a technical problem and explains the details of an unconventional technical solution expressed in the claim, or identifies technical improvements realized by the claim over the prior art (MPEP § 2106.05(a)). Additionally, an important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. McRO, 837 F.3d at 1314-15, 120 USPQ2d at 1102-03 (MPEP § 2106.05(a)(II)). The instant claims seem analogous to MPEP § 2106.05(a)(II) examples that the courts have indicated may not be sufficient to show an improvement to technology, example iii. Gathering and analyzing information using conventional techniques and displaying the result, TLI Communications, 823 F.3d at 612-13, 118 USPQ2d at 1747-48. The only identified improvement by the Applicant is the type of data used to update the model, but the model itself analyzes data in a generic manner. 

Regarding rejections to Claims 1-20 under 35 USC 103, Applicant’s arguments have been considered and are persuasive. Therefore the rejection has been withdrawn. However, upon further consideration, a new rejection has been made in light of the amendments, rejecting the independent claims over Osogami in view of Bostic and Goecks. 



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLIVIA R GEDRA whose telephone number is (571)270-0944. The examiner can normally be reached Monday - Friday 8:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Peter H Choi can be reached at (469)295-9171. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLIVIA R. GEDRA/Examiner, Art Unit 3681                                                                                                                                                                                                        

/PETER H CHOI/Supervisory Patent Examiner, Art Unit 3681
Read full office action
Prosecution Timeline

Show 1 earlier event
Jun 16, 2025
Non-Final Rejection mailed — §101, §103, §112
Sep 16, 2025
Response Filed
Oct 14, 2025
Final Rejection mailed — §101, §103, §112
Jan 29, 2026
Applicant Interview (Telephonic)
Jan 29, 2026
Examiner Interview Summary
Feb 09, 2026
Request for Continued Examination
Feb 11, 2026
Response after Non-Final Action
Apr 09, 2026
Non-Final Rejection mailed — §101, §103, §112 (current)
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
Grant Probability
With Interview (+0.0%)
2y 8m (~3m remaining)
Median Time to Grant
High
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allowance rate.