Office Action Analysis: 18362166 — Adversarial Cooperative Imitation Learning for Dynamic Treatment

Office Action

§101 §103 §112 §DP
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claims 1-17 are presented for examination. Claim Objections Claim 6 is objected to because of the following informalities: It is unclear if “environment information” is meant to refer to the environment information recited in claim 1 ; the examiner suggests amending the limitation to read “the environment information.” Appropriate correction is required. Specification The disclosure is objected to because of the following informalities: [0001]: "Untied States" should read "United States , " and “ both of which are incorporated by reference in their entireties., incorporated herein by reference in its entirety ” should read “both of which are incorporated by reference in their entireties ” [0016]: "a sequence" should read "and a sequence" [0020]: "body temperature, blood sugar levels" should read "body temperature, and blood sugar levels" [0029]: "output a decoder network" should read "output of decoder network" [0031]: “ In general, the differences between two policies ( …) by comparing the trajectories they generate ” is an incomplete sentence [0033]: " repsents " should read "represents" [0035]: " maximize the probability of the data that is generated by πθ is positive " should read " maximize the probability that the data that is generated by πθ is positive " Appropriate correction is required. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. Claims 1 -17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 1 recites “ iteratively training… the dynamic response regime,” which has insufficient antecedent basis in the claim. Claim 2 recites “the historical trajectories,” and it is unclear if this is meant to refer to “historical trajectories that resulted in a positive outcome,” “historical trajectories that resulted in a negative outcome,” or both. For examination purposes, the examiner will assume it refers to both. Claims 1 and 9 recite “similar” and “dissimilar , ” which a re relative term s that render the claim s indefinite. The term s “similar” and “dissimilar” are not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The remainder of the claims are rejected due to dependency on a rejected base claim. The following is a quotation of 35 U.S.C. 112(d): (d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers. The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph: Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers. Claim 13 is rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Claim 13 fails to further limit the subject matter of claim 9. Applicant may cancel the claim, amend the claim to place the claim in proper dependent form, rewrite the claim in independent form, or present a sufficient showing that the dependent claim complies with the statutory requirements. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 9 -1 7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim s do not fall within at least one of the four categories of patent eligible subject matter because they are directed to a system comprising a machine learning model, a model trainer, and a response interface, which encompasses software per se, as all of these components could be implemented with software . Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness . This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claim s 1 -4, 8-1 2 , and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Komorowski et al. (NPL: “The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care”) (“ Komorowski ”) in view of Nguyen et al. (NPL: “Dual Discriminator Generative Adversarial Nets”) (“Nguyen”), Daianu et al. (US 11650996 ) (“ Daianu ”) , and Osogami ( US20200303068 ) . Regarding claim 1, Komorowski discloses “ A method for responding to changing conditions, comprising: training a model, using a processor, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome ( Komorowski , page 1716, paragraph 5: “A Markov decision process (MDP) was used to model the patient environment and trajectories. The various elements of the model were defined using patient data time series from the training set (a random sample of 80% of MIMIC-III; Fig. 1 ) ” and page 1716, paragraph 4: “In both datasets, we extracted a set of 48 variables, including demographics, Elixhauser premorbid status, vital signs, laboratory values, fluids and vasopressors received (Supplementary Table 2). Patients’ data were coded as multidimensional discrete time series with 4-h time steps, and for each patient, we included up to 72 h of measurements taken around the estimated time of onset of sepsis. The total volume of intravenous fluids and maximum dose of vasopressors administered over each 4-h period defined the medical treatments of interest. The model aims at optimizing patient mortality, so a reward was associated to survival and a penalty to death” ; the examiner notes that the patient time series data that led to survival correspond s to “trajectories that resulted in a positive outcome” and the patient time series data that led to death correspond s to “trajectories that resulted in a negative outcome” ) , by using … [reinforcement learning] to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome ( Komorowski , Methods, Building the computational model, paragraph 5: “ The sequences of successive states and actions are referred to as patients’ trajectories. In our models, we used either hospital mortality or 90-d mortality as the sole defining factor for the system-defined penalty and reward. When a patient survived, a positive reward was released at the end of each patient’s trajectory (a ‘reward’ of + 100) ” and paragraph 6: “ As such, the resulting AI policy suggests the best possible treatment among all the options chosen (relatively frequently) by clinicians ” ; the examiner notes that the patient trajectories generated by the trained AI policy would be similar to historical trajectories that resulted in survival, as trajectories leading to survival were rewarded during the training process ) , and by usin g… [reinforcement learning] to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome ( Komorowski , Methods, Building the computational model, paragraph 5: “… a negative reward (a ‘penalty’ of –100) was issued if the patient died ” and paragraph 6: “ As such, the resulting AI policy suggests the best possible treatment among all the options chosen (relatively frequently) by clinicians ”; the examiner notes that the patient trajectories generated by the trained AI policy would be dissimilar to historical trajectories that resulted in death , as trajectories leading to death were penalized during the training process ) , and … generating a dynamic response regime using the trained model and environment information ” ( Komorowski , page 1716, paragraph 2: “We developed the AI Clinician, a computational model using reinforcement learning, which is able to dynamically suggest optimal treatments for adult patients with sepsis in the intensive care unit (ICU) , ” and page 1716, paragraph 5 : “ We deployed the AI Clinician to solve the MDP and predict outcomes of treatment strategies ” and page 1719, paragraph 5: “ We envision that this system would be used in real-time, with patient data obtained from different streams being fed into electronic health record software fitted with our algorithm, which would suggest a course of action ”; the examiner notes that the AI Clinician corresponds to a “dynamic response regime” and patient data corresponds to “environment information” ) . Komorowski does not appear to explicitly disclose the further limitations of the claim. However, Nguyen discloses an “ adversarial discriminator ” and a “ cooperative discriminator ” ( Nguyen, Section 3: “Our intuition is based on GAN, but we formulate a three-player game that consists of two different discriminators D 1 and D 2 , and one generator G. Given a sample x in data space, D 1 (x) rewards a high score if x is drawn from the data distribution P data , and gives a low score if generated from the model distribution P G . In contrast, D 2 (x) returns a high score for x generated from P G whilst giving a low score for a sample drawn from P data ; the examiner notes that discriminator D 1 (x) is “adversarial” to the generator because it gives a high score to the real data, and discriminator D 2 (x) is “cooperative” with the generator because it gives a high score to the generated data ) , and “iteratively training the adversarial discriminator, the cooperative discriminator, and… [a generator] using a three-party optimization” ( Nguyen, Section 3: “ More formally, D 1 , D 2 and G now play the following three-player minimax optimization game: [see eq (1)]” ). Nguyen and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified Komorowski with the teachings of Nguyen to include using a n adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and including iteratively training the adversarial discriminator, the cooperative discriminator, a nd the dynamic response regime using a three-party optimization , and one would have been motivated to do so for the purpose of avoiding mode collapse and efficiently scaling up to very large datasets ( see Nguyen, Section 1, paragraph 5 ). Neither Komorowski nor Nguyen a ppear s to explicitly disclose the further limitations of the claim. However, Daianu discloses “iteratively training… [a model] until improvement from one iteration to the next has fallen below a predetermined threshold” ( Daianu , Col 10, lines 36-40 : Training a machine learning model may be iterative, and may be performed many times until output 322 produced by intent model 132 exceeds an accuracy threshold, or until the improvement in output 322 falls below a threshold for iterative improvement ) . It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the combination of Komorowski and Nguyen so that the training is performed “until improvement from one iteration to the next has fallen below a predetermined threshold,” as disclosed by Daianu , and one would have been motivated to do so for the purpose of improving resource utilization by stopping training when it is no longer beneficial. Komorowski , Nguyen, and Daianu do not appear to explicitly disclose the further limitations of the claim. However, Osogami discloses “responding to changing environment conditions in accordance with… [a] dynamic response regime ” ( Osogami , [0062]: “The treatment agent 200 can suggest an action to take to treat an adverse condition of the patient. The condition monitor 202 can implement the action, or record biological effects upon the implementation of the action by a healthcare professional” and [0063]: The condition monitor 202 can assess the patient for changes to biomarkers and health indicia as a result of the action. The changes can be used to make a state determination of the adverse condition of the patient. The state determination can be performed by the condition monitor 202 and then provided to the treatment agent 200” and [0065]: “ The treatment agent 200 can then be adjusted to take into account the effectiveness or ineffectiveness of the action by, e.g., updating parameters corresponding to a state representation model and a value model. Additionally, the treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient. According to the values for each action, a next action can be determined and suggested to a user. The treatment agent 200 can continue generating actions until a state corresponding to a resolution of the adverse condition is reached” ; the examiner notes that changes to biomarkers and health indicia of the patient correspond to “changing environment conditions , ” the treatment agent corresponds to “a dynamic response regime,” and the action being suggested to a user or implemented by the condition monitor is the response to changing environment conditions ). Osogami and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention to have modified the combination of Komorowski , Nguyen, and Daianu to include “responding to changing environment conditions in accordance with the dynamic response regime ” as disclosed by Osogami , and one would have been motivated to do so for the purpose of achieving positive patient health o utcome s ( see Osogami , [0020] ). Regarding claim 2, the rejection of claim 1 is incorporated. Komorowski as modified by Nguyen, Daianu , and Osogami further discloses “wherein the historical trajectories include patient treatment trajectories” ( Komorowski , page 1716, paragraph 4: “In both datasets, we extracted a set of 48 variables, including demographics, Elixhauser premorbid status, vital signs, laboratory values, fluids and vasopressors received (Supplementary Table 2). Patients’ data were coded as multidimensional discrete time series with 4-h time steps, and for each patient, we included up to 72 h of measurements taken around the estimated time of onset of sepsis. The total volume of intravenous fluids and maximum dose of vasopressors administered over each 4-h period defined the medical treatments of interest ” ) . Regarding claim 3, the rejection of claim 2 is incorporated. Komorowski as modified by Nguyen, Daianu , and Osogami further discloses “ wherein the positive outcomes are positive patient health outcomes, and the negative outcomes are negative patient health outcomes ” ( Komorowski , page 1716, paragraph 4: “ The model aims at optimizing patient mortality, so a reward was associated to survival and a penalty to death ” ). Regarding claim 4, the rejection of claim 2 is incorporated. Komorowski further discloses “ wherein the environment information … reflect [s] information about a patient being treated ” ( Komorowski , page 1719, paragraph 5: “ We envision that this system would be used in real-time, with patient data obtained from different streams being fed into electronic health record software fitted with our algorithm, which would suggest a course of action ” ). Osogami further discloses “ wherein … the environment conditions reflect information about a patient being treated ” ( Osogami , [0063]: “ The condition monitor 202 can assess the patient for changes to biomarkers and health indicia as a result of the action. The changes can be used to make a state determination of the adverse condition of the patient ” ). Osogami and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention to have modified the combination of Komorowski , Nguyen, and Daianu to include “wherein the environment conditions reflect information about a patient being treated” as disclosed by Osogami , and one would have been motivated to do so for the purpose of achieving positive patient health o utcomes ( see Osogami , [0020] ). Regarding claim 8, the rejection of claim 1 is incorporated. Osogami further discloses “ wherein responding to changing environment conditions comprises automatically performing a responsive action to correct a negative condition ” ( Osogami , [0062]: “The treatment agent 200 can suggest an action to take to treat an adverse condition of the patient. The condition monitor 202 can implement the action, or record biological effects upon the implementation of the action by a healthcare professional” and [0063]: The condition monitor 202 can assess the patient for changes to biomarkers and health indicia as a result of the action. The changes can be used to make a state determination of the adverse condition of the patient. The state determination can be performed by the condition monitor 202 and then provided to the treatment agent 200” and [0065]: “The treatment agent 200 can then be adjusted to take into account the effectiveness or ineffectiveness of the action by, e.g., updating parameters corresponding to a state representation model and a value model. Additionally, the treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient. According to the values for each action, a next action can be determined and suggested to a user. The treatment agent 200 can continue generating actions until a state corresponding to a resolution of the adverse condition is reached” ; the examiner notes that the condition monitor implementing the action to treat an adverse condition of the patient corresponds to “automatically performing a responsive action to correct a negative condition ” ). Osogami and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention to have modified the combination of Komorowski , Nguyen, and Daianu to include “ wherein responding to changing environment conditions comprises automatically performing a responsive action to correct a negative condition ” as disclosed by Osogami , and one would have been motivated to do so for the purpose of achieving positive patient health o utcome s ( see Osogami , [0020] ). Regarding claim 9, Komorowski discloses “ A system for responding to changing conditions, comprising: a machine learning model, configured to generate a dynamic response regime for using environment information ( Komorowski , page 1716, paragraph 2: “We developed the AI Clinician, a computational model using reinforcement learning, which is able to dynamically suggest optimal treatments for adult patients with sepsis in the intensive care unit (ICU) , ” and paragraph 4: “ We deployed the AI Clinician to solve the MDP and predict outcomes of treatment strategies ” and page 1719, paragraph 5: “ We envision that this system would be used in real-time, with patient data obtained from different streams being fed into electronic health record software fitted with our algorithm, which would suggest a course of action ”; the examiner notes that the MDP corresponds to “a machine learning model,” the AI Clinician corresponds to a “dynamic response regime” and patient data corresponds to “environment information” ) ; a model trainer, configured to train the machine learning model, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome ( Komorowski , page 1716, paragraph 5: “A Markov decision process (MDP) was used to model the patient environment and trajectories. The various elements of the model were defined using patient data time series from the training set (a random sample of 80% of MIMIC-III; Fig. 1 ) ” and page 1716, paragraph 4: “In both datasets, we extracted a set of 48 variables, including demographics, Elixhauser premorbid status, vital signs, laboratory values, fluids and vasopressors received (Supplementary Table 2). Patients’ data were coded as multidimensional discrete time series with 4-h time steps, and for each patient, we included up to 72 h of measurements taken around the estimated time of onset of sepsis. The total volume of intravenous fluids and maximum dose of vasopressors administered over each 4-h period defined the medical treatments of interest. The model aims at optimizing patient mortality, so a reward was associated to survival and a penalty to death” ; the examiner notes that the patient time series data that led to survival correspond s to “trajectories that resulted in a positive outcome” and the patient time series data that led to death correspond s to “trajectories that resulted in a negative outcome” ) , by using … [reinforcement learning] to train the machine learning model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome ( Komorowski , Methods, Building the computational model, paragraph 5: “ The sequences of successive states and actions are referred to as patients’ trajectories. In our models, we used either hospital mortality or 90-d mortality as the sole defining factor for the system-defined penalty and reward. When a patient survived, a positive reward was released at the end of each patient’s trajectory (a ‘reward’ of + 100) ” and paragraph 6: “ As such, the resulting AI policy suggests the best possible treatment among all the options chosen (relatively frequently) by clinicians ” ; the examiner notes that the patient trajectories generated by the trained AI policy would be similar to historical trajectories that resulted in survival, as trajectories leading to survival were rewarded during the training process ) , and by using … [reinforcement learning] to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome …” ( Komorowski , Methods, Building the computational model, paragraph 5: “… a negative reward (a ‘penalty’ of –100) was issued if the patient died ” and paragraph 6: “ As such, the resulting AI policy suggests the best possible treatment among all the options chosen (relatively frequently) by clinicians ” ; the examiner notes that the patient trajectories generated by the trained AI policy would be dissimilar to historical trajectories that resulted in death , as trajectories leading to death were penalized during the training process ). Komorowski does not appear to explicitly disclose the further limitations of the claim. However, Nguyen discloses an “adversarial discriminator” and a “cooperative discriminator” ( Nguyen, Section 3: “Our intuition is based on GAN, but we formulate a three-player game that consists of two different discriminators D 1 and D 2 , and one generator G. Given a sample x in data space, D 1 (x) rewards a high score if x is drawn from the data distribution P data , and gives a low score if generated from the model distribution P G . In contrast, D 2 (x) returns a high score for x generated from P G whilst giving a low score for a sample drawn from P data ” ; the examiner notes that discriminator D 1 (x) is “adversarial” to the generator because it gives a high score to the real data, and discriminator D 2 (x) is “cooperative” with the generator because it gives a high score to the generated data ), and “ …to iteratively train the adversarial discriminator, the cooperative discriminator, and… [a generator] using a three-party optimization” ( Nguyen, Section 3: “More formally, D 1 , D 2 and G now play the following three-player minimax optimization game: [see eq (1)]” ). Nguyen and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified Komorowski with the teachings of Nguyen to include using an adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and to iteratively train the adversarial discriminator, the cooperative discriminator, a nd the dynamic response regime using a three-party optimization , and one would have been motivated to do so for the purpose of avoiding mode collapse and efficiently scaling up to very large datasets ( see Nguyen, Section 1, paragraph 5 ). Neither Komorowski nor Nguyen appear to explicitly disclose the further limitations of the claim. However, Daianu discloses “iteratively training… [a model] until improvement from one iteration to the next has fallen below a predetermined threshold” ( Daianu , Col 10, lines 36-40 : “ Training a machine learning model may be iterative, and may be performed many times until output 322 produced by intent model 132 exceeds an accuracy threshold, or until the improvement in output 322 falls below a threshold for iterative improvement ” ). It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the combination of Komorowski and Nguyen so that the training is performed “until improvement from one iteration to the next has fallen below a predetermined threshold,” as disclosed by Daianu , and one would have been motivated to do so for the purpose of improving resource utilization by stopping training when it is no longer beneficial. Komorowski , Nguyen, and Daianu do not appear to explicitly disclose the further limitations of the claim. However, Osogami discloses “ a response interface, configured to trigger a respon se to changing environment conditions in accordance with… [a] dynamic response regime ( Osogami , [0062]: “The treatment agent 200 can suggest an action to take to treat an adverse condition of the patient. The condition monitor 202 can implement the action, or record biological effects upon the implementation of the action by a healthcare professional” and [0063]: The condition monitor 202 can assess the patient for changes to biomarkers and health indicia as a result of the action. The changes can be used to make a state determination of the adverse condition of the patient. The state determination can be performed by the condition monitor 202 and then provided to the treatment agent 200” and [0065]: “The treatment agent 200 can then be adjusted to take into account the effectiveness or ineffectiveness of the action by, e.g., updating parameters corresponding to a state representation model and a value model. Additionally, the treatment agent 200 also determines a value for each possible action to take at a next step in response to the current measured state of the patient. According to the values for each action, a next action can be determined and suggested to a user. The treatment agent 200 can continue generating actions until a state corresponding to a resolution of the adverse condition is reached” ; the examiner notes that the “condition monitor” corresponds to a “response interface” because it communicates a state determination to the treatment agent which triggers an action being determined, changes to biomarkers and health indicia of the patient correspond to “changing environment conditions , ” the treatment agent corresponds to “a dynamic response regime,” and the action being suggested to a user or implemented by the condition monitor is the response to changing environment conditions ). Osogami and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention to have modified the combination of Komorowski , Nguyen, and Daianu to include “ a response interface, configured to trigger a response to changing environment conditions in accordance with the dynamic response regime ” as disclosed by Osogami , and one would have been motivated to do so for the purpose of achieving positive patient health o utcomes ( see Osogami , [0020] ). Regarding claim 10, the rejection of claim 9 is incorporated. Claim 10 is a system claim corresponding to method claim 2, and the rejection follows the same rationale as that of claim 2 above. Regarding claim 11, the rejection of claim 10 is incorporated. Claim 11 is a system claim corresponding to method claim 3, and the rejection follows the same rationale as that of claim 3 above. Regarding claim 12, the rejection of claim 9 is incorporated. Claim 12 is a system claim corresponding to method claim 4, and the rejection follows the same rationale as that of claim 4 above. Regarding claim 17, the rejection of claim 9 is incorporated. Claim 17 is a system claim corresponding to method claim 8, and the rejection follows the same rationale as that of claim 8 above. Claim s 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Komorowski , Nguyen, Daianu , and Osogami , and further in view of Chatzimichail et al. (NPL: “Predicting Asthma Outcome Using Partial Least Square Regression and Artificial Neural Networks”) (“ Chatzimichail ”) . Regarding claim 5, the rejection of claim 1 is incorporated. Komorowski as modified by Nguyen, Daianu , and Osogami discloses “the adversarial discriminator,” “the cooperative discriminator” and “the dynamic response regime,” but does appear to explicitly disclose the further limitations of the claim. However, Chatzimichail discloses implementing an asthma outcome prediction model with a multiple-layer perceptron ( Chatzmichail , Section 3.1: “The prediction algorithm which has been employed in this study consists of two stages: the feature reduction through partial least square regression and the classification stage by MLP [multilayer perceptron] and PNN classifiers” ). Chatzimichail and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the combination of Komorowski , Nguyen, Daianu , and Osogami with the teachings of Chatzmichail to implement the adversarial discriminator, the cooperative discriminator, and the dynamic response regime as multiple-layer perceptrons , and one would have been motivated to do so for the purpose of achieving accurate modeling of complex data patterns, and because MLPs have a clear architecture and simple algorithm compared to other types of neural networks ( see Chatzimichail , Section 1 ). Regarding claim 14, the rejection of claim 9 is incorporated. Komorowski as modified by Nguyen, Daianu , and Osogami discloses “the adversarial discriminator,” “the cooperative discriminator” “the dynamic response regime,” and “the machine learning model” but does appear to explicitly disclose the further limitations of the claim. However, Chatzimichail discloses implementing an asthma outcome prediction model with a multiple-layer perceptron ( Chatzmichail , Section 3.1: “The prediction algorithm which has been employed in this study consists of two stages: the feature reduction through partial least square regression and the classification stage by MLP [multilayer perceptron] and PNN classifiers” ). Chatzimichail and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the combination of Komorowski , Nguyen, Daianu , and Osogami with the teachings of Chatzmichail to implement the adversarial discriminator, the cooperative discriminator, and the dynamic response regime as multiple-layer perceptrons in the machine learning model, and one would have been motivated to do so for the purpose of achieving accurate modeling of complex data patterns, and because MLPs have a clear architecture and simple algorithm compared to other types of neural networks ( see Chatzimichail , Section 1 ). Claim s 6, 7, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Komorowski , Nguyen, Daianu , and Osogami , and further in view of Rampasek et al. (NPL: “Dr. VAE: Drug Response Variational Autoencoder”) (“ Rampasek ”) . Regarding claim 6, the rejection of claim 1 is incorporated. Komorowski , Nguyen, Daianu , and Osogami do not appear to explicitly disclose the further limitations of the claim. However, Rampasek discloses “training an environment model that encodes environment information as a vector in a latent space” ( Rampasek , 2.1: “Perturbation Variational Autoencoder ( PertVAE ) is an unsupervised model for drug-induced gene expression perturbations, that embeds the data space (gene expression) in a lower dimensional latent space. In the latent space we model the drug-induced effect as a linear function, which is trained jointly with the embedding encoder and decoder. We fit PertVAE on “perturbation pairs” [x1; x2] of pre-treatment and post-treatment gene expression with shared stochastic embedding encoder q and decoder p. The original dimension of each vector x is 903 genes. Additionally we use unpaired pre-treatment data (with no know post-treatment state) to improve learning of the latent representation ; the examiner notes that “ PertVAE ” corresponds to an “environment model” and “drug-induced gene expression” corresponds to “environment information” ). Rampasek and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the combination of Komorowski , Nguyen, Daianu , and Osogami with the teachings of Rampasek to include “ wherein training the model comprises training an environment model that encodes environment information as a vector in a latent space ,” and one would have been motivated to do so for the purpose of capturing the essence of the observed environment information that is most useful for prediction ( see Rampasek , Section 5 ). Regarding claim 7, the rejection of claim 1 is incorporated. Komorowski , Nguyen, Daianu , and Osogami do not appear to explicitly disclose the further limitations of the claim. However, Rampasek discloses “wherein… [a] model is implemented as a variational auto-encoder network” ( Rampasek , 2.1: “Perturbation Variational Autoencoder ( PertVAE ) is an unsupervised model for drug-induced gene expression perturbations, that embeds the data space (gene expression) in a lower dimensional latent space ) . Rampasek and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the combination of Komorowski , Nguyen, Daianu , and Osogami to implement the model as a variational auto-encoder network, as disclosed by Rampasek , and one would have been motivated to do so for the purpose of capturing the essence of the observed environment information that is most useful for prediction ( see Rampasek , Section 5 ). Regarding claim 15, the rejection of claim 9 is incorporated. Claim 15 is a system claim corresponding to method claim 6 and is rejected using the same rationale as claim 6 above. Regarding claim 16, the rejection of claim 15 is incorporated. Rampasek further discloses “wherein the environment model is implemented as a variational auto-encoder network…” ( Rampasek , 2.1: “Perturbation Variational Autoencoder ( PertVAE ) is an unsupervised model for drug-induced gene expression perturbations, that embeds the data space (gene expression) in a lower dimensional latent space ) . Rampasek and the instant application both relate to machine learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the combination of Komorowski , Nguyen, Daianu , and Osogami with the teachings of Rampasek to implement the environment model as a variational auto-encoder network in the machine learning model, and one would have been motivated to do so for the purpose of capturing the essence of the observed environment information that is most useful for prediction ( see Rampasek , Section 5 ). Double Patenting Claims 1-12 and 14-17 are rejected on the ground of nonstatutory double patenting a s being unpatentable over claims 1-16 of U.S. Patent No. 11783189 (“reference”) in view of Daianu . Instant claims 1-12 and 14-17 are identical to reference claims 1-16, respectively, except insofar as the instant claims additionally recite “…until improvement from one iteration to the next has fallen below a predetermined threshold. ” Daianu discloses “iteratively training… [a model] until improvement from one iteration to the next has fallen below a predetermined threshold” ( Daianu , Col 10, lines 36-40 : “ Training a machine learning model may be iterative, and may be performed many times until output 322 produced by intent model 132 exceeds an accuracy threshold, or until the improvement in output 322 falls below a threshold for iterative improvement ” ). It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the reference claims to include that the training is performed “until improvement from one iteration to the next has fallen below a predetermined threshold , ” as disclosed by Daianu , and one would have been motivated to do so for the purpose of improving resource utilization by stopping training when it is no longer beneficial. A claim comparison chart is provided below. Instant Application US Pat ent No. 11783189 1. A method for responding to changing conditions, comprising: training a model, using a processor, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and including iteratively training the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization until improvement from one iteration to the next has fallen below a predetermined threshold; generating a dynamic response regime using the trained model and environment information; and responding to changing environment conditions in accordance with the dynamic response regime. 1. A method for responding to changing conditions, comprising: training a model, using a processor, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and including iteratively training the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization; generating a dynamic response regime using the trained model and environment information; and responding to changing environment conditions in accordance with the dynamic response regime. 2. The method of claim 1, wherein the historical trajectories include patient treatment trajectories. 2. The method of claim 1, wherein the historical trajectories that resulted in a positive outcome and the historical trajectories that resulted in a negative outcome include patient treatment trajectories. 3. The method of claim 2, wherein the positive outcomes are positive patient health outcomes, and the negative outcomes are negative patient health outcomes. 3. The method of claim 2, wherein the positive outcomes are positive patient health outcomes, and the negative outcomes are negative patient health outcomes. 4. The method of claim 2, wherein the environment information and the environment conditions reflect information about a patient being treated. 4. The method of claim 2, wherein the environment information and the environment conditions reflect information about a patient being treated. 5. The method of claim 1, wherein the adversarial discriminator, the cooperative discriminator, and the dynamic response regime are implemented as multiple-layer perceptrons . 5. The method of claim 1, wherein the adversarial discriminator, the cooperative discriminator, and the dynamic response regime are implemented as multiple-layer perceptrons . 6. The method of claim 1, wherein training the model comprises training an environment model that encodes environment information as a vector in a latent space. 6. The method of claim 1, wherein training the model comprises training an environment model that encodes environment information as a vector in a latent space. 7. The method of claim 1, wherein the model is implemented as a variational auto-encoder network. 7. The method of claim 6, wherein the model is implemented as a variational auto-encoder network. 8. The method of claim 1, wherein responding to changing environment conditions comprises automatically performing a responsive action to correct a negative condition. 8. The method of claim 1, wherein responding to changing environment conditions comprises automatically performing a responsive action to correct a negative condition. 9. A system for responding to changing conditions, comprising: a machine learning model, configured to generate a dynamic response regime for using environment information; a model trainer, configured to train the machine learning model, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the machine learning model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and to iteratively train the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization until improvement from one iteration to the next has fallen below a predetermined threshold; and a response interface, configured to trigger a response to changing environment conditions in accordance with the dynamic response regime. 9. A system for responding to changing conditions, comprising: a machine learning model, configured to generate a dynamic response regime for using environment information; a model trainer, configured to train the machine learning model, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the machine learning model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and to iteratively train the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization; and a response interface, configured to trigger a response to changing environment conditions in accordance with the dynamic response regime. 10. The system of claim 9, wherein the historical trajectories that resulted in a positive outcome and the historical trajectories that resulted in a negative outcome include patient treatment trajectories. 10. The system of claim 9, wherein the historical trajectories that resulted in a positive outcome and the historical trajectories that resulted in a negative outcome include patient treatment trajectories. 11. The system of claim 10, wherein the positive outcomes are positive patient health outcomes, and the negative outcomes are negative patient health outcomes. 11. The system of claim 10, wherein the positive outcomes are positive patient health outcomes, and the negative outcomes are negative patient health outcomes. 12. The system of claim 9, wherein the environment information and the environment conditions reflect information about a patient being treated. 12. The system of claim 9, wherein the environment information and the environment conditions reflect information about a patient being treated. 13. The system of claim 9, wherein the model trainer is further configured to iteratively train the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization. 14. The system of claim 9, wherein the adversarial discriminator, the cooperative discriminator, and the dynamic response regime are implemented as multiple-layer perceptrons in the machine learning model. 13. The system of claim 9, wherein the adversarial discriminator, the cooperative discriminator, and the dynamic response regime are implemented as multiple-layer perceptrons in the machine learning model. 15. The system of claim 9, wherein the model trainer is further configured to train an environment model that encodes the environment information as a vector in a latent space. 14. The system of claim 9, wherein the model trainer is further configured to train an environmen
Read full office action
Adversarial Cooperative Imitation Learning for Dynamic Treatment

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Adversarial Cooperative Imitation Learning for Dynamic Treatment

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email