Last updated: May 29, 2026
Application No. 18/083,490
REINFORCEMENT LEARNING BASED CLOSED-LOOP NEUROMODULATION SYSTEM

Non-Final OA §103§112
Filed
Dec 17, 2022
Priority
Dec 17, 2021 — provisional 63/290,993
Examiner
HUSSAINI, ATTIYA SAYYADA
Art Unit
3792
Tech Center
3700 — Mechanical Engineering & Manufacturing
Assignee
Purdue Research Foundation
OA Round
3 (Non-Final)
Interview Optional

— +14.5% interview lift. Interview lift (+14.5%) is below the 15.0% threshold. A written response is recommended.
Based on 35 resolved cases, 2023–2026
Examiner Intelligence

HUSSAINI, ATTIYA SAYYADA View full profile →
Grants 57% of resolved cases
Career Allowance Rate
20 granted / 35 resolved
-12.9% vs TC avg
Moderate +14% lift
Without
With
+14.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
24 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
0.7%
-39.3% vs TC avg
§103
91.4%
+51.4% vs TC avg
§102
2.2%
-37.8% vs TC avg
§112
5.0%
-35.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 35 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07 January 2026 has been entered.

Response to Amendment
	This Office Action is responsive to the RCE filed 07 January 2026. As directed by the amendments, claims 1, 7, and 17-18 have been amended and no claims have been cancelled or added. Thus, claims 1-20 are presently pending and under consideration.

Response to Arguments
Response to Arguments Regarding 35 USC § 102/103
	Applicant has amended independent claim 1 to recite the limitation of “wherein the reward function is dynamically updated based on patient-specific neural biomarkers…wherein the processor is configured to implement a patient-specific stimulation policy using an actor-critic reinforcement learning architecture”, independent claim 17 to recite the limitation of “an actor-critic reinforcement learning algorithm to derive a stimulation policy, wherein the actor generates stimulation instructions and the critic evaluates neural state transitions…the policy is refined based on mean-square error between a target and observed neural response, updated continuously during operation”, and independent claim 18 to recite the limitation “the reinforcement learning algorithm that models neural dynamics and refines stimulation policies based on real-time reward signals…wherein the reward function is dynamically updated based on patient-specific neural biomarkers” (emphasis added). 
Independent claim 1
Applicant has amended independent claim 1 to recite the limitation of “wherein the reward function is dynamically updated based on patient-specific neural biomarkers…wherein the processor is configured to implement a patient-specific stimulation policy using an actor-critic reinforcement learning architecture”, and further argues that this limitation is not taught or suggested by DiLorenzo, Rao, or Milosevic (view pg. 8 of Remarks). Examiner agrees and has instead used Paydarfar et al. (US 2022/0143412 A1, previously cited) to teach the recited limitations, as described in detail below. 
Additionally, in response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning (see pg. 8 of Remarks), it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
Therefore, claims 1-6 and 8-16 are rejected under 35 USC 103, as shown in detail below. 
Independent claim 18
Applicant has amended independent claim 18 to recite the limitation “the reinforcement learning algorithm that models neural dynamics and refines stimulation policies based on real-time reward signals…wherein the reward function is dynamically updated based on patient-specific neural biomarkers” and further argues that this limitation is not taught or suggested (view pg. 9 of Remarks). Examiner agrees and has used Rao to teach “the reinforcement learning algorithm that models neural dynamics and refines stimulation policies based on real-time reward signals” and Paydarfar to teach the limitation of “the reward function is dynamically updated based on patient-specific neural biomarkers”. 
Applicant additionally argues that Hulvershorn teaches mapping but not reinforcement learning-based corrective stimulation or personalized reward functions, however, Examiner would like to note that Hulvershorn has not been used to teach the limitation of reinforcement learning-based corrective stimulation or personalized reward functions, rather Rao and Paydarfar have been used.  
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, Applicant argues there is no motivation to combine regarding the prior art reference Hulvershorn which has been used to teach the limitations of dependent claims 19 and 20. Examiner does not necessarily agree as Hulvershorn, the other cited prior art references, and the instant application are directed to neural stimulation and its adjustment, and one would be motivated to combine as Hulvershorn teaches that this mapping can be used as an indicator of increased brain normalcy, and in some cases favorable recovery ([0065]) and augmentation in response to the mapping improves or reestablishes therapeutic efficacy ([0051]).
	Additionally, in response to the argument that DiLorenzo emphasizes threshold-based control and teaches away from complex reinforcement learning architecture (see pg. 9 of Remarks), Examiner would like to note that “the nature of the teachings is highly relevant and must be weighed in substance. A known or obvious composition does not become patentable simply because it has been described as somewhat inferior to some other product for the same use” In re Gurley, 27 F.3d 551,553, 31 USPQ2d 1130, 1132 (Fed. Cir. 1994) and that “the prior art’s mere disclosure of more than one alternative does not constitute a teaching away from any of these alternative because such disclosures does not criticize, discredit, or otherwise discourage the solution claimed..." In re Fulton, 391 F.3d 1195, 1201, 73 USPQ2d 1141, 1146 (Fed. Cir. 2004). See also UCB, Inc. V. Actavis Labs, UT, Inc., 65 F.4th 679, 692, 2023 USPQ2d 448 (Fed. Cir. 2023) ("a reference does not teach away if it merely expresses a general preference for an alternative invention but does not criticize, discredit or otherwise discourage investigation into the invention claimed.") (internal quotations omitted) (quoting DePuy Spine, Inc. V. Medtronic Sofamor Danek, Inc., 567 F.3d 1314, 1327 (Fed. Cir. 2009)); and Schwendimann V. Neenah, Inc., 82 F.4th 1371, 1381, 2023 USPQ2d 1173 (Fed. Cir. 2023) (See MPEP 2145(X)(D)1. and MPEP 2143.01(I)). In this case, although DiLorenzo focuses on threshold-based control there is no criticism, discrediting, or discouraging alternative methods, specifically reinforcement learning architectures. Additionally, one skilled in the art would realize that reinforcement learning architectures have advantages in closed-loop stimulation applications such as DiLorenzo’s, in that reinforcement learning architectures are more robust to external noise in systems, highly adaptable by constantly exploring and learning from their environments , and can be personalized to a patient (Paydarfar [0140]).
Therefore, claims 18-20 are rejected under 35 USC 103, as described in detail below. 
No additional specific arguments were presented with previous 35 U.S.C. 103 rejections of dependent claim 5, nor specifically with respect to the previously cited Chugh reference. 
Therefore, claim 5 remains rejected as described below under 35 U.S.C. 103. 

Claim Objections
Claim 5 is objected to because of the following informalities: 
Missing punctuation at the end of the RMSE function. Claim 5 should read “wherein the mean-square error loss is provided as a function of:
                        
                            
                                
                                    R
                                
                                
                                    M
                                    S
                                    E
                                
                            
                            =
                             
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            t
                                                            a
                                                            r
                                                            g
                                                            e
                                                            t
                                                        
                                                    
                                                    -
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            o
                                                            b
                                                            s
                                                            e
                                                            r
                                                            v
                                                            e
                                                            d
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    .”

  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 14, and 17 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 1 and 14, it is not clear what is meant by “abhorrent neural activity”. Abhorrent is defined as “causing or deserving strong dislike or hatred”, it is not clear what neural activity would be classified as abhorrent. Examiner believes applicant may have meant “abberant” which is defined as “abnormal”. For examination purposes, Examiner will interpret the claim limitation to mean a neural activity that is abnormal. 
Claim 17 recites the limitation "the policy" in lines 10 and 11.  There is insufficient antecedent basis for this limitation in the claim. Applicant is asked to amend lines 10 and 11 to recite “the stimulation policy”, as recited previously in line 7-8 of claim 17.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6, 8-15, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over DiLorenzo (US Patent 7,231,254 B2, previously cited), hereinafter DiLorenzo in view of Rao (US 2023/0137595 A1, previously cited), hereinafter Rao in view of Milosevic et al. (US 2022/0152396 A1, previously cited), hereinafter Milosevic, further in view of Paydarfar et al. (US 2022/0143412 A1, previously cited), hereinafter Paydarfar.
Regarding claim 1, DiLorenzo discloses a neuromodulation system configured to stimulate and control a nervous system (Abstract: “A neurological control system for modulating activity of any component or structure comprising the entirety or portion of the nervous system”), comprising : 
a sensor that is configured to monitor the nervous system (Column  4, lines 22-27: “at least one sensor, each constructed and arranged to sense at least one parameter, including but not limited to physiologic values and neural signals, which is indicative of at least one of disease state, magnitude of symptoms, and response to therapy”); 
a recording amplifier that is electrically coupled to the sensor (Figure 2: 57-63 amplifier, Column 18, line 52-54: “amplifiers 57-63 may be affixed to or situated proximate to their associated electrode arrays 38, 50-54”), the recording amplifier configured to read and process stimuli detected by the sensor (Column 18, lines 34-43: “signal conditioning circuit 76 includes an EMG amplifier 59 and filter 66, each constructed and arranged to amplify and filter, respectively, the EMG signals received from EMG electrode array 50. Similarly, signal conditioning circuit 76 also includes an EEG amplifier 60 and filter 67, accelerometer (ACC) amplifier 61 and filter 68, acoustic (ACO) amplifier 62 and filter 69, peripheral nerve electrode (PNE) amplifier 63 and filter 70 and intracranial (IC) recording electrode (ICRE) amplifier 58 and filter 65.”), and output a signal (Column 19, lines 6-7: “generate conditioned sensed signals 84, 83 and 78-82, respectively”); 
a processor communicatively coupled to the recording amplifier (Column 19, lines 8-10: “Signal processor 71 processes the conditioned sensed neural response signals 78-84 generated by signal conditioning circuit 76”, Figure 2: signal processor 71) , the processor executing steps to monitor the signal provided by the recording amplifier and output an instruction (Column 19, lines 21-25: “Signal processor 71 extracts relevant information from the sensed condition signals, and control circuit 72 uses this extracted information in the calculation of an output neuromodulation signal (NMS) 998.”); and 
a stimulator communicatively coupled to the processor (Figure 2, Column 21, lines 36-28: “Output stage circuit 77 includes a pulse generator 73, an output amplifier 74 and a multiplexor 75. Pulse generator 73 generates one or more stimulus waveforms”, intracranial (IC) stimulating electrode array 37 ), the stimulator is configured to provide a non-binary stimulation based on the instruction provided by the processor (Column 19, lines 25-27: “Neuromodulation signal 998 subsequently travels along stimulator output path 111 to IC stimulating electrode array 37.”, Column 21, lines 45-52: “As noted, the stimulus waveforms comprising the neuromodulation signal (NMS) generated by output stage circuit 77 are applied to patient through intracranial (IC) stimulating electrode array 37. Pulse generator 73 generates a single waveform when single channel stimulation is to be used, and a plurality of waveforms when multiple channel stimulation is to be used. It may generate monophasic or biphasic waveforms.”); 
wherein the processor is a closed loop system (view Figure 2, Column 52, lines 8: “closed-loop system”) , the processor continuously measures and searches for an abhorrent neural activity (Column 16, line 56 – Column 17, line 6: “tremor are quantified and monitored by any sensors over time as indicators of disease state…Changes in these and other parameters are compared to current levels of, and changes in, treatment parameters. These changes are then used by aggregate disease state estimator 195 to estimate the response to therapy as functions of various electrical stimulation treatment parameters. Electrical stimulation treatment parameters are adjusted by control circuit 72 in real-time to provide optimal control of disease state”,  Column 4, lines 44-50: “signal processing means for processing said conditioned sensed neural response signals to determine neural system states, including but not limited to a single or plurality of physiologic states and a single or plurality of disease states; and controller means for adjusting neural modulation signal in response to sensed neural response to signal.”, Column 23, lines 49-50: “periodic sampling of neural activity in tissue being stimulated”), and autonomously delivers the instruction to the stimulator to apply the non-binary stimulation when an abhorrent neural activity is detected (Column 4, lines 60-63: “The disease state is monitored as treatment parameters are automatically varied, and the local or absolute minimum in disease state is achieved as the optimal set of stimulation parameters is converged upon”, Column 3, lines 53-57: “neurological control system generates neural modulation signals delivered to a nervous system component through one or more intracranial (IC) stimulating electrodes in accordance with treatment parameters”, Column 74, lines 46-50: “present invention can continuously monitor and maintain a desired level of therapy, controlling desired neural states to remain within stable regions and out of regions in which neurological signs and symptoms may develop”)  
DiLorenzo fails to disclose the processor executing steps to implement a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function, and output an instruction based on a policy derived from the reinforcement learning algorithm, the policy is refined based on reward signals derived from mean-square error between target and observed neural response, wherein the reward function is dynamically updated based on patient-specific neural biomarkers,  wherein the processor is configured to implement a patient-specific stimulation policy using an actor-critic reinforcement learning architecture. 
However, Rao teaches a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function, and output an instruction based on a policy derived from the reinforcement learning algorithm ([0018] “the present technology uses model-based or model-free reinforcement learning within a co-processor to learn a mapping ("policy") from input recordings to output stimulation patterns in a manner that optimizes a reward or cost function to achieve a desired outcome in augmentation or restoration of neural function… the present technology uses model-based planning to plan a sequence of stimulation patterns and select the best next stimulation pattern(s) for optimizing the reward/cost function or reaching goal states…Such technologies include, but are not limited to, electrical, optical, magnetic, and ultrasound- based recording and stimulation methods.”, Figure 2), the policy is refined based on reward signals derived from an error between target and observed neural response ([0015] “In U.S. Patent No. 11,083,895…a method is described…The induced behavioral output to the predicted behavioral output is compared to generate an error signal. Parameters of the first artificial network can be adjusted using the error signal and the second artificial network to optimize the stimulation patterns and other output signals to achieve restoration and/or augmentation goals.”, Examiner would also like to note that US Patent 11,083,895 is published 08/10/2021 which is before the effective filing date of the claimed invention). 
It would have been prima facia obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified DiLorenzo to incorporate the teachings of Rao to have a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function, and output an instruction based on a policy derived from the reinforcement learning algorithm, as these prior art references are directed to optimizing stimulation applied to the brain. One would be motivated to do this as the method trained by reinforcement learning can process the neural inputs and transform each input to an optical output stimulation pattern intended to maximize total future expected reward and these stimulation patterns can cause a desired response such as movement or speech, a sensory percept, or even abstract thoughts, memories, or feelings, as recognized by Rao [0068]. 
DiLorenzo and Rao, alone or in combination, fail to teach that the policy is refined based on reward signals derived from mean-square error between a target and an observed neural response. 
However, Milosevic teaches a system and method for delivering deep brain stimulation of a target structure wherein “the device 101 is being operated in closed loop mode, then the method 200 proceeds to step 220 where it is determined whether the measured neuronal output is similar to a desired neuronal output indicating that effective treatment is being performed… The similarity between the measured neuronal output and the desired neuronal output may be determined by obtaining an error signal from the difference between the measured neuronal output and the desired neuronal output. A measure of the error signal, such as the mean square average error may then be compared to a threshold and if it is larger than the threshold then one or more of the stimulus parameters may be adjusted at step 222 so that the next neuronal response that is generated in response to the next DBS stimulus that is generated and applied at step 212.” ([0116]). 
It would have been prima facia obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified DiLorenzo and Rao to incorporate the teachings of Milosevic to have the policy is refined based on reward signals derived from mean-square error between a target and an observed neural response, as these prior art references and the instant application are directed to stimulating the brain. One would be motivated to do this as this is an accurate measurement to determine a difference between the desired neuronal output and the measured output to adjust the stimulation to be more effective, as recognized by Milosevic ([0116]). 
DiLorenzo, Rao, and Milosevic, alone or in combination, fail to teach wherein the reward function is dynamically updated based on patient-specific neural biomarkers,  wherein the processor is configured to implement a patient-specific stimulation policy using an actor-critic reinforcement learning architecture. 
However, Paydarfar teaches methods and systems for phase-agnostic stimuli including waveforms generated via a programmable arbitrary waveform generator ([0008]) which generates stimulation signals based on a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function and output an instruction based on a policy derived from the reinforcement learning algorithm ([0017] “In particular embodiments, the adjusting comprises applying a reinforcement learning process where a reward is based on the feedback to adjust the electrical stimulation.”, [0022] “adjusting the electrical stimulus based on the recorded responses comprises searching for a waveform optimization using reinforcement learning”, [0024] “the second stimulation signal is generated using a reinforcement learning algorithm.”), wherein the reward function is dynamically updated based on patient-specific neural biomarkers ([0139] “Based on the response of the environment (how close did the stimulus get to generating an action potential, while also considering the energy of the stimulus), the environment returns a reward to the agent, as well as information about its new state as a result of the application of the stimulus. The agent then uses this reward and new state information to inform its next action.”, [0140] “Clinicians and researchers can “personalize” the reinforcement learning agent to account for characteristics seen in individual patients. The idea of personalization is possible in reinforcement learning because of the adaptability discussed earlier. Because of their ability to adapt to changing environments, reinforcement learning agents that are trained on a general model can easily be applied to specific cases after sufficient training. This is a major advantage over current systems, which are more rigid. Clinicians must personalize stimulation to patients by working with the patient and understanding the important characteristics that affect stimulus parameters. This personalized reinforcement learning agent could be implemented with a two part training process: (1) train agent initially on computational models to familiarize it with general system dynamics and (2) improve the agent performance on a specific patient based on personal characteristics related to the condition of that patient.”, [0142], emphasis added), wherein the processor is configured to implement a patient-specific stimulation policy using an actor-critic reinforcement learning architecture ([0017]) “In certain embodiments the deep reinforcement learning algorithm is a deep deterministic policy gradients (DDPG) algorithm comprising an actor network and a critic network”). 
It would have been prima facie obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified DiLorenzo, Rao, and Milosevic to incorporate the teachings of Paydarfar to have wherein the reward function is dynamically updated based on patient-specific neural biomarkers,  wherein the processor is configured to implement a patient-specific stimulation policy using an actor-critic reinforcement learning architecture, as these prior art references are directed to delivering stimulation to the brain with learning algorithms. One would be motivated to do this as this type of model is robust to external noises in the system and adaptability to personalize the agent for individual patient, as recognized by Paydarfar ([0140]). 
Regarding claim 2, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further discloses wherein the non-binary stimulation includes variable stimulation parameters having three or more states (Column 5, lines 3-9 : “This optimization includes selection of electrode polarities, electrode configurations stimulating parameter waveforms, temporal profile of stimulation magnitude, stimulation duty cycles, baseline stimulation magnitude, intermittent stimulation magnitude and timing, and other stimulation parameters.”, Column 15, lines 43-47 : “modification of actual stimulation parameters and allowable ranges thereof, including but not limited to pulse width, pulse amplitude, interpulse interval, pulse frequency, number of pulses per burst frequency.”, Column 21 lines 49-52: “Pulse generator 73 generates a single waveform when single channel stimulation is to be used, and a plurality of waveforms when multiple channel stimulation is to be used. It may generate monophasic or biphasic waveforms.”).  
Regarding claim 3, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 2 (as shown above). DiLorenzo further discloses wherein the variable stimulation parameters include at least one of a stimulation amplitude, a number of pulse stimuli, and a duration of stimuli (Column 5, lines 3-9 : “This optimization includes selection of electrode polarities, electrode configurations stimulating parameter waveforms, temporal profile of stimulation magnitude, stimulation duty cycles, baseline stimulation magnitude, intermittent stimulation magnitude and timing, and other stimulation parameters.”, Column 15, lines 43-47 : “modification of actual stimulation parameters and allowable ranges thereof, including but not limited to pulse width, pulse amplitude, interpulse interval, pulse frequency, number of pulses per burst frequency.”).  
Regarding claim 4, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further teaches wherein the sensor is a non-invasive device (Figure 47, Column 7, lines 37-38: “a set of noninvasive…sensors and neuromodulators in a human patient”).  
Regarding claim 6, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further discloses wherein the neuromodulation system is provided as a single device that is configured to be one of partially and completely implantable subcutaneously (Column 14, lines 21-43: “The neurological control system 999 includes one or more implantable components 249 including a plurality of sensors… patient interface module 55 and supervisory module 56 remain external to the body of the patient”, Figure 2).  
Regarding claim 8, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further discloses wherein the processor outputs a quantified metric of an environmental response from the signal ( Column 62, lines 10-13: “These measures of neural chaos and the correlation between neural chaos measurements may be alternatively calculated within signal processor 71 as a neural state or disease state estimate”, Claim 1: “signal processor performs a neural state estimation”, Column 4, lines 57-67: “By sensing and quantifying the magnitude and frequency of tremor activity in the patient, a quantitative representation of the level or "state" of the disease is determined. The disease state is monitored as treatment parameters are automatically varied… The disease state may be represented as a single value or a vector or matrix of values; in the latter two cases, a multi variable optimization algorithm is employed with appropriate weighting factors.”).
Regarding claim 9, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 8 (as shown above). DiLorenzo further discloses wherein the quantified metrics include statistics of at least one of overstimulation and aberrant stimulation (Column 16, line 63 – Column 17, line 3: “the sensed tremor characteristics include, but are not limited to, magnitude, frequency, duration and frequency of occurrence of tremors. Changes in these and other parameters are compared to current levels of, and changes in, treatment parameters. These changes are then used by aggregate disease state estimator 195 to estimate the response to therapy as functions of various electrical stimulation treatment parameters”, Column 19, lines 49-54: “control law error history information…,battery voltage history information, and power consumption history information”, Column 41, lines 61-65: “(1) undertreatment, i.e. tremor amplitude exceeds desirable level or (2) overtreatment or excess stimulation, in which more electrical energy is delivered than is actually needed. In the overtreatment case, battery life is unnecessarily reduced.”).
Regarding claim 10, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 8 (as shown above). DiLorenzo further discloses wherein the quantified metrics include a record of parameters measured by the sensor and/or the recording amplifier (Column 20, lines 21-34: “Patient information module 55 queries signal processor 71 for present and time histories of monitored values. Time histories of selected variables in signal processor 71 and control circuit 72 are stored in memory module 240 for subsequent retrieval by patient interface module 55 and supervisory module 56. Selected variables include but are not limited to disease state, tremor frequency, tremor magnitude, EMG magnitude, EMG frequency spectra (EMG magnitude within frequency ranges), and acceleration of limb, head, mandible, or torso. Selected variables may also include disease state, frequency spectra of limb, torso, and head movements, as determined by EMG and accelerometer signals.”, Column 15, lines 40-43: “Such monitoring includes observation of time history of disease state, stimulation parameters, response to therapy, and control law parameters, including time-varying adaptive controller parameters”, Column 19, lines 47-55: “Control circuit 72 provides stimulation waveform parameter history information, disease state history information, control law state variable history information, control law error history information, control law input variable history information, control law output variable history information, stimulating electrode impedance history information, sensory input history information, battery voltage history information, and power consumption history information to patient interface module 55 and supervisory module 56.”).  
Regarding claim 11, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further discloses wherein the processor includes system driven capabilities to enable the neuromodulation system to autonomously select at least one of the strength of the non-binary stimulation (Column 4, lines 55-57 : “the present invention is that it performs automated determination of the optimum magnitude of treatment.”), and the desired target of the non-binary stimulation(Column 24, line 67- Column 25, line 10: “Such reference values include but are not limited to target disease state levels, target symptom levels, including target tremor level, and threshold levels. Threshold levels include but are not limited to disease and symptom levels, including tremor threshold levels. Neural modulation amplitude may be increased when at least one of disease state and symptom level exceed the corresponding threshold. Similarly neural modulation amplitude may be decreased or reduced to zero when either the disease state or symptom level falls below the corresponding threshold”) .  
Regarding claim 12, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further discloses wherein the processor autonomously recalibrates the neuromodulation system (Column 46, lines 1-9 :” the present invention is that it performs automated determination of the optimum magnitude of treatment--by sensing and quantifying the magnitude and frequency of tremor activity in the patient, a quantitative representation of the level or "state" of the disease is determined. The disease state is monitored as treatment parameters are automatically varied, and the local or absolute minimum in disease state is achieved as the optimal set of stimulation parameters is converged upon.”, Column 40, lines 57-61 : “Control law circuit block 231 has an autocalibration mode in which multivariable sweeps through stimulation parameters and stimulating electrode configurations are performed to automate and expedite parameter and configuration optimization.”).  
Regarding claim 13, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 12 (as shown above). DiLorenzo further discloses wherein the processor continuously recalibrates the neuromodulation system (Column 40, lines 57-67: “an autocalibration mode in which multivariable sweeps through stimulation parameters and stimulating electrode configurations are performed to automate and expedite parameter and configuration optimization…this autocalibration feature permits real-time adjustment and optimization of stimulation parameters and electrode configuration”, Column 28, lines 10-13: “adapted in real-time by an algorithm which sweeps the threshold through a range of values to search for values at which action potential spikes are consistently recorded”, Column 74, lines 46-50: “present invention can continuously monitor and maintain a desired level of therapy, controlling desired neural states to remain within stable regions and out of regions in which neurological signs and symptoms may develop”).  
Regarding claim 14, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further discloses wherein the non-binary stimulation is applied in real time as the abhorrent neural activity is detected (Column 16, line 56 – Column 17, line 6: “tremor are quantified and monitored by any sensors over time as indicators of disease state… Electrical stimulation treatment parameters are adjusted by control circuit 72 in real-time to provide optimal control of disease state”, Column 3, lines 53-57: “neurological control system generates neural modulation signals delivered to a nervous system component through one or more intracranial (IC) stimulating electrodes in accordance with treatment parameters”, Column 23, lines 40-43: “Multiplexor 75 allows delivery of neural modulation signals to neural tissue concurrent with monitoring of activity of same neural tissue; this facilitates real-time monitoring of disease state and response to treatment.”)
Regarding claim 15, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo further discloses wherein at least one of the sensor, the recording amplifier, and the stimulator wirelessly communicate with the processor  (Column 11, lines 35-46: “A set of electrical wires provides the means for communication between the intracranial and extracranial components; however, it should be understood that alternate systems and techniques such as radiofrequency links, optical (including infrared) links with transcranial optical windows, magnetic links, and electrical links using the body components as conductors, may be used without departing from the present invention. Specifically, in the illustrative embodiment, connecting cable 8 provides electrical connection between intracranial components 246 and stimulating and recording circuit 26”, Column 11, lines 56-58: “connecting cable 8 provides electrical connection between intracranial electrodes 246 and stimulating and recording circuit 26”). 
Regarding claim 16, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of claim 1 (as shown above). DiLorenzo, Rao, and Milosevic, alone or in combination, fail to teach wherein the stimulator includes a plurality of stimulators, and the processor is configured to train responses across the plurality of stimulators to one of a single reward function and a unique reward function across spatially disparate stimulators.
However, Paydarfar teaches apparatus and methods for applying a stimuli (Abstract), the stimulus can be a directional deep brain stimulus ([0022]) wherein “a therapeutic treatment to a subject with multiple electrical stimulations, comprising: operatively connecting multiple electrodes to a subject where an electrical stimulation can be applied to a number of electrodes and a response can be received from a number of electrodes; applying an electrical stimulus to a plurality of the electrodes where the applied stimulus comprises a different waveform applied to two or more electrodes; recording a plurality of responses received from a plurality of electrodes responsive to the applied electrical stimulus; adjusting the electrical stimulus based on the recorded responses to resolve a new electrical stimulus comprising a matrix of outputs; and applying the new electrical stimulus to a number of electrodes.” ([0021]),  “the adjustment in output waveform comprises one or more of a change to pulse, amplitude, timing, duration, shape. In certain embodiments the different waveforms applied to two or more electrodes are independent…adjusting the electrical stimulus based on the recorded responses comprises searching for a waveform optimization using reinforcement learning.”.
Although, Paydarfar does not explicitly state a single reward function and unique reward function, it would have been obvious to one skilled in the art to interpret the independent waveforms delivered to the two or more electrodes wherein the waveform optimization is done using reinforcement learning which produces reward functions (Figure 8 of PAYDARFAR) is training responses across the plurality of stimulators to one of a single reward function and a unique reward function across spatially disparate stimulators. 
It would have been prima facia obvious for one of ordinary skill in the art to have modified the system of DiLorenzo. Rao, and Milosevic to incorporate the teachings of Paydarfar to have the stimulator includes a plurality of stimulators, and the processor is configured to train responses across the plurality of stimulators to one of a single reward function and a unique reward function across spatially disparate stimulators, as both prior art references are directed to brain stimulation. One would be motivated to do this as this allows for the optimization of current stimulation to target areas to produce a desired outcome while limiting the amount of current sent to non-target areas of the brain, and results in different voltage value can be sent to different contacts on the electrode, resulting in an asymmetrical stimulus focused on a specific area within the target system, as recognized by PAYDARFAR ([0130]). 

Regarding claim 18, DiLorenzo discloses a method of using a neuromodulation system configured to stimulate and control a nervous system (Abstract: “A neurological control system for modulating activity of any component or structure comprising the entirety or portion of the nervous system”, Column 1, lines 32-36: “The present invention relates generally to neurological disease and, more particularly, to intracranial stimulation for optimal control of movement disorders and other neurological disease.), the method comprising the steps of: 
Providing the neuromodulation system (neurological control system 999) having a sensor (sensory input modalities 247), a recording amplifier (amplifiers 57-63), a processor (Signal processor 71), and a stimulator (Column 21, lines 36-28: “Output stage circuit 77 includes a pulse generator 73, an output amplifier 74 and a multiplexor 75. Pulse generator 73 generates one or more stimulus waveforms”, intracranial (IC) stimulating electrode array 37 ), the sensor is configured to monitor the nervous system (Column  4, lines 22-27: “at least one sensor, each constructed and arranged to sense at least one parameter, including but not limited to physiologic values and neural signals, which is indicative of at least one of disease state, magnitude of symptoms, and response to therapy”); The recording amplifier is electrically coupled to the sensor (Figure 2: 57-63 amplifier, Column 18, line 52-54: “amplifiers 57-63 may be affixed to or situated proximate to their associated electrode arrays 38, 50-54”), the recording amplifier is configured to read and process stimuli detected by the sensor (Column 18, lines 34-43: “signal conditioning circuit 76 includes an EMG amplifier 59 and filter 66, each constructed and arranged to amplify and filter, respectively, the EMG signals received from EMG electrode array 50. Similarly, signal conditioning circuit 76 also includes an EEG amplifier 60 and filter 67, accelerometer (ACC) amplifier 61 and filter 68, acoustic (ACO) amplifier 62 and filter 69, peripheral nerve electrode (PNE) amplifier 63 and filter 70 and intracranial (IC) recording electrode (ICRE) amplifier 58 and filter 65.”), and output a signal (Column 19, lines 6-7: “generate conditioned sensed signals 84, 83 and 78-82, respectively”); the processor is communicatively coupled to the recording amplifier (Column 19, lines 8-10: “Signal processor 71 processes the conditioned sensed neural response signals 78-84 generated by signal conditioning circuit 76”, Figure 2: signal processor 71) , the processor executing steps to monitor the signal provided by the recording amplifier and output an instruction (Column 19, lines 21-25: “Signal processor 71 extracts relevant information from the sensed condition signals, and control circuit 72 uses this extracted information in the calculation of an output neuromodulation signal (NMS) 998.”); and the stimulator is communicatively coupled to the processor (Figure 2, Column 21, lines 36-28: “Output stage circuit 77 includes a pulse generator 73, an output amplifier 74 and a multiplexor 75. Pulse generator 73 generates one or more stimulus waveforms”, intracranial (IC) stimulating electrode array 37 ), the stimulator is configured to provide a non-binary stimulation based on the instruction provided by the processor (Column 19, lines 25-27: “Neuromodulation signal 998 subsequently travels along stimulator output path 111 to IC stimulating electrode array 37.”, Column 21, lines 45-52: “As noted, the stimulus waveforms comprising the neuromodulation signal (NMS) generated by output stage circuit 77 are applied to patient through intracranial (IC) stimulating electrode array 37. Pulse generator 73 generates a single waveform when single channel stimulation is to be used, and a plurality of waveforms when multiple channel stimulation is to be used. It may generate monophasic or biphasic waveforms.”);
 monitoring neural stimuli of the nervous system using the sensor (Column  4, lines 22-27: “at least one sensor, each constructed and arranged to sense at least one parameter, including but not limited to physiologic values and neural signals, which is indicative of at least one of disease state, magnitude of symptoms, and response to therapy”, Column 28, lines 33-35: “characterize the behavior of the individual and groups of neurons, the activity of which is sensed by intracranial recording electrode array 38… “; 
measuring the neural stimuli of the nervous system by using the recording amplifier (Column 18, lines 34-43: “signal conditioning circuit 76 includes an EMG amplifier 59 and filter 66, each constructed and arranged to amplify and filter, respectively, the EMG signals received from EMG electrode array 50. Similarly, signal conditioning circuit 76 also includes an EEG amplifier 60 and filter 67, accelerometer (ACC) amplifier 61 and filter 68, acoustic (ACO) amplifier 62 and filter 69, peripheral nerve electrode (PNE) amplifier 63 and filter 70 and intracranial (IC) recording electrode (ICRE) amplifier 58 and filter 65.”, Column 19, lines 6-7: “generate conditioned sensed signals 84, 83 and 78-82, respectively”);
quantifying, via the processor, a neural dynamics of the nervous system (Column 4, lines 57-60: “By sensing and quantifying the magnitude and frequency of tremor activity in the patient, a quantitative representation of the level or "state" of the disease is determined.”, Column 12, lines 2-7: “Sensory input modalities 247 provide information to stimulating and recording unit 26. As will be described in greater detail below, such information is processed by stimulating and recording unit 26 to deduce the disease state and progression and its response to therapy.”); and 
applying the non-binary stimulation to the nervous system (Column 4, lines 44-50: “signal processing means for processing said conditioned sensed neural response signals to determine neural system states, including but not limited to a single or plurality of physiologic states and a single or plurality of disease states; and controller means for adjusting neural modulation signal in response to sensed neural response to signal”, Column 17, lines 4-11: “Electrical stimulation treatment parameters are adjusted by control circuit 72 in real-time to provide optimal control of disease state. Modulation parameters are optimized to achieve at least one of minimization of disease state, minimization of symptoms of disease, minimization of stimulation magnitude, minimization of side effects, and any constant or time-varying weighted combination of these.”).  
DiLorenzo fails to disclose the processor executing steps to implement a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function, and output an instruction based on a policy derived from the reinforcement learning algorithm that models neural dynamics and refines stimulation policies based on real-time reward signals, the policy is refined based on reward signals derived from mean-square error between target and observed neural response, wherein the reward function is dynamically updated based on patient-specific neural biomarkers. 
However, Rao teaches the processor executing steps to implement a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function, and output an instruction based on a policy derived from the reinforcement learning algorithm that models neural dynamics ([0018] “the present technology uses model-based or model-free reinforcement learning within a co-processor to learn a mapping ("policy") from input recordings to output stimulation patterns in a manner that optimizes a reward or cost function to achieve a desired outcome in augmentation or restoration of neural function… the present technology uses model-based planning to plan a sequence of stimulation patterns and select the best next stimulation pattern(s) for optimizing the reward/cost function or reaching goal states…Such technologies include, but are not limited to, electrical, optical, magnetic, and ultrasound- based recording and stimulation methods.”, Figure 2: 140, [0043] “the method includes training an emulator network to model an appropriate neurological region(s) of the subject using reinforcement learning or other training technique. In some embodiments, supervised learning may be used to train the emulator network. The emulator network may be trained with data including stimulation patterns as inputs and resulting neurological states as outputs. The stimulation patterns may be based on stimulations provided to one or more neural regions, and the neurological states may be states measured at one or more neural regions, which may be the same or different than the stimulated regions.”), and refines stimulation policies based on real-time reward signals ([0018] “the present technology uses model-based or model-free reinforcement learning within a co-processor to learn a mapping (“policy”) from input recordings to output stimulation patterns in a manner that optimizes a reward or cost function to achieve a desired outcome in augmentation or restoration of neural function”, [0057]-[0058], Claims 1-2) the policy is refined based on reward signals derived from an error between target and observed neural response ([0015] “In U.S. Patent No. 11,083,895…a method is described…The induced behavioral output to the predicted behavioral output is compared to generate an error signal. Parameters of the first artificial network can be adjusted using the error signal and the second artificial network to optimize the stimulation patterns and other output signals to achieve restoration and/or augmentation goals.”, Examiner would also like to note that US Patent 11,083,895 is published 08/10/2021 which is before the effective filing date of the claimed invention). 
It would have been prima facia obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified DiLorenzo to incorporate the teachings of Rao to have a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function, and output an instruction based on a policy derived from the reinforcement learning algorithm that models neural dynamics and refines stimulation policies based on real-time reward signals, as these prior art references are directed to optimizing stimulation applied to the brain. One would be motivated to do this as the method trained by reinforcement learning can process the neural inputs and transform each input to an optical output stimulation pattern intended to maximize total future expected reward and these stimulation patterns can cause a desired response such as movement or speech, a sensory percept, or even abstract thoughts, memories, or feelings, as recognized by Rao [0068]. 
DiLorenzo and Rao, alone or in combination, fail to teach that the policy is refined based on reward signals derived from mean-square error between a target and an observed neural response. 
However, Milosevic teaches a system and method for delivering deep brain stimulation of a target structure wherein “the device 101 is being operated in closed loop mode, then the method 200 proceeds to step 220 where it is determined whether the measured neuronal output is similar to a desired neuronal output indicating that effective treatment is being performed… The similarity between the measured neuronal output and the desired neuronal output may be determined by obtaining an error signal from the difference between the measured neuronal output and the desired neuronal output. A measure of the error signal, such as the mean square average error may then be compared to a threshold and if it is larger than the threshold then one or more of the stimulus parameters may be adjusted at step 222 so that the next neuronal response that is generated in response to the next DBS stimulus that is generated and applied at step 212.” ([0116]). 
It would have been prima facia obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified DiLorenzo and Rao to incorporate the teachings of Milosevic to have the policy is refined based on reward signals derived from mean-square error between a target and an observed neural response, as these prior art references and the instant application are directed to stimulating the brain. One would be motivated to do this as this is an accurate measurement to determine a difference between the desired neuronal output and the measured output to adjust the stimulation to be more effective, as recognized by Milosevic ([0116]).
DiLorenzo, Rao, and Milosevic, alone or in combination, fail to teach wherein the reward function is dynamically updated based on patient-specific neural biomarkers. 
However, Paydarfar teaches methods and systems for phase-agnostic stimuli including waveforms generated via a programmable arbitrary waveform generator ([0008]) which generates stimulation signals based on a reinforcement learning algorithm to determine an optimal stimulation strategy based on a reward function and output an instruction based on a policy derived from the reinforcement learning algorithm ([0017] “In particular embodiments, the adjusting comprises applying a reinforcement learning process where a reward is based on the feedback to adjust the electrical stimulation.”, [0022] “adjusting the electrical stimulus based on the recorded responses comprises searching for a waveform optimization using reinforcement learning”, [0024] “the second stimulation signal is generated using a reinforcement learning algorithm.”), wherein the reward function is dynamically updated based on patient-specific neural biomarkers ([0139] “Based on the response of the environment (how close did the stimulus get to generating an action potential, while also considering the energy of the stimulus), the environment returns a reward to the agent, as well as information about its new state as a result of the application of the stimulus. The agent then uses this reward and new state information to inform its next action.”, [0140] “Clinicians and researchers can “personalize” the reinforcement learning agent to account for characteristics seen in individual patients. The idea of personalization is possible in reinforcement learning because of the adaptability discussed earlier. Because of their ability to adapt to changing environments, reinforcement learning agents that are trained on a general model can easily be applied to specific cases after sufficient training. This is a major advantage over current systems, which are more rigid. Clinicians must personalize stimulation to patients by working with the patient and understanding the important characteristics that affect stimulus parameters. This personalized reinforcement learning agent could be implemented with a two part training process: (1) train agent initially on computational models to familiarize it with general system dynamics and (2) improve the agent performance on a specific patient based on personal characteristics related to the condition of that patient.”, [0142], emphasis added). 

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar as applied to claim 1 above, and further in view of Chugh, Akshita. "MAE, MSE, RMSE, coefficient of determination, adjusted R squared—which metric is better?." Medium (2020)., hereinafter Chugh.
Regarding claim 5, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the neuromodulation system of Claim 1 (as shown above). DiLorenzo, Rao, and Milosevic, alone or in combination, fail to explicitly teach wherein the mean-square error loss is provided as a function of:
                
                    
                        
                            R
                        
                        
                            M
                            S
                            E
                        
                    
                    =
                     
                    
                        
                            1
                        
                        
                            n
                        
                    
                    
                        
                            ∑
                            
                                i
                            
                            
                                n
                            
                        
                        
                            
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    t
                                                    a
                                                    r
                                                    g
                                                    e
                                                    t
                                                
                                            
                                            -
                                            
                                                
                                                    x
                                                
                                                
                                                    o
                                                    b
                                                    s
                                                    e
                                                    r
                                                    v
                                                    e
                                                    d
                                                
                                            
                                        
                                    
                                
                                
                                    2
                                
                            
                        
                    
                
            
Milosevic, however, does teach that the error signal is a “difference between the measured neuronal output and the desired neuronal output…measure of the error signal, such as the mean square average error” and Chugh teaches methods of evaluating the accuracy of machine learning models which can be done with a mean squared error (pg. 2) where “the mean squared error represented the average of the squared difference between the original and predicted values in the data set” with the following function:

    PNG
    media_image1.png
    160
    342
    media_image1.png
    Greyscale

Mean Squared Error Function (Chugh)
	It would be obvious to one skilled in the art that the mean square average error that Milosevic recites is calculated using the function recited in Chugh as these functions are commonly used for measuring the accuracy of machine learning models. 


Claim(s) 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar as applied to claim 18 above, and further in view of Hulvershorn (US 2013/0066137 A1), hereinafter Hulvershorn.
Regarding claim 19, DiLorenzo in view of Rao in view of Milosevic in view of Paydarfar teaches the method of Claim 18 (as shown above). DiLorenzo, Rao, Milosevic, and Paydarfar, alone or in combination, fail to explicitly teach the method further comprising a step of mapping the neural dynamics of the nervous system in response to the corrective stimulation.  
However, Hulvershorn teaches methods and systems for establishing, adjusting, and/or modulating parameters for neural stimulation based, on functional and/or structure measurements (Abstract) wherein the method further comprises a step of mapping the neural dynamics of the nervous system in response to the non-binary stimulation ( Figure 7B: brain map 750, [0069] “the patients in the investigational group received targeted electrical and/or magnetic stimulation (e.g., targeted subthreshold cortical stimulation) in addition to the above-described physical therapy. As shown by the second brain map 750 in FIG. 7B, the investigational group patients had a significantly reduced volume of functional activation after treatment. This consolidation in the investigational group resembles events seen during spontaneous recovery from stroke. As discussed above, such consolidation can be an indication of a patient's response to therapy.”)
It would have been prima facia obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of DiLorenzo, Rao, Milosevic, Paydarfar to incorporate the teachings of Hulvershorn to have the method further comprise of a step of mapping the neural dynamics of the nervous system in response to the non-binary stimulation, as both prior art references are directed neural stimulation and its adjustment. One would be motivated to do this to be able to use this mapping as an indicator of increased brain normalcy, and in some cases, favorable recovery, as recognized by Hulvershorn ([0065]). 
Regarding claim 20, DiLorenzo in view of Rao in view Milosevic in view of Paydarfar further in view of Hulvershorn teaches the method of Claim 19 (as shown above). DiLorenzo, Rao, Milosevic, and Paydarfar alone or in combination, fail to teach the method further comprising a step of augmenting the corrective stimulation in response to the neural response mapping.
However, Hulvershorn teaches the method further comprising a step of augmenting the non-binary stimulation in response to mapping the neural dynamics of the nervous system ([0051] “if an area of activation in a patient's brain in which consolidation is generally not expected (e.g., an area of hyperactivity in a tinnitus patient) becomes undesirably more or less active over time after stimulation during one or more therapy sequences, then the stimulation parameters could be adjusted to decrease or increase activation/consolidation in order to improve or reestablish therapeutic efficacy. This habituation, and the corresponding reduction or alleviation in response to modification of neural stimulation parameters, could be monitored using some measurement of neural activity”).
It would have been prima facia obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of DiLorenzo, Rao, Milosevic, Paydarfar to incorporate the teachings of Hulvershorn to have the method further comprise a step of augmenting the non-binary stimulation in response to mapping the neural dynamics of the nervous system, as both prior art references are directed to neural stimulation and its adjustment. One would be motivated to do this to improve or reestablish therapeutic efficacy, as recognized by Hulvershorn ([0051]).

Allowable Subject Matter
Claim 7 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter:  
Regarding claim 7, the closest prior art of record is Rao (US 2023/0137595 A1).
The prior art of record and the other cited prior art reference do not disclose, teach, or suggest wherein the actor and critic module is trained using a neural dynamics model that adapts to patient-specific responses over time (emphasis). 
Claim 17 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action. The following is an examiner’s statement of reasons for allowance:
Regarding claim 17, the closest prior art of record is Rao (US 2023/0137595 A1).
The prior art of record and the other cited prior art reference do not disclose, teach, or suggest that an actor-critic reinforcement learning algorithm to derive a stimulation policy, wherein the actor generates stimulation instructions and the critic evaluates neural state transitions and the policy is refined based on mean-square error between a target and observed neural response, updated continuously during operation (emphasis). 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Pineau J, Guez A, Vincent R, Panuccio G, Avoli M. Treating epilepsy via adaptive neurostimulation: a reinforcement learning approach. Int J Neural Syst. 2009 Aug;19(4):227-40. doi: 10.1142/S0129065709001987. PMID: 19731397; PMCID: PMC4884089. discloses treating epilepsy via adaptive neurostimulation: a reinforcement learning approach (3. Adaptive Control Algorithm).
Lee D, Seo H, Jung MW. Neural basis of reinforcement learning and decision making. Annu Rev Neurosci. 2012;35:287-308. doi: 10.1146/annurev-neuro-062111-150512. Epub 2012 Mar 29. PMID: 22462543; PMCID: PMC3490621. discloses a neural basis of reinforcement learning wherein the value functions can be adjusted based on the current environment (Abstract)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATTIYA SAYYADA HUSSAINI whose telephone number is (703)756-5921. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Niketa Patel can be reached at 5712724156. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ATTIYA SAYYADA HUSSAINI/Examiner, Art Unit 3792                                                                                                                                                                                                        
/NIKETA PATEL/Supervisory Patent Examiner, Art Unit 3792
Read full office action
Prosecution Timeline

Dec 17, 2022
Application Filed
Feb 27, 2025
Non-Final Rejection mailed — §103, §112
Jul 25, 2025
Response Filed
Sep 12, 2025
Final Rejection mailed — §103, §112
Nov 11, 2025
Response after Non-Final Action
Jan 07, 2026
Request for Continued Examination
Feb 17, 2026
Response after Non-Final Action
Mar 19, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/719,005
Patent 12636493
NEUROSTIMULATION RESPONSE AND CONTROL
4y 1m to grant Granted May 26, 2026
18/128,200
Patent 12629515
SYSTEM FOR PLANNING TUMOR-TREATING ELECTRIC FIELDS BASED ON TEMPERATURE CONTROL AND ABSORBED ENERGY IN BODY AND SYSTEM FOR PERFORMING ELECTRIC FIELD THERAPY INCLUDING THE SAME
3y 1m to grant Granted May 19, 2026
17/927,956
Patent 12616836
METHOD FOR TREATING BACTERIAL AND VIRAL DISEASES USING ELECTRICAL STIMULATION
3y 5m to grant Granted May 05, 2026
17/542,317
Patent 12609198
Medical Diagnostic Kit
4y 4m to grant Granted Apr 21, 2026
17/811,860
Patent 12582315
ELECTROCARDIOGRAM ANALYSIS APPARATUS, ELECTROCARDIOGRAM ANALYZING METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
3y 8m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
57%
Grant Probability
72%
With Interview (+14.5%)
3y 2m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 35 resolved cases by this examiner. Grant probability derived from career allowance rate.