Last updated: April 19, 2026

Application No. 17/200,106

MACHINE LEARNING FRAMEWORK FOR CONTROL OF AUTONOMOUS AGENT OPERATING IN DYNAMIC ENVIRONMENT

Non-Final OA §102§103

Filed

Mar 12, 2021

Examiner

REDA, MATTHEW J

Art Unit

3665

Tech Center

3600 — Transportation & Electronic Commerce

Assignee

Andro Computational Solutions, LLC

OA Round

3 (Non-Final)

Interview Optional

— +28.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 231 resolved cases, 2023–2026

Examiner Intelligence

REDA, MATTHEW J View full profile →

Grants 54% of resolved cases

Career Allow Rate

126 granted / 231 resolved

+2.5% vs TC avg

Strong +28% interview lift

Without

With

+28.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

46 currently pending

Career history

277

Total Applications

across all art units

Statute-Specific Performance

§101

8.5%

-31.5% vs TC avg

§103

51.1%

+11.1% vs TC avg

§102

20.8%

-19.2% vs TC avg

§112

15.0%

-25.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 231 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1, 3, 5-8, 10, 12-15, 17, and 19-23 are pending and examined below. This action is in response to the claims filed 11/25/25.

	Continued Examination Under 37 CFR 1.114
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/25/25 has been entered.

Response to Amendment
Applicant’s arguments, see Applicant Remarks claim objections filed on 11/25/25, regarding claim objections are moot in view of amendments filed 11/25/25. Claim objections are withdrawn.

Applicant did not address 35 U.S.C. § 112(f) interpretations in view of arguments filed 11/25/25, therefore 35 U.S.C. § 112(f) interpretations are maintained and reiterated below.

Applicant’s arguments, see Applicant Remarks 35 USC § 102. filed on 11/25/25, regarding 35 USC § 102 rejections are persuasive in view of amendments filed 11/25/25. 
However, upon further consideration, new grounds of rejection are made in view of Dupray et al. (US 2020/0265726) below.




Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  The claim limitation is “function approximator” in claims 1, 8, and 15, which is defined as “any mathematical or algorithmic object capable of estimating an unknown function” in Specification ¶30.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 3, 5-7, 15, 17, 19-21, and 23 are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Faust et al. (US 10,254,759).

Regarding claims 1 and 15, Faust discloses an autonomous agent control system including a system/non-transitory computer readable storage medium for control of an autonomous agent, the system comprising (Abstract and Col 8:39-58): 
a sensor communicatively coupled to the autonomous agent and configured for receiving a set of inputs, the sensor including an environmental sensor and a non- environmental sensor (Col 2:38-47 and Col 10:41-67 – sensor subsystem including camera corresponding to the recited environmental sensor receiving inputs and communication network capable of receiving remote inputs corresponding to the recited non-environmental sensor); 
at least one actuator for causing the autonomous agent to perform an action (Col 3:31-44 – control subsystem corresponding to the recited at least one actuator to cause the agent to perform an action); and 
a controller communicatively coupled to the sensor and the at least one actuator, the controller including a function approximator, wherein the controller is configured to perform actions including (Col 3:31-49 – policy engine and control subsystem corresponding to the recited controller including a function approximator which received observations corresponding to the recited coupled to the sensor and outputting actions corresponding to the recited coupled to the actuator): 
causing the at least one actuator to perform the action based on the set of inputs and an operative policy (Col 3:31-49 – mapping observations to a corresponding action then controlling the agent based on the selected action); 
determining whether the set of inputs indicates termination of the operative policy; evaluating a value function for each of a plurality of candidate policies via the function approximator and based on: the set of inputs, a library of training data corresponding to a different autonomous agent in the same environment, and past instances of selecting one of the plurality of candidate policies in response to the set of inputs indicating termination of the operative policy (Col 4:20-25, Col 5:53-6:9, and Col 9:35-48 – cumulative action score corresponding to the recited evaluating a value function is generated for each of a plurality of candidate policies based on the candidate action being performed corresponding to the recited determination that indicates termination of the operative policy for selecting one of a plurality of candidate policies where training data can be stored in a library and the training data is collected from actual autonomous vehicle experiences in the real world in the same environment corresponding to the recited set of inputs from a library of training data from past instances of different autonomous agents in the same environment); 
creating an additional candidate policy for the autonomous agent via an offline reinforcement learning algorithm of the function approximator by converting a non- viable candidate policy into the additional candidate policy using a constraint projection that guarantees compliance with at least one predefined safety constraint while operating offline  (Col 4:66-5:7, Col 8:25-38, and Col 9:6-24 – additional evaluations corresponding to the recited additional candidate policy can be created when a trained policy is not available for a particular environment or driving context corresponding to the recited a non-viable candidate policy the trained reinforcement model is used to augment a baseline nominal policy that uses human-engineered heuristics to determine driving decisions corresponding to the recited converting a non-viable candidate policy into an additional candidate policy for the autonomous agent via trained reinforcement model where an offline reinforcement learning algorithm is interpreted utilizing BRI as an already trained reinforcement model since the trained model is just a processing algorithm and processed utilizing an onboard policy engine which can be a single on site standalone program where the additional evaluations are filtered utilizing other factors, e.g., comfort or legality where legality corresponding to the recited compliance with predefined safety constraints); and 
selecting one of the plurality of candidate policies or the additional candidate policy as a new operative policy (Col 3:50-55 – process is repeated iteratively to provide real time control corresponding to the recited selecting a new operative policy).

Regarding claims 3 and 17, Faust further discloses the library of training data includes data corresponding to a different autonomous agent, another device, or an operator-controlled system operating in the same environment (Col 3:19-30 – training data collected from real-world or simulated driving interactions corresponding to the recited a different autonomous agent or another device, the “or” element only requires one of the group to be present to disclose the invention as claimed).

Regarding claim 5, Faust further discloses the offline reinforcement learning algorithm performs actions including: designating the additional candidate policy as part of the plurality of candidate policies (Col 4:20-47, Col 8:25-32, and Col 11:49-53 – onboard policy engine corresponding to the recited offline reinforcement learning algorithm updates weights of the state function corresponding to the recited updating policies based on the library/training inputs to generate the cumulative action score for the candidate action corresponding to the recited designating the modified candidate policy as part of the plurality of candidate policies where additional evaluations can be performed prior to the selection of the policy).

Regarding claims 6, 7, and 20, Faust further discloses the non-environmental sensor includes a transceiver configured to receive a direct user input, or data from a different autonomous agent (Col 2:38-47 – sensor subsystem including camera corresponding to the recited environmental sensor receiving inputs)
the environmental sensor includes a camera configured for visually monitoring an environment (Col 10:41-67 - communication network capable of receiving remote inputs corresponding to the recited non-environmental sensor).

Regarding claim 19, Faust further discloses the offline reinforcement learning algorithm performs actions including: designating the modified candidate policy as part of the plurality of candidate policies (Col 4:20-47 and Col 11:49-53 – onboard policy engine corresponding to the recited offline reinforcement learning algorithm updates weights of the state function corresponding to the recited updating policies based on the library/training inputs to generate the cumulative action score for the candidate action corresponding to the recited designating the modified candidate policy as part of the plurality of candidate policies).

Regarding claims 21 and 23, Faust further discloses wherein termination of the operative policy includes at least one of: data connections from the autonomous agent being unavailable for at least a threshold time interval, coordinates of the autonomous agent being outside particular boundaries, or a predetermined environmental cue being identified (Col 5:16-38 – detection of a change in an initial observation corresponding to the recited a predetermined environmental cue being identified which indicates the termination of a previous operative policy. The claim element “at least one of” only requires one of the following to be present to disclose the element as claimed).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 8, 10, 12-14, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Faust et al. (US 10,254,759) in view of Dupray et al. (US 2020/0265726).

Regarding claim 8, Faust further discloses a method for control of an autonomous agent, the method comprising (Abstract, Col 1:6-11, and Col 8:39-58):
causing the autonomous agent to sense a set of inputs via an environmental sensor and a non-environmental sensor (Col 2:38-47 and Col 10:41-67 – sensor subsystem including camera corresponding to the recited environmental sensor receiving inputs and communication network capable of receiving remote inputs corresponding to the recited non-environmental sensor);
causing at least one actuator of the autonomous agent to perform an action, based on the set of inputs and an operative policy implemented to achieve a mission, wherein the mission includes causing the UAV to travel to a destination along a predefined path and perform a predefined task at the destination (Col 3:31-49 and Col 6:49-7:21 – mapping observations to a corresponding action then controlling the agent based on the selected action corresponding to the recited perform the action based on the set of inputs and an operative policy where the observations include an appropriate driving context corresponding to the recited operative policy to travel along a particular route corresponding to the recited travel to a destination along a predefined path including determining actions along the route based on the operative policy corresponding to the recited perform a predefined task at the destination), and 
wherein the actuator is communicatively coupled to a controller including a function approximator (Col 3:31-49 – policy engine and control subsystem corresponding to the recited controller including a function approximator which received observations corresponding to the recited coupled to the sensor and outputting actions corresponding to the recited coupled to the actuator);
evaluating a value function for each of a plurality of candidate policies via the function approximator and based on: the set of inputs, a library of training data corresponding to a different autonomous agent in the same environment, and past instances of selecting one of the plurality of candidate policies a library of training data corresponding to a different autonomous agent in the same environment, and past instances of selecting one of the plurality of candidate policies in response to the set of inputs indicating termination of the operative policy (Col 4:20-25, Col 5:53-6:9, and Col 9:35-48 – cumulative action score corresponding to the recited evaluating a value function is generated for each of a plurality of candidate policies based on the candidate action being performed corresponding to the recited determination that indicates termination of the operative policy for selecting one of a plurality of candidate policies where training data can be stored in a library and the training data is collected from actual autonomous vehicle experiences in the real world in the same environment corresponding to the recited set of inputs from a library of training data from past instances of different autonomous agents in the same environment);
creating an additional candidate policy for the autonomous agent via an offline reinforcement learning algorithm of the function approximator by converting a non-viable candidate policy into the additional candidate policy using a constraint projection that guarantees compliance with at least one predefined safety constraint while operating offline Col 4:66-5:7, Col 8:25-38, and Col 9:6-24 – additional evaluations corresponding to the recited additional candidate policy can be created when a trained policy is not available for a particular environment or driving context corresponding to the recited a non-viable candidate policy the trained reinforcement model is used to augment a baseline nominal policy that uses human-engineered heuristics to determine driving decisions corresponding to the recited converting a non-viable candidate policy into an additional candidate policy for the autonomous agent via trained reinforcement model where an offline reinforcement learning algorithm is interpreted utilizing BRI as an already trained reinforcement model since the trained model is just a processing algorithm and processed utilizing an onboard policy engine which can be a single on site standalone program where the additional evaluations are filtered utilizing other factors, e.g., comfort or legality where legality corresponding to the recited compliance with predefined safety constraints); and 
selecting one of the plurality of candidate policies or the additional candidate policy as a new operative policy (Col 3:50-55 – process is repeated iteratively to provide real time control corresponding to the recited selecting a new operative policy).	
While Faust does disclose controls for an autonomous agent as well as defining autonomous vehicles as including aircraft, it does not explicitly disclose a UAV or detecting a communication malfunction.
However, Dupray discloses a UAV system including wherein the mission includes causing the UAV to travel to a destination along a predefined path and perform a predefined task at the destination (¶268 – UAV may automatically carry a payload to a destination and automatically unload the payload at the destination corresponding to the recited the mission includes causing the UAV to travel to a destination along a predefined path and perform a predefined task at the destination), 
determining whether the set of inputs indicates a communication malfunction with the UAV and, in response to the communication malfunction, terminating the operative policy (¶272 – UAV is out of range of the communications hub corresponding to the recited communication malfunction causing the UAV to stay at its location until it regains communications with a hub corresponding to the recited terminating the operative policy);
The combination of the an autonomous agent control system of Faust with the UAV based mission/communications response of Dupray fully discloses the elements as claimed.
It would have been obvious to one of ordinary skill in the art before the filing date to have combined the autonomous agent control system of Faust with the UAV based mission/communications response of Dupray in order to maintain infrastructural operations without the need for proper long range communication equipment while avoiding communications malfunctions (Dupray - ¶386-388).

Regarding claim 10, Faust further discloses the library of training data includes data corresponding to a different autonomous agent, another device, or an operator-controlled system operating in the same environment (Col 3:19-30 – training data collected from real-world or simulated driving interactions corresponding to the recited a different autonomous agent or another device, the “or” element only requires one of the group to be present to disclose the invention as claimed).

Regarding claim 12, Faust further discloses causing the offline reinforcement learning algorithm to perform actions including: designating the modified candidate policy as part of the plurality of candidate policies (Col 4:20-47 and Col 11:49-53 – onboard policy engine corresponding to the recited offline reinforcement learning algorithm updates weights of the state function corresponding to the recited updating policies based on the library/training inputs to generate the cumulative action score for the candidate action corresponding to the recited designating the modified candidate policy as part of the plurality of candidate policies).

Regarding claim 13, Faust further discloses causing the autonomous agent to sense the set of inputs includes causing a transceiver configured to receive a direct user input, or data from a different autonomous agent (Col 10:41-67 - communication network capable of receiving remote inputs corresponding to the recited non-environmental sensor)

Regarding claim 14, Faust further discloses causing the autonomous agent to sense the set of inputs includes causing a camera configured to visually monitor an environment  (Col 2:38-47 – sensor subsystem including camera corresponding to the recited environmental sensor receiving inputs).

Regarding claim 22, while Faust does disclose controls for an autonomous agent as well as defining autonomous vehicles as including aircraft, it does not explicitly disclose a UAV or detecting a communication malfunction.
However, Dupray further discloses the termination of the operative policy includes maintaining at least one process of a terminated operative policy until the new operative policy is implemented, wherein the at least one process includes maintaining a position of the UAV (¶272 – UAV is out of range of the communications hub corresponding to the recited communication malfunction causing the UAV to stay at its location until it regains communications with a hub corresponding to the recited terminating the operative policy until new operative policy is implemented via a new hub).
The combination of the an autonomous agent control system of Faust with the UAV based mission/communications response of Dupray fully discloses the elements as claimed.
It would have been obvious to one of ordinary skill in the art before the filing date to have combined the autonomous agent control system of Faust with the UAV based mission/communications response of Dupray in order to maintain infrastructural operations without the need for proper long range communication equipment while avoiding communications malfunctions (Dupray - ¶386-388).

Additional References Cited
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Dupray et al. (US 11,145,212) discloses various systems, methods, for unmanned aerial vehicles (UAV) are disclosed. In one aspect, UAVs operation in an area may be managed and organized by UAV corridors, which can be defined ways for the operation and movement of UAVs. UAV corridors may be supported by infrastructures and/or systems supported UAVs operations. Support infrastructures may include support systems such as resupply stations and landing pads. Support systems may include communication UAVs and/or stations for providing communications and/or other services, such as aerial traffic services, to UAV with limited communication capabilities. Further support systems may include flight management services for guiding UAVs with limited navigation capabilities as well as tracking and/or supporting unknown or malfunctioning UAVs. (Abstract)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Matthew J Reda whose telephone number is (408)918-7573.  The examiner can normally be reached on Monday - Friday 7-4 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hunter Lonsberry can be reached on (571) 272-7298.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW J. REDA/Primary Examiner, Art Unit 3665

Read full office action

Prosecution Timeline

Mar 12, 2021

Application Filed

Apr 03, 2025

Non-Final Rejection — §102, §103

Jun 04, 2025

Interview Requested

Jun 24, 2025

Examiner Interview Summary

Jun 24, 2025

Applicant Interview (Telephonic)

Jul 08, 2025

Response Filed

Jul 18, 2025

Final Rejection — §102, §103

Nov 24, 2025

Request for Continued Examination

Dec 05, 2025

Response after Non-Final Action

Jan 08, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

16/334,595

Patent 12573248

AN ELECTRONIC CONTROL UNIT FOR A VEHICLE CAPABLE OF CONTROLLING MULTIPLE ELECTRICAL LOADS

2y 5m to grant Granted Mar 10, 2026

17/675,238

Patent 12570509

INDUSTRIAL TRUCK WITH DETECTION DEVICES ON THE FORKS

2y 5m to grant Granted Mar 10, 2026

18/364,575

Patent 12533065

METHOD AND APPARATUS FOR CLASSIFYING SUBJECT INDEPENDENT DRIVER STATE USING BIO-SIGNAL

2y 5m to grant Granted Jan 27, 2026

17/931,435

Patent 12530029

SYSTEM AND METHOD OF ADAPTIVE, REAL-TIME VEHICLE SYSTEM IDENTIFICATION FOR AUTONOMOUS DRIVING

2y 5m to grant Granted Jan 20, 2026

18/296,412

Patent 12525071

METHOD FOR ASSISTED OPERATING SUPPORT OF A GROUND COMPACTION MACHINE AND GROUND COMPACTION MACHINE

2y 5m to grant Granted Jan 13, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

54%

Grant Probability

83%

With Interview (+28.5%)

3y 2m

Median Time to Grant

High

PTA Risk

Based on 231 resolved cases by this examiner. Grant probability derived from career allow rate.