Last updated: April 19, 2026

Application No. 18/091,520

METHOD AND APPARATUS FOR SYNCHRONIZING ACTIONS OF LEARNING DEVICES BETWEEN SIMULATED WORLD AND REAL WORLD

Non-Final OA §102§103§112

Filed

Dec 30, 2022

Examiner

TANG, BRYANT

Art Unit

3658

Tech Center

3600 — Transportation & Electronic Commerce

Assignee

ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

OA Round

1 (Non-Final)

Interview Optional

— -3.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 61 resolved cases, 2023–2026

Examiner Intelligence

TANG, BRYANT View full profile →

Grants 90% — above average

Career Allow Rate

55 granted / 61 resolved

+38.2% vs TC avg

Minimal -3% lift

Without

With

+-3.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

25 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

8.2%

-31.8% vs TC avg

§103

44.9%

+4.9% vs TC avg

§102

29.6%

-10.4% vs TC avg

§112

14.4%

-25.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 61 resolved cases

Office Action

§102 §103 §112

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Joint Inventors
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on December 30th, 2022 and October 29th, 2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are:
“learning device” and “learning devices” in claims 1-16.
“learning apparatus” in claims 9-16.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The terms “learning device” and “learning devices” in claims 1-16, “synchronizing” in claims 1-2 and 9-10, and “short delay time” in claims 3 and 11 are relative terms which render the claims indefinite. The terms “learning device”, “learning devices”, “synchronizing” and “short delay time” are not defined by the claims, the specification does not provide a standard for ascertaining the requisite degree or structure, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
Claims 6-7 and 14-15 recite the limitation "joint". There is insufficient antecedent basis for this limitation in the claim as there is no mention of a joint or any physical component in the claims. Therefore, it is unclear to the examiner what is being referred to in this limitation.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 5, 7-11, 13 and 15-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sandha et al. (“Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays”), published in 2020, herein “Sandha”.

Regarding Claims 1 and 9, Sandha discloses a method and reinforcement learning apparatus for performing a method of synchronizing actions of learning devices (See Abstract, “Deep Reinforcement Learning (RL) […] for a wide variety of robotics applications.”), the method and reinforcement learning apparatus comprising:
inputting an action command to a learning device of a simulated world and a learning device of a real world, so that the learning device of the simulated world and the learning device of the real world reach a target state (See Abstract, “[…] to train Deep RL policies in a simulator and then deploy to the real world, a process called Sim2Real transfer.” See also Section 1, “RL agents make sequential decisions in a Markov Decision Process (MDP) in discrete time steps, where the input to the agent is the current state st of the environment, where t is the current time step, and output is the action at. The environment transitions to the next state st+1 once the action is executed, and in turn used as the input for the next action […]” Examiner notes reinforcement learning is executed in both simulated and real-world domains, with clear description of parallel training and deployment including input actions driving transitions to new states);
determining whether the learning device of the simulated world and the learning device of the real world reach the target state after one unit time (See Section 1, “[…] the states st and st+1 were captured at world clock time τ and τ respectively, the timing delay between state transitions is defined as ∆τ =τ−τ. Note that t is a discrete time step in simulation while ∆τ represents the actual passage of time on a robot.” Examiner notes the RL process describes state transitions with a discrete unit of time between sampled states to determine transitions);
when the learning device of the simulated world and the learning device of the real world reach the target state, determining a first delay time, which is a time until the learning device of the simulated world reaches the target state, and a second delay time, which is a time until the learning device of the real world reaches the target state (See Section 1 as referenced above. See also Fig. 1 shown below and Section 2, “[…] the agent takes action at based on state st, and the environment returns with scalar reward rt and next state st+1. We consider episodic MDPs, where the environment is initialized with state s0 and the interaction with the agent continues until the environment reaches the terminal state sT […] changes to the real state transition time ∆τ – as opposed to fixed discrete time steps in t – directly impacts the state transition […]” Examiner notes t represents the first delay time because it denotes a discrete time step in the simulation, and ∆τ represents the second delay time because it is the change in actual time before the physical agent reaches a terminal state. Furthermore, Fig. 1 below shows parallel processing and coordination between the current state and action of both simulation and actual environment); and

    PNG
    media_image1.png
    184
    546
    media_image1.png
    Greyscale

performing a correction between a state of the learning device of the simulated world and a state of the learning device of the real world in reinforcement learning that performs learning in conjunction with the learning device of the real world, based on the first delay time and the second delay time (See Section 6, “[…] deep reinforcement learning approach that incorporates sampling interval and execution latency into its state space. By utilizing domain randomization with time in the state, TSRL’s policies are robust against varying execution latencies and sampling rates for both Sim2Sim and Sim2Real transfer […] staying within the desired reward budget […] evaluation of time in state policies show that the policies are able to maintain higher rewards across a range of timing characteristics and, thus, can be used in presence of deployment uncertainties impacting the timing characteristics at runtime.” See also Appendix A, “To address variations in the state transition delay, we propose augmenting the agent state with execution time ∆τη and sampling interval ∆τσ measurements […] enables the agent to distinguish between state transitions introduced by variations in delays.”).

Regarding Claims 2 and 10, Sandha further discloses the method of claim 1 and reinforcement apparatus of claim 9, wherein the performing of the correction comprises:
receiving a next state of the simulated world according to N number of an amount of movement per unit time by as much as a difference between the first delay time and the second delay time (See Sections 1-2 as referenced above. See also Abstract, “[…] Time-in State RL (TSRL) approach, which includes delays and sampling rate as additional agent observations at training time […]”); and
synchronizing the state of the learning device of the simulated world with the state of the learning device of the real world based on the next state of the simulated world (See Sections 1-2 as referenced above. See also Section 5, “[…] mechanisms to reduce state transition delays […] by compensating for delays using damping components.”).

Regarding Claims 3 and 11, Sandha further discloses the method of claim 1 and reinforcement apparatus of claim 9, wherein the performing of the correction comprises adding a dummy time by as much as a difference between the first delay time and the second delay time to a learning device having a short delay time among the learning device of the simulated world and the learning device of the real world (See Section 2, “[…] variability in the real world as determined by the sampling interval of sensors ∆τσ, the execution latency ∆τη, and communication delays ∆τm […]” Examiner notes reference 2.2 in Section 2 gives various delay variations and their impacts on the reinforcement learning policies, including adjusting state transitions based on the ”type” of delay. Furthermore, artificial time augmentation is not an inventive concept, but a simple design choice).

Regarding Claims 5 and 13, Sandha further discloses the method of claim 1 and reinforcement apparatus of claim 9, wherein the determining of whether the learning device of the simulated world and the learning device of the real world reach the target state comprises determining whether the learning devices reach the target state, based on a simulated world state, which is a state after the learning device of the simulated world moves for one unit time, and a real world state, which is a state after the learning device of the real world moves for one unit time (See Sections 1-2 as referenced above. Examiner notes explicit definitions for state transition timing at discrete time steps supports comparing states after unit time intervals).

Regarding Claims 7 and 15, Sandha further discloses the method of claim 1 and reinforcement apparatus of claim 9, wherein the learning device of the real world reaches the target state while reducing an error according to a physical state of the real world acting on each joint (See Sections 1-2 as referenced above along with, “[…] policies trained using domain randomization (DR) of timing characteristics. Our results demonstrate that the TSRL policies are robust to the varying state transition delays and, as a result, transfer better across simulations and to real-world environments than the DR policies.” See also Appendix A, “[…] To reduce variance, the advantage function A(st, at) is used for the gradient update […] advantage estimates the relative benefit of taking action at compared to other possible actions in state st. The value network is trained with a mean squared error loss function […]”).

Regarding Claims 8 and 16, Sandha further discloses the method of claim 7 and reinforcement apparatus of claim 15, wherein the learning device of the real world reduces the error according to the physical state of the real world, based on a proportional control, a differential control, or an integral control (See Section 5, “[…] model finite-dimensional systems that may require linearization. The goal in these contexts is to develop robust controllers by approximating the worst case time-delays [33] and sampling variation [34], or by compensating for delays using damping components.” Examiner notes controllers requiring linearization are proportional controllers).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 4, 6, 12 and 14 are rejected under 35 U.S.C. 103 as being obvious over Sandha et al. (“Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays”) in view of Mahmood et al. (US Patent Pub. No. 2020/0074241 A1), herein “Mahmood”.

Regarding Claims 4 and 12, Sandha discloses the method of claim 1 and reinforcement apparatus of claim 9, wherein the determining of whether the learning device of the simulated world and the learning device of the real world reach the target state comprises:
when the learning device of the simulated world or the learning device of the real world does not reach the target state, adding time to reach the target state (See Sections 1-2 as referenced above. Examiner notes variability in timing and stochastic delays necessitates iterative checks and adjustments until state transition occurs).
But does not explicitly disclose repeating the adding of the time until the learning device of the simulated world or the learning device of the real world reaches the target state.
Mahmood, in a similar field of endeavor, teaches repeating the adding of the time until the learning device of the simulated world or the learning device of the real world reaches the target state (See 0063, “[…] reinforcement learning agent 502 operates in the active state for a shorter time period than the first example iteration to ensure that the step times S for each iteration is approximately equal to the desired step time S of 100 ms.” See also 0084, “[…] may be an iterative process in which portions of the method are repeatedly performed.”).
In view of Mahmood’s teachings, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to include, with the delay-aware state comparison between simulated and real agents as disclosed by Sandha, explicit iterations of delay compensation until a target state is reached, with a reasonable expectation of success, since both address state transition delays in reinforcement learning systems, and the combination renders obvious iterative delay addition until state alignment.

Regarding Claims 6 and 14, Sandha does not explicitly disclose the method of claim 1 and reinforcement apparatus of claim 9, wherein the learning device of the simulated world reaches the target state by causing a learning device in a monitor of the simulated world to change each joint by as much as an amount of movement per unit time,
wherein the amount of movement per unit time is determined based on a physical state of the simulated world acting on each joint.
Mahmood, in a similar field of endeavor, teaches the learning device of the simulated world reaches the target state by causing a learning device in a monitor of the simulated world to change each joint by as much as an amount of movement per unit time (See 0005-0006, “[…] task manager executes on a second processor of the computer system, obtains the state data stored […] stores the joint state vector in a second buffer. The task manager processes the state data and generates a joint state vector based on the state data […] the joint state vector may be generated based on the defined objective to be achieved in the real-world environment. The reinforcement learning agent transitions from a suspended state to an active state and, in the active state, obtains the joint state vector from the second buffer. The reinforcement learning agent generates, based on the joint state vector, a joint action vector indicating actions to be performed by the plurality of devices. The joint action vector may be generated by applying a reinforcement learning policy to information of the joint state vector […] transitions back to the suspended state for a defined period of time. The task manager obtains the joint action vector and parses the joint action vector into a plurality of actuation commands respectively corresponding to actions to be performed by the plurality of devices operating in the real-world environment.”),
wherein the amount of movement per unit time is determined based on a physical state of the simulated world acting on each joint (See 0005-0006 as referenced above).
In view of Mahmood’s teachings, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to include, with the delay-aware state comparison between simulated and real agents as disclosed by Sandha, changing a joint vector of the physical agent to reach a target state, with a reasonable expectation of success, since merging joint action buffers with delay-aware state comparisons enables more effective action timing in the corrective process of reinforcement learning.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bryant Tang whose telephone number is (571)270-0145. The examiner can normally be reached M-F 8-5 CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thomas Worden can be reached at (571)272-4876. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRYANT TANG/Examiner, Art Unit 3658    
/JASON HOLLOWAY/Primary Examiner, Art Unit 3658

Read full office action

Prosecution Timeline

Dec 30, 2022

Application Filed

Feb 12, 2026

Non-Final Rejection — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/956,087

Patent 12594942

Method and Apparatus for Detecting Complexity of Traveling Scenario of Vehicle

2y 5m to grant Granted Apr 07, 2026

18/792,340

Patent 12594967

METHOD AND SYSTEM FOR ADDRESSING FAILURE IN AN AUTONOMOUS AGENT

2y 5m to grant Granted Apr 07, 2026

18/081,560

Patent 12583115

ENHANCED VISUAL FEEDBACK SYSTEMS, ENHANCED SKILL LIBRARIES, AND ENHANCED FUNGIBLE TOKENS FOR THE OPERATION OF ROBOTIC SYSTEMS

2y 5m to grant Granted Mar 24, 2026

17/867,302

Patent 12558964

VEHICLE PROVIDING NOTIFICATION INFORMATION FOR SAFETY OF A USER

2y 5m to grant Granted Feb 24, 2026

18/388,027

Patent 12548450

VEHICLE CONTROL DEVICE, VEHICLE CONTROL METHOD, AND NON-TRANSITORY STORAGE MEDIUM STORING VEHICLE CONTROL PROGRAM

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

90%

Grant Probability

87%

With Interview (-3.4%)

2y 6m

Median Time to Grant

Low

PTA Risk

Based on 61 resolved cases by this examiner. Grant probability derived from career allow rate.