Last updated: April 19, 2026
Application No. 18/984,087
LEARNING PHYSICS-BASED INTERACTIONS FROM DEMONSTRATION

Non-Final OA §103
Filed
Dec 17, 2024
Examiner
FIGUEROA, JAIME
Art Unit
3656
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Honda Motor Co. Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +12.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 839 resolved cases, 2023–2026
Examiner Intelligence

FIGUEROA, JAIME View full profile →
Grants 86% — above average
Career Allow Rate
718 granted / 839 resolved
+33.6% vs TC avg
Moderate +13% lift
Without
With
+12.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
14 currently pending
Career history
853
Total Applications
across all art units
Statute-Specific Performance

§101
9.3%
-30.7% vs TC avg
§103
38.8%
-1.2% vs TC avg
§102
26.3%
-13.7% vs TC avg
§112
16.8%
-23.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 839 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Pursuant to communications filed on 12/17/2024, this is a First Action Non-Final Rejection on the Merits wherein claims 1-20 are currently pending in the instant application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/08/2025 and 04/08/2025 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the Examiner.
                                   Examiner's Note
Examiner has cited particular paragraphs and/or columns / lines numbers or figures in the reference(s) as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant, in preparing the responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. Applicant is reminded that the Examiner is entitled to give the broadest reasonable interpretation to the language of the claims. Examiner has also cited references in PTO-892 but not relied on, which are relevant and pertinent to the applicant’s disclosure, and may also be reading (anticipatory/obvious) on the claims and claimed limitations. Applicant is advised to consider the references in preparing the response/amendments in-order to expedite the prosecution.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al (NPL “Simulation and Retargeting of Complex Multi-Character Interactions”–2023), “Zhang” in view of Lin et al (NPL “Multi-Task Feature Interaction Learning”–2016), “Lin”.
Regarding claims 1 and 11, Zhang discloses a system and the associated method for learning physics-based interactions from demonstration (e.g., via method for reproducing complex multi-character interactions for physically simulated humanoid characters using deep reinforcement learning), comprising: 

    PNG
    media_image1.png
    462
    688
    media_image1.png
    Greyscale

a memory storing one or more instructions (although not shown, Zhang discloses the implementation of “storing or recording” motion data and interactions – These features imply the existence of a memory / storage device during the simulation); and 
a processor (e.g., see section 4.1 disclosing all experiments are run using 640 CPUs) executing one or more of the instructions stored on the memory to perform: 
learning a (e.g., see figure 2 below; see abstract disclosing a novel reward formulation based on an interaction graph that measures distances between pairs of interaction landmarks. This reward encourages control policies to efficiently imitate the character’s motion while preserving the spatial relationships of the interactions in the reference motion…...see section “introduction” disclosing a novel learning-based method that provides a physics-based retargeting of complex interactions for multiple characters. More specifically, given reference motions that capture interactions between people, we learn control policies (a.k.a. controllers) of simulated characters via deep reinforcement learning that imitate not only the motion of the individuals but also the interactions between them……... see section 3.3 “Interaction Graph” disclosing the semantics of the interaction happening between agents (or between an agent and an object) during the motion, we define the notion of an Interaction Graph (IG), a graph-based spatial descriptor where the information on interactions is stored in its vertices and edges…….The example interaction graph in Figure 2 includes both edges connecting nodes on a single character and edges connecting nodes on different characters. The edges within the character help maintain the motion quality of an individual character, while the edges between the characters act as guides for maintaining the relative position of the body parts of the two characters…… see section3.4.3 “Positional Graph Similarity” To compare the positional graph similarity between two graphs, we separately consider the similarity of the two graph edges connecting each individual character 𝐸𝑠𝑒𝑙 𝑓 (self-connections) and between characters 𝐸𝑐𝑟𝑜𝑠𝑠 (cross connections).); and  
 
    PNG
    media_image2.png
    404
    618
    media_image2.png
    Greyscale
  

    PNG
    media_image3.png
    478
    690
    media_image3.png
    Greyscale

training a policy for controlling interactions between the first character and the second character based on using the (see “Abstract” disclosing Our approach uses a novel reward formulation based on an interaction graph that measures distances between pairs of interaction landmarks. This reward encourages control policies to efficiently imitate the character’s motion while preserving the spatial relationships of the interactions in the reference motion…... see section 4.1 “Experimental Setup” disclosing to speed up the learning for all of the experiments below, we pre-train an imitation policy of a single character on sequences that can be performed without a partner (e.g. high five, greetings, and push-ups). When training an interaction-graph based policy, we reuse the pre-trained decoder and allow its weights to be updated during the training. The decoder is reusable because the latent dimensions are unchanged. The encoder trained simultaneously with the pre-trained decoder is not reusable due to differences in input dimensions. This design makes it easier for the policy to maintain balance at the initial phase of learning, and therefore results in faster training. The training time of a policy varies based on the difficulty of the sequence. For easier sequences, it takes about 300 million to 500 million samples to train one policy. For harder sequences, it could take more than 2 billion samples to train a policy.).
            Zhang teaches substantially the claimed invention regarding the “interaction graph”, and merely recites the “sparsely interactions”, but it is silent to teach that the claimed “interaction graph” uses the “sparse embedded” technique. 
However, in the same field of endeavour or analogous art, Lin teaches the claimed features implemented in a Multi-Task Feature Interaction Learning. Lin goes on and further teaches the implementation of embedded interaction based on sparsity tensor for machine learning tasks (see figure 1 and see section 3, page 1738, right column Embedded Interaction Approach. When the response from one task is related to complicated feature interactions, the patterns of such interactions may be captured by a low-dimensional space, resulting in a low-rank interaction matrix. When there are multiple related tasks, they could have a shared low-dimensional space, i.e., different interaction matrices may share the same set of rank-1 basis matrices, but have different weights associated with these basis matrices. When collectively represented by a tensor, we end up with a low-rank tensor. During the learning process, each task contributes their subspace information to facilitate learning of the share low-dimensional subspace, which in turn, improves the feature space…… See section 4.3, page 1740, Embedded Interaction Approach disclosing further details.). 
                
    PNG
    media_image4.png
    714
    584
    media_image4.png
    Greyscale

Therefore, it is prima facie obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zhang to include the idea of sparse embedded interaction, as taught by Lin, for the benefit that during the learning process, each task contributes their subspace information to facilitate learning of the share low-dimensional subspace, which in turn, improves the feature space.
Regarding claim 16, Zhang discloses a system for learning physics-based interactions from demonstration (e.g., via method for reproducing complex multi-character interactions for physically simulated humanoid characters using deep reinforcement learning), comprising: 
a memory storing one or more instructions (although not shown, Zhang discloses the implementation of “storing or recording” motion data and interactions – These features imply the existence of a memory / storage device during the simulation); and 
a processor (e.g., see section 4.1 disclosing all experiments are run using 640 CPUs) executing one or more of the instructions stored on the memory to perform: 
learning a or a pose of the second character and a current interaction graph (e.g., see figure 2 below; see abstract disclosing a novel reward formulation based on an interaction graph that measures distances between pairs of interaction landmarks. This reward encourages control policies to efficiently imitate the character’s motion while preserving the spatial relationships of the interactions in the reference motion…...see section “introduction” disclosing a novel learning-based method that provides a physics-based retargeting of complex interactions for multiple characters. More specifically, given reference motions that capture interactions between people, we learn control policies (a.k.a. controllers) of simulated characters via deep reinforcement learning that imitate not only the motion of the individuals but also the interactions between them……... see section 3.3 “Interaction Graph” disclosing the semantics of the interaction happening between agents (or between an agent and an object) during the motion, we define the notion of an Interaction Graph (IG), a graph-based spatial descriptor where the information on interactions is stored in its vertices and edges…….The example interaction graph in Figure 2 includes both edges connecting nodes on a single character and edges connecting nodes on different characters. The edges within the character help maintain the motion quality of an individual character, while the edges between the characters act as guides for maintaining the relative position of the body parts of the two characters…… see section3.4.3 “Positional Graph Similarity” To compare the positional graph similarity between two graphs, we separately consider the similarity of the two graph edges connecting each individual character 𝐸𝑠𝑒𝑙 𝑓 (self-connections) and between characters 𝐸𝑐𝑟𝑜𝑠𝑠 (cross connections).); and  

      
    PNG
    media_image2.png
    404
    618
    media_image2.png
    Greyscale
  

     
    PNG
    media_image3.png
    478
    690
    media_image3.png
    Greyscale

training a policy for controlling interactions between the first character and the second character based on using the (see “Abstract” disclosing Our approach uses a novel reward formulation based on an interaction graph that measures distances between pairs of interaction landmarks. This reward encourages control policies to efficiently imitate the character’s motion while preserving the spatial relationships of the interactions in the reference motion…... see section 4.1 “Experimental Setup” disclosing to speed up the learning for all of the experiments below, we pre-train an imitation policy of a single character on sequences that can be performed without a partner (e.g. high five, greetings, and push-ups). When training an interaction-graph based policy, we reuse the pre-trained decoder and allow its weights to be updated during the training. The decoder is reusable because the latent dimensions are unchanged. The encoder trained simultaneously with the pre-trained decoder is not reusable due to differences in input dimensions. This design makes it easier for the policy to maintain balance at the initial phase of learning, and therefore results in faster training. The training time of a policy varies based on the difficulty of the sequence. For easier sequences, it takes about 300 million to 500 million samples to train one policy. For harder sequences, it could take more than 2 billion samples to train a policy.); and 
implementing the policy to control an interaction between a first robot and a second robot (see figure 5 above showing the interaction between two Baxter robots…. See section 4.5 Non-Human Characters disclosing further details of robot’s interactions).
Zhang teaches substantially the claimed invention regarding the “interaction graph”, and merely recites the “sparsely interactions”, but it is silent to teach that the claimed “interaction graph” uses the “sparse embedded” technique. 
However, in the same field of endeavour or analogous art, Lin teaches the claimed features implemented in a Multi-Task Feature Interaction Learning. Lin goes on and further teaches the implementation of embedded interaction based on sparsity tensor for machine learning tasks (see figure 1 and see section 3, page 1738, right column Embedded Interaction Approach. When the response from one task is related to complicated feature interactions, the patterns of such interactions may be captured by a low-dimensional space, resulting in a low-rank interaction matrix. When there are multiple related tasks, they could have a shared low-dimensional space, i.e., different interaction matrices may share the same set of rank-1 basis matrices, but have different weights associated with these basis matrices. When collectively represented by a tensor, we end up with a low-rank tensor. During the learning process, each task contributes their subspace information to facilitate learning of the share low-dimensional subspace, which in turn, improves the feature space…… See section 4.3, page 1740, Embedded Interaction Approach disclosing further details.). 
                
    PNG
    media_image4.png
    714
    584
    media_image4.png
    Greyscale

Therefore, it is prima facie obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zhang to include the idea of sparse embedded interaction, as taught by Lin, for the benefit that during the learning process, each task contributes their subspace information to facilitate learning of the share low-dimensional subspace, which in turn, improves the feature space.
Regarding claims 2 and 12, Zhang in view of Lin discloses as discussed in claims 1 and 11. Zhang further discloses, wherein the processor implements the policy to control an interaction between a first robot and a second robot (see figure 5 above showing the interaction between two Baxter robots…. See section 4.5 Non-Human Characters disclosing further details of robot’s interactions).
Regarding claims 3, 13 and 17, Zhang in view of Lin discloses as discussed in claims 1, 11 and 16. Zhang further discloses, wherein the fully connected graph is indicative of a first pose associated with the first character and a second pose associated with the second character (see section 2.1 Multi-character Interactions for Kinematic Characters disclosing Our state representation and reward function for deep reinforcement learning are inspired by one of these descriptor-based approaches [Ho et al. 2010], where they construct an interaction graph by connecting edges among pre-specified markers on the body surface. By utilizing deep reinforcement learning and a novel formulation to measure interaction graph similarities, our method can be applied to dynamic characters having different body shapes instead of generating kinematic interaction motions… see section 3.3 Interaction Graph disclosing the example interaction graph in Figure 2 includes both edges connecting nodes on a single character and edges connecting nodes on different characters. The edges within the character help maintain the motion quality of an individual character, while the edges between the characters act as guides for maintaining the relative position of the body parts of the two characters…... see also section 3.4.3 Positional Graph Similarity for more connecting details).
Regarding claims 4, 14 and 18, Zhang in view of Lin discloses as discussed in claims 3, 13 and 17. Zhang further discloses, wherein the cross attention is between the first pose of the first character or the second pose of the second character and the current interaction graph (see rationales as set forth above in claim 16.
Regarding claims 5, 15 and 19, Zhang in view of Lin discloses as discussed in claims 1, 11 and 16. Zhang further discloses, wherein the current interaction graph is derived from the fully connected graph (see section 4.1 Experiment Setup disclosing the structure of our policy follows a encoder-decoder style as presented in [Won et al. 2021], where the encoder is a fully connected neural network with two hidden layers with 256 and 128 units respectively. The encoder takes the full observation and projects it onto a 32-dimensional latent vector 𝑧. The decoder is another fully connected network with two hidden layers with 256 units, and it takes as input the concatenated vector 𝑧𝑑𝑒𝑐𝑜𝑑𝑒𝑟 = (O𝑠𝑖𝑚,𝑠𝑒𝑙𝑓 , 𝑧) and outputs the action of the policy.).
Regarding claims 6 and 20, Zhang in view of Lin discloses as discussed in claims 1 and 16. Zhang further discloses, wherein the processor generates a pose latent vector based on passing the (see section 4.1 Experiment Setup disclosing the structure of our policy follows a encoder-decoder style as presented in [Won et al. 2021], where the encoder is a fully connected neural network with two hidden layers with 256 and 128 units respectively. The encoder takes the full observation and projects it onto a 32-dimensional latent vector 𝑧. The decoder is another fully connected network with two hidden layers with 256 units, and it takes as input the concatenated vector 𝑧𝑑𝑒𝑐𝑜𝑑𝑒𝑟 = (O𝑠𝑖𝑚,𝑠𝑒𝑙𝑓 , 𝑧) and outputs the action of the policy.).
In regards to the “sparse embedded” technique – please see the rationales asset forth in claims 1 and 16.
Regarding claim 7, Zhang in view of Lin discloses as discussed in claim 6. Zhang further discloses, wherein the processor generates a future interaction state for the first character and the second character based on passing the pose latent vector and a first pose associated with the first character through a pose decoder and passing the pose latent vector and a second pose associated with the second character through a second pose decoder (see section 4.1 Experiment Setup disclosing the structure of our policy follows a encoder-decoder style as presented in [Won et al. 2021], where the encoder is a fully connected neural network with two hidden layers with 256 and 128 units respectively. The encoder takes the full observation and projects it onto a 32-dimensional latent vector 𝑧. The decoder is another fully connected network with two hidden layers with 256 units, and it takes as input the concatenated vector 𝑧𝑑𝑒𝑐𝑜𝑑𝑒𝑟 = (O𝑠𝑖𝑚,𝑠𝑒𝑙𝑓 , 𝑧) and outputs the action of the policy.).
Regarding claim 8, Zhang in view of Lin discloses as discussed in claim 7. Zhang further discloses, wherein the pose decoder is trained based on a pre-trained motion variable autoencoder (VAE) (see section 4.1 Experimental setup disclosing the structure of our policy follows a encoder-decoder style as presented in [Won et al. 2021], where the encoder is a fully connected neural network with two hidden layers with 256 and 128 units respectively. The encoder takes the full observation and projects it onto a 32-dimensional latent vector 𝑧. The decoder is another fully connected network with two hidden layers with 256 units, and it takes as input the concatenated vector 𝑧𝑑𝑒𝑐𝑜𝑑𝑒𝑟 = (O𝑠𝑖𝑚,𝑠𝑒𝑙𝑓 , 𝑧) and outputs the action of the policy. To speed up the learning for all of the experiments below, we pre-train an imitation policy of a single character on sequences that can be performed without a partner (e.g. high five, greetings, and push ups). When training an interaction-graph based policy, we reuse the pre-trained decoder and allow its weights to be updated during the training. The decoder is reusable because the latent dimensions are unchanged. The encoder trained simultaneously with the pre-trained decoder is not reusable due to differences in input dimensions. ).
Regarding claim 9, Zhang in view of Lin discloses as discussed in claim 1. Zhang further discloses, wherein training the policy is based on a reinforcement learning approach (see section 5 “Discussion” disclosing We demonstrated a method of simulating and retargeting complex multi-Character interactions by using deep reinforcement learning where novel state and rewards that are character-agnostic are developed based on an Interaction Graph…... see Abstract disclosing a method for reproducing complex multi-character interactions for physically simulated humanoid characters using deep reinforcement learning. Our method learns control policies for characters that imitate not only individual motions, but also the interactions between characters, while maintaining balance and matching the complexity of reference data.).
Regarding claim 10, Zhang in view of Lin discloses as discussed in claim 1. Zhang further discloses, wherein training the policy is based on a physics-based simulation (see Abstract disclosing a method for reproducing complex multi-character interactions for physically simulated humanoid characters using deep reinforcement learning. Our method learns control policies for characters that imitate not only individual motions, but also the interactions between characters, while maintaining balance and matching the complexity of reference data…. see section 1 disclosing we are interested in transferring complex multicharacter interactions from reference motions to physically simulated characters.).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See attached form PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jaime Figueroa whose telephone number is (571)270-7620.  The examiner can normally be reached on Monday-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wade Miles can be reached on 571-270-7777.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JAIME FIGUEROA/Primary Patent Examiner, Art Unit 3656
Read full office action
Prosecution Timeline

Dec 17, 2024
Application Filed
Mar 21, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/443,209
Patent 12589494
TECHNIQUES FOR CONTROLLING ROBOTS USING DYNAMIC GAIN TUNING
2y 5m to grant Granted Mar 31, 2026
18/320,825
Patent 12575901
SYSTEMS AND METHODS FOR COLLISION DETECTION AND AVOIDANCE
2y 5m to grant Granted Mar 17, 2026
18/683,630
Patent 12576533
TECHNIQUES FOR FOLLOWING COMMANDS OF AN INPUT DEVICE USING A CONSTRAINED PROXY
2y 5m to grant Granted Mar 17, 2026
18/797,871
Patent 12576721
METHOD AND SYSTEM FOR VERIFYING BATTERY VOLTAGE DETECTION OF ECO-FRIENDLY VEHICLES
2y 5m to grant Granted Mar 17, 2026
18/013,280
Patent 12558182
METHODS AND APPLICATIONS FOR FLIPPING AN INSTRUMENT IN A TELEOPERATED SURGICAL ROBOTIC SYSTEM
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
98%
With Interview (+12.8%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 839 resolved cases by this examiner. Grant probability derived from career allow rate.
LEARNING PHYSICS-BASED INTERACTIONS FROM DEMONSTRATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email