Last updated: April 19, 2026

Application No. 18/774,064

METHOD, DEVICE AND MEDIUM FOR OPERATING ROBOT ARM

Final Rejection §103

Filed

Jul 16, 2024

Examiner

EL SAYAH, MOHAMAD O

Art Unit

3658

Tech Center

3600 — Transportation & Electronic Commerce

Assignee

BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.

OA Round

2 (Final)

Interview Optional

— +5.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 218 resolved cases, 2023–2026

Examiner Intelligence

EL SAYAH, MOHAMAD O View full profile →

Grants 76% — above average

Career Allow Rate

166 granted / 218 resolved

+24.1% vs TC avg

Moderate +5% lift

Without

With

+5.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

41 currently pending

Career history

259

Total Applications

across all art units

Statute-Specific Performance

§101

16.9%

-23.1% vs TC avg

§103

50.2%

+10.2% vs TC avg

§102

16.7%

-23.3% vs TC avg

§112

12.1%

-27.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 218 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The amendment filed on 01/14/2026 have been entered. Claims 1-20 remain pending in the application.
Priority
Acknowledgement is made of applicants claim for foreign priority under 35 U.S.C. 119(a)-(d) and (f). The certified copy has been filed in parent application CN202311286388.1 filed on 09/28/2023.
		
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6, 13, 14, 15, 18, 20 are rejected under 35 U.S.C. 103 as being unpatentable by ITO (US20240367314) in view of Hori (US20240288870).

Regarding claim 1, Ito teaches a method for operating a robotic arm, comprising:
receiving a description for specifying a target implemented by the robotic arm ([0025] disclosing receiving a natural language description specifying a robot hand to pick up an object);
obtaining a current state of the robotic arm ([0026] disclosing the sensor data includes a torque value, joint angle or captured image of a state of each drive unit of the robot. [0045]-[0046] disclosing bending the arms at predetermined joint angles which are sensor values recorded at the current time during the teaching); and
determining, according to an action model, an action to be performed by the robotic arm based on the description and the current state, wherein the model is pre-trained by reference data comprising related data of a character arm ([0059]-[0060] disclosing predicting using a model, based on newly measured sensor information which is interpreted to include at least a state of the arm and based on natural language instruction, an operation to be performed by the robot. see [0044]-[0049] disclosing the training model that includes the interference unit is trained based on time series of data including robot sensors measurements and the instructions. Herein the training is accomplished based on the character arm being the arms of the robot for teaching purposes, the claim does not require the character arm to be different than a robot arm, a character arm is interpreted as being the demonstrating arm and in this case it is the same as the robot arm demonstrating the actions time series.).
Ito does not teach wherein the reference data comprises a reference character video, the reference character video comprising a reference character action and a reference description, the reference description, the reference description describing the reference character video and the model is obtained based on: pretraining the model with the reference character video and the reference description.
Hori teaches wherein the reference data comprises a reference character video, the reference character video comprising a reference character action and a reference description, the reference description, the reference description describing the reference character video and the model is obtained based on: pretraining the model with the reference character video and the reference description ([0076]-[0085] disclosing the prediction of images based on decoding the language and images of a video demonstration of a character actions. [0168] disclosing the actions are based on the current state of the robot and on the video demonstration, i.e., images.).
The combination and or substitution of Hori’s character video and descriptions with Ito’s model training is obvious yields predictable result in order to align a textual input along with visual input thus allowing the robot to accurately imitate a character demonstration video as taught by Hori [0077]-[0085], thus improving accuracy by imitating a sequence of movements in the demonstration.


Regarding claim 4, Ito as modified by Hori teaches the method of claim 1, wherein the current state of the robotic arm comprises at least one of: an image of the robotic arm, a pose of the robotic arm, and a state of a tool of the robot arm, the action relating to a change in the pose and the state of the tool (Ito [0026] disclosing the sensor data includes a torque value, joint angle or captured image of a state of each drive unit of the robot. [0045]-[0046] disclosing bending the arms at predetermined joint angles which are sensor values recorded at the current time during the teaching, i.e., indicative of a pose);

Regarding claim 6, Ito as modified by Hori further teaches the method of claim 5, wherein the action model further comprises an image decoder, and determining the image prediction comprises: determining, based on the representation and the state representation, the image prediction with the image decoder.
Hori teaches wherein the action model further comprises an image decoder, and determining the image prediction comprises: determining, based on the representation and the state representation, the image prediction with the image decoder ([0076]-[0085] disclosing the prediction of images based on decoding the language and images. [0168] disclosing the actions are based on the current state of the robot and on the video demonstration, i.e., images).
The combination of Hori with Ito yields predictable result in order to align a textual input along with visual input thus allowing the robot to accurately imitate a human as taught by Hori [0077]-[0085].

Regarding claim 13, Ito as modified by Hori teaches the method according to claim 1, wherein the action model matches an application environment of the robotic arm, and the application environment comprises at least one of the following: a virtual application environment and a reality application environment (Ito [0049]-[0070] disclosing the model checks for the object to be picked from the real environment, i.e., this is interpreted as matching the environment of the robot)

Regarding claim 14, Ito as modified by Hori teaches the method of claim 1, further comprising: adjusting the action to determine an action instruction for driving the robotic arm (Ito [0053]-[0070] disclosing adjusting actions based on the predicted object to be picked thus to determine the action for driving the robotic arm for grasping).

Claims 15, 20 are rejected for similar reasons as claim 1.

Claim 18 is rejected for similar reasons as claim 4, see above rejection.


Claims 2, 3, 5, 16, 17, 19 are rejected under 35 U.S.C. 103 as being unpatentable by ITO (US20240367314) in view of Hori (US20240288870) and Paxton (US20230297074).	
Regarding claim 2, Ito as modified by Hori teaches the method of claim 1, further comprising: determining, according to the model, prediction of a scenario in which the robotic arm performs the action based on the description and the current state (Ito [0068]-[0080] disclosing predicting time series generation that predicts the scenario of the object position in time series based on the position of the object at time t and the sensor information of the robot are measured at time t and predicts the operation of the commands for the robot to accomplish these predictions).
While Ito as modified by Hori does not teach image prediction.
Paxton teaches image prediction ([0048]-[0049] disclosing a prediction of discrete task “indicative of a specific scenario such as moving an object” from an image, i.e., image prediction of a scenario).
Both Paxton and Ito as modified by Hori teach predicting next actions based on neural network, it would have been obvious to substitute the image prediction for the prediction of Ito yielding predictable results, A system in accordance with at least one embodiment can move from language and vision inputs to a set of actions, in order to perform a set or sequence of tasks, partially or wholly in simulation as well as in a real environment. Intermediate computing blocks can perform respective high-level actions, and a structured input can be generated and provided to a task planner, which can execute the tasks, or cause the tasks to be performed. Such a system can combine symbolic planning, for example, with natural language understanding in order to accomplish long-horizon tasks Paxton [0030].

The method of claim 3, Ito as modified by Hori and Paxton further teaches further comprising:
Paxton further teaches receiving positions and a number of steps, the positions and the number of steps specifying an action performed by the robot arm ([0030],[0049] disclosing a series of discrete tasks “number of steps” and [0033], [0040] disclosing sequence of steps that moves the effector to grasp and move objects); and
determining, according to the action model, an action matching the positions and the number of the steps and the image prediction ([0040], [0049]-[0060] disclosing series of steps to match the image prediction of sequence and positions).
Both Paxton and Ito as modified by Paxton teach predicting next actions based on neural network, It would have been obvious to one of ordinary skill in the art to have modified the teaching of Ito to incorporate/combine the teaching of Paxton yielding predictable results and solving the control of robots via visual clues, A system in accordance with at least one embodiment can move from language and vision inputs to a set of actions, in order to perform a set or sequence of tasks, partially or wholly in simulation as well as in a real environment. Intermediate computing blocks can perform respective high-level actions, and a structured input can be generated and provided to a task planner, which can execute the tasks, or cause the tasks to be performed. Such a system can combine symbolic planning, for example, with natural language understanding in order to accomplish long-horizon tasks Paxton [0030].

Regarding claim 5, Ito as modified by Hori and Paxton teaches the method of claim 2, wherein the action model comprises a language encoder, a state encoder, and an action decoder, wherein determining the action comprises:
determining a representation of the language description with the language encoder (Ito [0100] discloses the instruction learning unit as the language encoder);
determining a state representation of the current state with the state encoder (Ito [0048] disclosing sensor data accumulation unit that determines current states and accumulates it); and
determining, based on the representation and the state representation, the action with the action decoder (Ito ([0059]-[0060] disclosing predicting using an interference unit “action model”, based on newly measured sensor information which is interpreted to include at least a state of the arm and based on natural language instruction, an operation to be performed by the robot. see [0044]-[0049] disclosing the training model that includes the interference unit is trained based on time series of data including robot sensors measurements and the instructions. Herein the training is accomplished based on the character arm being the arms of the robot for teaching purposes, the claim does not require the character arm to be different than a robot arm, a character arm is interpreted as being the demonstrating arm and in this case it is the same as the robot arm demonstrating the actions time series.).

Claims 16, 17, 19 are rejected for similar reasons as claims 2, 3, 5, respectively. 
				Response to Arguments
Applicant’s arguments filed on 01/14/2026 have been fully considered but they are not persuasive. 
With respect to applicant’s arguments regarding the 112b rejection, the rejection is withdrawn based on the amendment to claim 3.
With respect to applicant’s arguments regarding amended claim 1, that “neither Ito or Paxton or Hori teach “wherein the reference data comprises a reference character video……pre-training the model with the reference character video and the reference description”, examiner respectfully disagrees, Hori teaches a model trained based on character video demonstration and description of the video, the combination of Hori and Ito is obvious in order to allow the robot to accurately imitate a character thus improving motion control of robots similar to desired behavior and instructions. 

			Allowable Subject Matter 
Claims 7-12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if rewritten to overcome any 35 U.S.C. 101 and 112b rejections on record.
Claims 8-12 would be allowable for depending on claim 7. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to
applicant's disclosure. The prior art cited in PTO-892 and not mentioned above disclose related devices and methods.
US20230311335 disclosing natural language for robot control.
US20230398696 disclosing correlating a command with a video of action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMAD O EL SAYAH whose telephone number is (571)270-7734.  The examiner can normally be reached on M-Th 6:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ramon Mercado can be reached on (571) 270-5744.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MOHAMAD O EL SAYAH/Primary Examiner, Art Unit 3658B

Read full office action

Prosecution Timeline

Jul 16, 2024

Application Filed

Oct 09, 2025

Non-Final Rejection — §103

Jan 14, 2026

Response Filed

Mar 11, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/645,641

Patent 12600372

OPTIMIZATION OF VEHICLE PERFORMANCE TO SUPPORT VEHICLE CONTROL

2y 5m to grant Granted Apr 14, 2026

18/435,121

Patent 12576838

PROCESS AND APPARATUS FOR CONTROLLING THE FORWARD MOVEMENT OF A MOTOR VEHICLE AS A FUNCTION OF ROUTE PARAMETERS IN A DRIVING MODE WITH A SINGLE PEDAL

2y 5m to grant Granted Mar 17, 2026

18/485,607

Patent 12565239

AUTONOMOUS DRIVING PREDICTIVE DEFENSIVE DRIVING SYSTEM THROUGH INTERACTION BASED ON FORWARD VEHICLE DRIVING AND SITUATION JUDGEMENT INFORMATION

2y 5m to grant Granted Mar 03, 2026

16/804,094

Patent 12554260

Iterative Feedback Motion Planning

2y 5m to grant Granted Feb 17, 2026

18/299,073

Patent 12552364

VEHICLE TURNING CONTROL DEVICE

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

76%

Grant Probability

82%

With Interview (+5.4%)

2y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 218 resolved cases by this examiner. Grant probability derived from career allow rate.