Office Action Analysis: 18948157 — LEARNING PERCEIVED PREFERENCES IN HUMAN-ROBOT INTERACTIONS (HRI)

Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of claims
The following claims have been rejected or allowed for the following reasons:
Claim(s) 1-20 is rejected under 35 USC § 103
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. 63/646,930, filed on 5/13/24.
Information Disclosure Statement
The information disclosure statement/statements (IDS) were filed on 1/14/25 and 11/14/24. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-4, 6-7, 10-14, and 16-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over as applied to Yang (US 20220032454 A1), in further view of Chao (US 20230294276 A1), in further view of Losey (NPL | Physical interaction as communication: Learning robot objectives online from human corrections | 2022). 
Regarding claim 1 Yang teaches A system for learning perceived preferences in human-robot interactions (HRI), comprising: a sensor sensing a noisy action from a human associated with a human-robot interaction (HRI) with a robot; a memory storing one or more instructions; (Yang abstract reads “A robotic control system directs a robot to take an object from a human grasp by obtaining an image of a human hand holding an object, estimating the pose of the human hand and the object, and determining a grasp pose for the robot that will not interfere with the human hand.” And [0213] reads “In at least one embodiment, video image compositor may include enhanced temporal noise reduction for both spatial and temporal noise reduction. For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weight of information provided by adjacent frames.”); 
a processor executing one or more of the instructions stored on the memory to perform: generating a feature associated with the human based on the noisy action (Yang [0082] reads “Instead of learning deep features with ConvNets on depth images, at least one embodiment adopts the PointNet++ on point clouds for human grasp classification. In at least one embodiment, the backbone network consists of four set-abstraction layers to learn point features and a three-layer perceptron with batch normalization, ReLu and Dropout for global feature learning and human grasp classification. Given a point cloud cropped around the hand, the network classifies it into one of the defined grasp categories, which may be used for further robot grasp planning.”);
Yang does not teach and an observation model; generating a belief based on the feature and a belief model; and generating a robot action based on a reference trajectory, the belief, and one or more constraints; and a controller implementing the robot action for the HRI via a robot appendage of the robot and an actuator. 
generating a belief based on the feature and a belief model; (Chao [0043] reads “Motion models for the hand and the object can be updated for individual motions determined for individual sequences to generate at least one motion model for a human hand, and may also generate at least one model for an object held in a human hand, to provide for predictions of realistic motion and behavior of a hand and object during an interaction, such as an object handover.”); 
and generating a robot action based on a reference trajectory, the belief, and one or more constraints; and a controller implementing the robot action for the HRI via a robot appendage of the robot and an actuator. (Chao [0047] reads “The motion predictor 378 can consider any constraints on the motion, and can use a motion planner or optimizer application, algorithm, or neural network, for example, to predict an optimal path between a current position and orientation of the robot and a current position and orientation of the object, as well as the hand holding the object. … As discussed herein the optimization can take other factors or constraints into account as well, such as to prefer smooth or straight-line motion, or to avoid camera obstructions. … The motion predictor 378 can then determine a sequence of motions, or a path of motion to be performed over a number of future time steps. The motion predictor 378 can then provide information for this sequence or future motions, a first subset of these future motions, or only a first predicted motion to a robot controller 372 of the control system 370. … The robot controller 372 can then determine instructions to cause the robot to perform an action corresponding to that motion, if not already provided by the motion predictor 378, and can send those instructions to the robot 354, where a drive system 360 or other mechanical control of the robot can cause the robot 354 to perform that motion, such as to perform a determined motion for a current time step to bring an end effector 356 of the robot to a target position relative to the object.” And [0044] reads “Once these motion models have been generated, they can be provided to a simulator 330 or stored to a motion model repository 340 accessible to a simulator. … In at least some embodiments, a simulator 334 can select appropriate models (e.g., deformable meshes) for a hand, object, and robot to be simulated in a virtual environment. … The simulator can use the motion models for the hand and object, generated by the motion modeling system 320, to simulate motion or behavior of the hand and the object during a handover operation.”); 
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang with that of Chao to include a method for actuating and controlling a real world robot. This would allow the system to better interact with real world people who would be working alongside them. (Chao [0002] reads “Robots and other automated devices are increasingly being used to assist in the performance of various tasks. At least some of these tasks involve interacting with a human or other entity that requires a substantial degree of dexterity, such as to perform a handover action where the robot is to grasp and take an object from the hand of a person. The ability to exchange objects with humans seamlessly and safely is crucial for human-robot interactions (HRI). Despite increasing efforts in recent years, current development for on human-robot object handovers still faces various challenges. For example, development and testing often requires a real human in the loop, which makes the process expensive, potentially hazardous, and harder to reproduce. Further, different approaches adopt different experimental settings with different types of objects used for handovers, and evaluate with different metrics, which makes cross-study comparison and resulting optimization difficult.”);
Yang/Chao does not teach and an observation model; 
Losey in analogous art, teaches and an observation model; (Losey page 6 spanning paragraph reads “Assuming that human interactions are meaningful, the robot should leverage the human’s actions uh to update its belief over u. In order to associate the human interactions uh with the objective parameter u, the robot uses an observation model: P(uhjx, ur; u). If we were to treat the human’s actions as random disturbances, then we would select a uniform probability distribution for P(uhjx, ur; u). By contrast, here we model the human as intentionally interacting to correct the robot’s behavior; more specifically, let us model the human as correcting the robot to approximately maximize their reward”);
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao with that of Losey to include a method for the system to better understand the users intentions with a given task. This would better allow the robotic system to work with humans. (Losey Introduction reads “Physical interaction is a natural means for collaboration and communication between humans and robots. From compliant designs to reliable prediction algorithms, recent advances in robotics have enabled humans and robots to work in close physical proximity. Despite this progress, seamless physical interaction, where robots are as responsive, intelligent, and fluid as their human counterparts, remains an open problem”); 
Regarding claim 2 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 1, wherein the observation model is based on Boltzmann rationality and maximum entropy. (Losey page 6 spanning paragraph reads “If we were to treat the human’s actions as random disturbances, then we would select a uniform probability distribution for P(uhjx, ur; u). By contrast, here we model the human as intentionally interacting to correct the robot’s behavior; more specifically, let us model the human as correcting the robot to approximately maximize their reward. We assume the human selects an action uh that, when combined with the robot’s action ur, leads to a high Q value (state–action value) assuming the robot will behave optimally after the current timestep, i.e., assuming that the robot learns the true u: P(ut hjxt , ut r; u) = eQ(xt , ut r + ut h;u) R eQ(xt , ut r + ~uh;u) d~uh ð2Þ Our choice of Equation (2) stems from maximum entropy assumptions (Ziebart et al., 2008), as well as the Boltzmann distributions used in cognitive science models of human behavior (Baker et al., 2007).”); 
Regarding claim 3 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 1, wherein the HRI is modeled as a Constrained Partially Observable Markov Decision Process (CPOMDP). (Losey page 2 spanning columns reads “We formalize reacting to pHRI as a dynamical system, where the robot optimizes an objective function with an unknown parameter u, and human interventions serve as observations about the true value of u. As posed, this problem is an instance of a partially observable Markov decision process (POMDP).”); 
Regarding claim 4 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 3, wherein the HRI is a bi-lateral interaction including a second human associated with a second HRI with a second robot. (Yang abstract reads “A robotic control system directs a robot to take an object from a human grasp by obtaining an image of a human hand holding an object, estimating the pose of the human hand and the object, and determining a grasp pose for the robot that will not interfere with the human hand.” And [0213] reads “In at least one embodiment, video image compositor may include enhanced temporal noise reduction for both spatial and temporal noise reduction. For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weight of information provided by adjacent frames.” As shown in MPEP Section 2144.04 section V B discusses that the duplication of parts is not grounds for novelty.);
Regarding claim 6 Yang/Chao/Losey teaches Yang/Chao/Losey The system for learning perceived preferences in HRIs of claim 1, wherein the generating the belief is based on trajectory deformation of a current trajectory of the robot by replacing a waypoint of the current trajectory with a waypoint associated with the feature associated with the human based on the noisy action. (Losey figure 1 depicts a robotic system changing its trajectory based on interaction from a human operator); 

    PNG
    media_image1.png
    750
    386
    media_image1.png
    Greyscale

Losey Figure 1
Regarding claim 7 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 1, wherein the generating the belief is based on a maximum a posteriori (MAP) estimation of the belief. (Losey page 6 paragraph 3 reads “First, we separate finding the optimal robot policy from estimating the human’s objective. Next, we simplify the observation model and use a maximum a posteriori (MAP) estimate of u as opposed to the full belief over u. Finally, when finding the optimal robot policy and estimating u, we move from policies to trajectories. These approximations show how our solution is derived from the complete POMDP formalism outlined in the last section, but now enable the robot to learn and react in real-time with continuous state, action, and belief spaces.”); 
Regarding claim 10 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 1, wherein the generating the robot action is based on a hierarchical optimization of a first constraint of the one or more constraints and a second constraint of the one or more constraints. (Chao [0047] reads “The motion predictor 378 can consider any constraints on the motion, and can use a motion planner or optimizer application, algorithm, or neural network, for example, to predict an optimal path between a current position and orientation of the robot and a current position and orientation of the object, as well as the hand holding the object. This prediction can also take into account predicted behavior of the human hand, as well as the impact of that behavior on the object, such as by using at least a hand motion or behavior model generated using a process such as that described with respect to FIG. 3A. As discussed herein the optimization can take other factors or constraints into account as well, such as to prefer smooth or straight-line motion, or to avoid camera obstructions.”);
Regarding claim 11 Yang teaches A computer-implemented method for learning perceived preferences in human-robot interactions (HRI), comprising: sensing a noisy action from a human associated with a human-robot interaction (HRI) with a robot; (Yang abstract reads “A robotic control system directs a robot to take an object from a human grasp by obtaining an image of a human hand holding an object, estimating the pose of the human hand and the object, and determining a grasp pose for the robot that will not interfere with the human hand.” And [0213] reads “In at least one embodiment, video image compositor may include enhanced temporal noise reduction for both spatial and temporal noise reduction. For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weight of information provided by adjacent frames.”);
generating a feature associated with the human based on the noisy action (Yang [0082] reads “Instead of learning deep features with ConvNets on depth images, at least one embodiment adopts the PointNet++ on point clouds for human grasp classification. In at least one embodiment, the backbone network consists of four set-abstraction layers to learn point features and a three-layer perceptron with batch normalization, ReLu and Dropout for global feature learning and human grasp classification. Given a point cloud cropped around the hand, the network classifies it into one of the defined grasp categories, which may be used for further robot grasp planning.”);
Yang does not teach and an observation model; generating a belief based on the feature and a belief model; generating a robot action based on a reference trajectory, the belief, and one or more constraints; and implementing the robot action for the HRI via a robot appendage of the robot and an actuator. 
Yang in analogous art, teaches generating a belief based on the feature and a belief model; (Chao [0043] reads “Motion models for the hand and the object can be updated for individual motions determined for individual sequences to generate at least one motion model for a human hand, and may also generate at least one model for an object held in a human hand, to provide for predictions of realistic motion and behavior of a hand and object during an interaction, such as an object handover.”);
generating a robot action based on a reference trajectory, the belief, and one or more constraints; and implementing the robot action for the HRI via a robot appendage of the robot and an actuator. (Chao [0047] reads “The motion predictor 378 can consider any constraints on the motion, and can use a motion planner or optimizer application, algorithm, or neural network, for example, to predict an optimal path between a current position and orientation of the robot and a current position and orientation of the object, as well as the hand holding the object. … As discussed herein the optimization can take other factors or constraints into account as well, such as to prefer smooth or straight-line motion, or to avoid camera obstructions. … The motion predictor 378 can then determine a sequence of motions, or a path of motion to be performed over a number of future time steps. The motion predictor 378 can then provide information for this sequence or future motions, a first subset of these future motions, or only a first predicted motion to a robot controller 372 of the control system 370. … The robot controller 372 can then determine instructions to cause the robot to perform an action corresponding to that motion, if not already provided by the motion predictor 378, and can send those instructions to the robot 354, where a drive system 360 or other mechanical control of the robot can cause the robot 354 to perform that motion, such as to perform a determined motion for a current time step to bring an end effector 356 of the robot to a target position relative to the object.” And [0044] reads “Once these motion models have been generated, they can be provided to a simulator 330 or stored to a motion model repository 340 accessible to a simulator. … In at least some embodiments, a simulator 334 can select appropriate models (e.g., deformable meshes) for a hand, object, and robot to be simulated in a virtual environment. … The simulator can use the motion models for the hand and object, generated by the motion modeling system 320, to simulate motion or behavior of the hand and the object during a handover operation.”); 
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang with that of Chao to include a method for actuating and controlling a real world robot. This would allow the system to better interact with real world people who would be working alongside them. (Chao [0002] reads “Robots and other automated devices are increasingly being used to assist in the performance of various tasks. At least some of these tasks involve interacting with a human or other entity that requires a substantial degree of dexterity, such as to perform a handover action where the robot is to grasp and take an object from the hand of a person. The ability to exchange objects with humans seamlessly and safely is crucial for human-robot interactions (HRI). Despite increasing efforts in recent years, current development for on human-robot object handovers still faces various challenges. For example, development and testing often requires a real human in the loop, which makes the process expensive, potentially hazardous, and harder to reproduce. Further, different approaches adopt different experimental settings with different types of objects used for handovers, and evaluate with different metrics, which makes cross-study comparison and resulting optimization difficult.”);
Yang/Chao does not teach and an observation model; 
Losey in analogous art, teaches and an observation model; (Losey page 6 spanning paragraph reads “Assuming that human interactions are meaningful, the robot should leverage the human’s actions uh to update its belief over u. In order to associate the human interactions uh with the objective parameter u, the robot uses an observation model: P(uhjx, ur; u). If we were to treat the human’s actions as random disturbances, then we would select a uniform probability distribution for P(uhjx, ur; u). By contrast, here we model the human as intentionally interacting to correct the robot’s behavior; more specifically, let us model the human as correcting the robot to approximately maximize their reward”);
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao with that of Losey to include a method for the system to better understand the users intentions with a given task. This would better allow the robotic system to work with humans. (Losey Introduction reads “Physical interaction is a natural means for collaboration and communication between humans and robots. From compliant designs to reliable prediction algorithms, recent advances in robotics have enabled humans and robots to work in close physical proximity. Despite this progress, seamless physical interaction, where robots are as responsive, intelligent, and fluid as their human counterparts, remains an open problem”); 
Regarding claim 12 Yang/Chao/Losey teaches The computer-implemented method for learning perceived preferences in HRIs of claim 11, wherein the observation model is based on Boltzmann rationality and maximum entropy. (Losey page 6 spanning paragraph reads “If we were to treat the human’s actions as random disturbances, then we would select a uniform probability distribution for P(uhjx, ur; u). By contrast, here we model the human as intentionally interacting to correct the robot’s behavior; more specifically, let us model the human as correcting the robot to approximately maximize their reward. We assume the human selects an action uh that, when combined with the robot’s action ur, leads to a high Q value (state–action value) assuming the robot will behave optimally after the current timestep, i.e., assuming that the robot learns the true u: P(ut hjxt , ut r; u) = eQ(xt , ut r + ut h;u) R eQ(xt , ut r + ~uh;u) d~uh ð2Þ Our choice of Equation (2) stems from maximum entropy assumptions (Ziebart et al., 2008), as well as the Boltzmann distributions used in cognitive science models of human behavior (Baker et al., 2007).”);
Regarding claim 13 Yang/Chao/Losey teaches The computer-implemented method for learning perceived preferences in HRIs of claim 11, wherein the HRI is modeled as a Constrained Partially Observable Markov Decision Process (CPOMDP). (Losey page 2 spanning columns reads “We formalize reacting to pHRI as a dynamical system, where the robot optimizes an objective function with an unknown parameter u, and human interventions serve as observations about the true value of u. As posed, this problem is an instance of a partially observable Markov decision process (POMDP).”); 
Regarding claim 14 Yang/Chao/Losey teaches The computer-implemented method for learning perceived preferences in HRIs of claim 11, wherein the HRI is a bi-lateral interaction including a second human associated with a second HRI with a second robot. (Yang abstract reads “A robotic control system directs a robot to take an object from a human grasp by obtaining an image of a human hand holding an object, estimating the pose of the human hand and the object, and determining a grasp pose for the robot that will not interfere with the human hand.” And [0213] reads “In at least one embodiment, video image compositor may include enhanced temporal noise reduction for both spatial and temporal noise reduction. For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weight of information provided by adjacent frames.” As shown in MPEP Section 2144.04 section V B discusses that the duplication of parts is not grounds for novelty.);
Regarding claim 16 Yang teaches A robot for learning perceived preferences in human-robot interactions (HRI), comprising: a sensor sensing a noisy action from a human associated with a human-robot interaction (HRI) with the robot; a memory storing one or more instructions; (Yang abstract reads “A robotic control system directs a robot to take an object from a human grasp by obtaining an image of a human hand holding an object, estimating the pose of the human hand and the object, and determining a grasp pose for the robot that will not interfere with the human hand.” And [0213] reads “In at least one embodiment, video image compositor may include enhanced temporal noise reduction for both spatial and temporal noise reduction. For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weight of information provided by adjacent frames.”);
a processor executing one or more of the instructions stored on the memory to perform: generating a feature associated with the human based on the noisy action (Yang [0082] reads “Instead of learning deep features with ConvNets on depth images, at least one embodiment adopts the PointNet++ on point clouds for human grasp classification. In at least one embodiment, the backbone network consists of four set-abstraction layers to learn point features and a three-layer perceptron with batch normalization, ReLu and Dropout for global feature learning and human grasp classification. Given a point cloud cropped around the hand, the network classifies it into one of the defined grasp categories, which may be used for further robot grasp planning.”);
Yang does not teach and an observation model; generating a belief based on the feature and a belief model; and generating a robot action based on a reference trajectory, the belief, and one or more constraints; and a controller implementing the robot action for the HRI via a robot appendage of the robot and an actuator. 
Chao in analogous art, teaches generating a belief based on the feature and a belief model; (Chao [0043] reads “Motion models for the hand and the object can be updated for individual motions determined for individual sequences to generate at least one motion model for a human hand, and may also generate at least one model for an object held in a human hand, to provide for predictions of realistic motion and behavior of a hand and object during an interaction, such as an object handover.”);
and generating a robot action based on a reference trajectory, the belief, and one or more constraints; and a controller implementing the robot action for the HRI via a robot appendage of the robot and an actuator. (Chao [0047] reads “The motion predictor 378 can consider any constraints on the motion, and can use a motion planner or optimizer application, algorithm, or neural network, for example, to predict an optimal path between a current position and orientation of the robot and a current position and orientation of the object, as well as the hand holding the object. … As discussed herein the optimization can take other factors or constraints into account as well, such as to prefer smooth or straight-line motion, or to avoid camera obstructions. … The motion predictor 378 can then determine a sequence of motions, or a path of motion to be performed over a number of future time steps. The motion predictor 378 can then provide information for this sequence or future motions, a first subset of these future motions, or only a first predicted motion to a robot controller 372 of the control system 370. … The robot controller 372 can then determine instructions to cause the robot to perform an action corresponding to that motion, if not already provided by the motion predictor 378, and can send those instructions to the robot 354, where a drive system 360 or other mechanical control of the robot can cause the robot 354 to perform that motion, such as to perform a determined motion for a current time step to bring an end effector 356 of the robot to a target position relative to the object.” And [0044] reads “Once these motion models have been generated, they can be provided to a simulator 330 or stored to a motion model repository 340 accessible to a simulator. … In at least some embodiments, a simulator 334 can select appropriate models (e.g., deformable meshes) for a hand, object, and robot to be simulated in a virtual environment. … The simulator can use the motion models for the hand and object, generated by the motion modeling system 320, to simulate motion or behavior of the hand and the object during a handover operation.”);
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang with that of Chao to include a method for actuating and controlling a real world robot. This would allow the system to better interact with real world people who would be working alongside them. (Chao [0002] reads “Robots and other automated devices are increasingly being used to assist in the performance of various tasks. At least some of these tasks involve interacting with a human or other entity that requires a substantial degree of dexterity, such as to perform a handover action where the robot is to grasp and take an object from the hand of a person. The ability to exchange objects with humans seamlessly and safely is crucial for human-robot interactions (HRI). Despite increasing efforts in recent years, current development for on human-robot object handovers still faces various challenges. For example, development and testing often requires a real human in the loop, which makes the process expensive, potentially hazardous, and harder to reproduce. Further, different approaches adopt different experimental settings with different types of objects used for handovers, and evaluate with different metrics, which makes cross-study comparison and resulting optimization difficult.”);
Yang/Chao does not teach and an observation model;
Losey in analogous art, teaches and an observation model; (Losey page 6 spanning paragraph reads “Assuming that human interactions are meaningful, the robot should leverage the human’s actions uh to update its belief over u. In order to associate the human interactions uh with the objective parameter u, the robot uses an observation model: P(uhjx, ur; u). If we were to treat the human’s actions as random disturbances, then we would select a uniform probability distribution for P(uhjx, ur; u). By contrast, here we model the human as intentionally interacting to correct the robot’s behavior; more specifically, let us model the human as correcting the robot to approximately maximize their reward”);
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao with that of Losey to include a method for the system to better understand the users intentions with a given task. This would better allow the robotic system to work with humans. (Losey Introduction reads “Physical interaction is a natural means for collaboration and communication between humans and robots. From compliant designs to reliable prediction algorithms, recent advances in robotics have enabled humans and robots to work in close physical proximity. Despite this progress, seamless physical interaction, where robots are as responsive, intelligent, and fluid as their human counterparts, remains an open problem”); 
Regarding claim 17 Yang/Chao/Losey teaches The robot for learning perceived preferences in HRIs of claim 16, wherein the observation model is based on Boltzmann rationality and maximum entropy. (Losey page 6 spanning paragraph reads “If we were to treat the human’s actions as random disturbances, then we would select a uniform probability distribution for P(uhjx, ur; u). By contrast, here we model the human as intentionally interacting to correct the robot’s behavior; more specifically, let us model the human as correcting the robot to approximately maximize their reward. We assume the human selects an action uh that, when combined with the robot’s action ur, leads to a high Q value (state–action value) assuming the robot will behave optimally after the current timestep, i.e., assuming that the robot learns the true u: P(ut hjxt , ut r; u) = eQ(xt , ut r + ut h;u) R eQ(xt , ut r + ~uh;u) d~uh ð2Þ Our choice of Equation (2) stems from maximum entropy assumptions (Ziebart et al., 2008), as well as the Boltzmann distributions used in cognitive science models of human behavior (Baker et al., 2007).”);
Regarding claim 18 Yang/Chao/Losey teaches The robot for learning perceived preferences in HRIs of claim 16, wherein the HRI is modeled as a Constrained Partially Observable Markov Decision Process (CPOMDP). (Losey page 2 spanning columns reads “We formalize reacting to pHRI as a dynamical system, where the robot optimizes an objective function with an unknown parameter u, and human interventions serve as observations about the true value of u. As posed, this problem is an instance of a partially observable Markov decision process (POMDP).”); 
Regarding claim 19 Yang/Chao/Losey teaches The robot for learning perceived preferences in HRIs of claim 18, wherein the HRI is a bi-lateral interaction including a second human associated with a second HRI with a second robot. (Yang abstract reads “A robotic control system directs a robot to take an object from a human grasp by obtaining an image of a human hand holding an object, estimating the pose of the human hand and the object, and determining a grasp pose for the robot that will not interfere with the human hand.” And [0213] reads “In at least one embodiment, video image compositor may include enhanced temporal noise reduction for both spatial and temporal noise reduction. For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weight of information provided by adjacent frames.” As shown in MPEP Section 2144.04 section V B discusses that the duplication of parts is not grounds for novelty.);
Claim(s) 5, 9, 15, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over as applied to Yang/Chao/Losey in further view of Linder (US 20130345863 A1).
Regarding claim 5 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 4.
Yang/Chao/Losey does not teach wherein the second robot provides haptic feedback to the second human based on a human response to the robot action. 
Linder in analogous art, teaches wherein the second robot provides haptic feedback to the second human based on a human response to the robot action. (Linder [0011] reads “The robot may be programmed and configured such that, once the user has brought the arm into a desired position, she can direct the robot to perform a particular action (e.g., close the gripper around an object) with the simple push of a button. In addition to serving as an input device during training, the robot arm may, when guided by the user, also provide haptic feedback to the user. For example, to avoid self-collision, the robot may exert increasing resistive forces as the user pushes the arm in a direction that would result in potentially harmful contact with another robot part. The arm, when held by the user, may also exert time-variable force patterns (or “haptic signatures”) in response to certain conditions (such as, e.g., the proximity to a particular type of object) to thereby provide intuitive information to the user.”); 
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao/Losey with that of Linder to include a method that would allow the user to have haptic feedback about the teachings that they are trying to impart onto the robotic system. This would allow for the user to give the robot a higher quality training. (Linder [0005] reads “Programming conventional industrial robots generally demands a high level of technical expertise, and requires the user to think in terms understandable by the robot. For example, the programmer may use a six-dimensional vector to specify a reference point in three-dimensional space along with the orientation of the most distal link of the robot's arm. For a robot arm that has six or fewer degrees of freedom, that vector uniquely determines the settings for all the joints of the robot. If the robot arm has more than six degrees of freedom, further specification of the desired pose of the arm is required to remove any ambiguity.”);
Regarding claim 9 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 1.
Yang/Chao/Losey does not teach wherein one or more of the constraints includes a joint limit constraint, a force constraint, a velocity constraint, an acceleration constraint, a task space constraint, or a deviation constraint. 
Linder in analogous art, teaches wherein one or more of the constraints includes a joint limit constraint, a force constraint, a velocity constraint, an acceleration constraint, a task space constraint, or a deviation constraint. (Linder [0020 – 0021] reads “Another aspect relates to a robot with one or more user-guidable robot appendages for manipulating objects (each of the appendages including one or more movable joints) and a haptics module for generating forces at the joint(s). The haptics module, which may be implemented in hardware (including the hardware ordinarily used to apply forces to the joints) and/or software, is configured to at least partially resist user-guiding of the at least one appendage within a specified spatial zone around other parts of the robot so as to prevent collisions between the appendage and the other parts of the robot. The forces generated by the haptics module may depend (linearly or non-linearly) on the distance between the appendage and the other parts of the robot, and/or on the direction or speed of motion of the appendage. In certain embodiments, the forces increase as the appendage moves closer to the other parts of the robot. … The method involves, upon entry of the appendage into a specified spatial zone around other parts of the robot, at least partially resisting user-guiding of the appendage by generating a resistive force thereat so as to prevent collisions between the appendage and the other parts of the robot. The magnitude of the resistive force may depend on the distance of the appendage from the other parts of the robot and/or the direction or speed of motion of the appendage.”); 
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao/Losey with that of Linder to include a method that would allow the user to have haptic feedback about the teachings that they are trying to impart onto the robotic system. This would allow for the user to give the robot a higher quality training. (Linder [0005] reads “Programming conventional industrial robots generally demands a high level of technical expertise, and requires the user to think in terms understandable by the robot. For example, the programmer may use a six-dimensional vector to specify a reference point in three-dimensional space along with the orientation of the most distal link of the robot's arm. For a robot arm that has six or fewer degrees of freedom, that vector uniquely determines the settings for all the joints of the robot. If the robot arm has more than six degrees of freedom, further specification of the desired pose of the arm is required to remove any ambiguity.”);
Regarding claim 15 Yang/Chao/Losey teaches The computer-implemented method for learning perceived preferences in HRIs of claim 14.
Yang/Chao/Losey does not teach wherein the second robot provides haptic feedback to the second human based on a human response to the robot action.
Linder in analogous art, teaches wherein the second robot provides haptic feedback to the second human based on a human response to the robot action. (Linder [0011] reads “The robot may be programmed and configured such that, once the user has brought the arm into a desired position, she can direct the robot to perform a particular action (e.g., close the gripper around an object) with the simple push of a button. In addition to serving as an input device during training, the robot arm may, when guided by the user, also provide haptic feedback to the user. For example, to avoid self-collision, the robot may exert increasing resistive forces as the user pushes the arm in a direction that would result in potentially harmful contact with another robot part. The arm, when held by the user, may also exert time-variable force patterns (or “haptic signatures”) in response to certain conditions (such as, e.g., the proximity to a particular type of object) to thereby provide intuitive information to the user.”);
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao/Losey with that of Linder to include a method that would allow the user to have haptic feedback about the teachings that they are trying to impart onto the robotic system. This would allow for the user to give the robot a higher quality training. (Linder [0005] reads “Programming conventional industrial robots generally demands a high level of technical expertise, and requires the user to think in terms understandable by the robot. For example, the programmer may use a six-dimensional vector to specify a reference point in three-dimensional space along with the orientation of the most distal link of the robot's arm. For a robot arm that has six or fewer degrees of freedom, that vector uniquely determines the settings for all the joints of the robot. If the robot arm has more than six degrees of freedom, further specification of the desired pose of the arm is required to remove any ambiguity.”);
Regarding claim 20 Yang/Chao/Losey teaches The robot for learning perceived preferences in HRIs of claim 19.
Yang/Chao/Losey does not teach wherein the second robot provides haptic feedback to the second human based on a human response to the robot action.
Linder in analogous art, teaches wherein the second robot provides haptic feedback to the second human based on a human response to the robot action. (Linder [0011] reads “The robot may be programmed and configured such that, once the user has brought the arm into a desired position, she can direct the robot to perform a particular action (e.g., close the gripper around an object) with the simple push of a button. In addition to serving as an input device during training, the robot arm may, when guided by the user, also provide haptic feedback to the user. For example, to avoid self-collision, the robot may exert increasing resistive forces as the user pushes the arm in a direction that would result in potentially harmful contact with another robot part. The arm, when held by the user, may also exert time-variable force patterns (or “haptic signatures”) in response to certain conditions (such as, e.g., the proximity to a particular type of object) to thereby provide intuitive information to the user.”);
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao/Losey with that of Linder to include a method that would allow the user to have haptic feedback about the teachings that they are trying to impart onto the robotic system. This would allow for the user to give the robot a higher quality training. (Linder [0005] reads “Programming conventional industrial robots generally demands a high level of technical expertise, and requires the user to think in terms understandable by the robot. For example, the programmer may use a six-dimensional vector to specify a reference point in three-dimensional space along with the orientation of the most distal link of the robot's arm. For a robot arm that has six or fewer degrees of freedom, that vector uniquely determines the settings for all the joints of the robot. If the robot arm has more than six degrees of freedom, further specification of the desired pose of the arm is required to remove any ambiguity.”);
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over as applied to Yang/Chao/Losey, in further view of Laftchiev (US 20210173377 A1).
Regarding claim 8 Yang/Chao/Losey teaches The system for learning perceived preferences in HRIs of claim 1.
Yang/Chao/Losey does not teach wherein the generating the feature is based on a radial basis function (RBF). 
Laftchiev in analogous art, teaches wherein the generating the feature is based on a radial basis function (RBF). (Laftchiev [0124] reads “Nonparametric kernel: When there is no known structure of the process to be modeled, the kernel has to be chosen by the user accordingly to their understanding of the process to be modeled. A common option is the Radial Basis Function kernel (RBF): … where λ is a positive constant called the scaling factor, and ΣRBF is a positive definite matrix that defines the norm over which the distance between x k and x j is computed. The scaling factor and the elements of ΣRBF are unknown parameters called hyperparameters; for this reason, it is called a nonparametric (NP) kernel. Several options to parametrize ΣRBF have been proposed e.g., a diagonal matrix or a full matrix defined by the Cholesky decomposition, namely ΣRBF=LLT. In this case, the hyperparameters of ΣRBF are the elements of the lower triangular matrix L, where the elements along the diagonal are constrained to be positive. Notice that with this choice, all the positive definite matrices are parameterized.”);
It would have been obvious to one with ordinary skill in the art, before the effective filing date of the claimed invention to have modified the teachings of Yang/Chao/Losey with that of Laftchiev to include a method that would allow the user of a Radial basis function. This would give the robotic system improved accuracy when manipulating objects. (Laftchiev [0005] reads “Accordingly, there is a need to develop advanced technologies for learning systems that learn to characterize a discrete manufacturing process. In particular, there is a need to develop learning systems geared to discrete manufacturing processes whose sub-steps may be executed by joint human-robot teams. Here the learning system can operate at two levels, the first level is learning a method to optimize the process at a human-robot collaboration level that adjusts help that a robot can provide to a human worker, subject to a condition of the human worker. The second level, is at the system level that learns to detect anomalies in the total discrete manufacturing process given that some steps are executed by a robots and humans.”); 
Other references not Cited
Throughout examination other references were found that could read onto the prior art. Though these references were not used in this examination they could be used in future examination and could read on the contents of the current disclosure. These references are, Hashimoto (US 20210003993 A1); Eppner (US 20210138655 A1); Onuma (US 20190351547 A1).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN MARTIN O'MALLEY whose telephone number is (571)272-6228. The examiner can normally be reached Mon - Fri 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ramon Mercado can be reached at (571) 270 - 5744. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN MARTIN O'MALLEY/Examiner, Art Unit 3658
/Ramon A. Mercado/Supervisory Patent Examiner, Art Unit 3658
Read full office action
LEARNING PERCEIVED PREFERENCES IN HUMAN-ROBOT INTERACTIONS (HRI)

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

LEARNING PERCEIVED PREFERENCES IN HUMAN-ROBOT INTERACTIONS (HRI)

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email