Last updated: April 19, 2026
Application No. 18/209,045
REMOTE CONTROL SYSTEM, CONTROL METHOD FOR REMOTE CONTROL SYSTEM, AND REMOTE CONTROL PROGRAM

Non-Final OA §103
Filed
Jun 13, 2023
Examiner
KASPER, BYRON XAVIER
Art Unit
3657
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Kawasaki Jukogyo Kabushiki Kaisha
OA Round
3 (Non-Final)
Interview Optional

— +18.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 103 resolved cases, 2023–2026
Examiner Intelligence

KASPER, BYRON XAVIER View full profile →
Grants 70% — above average
Career Allow Rate
72 granted / 103 resolved
+17.9% vs TC avg
Strong +18% interview lift
Without
With
+18.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
36 currently pending
Career history
139
Total Applications
across all art units
Statute-Specific Performance

§101
10.9%
-29.1% vs TC avg
§103
56.3%
+16.3% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
16.4%
-23.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 103 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. This communication is responsive to Application No. 18/209,045 and the amendments filed on 11/26/2025.
3. Claims 1, 3-5, and 7-15 are presented for examination.


Information Disclosure Statement
4. The information disclosure statement (IDS) submitted on 6/13/2023 has been fully
considered by the Examiner.


Response to Arguments
5. Applicant’s arguments with respect to the rejection of claim(s) 1 and 3-15 under 35 U.S.C. 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Regarding independent claim 1, the Examiner agrees that the combination of US 20180154518 A1 to Rossano and US 20130300846 A1 to Miller fails to teach all of the amended limitations of the claim. However, in light of the amendments and the Applicant’s remarks, an updated search was conducted, and a new ground of rejection concerning claim 1 has been determined, in which will be described later.
Regarding independent claims 9, 10, and 11, as all of these claims contain similar limitations to claim 1, are still rejected for similar reasons claim 1 is, in which will be described later.
Regarding dependent claims 3-5, 7-8 and 12-15, as all of these claims depend from either claims 1, 9, 10, or 11, are still rejected, in which will be described later.
Regarding dependent claim 6, this claim has been cancelled, and thus, is withdrawn from further consideration.


Claim Rejections - 35 USC § 103
6. In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
7. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

8. Claim(s) 1 and 7-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossano et al. (US 20180154518 A1 hereinafter Rossano) in view of Miller et al. (US 20130300846 A1 hereinafter Miller) and Nishigori et al. (US 20210223867 A1 hereinafter Nishigori).

Regarding Claim 1, Rossano teaches a remote control system comprising: a master that is operated by a user ([0025] via “The operator station 104 is configured to, in connection with communications to/with the robot station 102, provide an operator of the operator station 104 direct remote control of the robot 106 and attached processes from the operator station
104. … The at least one teleoperation member 122 can be structured to receive commands or other input information from the operator of the teleoperation member 122, ….”), (Note: The Examiner interprets the teleoperation member 122 as the master.);  
a slave that applies a treatment to an object in accordance with an action of the master ([0020] via “According to certain embodiments, the robot 106 can be operative to position and/or orient the end effector 108 at locations within reach of a work envelope or workspace of the robot 106 that can accommodate the robot 106 utilizing the end effector 108 to perform work on a workpiece.”), ([0025] via “The operator station 104 is configured to, in connection with communications to/with the robot station 102, provide an operator of the operator station 104 direct remote control of the robot 106 and attached processes from the operator station 104. According to the exemplary embodiment depicted in FIG. 1, the operator station 104 can include at least one teleoperation member 122, one or more displays 124, and the management system 122.”), (Note: The Examiner interprets the robot 106 as the slave.); 
a sensor that is disposed in the slave and detects an operating state of the slave ([0034] via “FIG. 2 illustrates a portion of a robot of a robot station performing exemplary work on a workpiece. In the depicted embodiment, a sensor 114 in the form of a force sensor 130 is positioned between the end effector 108 and a subpart 132 that the robot 106 is assembling to a main part 134. … Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106
and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122 of the operator station 104.”);  
an imager that captures images of at least one of the slave and the object ([0048] via “Additionally, the computation member 116 and/or management system 120 may receive positional and/or orientation information for the robot 106 and/or the TCP 162 of the robot 106 from one or more of the sensors 114, such as, … and/or a captured image(s) through the use of a vision system, such, as, for example, a vision system having one or more cameras or imaging capturing devices, among other types of sensors. Such information may provide, or be used by the computation member 116 and/or management system 120 to determine, the position and/or orientation of the robot 106 and/or the TCP 162 of the robot 106.”); 
a display that displays the captured images captured by the imager and presents the captured images to the user operating the master ([0036] via “According to certain embodiments, the management system 122 can be configured to display on the display 124 a graphical user interface 136 (GUI) that can correspond to an orientation or position of one or more of the robot 106, end effector 108, and/or workpiece, such as the subpart 132 and the main part 134. … For example, according to certain embodiments, the GUI 136 may provide a digital representation or image of one or more of the robot 106, end effector 108, and/or workpiece, but not of the teleoperation member 122.”), ([0053] via “Additionally, according to certain embodiments, digital representations provided by the GUI 136, such as, but not limited to, the TCP 162 of the robot 106, … can overlay video images displayed on the display 124, such as, for example, video images captured by an image capturing device, such as a camera, in the robot station 102.”); 
a controller that performs action control of at least one of the master and the slave based on detection results of the sensor ([0034] via “According to the illustrated embodiment, the force sensor 130 can be integrated with haptic feedback capabilities of the teleoperation member 122. Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106 and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122 of the operator station 104. The controller 112, computation member 116, and/or the management system 122 can utilize such data or related information to provide a signal to the teleoperation member 122 that is used to provide a visual, auditory, or physical signal to the operator of the teleoperation member 122 that may provide an indication of the relative degree of the force sensed by the force sensor 130. Moreover, the degree of haptic feedback provided by the teleoperation member 122, such as whether the force feedback is relatively large or small, may provide an indication of the extent of force sensed by the force sensor 130.”).
Rossano is silent on wherein the controller delays the action control to reduce a lag between the action control and display timings of the captured images by the display, and the controller receives each detection result from the slave, receives each captured image from the display, and obtains a delay amount in delaying the action control based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result; and an image processor that receives the captured images from the imager and combines each captured image with each respective detection result to generate a combined image signal, wherein the image processor encodes the combined image signal as an encoded image signal and transmits the encoded image signal to the controller, and the controller decodes the encoded image signal and outputs the decoded image signal to the display such that the display displays the captured images combined with the respective detection results.
However, Miller teaches wherein the controller delays the action control to reduce a lag between the action control and display timings of the captured images by the display ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), (Note: See Figure 5 of Miller as well.), and 
the controller receives each detection result from the slave, receives each captured image from the display, and obtains a delay amount in delaying the action control based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), ([0066] via “At time unit t.sub.7, which would correspond to the display of the next image frame based on the current display rate, video processor 325 has yet to receive image frame f3. This may be due to, for example, bandwidth or other network issues. According to the exemplary embodiment, and because video processor 325 has one image frame, f2, accumulated, video processor 335 reduces its video display rate. For example, in the illustration of FIG. 5, video processor 335 reduces the video display rate from one image frame every two time units to one image frame every two and one half time units, and provides image frame f2 for display at the mid-point between time units t.sub.7 and t.sub.8.”), (Note: See Figure 5 of Miller as well.).
Further, Nishigori teaches an image processor that receives the captured images from the imager and combines each captured image with each respective detection result to generate a combined image signal ([0093] via “The image signal processing unit 12 performs various image signal processing on the captured image signal based on the digital signal obtained by the imaging unit 11 and outputs the image signal to the coding unit 17.”), ([0094] via “A sound collection signal by the microphone 13 is converted into a digital signal by the A/D converter 15 via the amplifier 14, then subjected to predetermined audio signal processing by the audio signal processing unit 16 and input to the coding unit 17.”), ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”), (Note: See Figure 3 of Nishigori as well.), 
wherein the image processor encodes the combined image signal as an encoded image signal ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”) and transmits the encoded image signal to the controller ([0098] via “Although the details will be described later, the coding unit 17 of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0099] via “The coding unit 17, the control unit 18, the storage unit 20, the display control unit 22, the media drive 23, and the communication unit 24 are communicably connected to each other via the bus 25.”), ([0110] via “The communication unit 24 performs data communication and network communication with an external device by wire or wirelessly. The above stream data can be transmitted to an external device via the communication unit 24.”), and 
the controller decodes the encoded image signal and outputs the decoded image signal to the display such that the display displays the captured images combined with the respective detection results ([0098] via “Although the details will be described later, the coding unit 17
of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0134] via “The decoding unit 39 inputs the stream data read from the recording medium mounted on the media drive 31 or the stream data received from the imaging device 1 via the communication unit 32 via the bus 38, and decodes (reproduces) the captured image signal, audio signal, and tactile signal included in the stream data. Note that, the decoding unit 39 decodes the stream data in response to an instruction given by the control unit 33 on the basis of, for example, an operation input via the operation unit 34.”), ([0135] via “The decoded captured image signal and audio signal are output to the display device 5 in a predetermined transmission data format via the image/audio I/F 40.”), ([0137] via “Further, the decoded tactile signals are input to the signal processing unit 41 of the corresponding channels, respectively. Each signal processing unit 41
performs signal processing such as calibration of the tactile presentation device 3 or the tactile presentation device 4 and predetermined filter processing on the tactile signals of the corresponding channels as necessary.”), (Note: See Figure 4 of Nishigori as well.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Miller wherein the controller delays the action control to reduce a lag between the action control and display timings of the captured images by the display, and the controller receives each detection result from the slave, receives each captured image from the display, and obtains a delay amount in delaying the action control based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result. Doing so provides a smooth display to the user during real-time image rendering of the environment, as stated by Miller ([0074] via “As illustrated in FIG. 5, exemplary embodiments of the present disclosure control and vary the output of video frames from a buffer to the display in a manner that balances providing a smooth display in a real-time image rendering environment. In particular, under the exemplary timing for receiving image frames from a video capture device illustrated in Abscissa C of FIGS. 4 and 5, exemplary embodiments provide a relatively smooth and accurate display of the captured video via smaller freeze and/or jitter periods than those of conventional configurations, for example, as schematically illustrated in Abscissa D.”).
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Nishigori wherein the remote control system comprises: an image processor that receives the captured images from the imager and combines each captured image with each respective detection result to generate a combined image signal, wherein the image processor encodes the combined image signal as an encoded image signal and transmits the encoded image signal to the controller, and the controller decodes the encoded image signal and outputs the decoded image signal to the display such that the display displays the captured images combined with the respective detection results. Doing so enhances the realistic feeling the user experiences by reducing a time lag between the displayed captured images and the outputted respective detection results, as stated by Nishigori ([0253] via “By generating the tactile signal synchronized with the moving image data as described above, it is possible to prevent the occurrence of a time lag between the visual information and the tactile information, and it is possible to enhance the realistic feeling and sense of reality.”).

Regarding Claim 7, modified reference Rossano teaches the remote control system according to claim 1, wherein as the action control, the controller controls an action of the slave and controls an action of the master such that a reaction force exerted on the slave is presented to the user, in accordance with operation of the master by the user and the detection results ([0034] via “According to the illustrated embodiment, the force sensor 130
can be integrated with haptic feedback capabilities of the teleoperation member 122. Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106 and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122 of the operator station 104.”), ([0035] via “As the robot 106 is maneuvered by operation of the teleoperation member 122, and the subpart 132 is moved into contact with the main part
134, at least a portion of the forces generated by such contact between the subpart 132 and main part 134 are sensed by the force sensor 130. Force feedback information or data relating to, or indicative of, the forces sensed by the force sensor 130 can be communicated to the operator via the haptic capabilities of the teleoperation member 122, as previously discussed. … Such haptic feedback can provide the operator of the operator station 104 a sense or indication of what the robot 106 is experiencing during the operation of assembling the subpart 132 with the main part 134.”).

Regarding Claim 8, modified reference Rossano teaches the remote control system according to claim 7, wherein the sensor is a force sensor ([0034] via “FIG. 2 illustrates a portion of a robot of a robot station performing exemplary work on a workpiece. In the depicted embodiment, a sensor 114 in the form of a force sensor 130 is positioned between the end effector 108 and a subpart 132 that the robot 106 is assembling to a main part 134. According to the illustrated embodiment, the force sensor 130 can be integrated with haptic feedback capabilities of the teleoperation member 122.”).

Regarding Claim 9, Rossano teaches a remote control system comprising: a master that is operated by a user in order to move a slave that applies a treatment to an object ([0020] via “According to certain embodiments, the robot 106 can be operative to position and/or orient the end effector 108 at locations within reach of a work envelope or workspace of the robot
106 that can accommodate the robot 106 utilizing the end effector 108 to perform work on a workpiece.”), ([0025] via ““The operator station 104 is configured to, in connection with communications to/with the robot station 102, provide an operator of the operator station 104 direct remote control of the robot 106 and attached processes from the operator station 104. According to the exemplary embodiment depicted in FIG. 1, the operator station 104 can include at least one teleoperation member 122, one or more displays 124, and the management system 122. The at least one teleoperation member 122 can be structured to receive commands or other input information from the operator of the teleoperation member 122, including, but not limited to, commands provided through the physical engagement and/or movement of at least a portion of the teleoperation member 122 and/or operator.”), (Note: The Examiner interprets the teleoperation member 122 as the master and the robot 106 as the slave.); 
a display that displays captured images captured by an imager that captures images of at least one of the slave and the object, and presents the captured images to the user operating the master ([0036] via “According to certain embodiments, the management system 122 can be configured to display on the display 124 a graphical user interface 136 (GUI) that can correspond to an orientation or position of one or more of the robot 106, end effector 108, and/or workpiece, such as the subpart 132 and the main part 134. … For example, according to certain embodiments, the GUI 136 may provide a digital representation or image of one or more of the robot 106, end effector 108, and/or workpiece, but not of the teleoperation member 122.”), ([0053] via “Additionally, according to certain embodiments, digital representations provided by the GUI 136, such as, but not limited to, the TCP 162 of the robot 106, … can overlay video images displayed on the display 124, such as, for example, video images captured by an image capturing device, such as a camera, in the robot station
102.”); 
a controller that performs action control of at least one of the master and the slave based on detection results of a sensor that is disposed in the slave and detects an operating state of the slave ([0034] via “According to the illustrated embodiment, the force sensor 130
can be integrated with haptic feedback capabilities of the teleoperation member 122. Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106 and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122 of the operator station 104. The controller 112, computation member 116, and/or the management system 122 can utilize such data or related information to provide a signal to the teleoperation member 122 that is used to provide a visual, auditory, or physical signal to the operator of the teleoperation member 122 that may provide an indication of the relative degree of the force sensed by the force sensor 130. Moreover, the degree of haptic feedback provided by the teleoperation member 122, such as whether the force feedback is relatively large or small, may provide an indication of the extent of force sensed by the force sensor
130.”).
Rossano is silent on wherein the controller delays the action control to reduce a lag between the action control and display timings of the captured images by the display, and the controller receives each detection result from the slave, receives each captured image from the display, and obtains a delay amount in delaying the action control based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result; and 
an image processor that receives the captured images from the imager and combines each captured image with each respective detection result to generate a combined image signal, wherein the image processor encodes the combined image signal as an encoded image signal and transmits the encoded image signal to the controller, and the controller decodes the encoded image signal and outputs the decoded image signal to the display such that the display displays the captured images combined with the respective detection results.
However, Miller teaches wherein the controller delays the action control to reduce a lag between the action control and display timings of the captured images by the display ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), (Note: See Figure 5 of Miller as well.), and 
the controller receives each detection result from the slave, receives each captured image from the display, and obtains a delay amount in delaying the action control based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), ([0066] via “At time unit t.sub.7, which would correspond to the display of the next image frame based on the current display rate, video processor 325 has yet to receive image frame f3. This may be due to, for example, bandwidth or other network issues. According to the exemplary embodiment, and because video processor 325 has one image frame, f2, accumulated, video processor 335 reduces its video display rate. For example, in the illustration of FIG. 5, video processor 335 reduces the video display rate from one image frame every two time units to one image frame every two and one half time units, and provides image frame f2 for display at the mid-point between time units t.sub.7 and t.sub.8.”), (Note: See Figure 5 of Miller as well.).
Further, Nishigori teaches an image processor that receives the captured images from the imager and combines each captured image with each respective detection result to generate a combined image signal ([0093] via “The image signal processing unit 12 performs various image signal processing on the captured image signal based on the digital signal obtained by the imaging unit 11 and outputs the image signal to the coding unit 17.”), ([0094] via “A sound collection signal by the microphone 13 is converted into a digital signal by the A/D converter 15 via the amplifier 14, then subjected to predetermined audio signal processing by the audio signal processing unit 16 and input to the coding unit 17.”), ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”), (Note: See Figure 3 of Nishigori as well.), 
wherein the image processor encodes the combined image signal as an encoded image signal ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”) and transmits the encoded image signal to the controller ([0098] via “Although the details will be described later, the coding unit 17 of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0099] via “The coding unit 17, the control unit 18, the storage unit 20, the display control unit 22, the media drive 23, and the communication unit 24 are communicably connected to each other via the bus 25.”), ([0110] via “The communication unit 24 performs data communication and network communication with an external device by wire or wirelessly. The above stream data can be transmitted to an external device via the communication unit 24.”), and 
the controller decodes the encoded image signal and outputs the decoded image signal to the display such that the display displays the captured images combined with the respective detection results ([0098] via “Although the details will be described later, the coding unit 17
of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0134] via “The decoding unit 39 inputs the stream data read from the recording medium mounted on the media drive 31 or the stream data received from the imaging device 1 via the communication unit 32 via the bus 38, and decodes (reproduces) the captured image signal, audio signal, and tactile signal included in the stream data. Note that, the decoding unit 39 decodes the stream data in response to an instruction given by the control unit 33 on the basis of, for example, an operation input via the operation unit 34.”), ([0135] via “The decoded captured image signal and audio signal are output to the display device 5 in a predetermined transmission data format via the image/audio I/F 40.”), ([0137] via “Further, the decoded tactile signals are input to the signal processing unit 41 of the corresponding channels, respectively. Each signal processing unit 41
performs signal processing such as calibration of the tactile presentation device 3 or the tactile presentation device 4 and predetermined filter processing on the tactile signals of the corresponding channels as necessary.”), (Note: See Figure 4 of Nishigori as well.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Miller wherein the controller delays the action control to reduce a lag between the action control and display timings of the captured images by the display, and the controller receives each detection result from the slave, receives each captured image from the display, and obtains a delay amount in delaying the action control based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result. Doing so provides a smooth display to the user during real-time image rendering of the environment, as stated by Miller ([0074] via “As illustrated in FIG. 5, exemplary embodiments of the present disclosure control and vary the output of video frames from a buffer to the display in a manner that balances providing a smooth display in a real-time image rendering environment. In particular, under the exemplary timing for receiving image frames from a video capture device illustrated in Abscissa C of FIGS. 4 and 5, exemplary embodiments provide a relatively smooth and accurate display of the captured video via smaller freeze and/or jitter periods than those of conventional configurations, for example, as schematically illustrated in Abscissa D.”).
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Nishigori wherein the remote control system comprises: an image processor that receives the captured images from the imager and combines each captured image with each respective detection result to generate a combined image signal, wherein the image processor encodes the combined image signal as an encoded image signal and transmits the encoded image signal to the controller, and the controller decodes the encoded image signal and outputs the decoded image signal to the display such that the display displays the captured images combined with the respective detection results. Doing so enhances the realistic feeling the user experiences by reducing a time lag between the displayed captured images and the outputted respective detection results, as stated by Nishigori ([0253] via “By generating the tactile signal synchronized with the moving image data as described above, it is possible to prevent the occurrence of a time lag between the visual information and the tactile information, and it is possible to enhance the realistic feeling and sense of reality.”).

Regarding Claim 10, Rossano teaches a control method for a remote control system including a master that is operated by a user ([0025] via “The operator station 104 is configured to, in connection with communications to/with the robot station 102, provide an operator of the operator station 104 direct remote control of the robot 106 and attached processes from the operator station 104. … The at least one teleoperation member 122 can be structured to receive commands or other input information from the operator of the teleoperation member 122, ….”), (Note: The Examiner interprets the teleoperation member 122 as the master.), 
a slave that applies a treatment to an object in accordance with an action of the master ([0020] via “According to certain embodiments, the robot 106 can be operative to position and/or orient the end effector 108 at locations within reach of a work envelope or workspace of the robot 106 that can accommodate the robot 106 utilizing the end effector 108 to perform work on a workpiece.”), ([0025] via “The operator station 104 is configured to, in connection with communications to/with the robot station 102, provide an operator of the operator station 104 direct remote control of the robot 106 and attached processes from the operator station 104. According to the exemplary embodiment depicted in FIG. 1, the operator station 104 can include at least one teleoperation member 122, one or more displays 124, and the management system 122.”), (Note: The Examiner interprets the robot 106 as the slave.), 
a sensor that is disposed in the slave and detects an operating state of the slave ([0034] via “FIG. 2 illustrates a portion of a robot of a robot station performing exemplary work on a workpiece. In the depicted embodiment, a sensor 114 in the form of a force sensor 130 is positioned between the end effector 108 and a subpart 132 that the robot 106 is assembling to a main part 134. … Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106
and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122
of the operator station 104.”), 
an imager that captures images of at least one of the slave and the object ([0048] via “Additionally, the computation member 116 and/or management system 120 may receive positional and/or orientation information for the robot 106 and/or the TCP 162 of the robot 106 from one or more of the sensors 114, such as, for example, a position sensor, accelerometer, and/or a captured image(s) through the use of a vision system, such, as, for example, a vision system having one or more cameras or imaging capturing devices, among other types of sensors. Such information may provide, or be used by the computation member 116 and/or management system 120 to determine, the position and/or orientation of the robot 106 and/or the TCP 162 of the robot 106.”), and 
a display that displays the captured images captured by the imager and presents the captured images to the user operating the master ([0036] via “According to certain embodiments, the management system 122 can be configured to display on the display 124 a graphical user interface 136 (GUI) that can correspond to an orientation or position of one or more of the robot 106, end effector 108, and/or workpiece, such as the subpart 132 and the main part 134. … For example, according to certain embodiments, the GUI 136 may provide a digital representation or image of one or more of the robot 106, end effector 108, and/or workpiece, but not of the teleoperation member 122.”), ([0053] via “Additionally, according to certain embodiments, digital representations provided by the GUI 136, such as, but not limited to, the TCP 162 of the robot 106, … can overlay video images displayed on the display 124, such as, for example, video images captured by an image capturing device, such as a camera, in the robot station 102.”), the method comprising: 
performing action control of at least one of the master and the slave based on detection results of the sensor ([0034] via “According to the illustrated embodiment, the force sensor 130 can be integrated with haptic feedback capabilities of the teleoperation member 122. Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106 and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122 of the operator station 104. The controller 112, computation member 116, and/or the management system 122 can utilize such data or related information to provide a signal to the teleoperation member 122 that is used to provide a visual, auditory, or physical signal to the operator of the teleoperation member 122 that may provide an indication of the relative degree of the force sensed by the force sensor 130. Moreover, the degree of haptic feedback provided by the teleoperation member 122, such as whether the force feedback is relatively large or small, may provide an indication of the extent of force sensed by the force sensor
130.”).
Rossano is silent on delaying the action control to reduce a lag between the action control and display timings of the captured images by the display, wherein each detection result is received from the slave, each captured image is received from the display, and a delay amount in delaying the action control is obtained based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result; receiving the captured images from the imager and combining each captured image with each respective detection result to generate a combined image signal; encoding the combined image signal as an encoded image signal and transmitting the encoded image signal; and decoding the encoded image signal and outputting the decoded image signal to the display such that the display displays the captured images combined with the respective detection results.
However, Miller teaches delaying the action control to reduce a lag between the action control and display timings of the captured images by the display ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), (Note: See Figure 5 of Miller as well.), 
wherein each detection result is received from the slave, each captured image is received from the display, and a delay amount in delaying the action control is obtained based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), ([0066] via “At time unit t.sub.7, which would correspond to the display of the next image frame based on the current display rate, video processor 325 has yet to receive image frame f3. This may be due to, for example, bandwidth or other network issues. According to the exemplary embodiment, and because video processor 325 has one image frame, f2, accumulated, video processor 335 reduces its video display rate. For example, in the illustration of FIG. 5, video processor 335 reduces the video display rate from one image frame every two time units to one image frame every two and one half time units, and provides image frame f2 for display at the mid-point between time units t.sub.7 and t.sub.8.”), (Note: See Figure 5 of Miller as well.).
Further, Nishigori teaches receiving the captured images from the imager and combining each captured image with each respective detection result to generate a combined image signal ([0093] via “The image signal processing unit 12 performs various image signal processing on the captured image signal based on the digital signal obtained by the imaging unit 11 and outputs the image signal to the coding unit 17.”), ([0094] via “A sound collection signal by the microphone 13 is converted into a digital signal by the A/D converter 15 via the amplifier 14, then subjected to predetermined audio signal processing by the audio signal processing unit 16 and input to the coding unit 17.”), ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”), (Note: See Figure 3 of Nishigori as well.); 
encoding the combined image signal as an encoded image signal ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”) and transmitting the encoded image signal ([0098] via “Although the details will be described later, the coding unit 17 of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0099] via “The coding unit 17, the control unit 18, the storage unit 20, the display control unit 22, the media drive 23, and the communication unit 24 are communicably connected to each other via the bus 25.”), ([0110] via “The communication unit 24 performs data communication and network communication with an external device by wire or wirelessly. The above stream data can be transmitted to an external device via the communication unit 24.”); and 
decoding the encoded image signal and outputting the decoded image signal to the display such that the display displays the captured images combined with the respective detection results ([0098] via “Although the details will be described later, the coding unit 17
of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0134] via “The decoding unit 39 inputs the stream data read from the recording medium mounted on the media drive 31 or the stream data received from the imaging device 1 via the communication unit 32 via the bus 38, and decodes (reproduces) the captured image signal, audio signal, and tactile signal included in the stream data. Note that, the decoding unit 39 decodes the stream data in response to an instruction given by the control unit 33 on the basis of, for example, an operation input via the operation unit 34.”), ([0135] via “The decoded captured image signal and audio signal are output to the display device 5 in a predetermined transmission data format via the image/audio I/F 40.”), ([0137] via “Further, the decoded tactile signals are input to the signal processing unit 41 of the corresponding channels, respectively. Each signal processing unit 41
performs signal processing such as calibration of the tactile presentation device 3 or the tactile presentation device 4 and predetermined filter processing on the tactile signals of the corresponding channels as necessary.”), (Note: See Figure 4 of Nishigori as well.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Miller wherein the method comprises: delaying the action control to reduce a lag between the action control and display timings of the captured images by the display, wherein each detection result is received from the slave, each captured image is received from the display, and a delay amount in delaying the action control is obtained based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result. Doing so provides a smooth display to the user during real-time image rendering of the environment, as stated by Miller ([0074] via “As illustrated in FIG. 5, exemplary embodiments of the present disclosure control and vary the output of video frames from a buffer to the display in a manner that balances providing a smooth display in a real-time image rendering environment. In particular, under the exemplary timing for receiving image frames from a video capture device illustrated in Abscissa C of FIGS. 4 and 5, exemplary embodiments provide a relatively smooth and accurate display of the captured video via smaller freeze and/or jitter periods than those of conventional configurations, for example, as schematically illustrated in Abscissa D.”).
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Nishigori wherein the method comprises: receiving the captured images from the imager and combining each captured image with each respective detection result to generate a combined image signal; encoding the combined image signal as an encoded image signal and transmitting the encoded image signal; and decoding the encoded image signal and outputting the decoded image signal to the display such that the display displays the captured images combined with the respective detection results. Doing so enhances the realistic feeling the user experiences by reducing a time lag between the displayed captured images and the outputted respective detection results, as stated by Nishigori ([0253] via “By generating the tactile signal synchronized with the moving image data as described above, it is possible to prevent the occurrence of a time lag between the visual information and the tactile information, and it is possible to enhance the realistic feeling and sense of reality.”).

Regarding Claim 11, Rossano teaches a non-transitory recording medium recording a remote control program for causing a computer to execute the function of controlling a remote control system  ([0022] via “The controller 112 can take a variety of different forms, and can be configured to execute program instructions to perform tasks associated with operating robot 106, and moreover, to operate the robot 106 to perform various functions, such as, for example, but not limited to, tasks described herein. In one form, the controller(s) 112 is/are microprocessor based and the program instructions are in the form of software stored in one or more memories. … Operations, instructions, and/or commands determined and/or transmitted from the controller 112 can be based on one or more models stored in non-transient computer readable media in a controller 112, other computer, and/or memory that is accessible or in electrical communication with the controller 112.”) 
including a master that is operated by a user ([0025] via “The operator station 104 is configured to, in connection with communications to/with the robot station 102, provide an operator of the operator station 104 direct remote control of the robot 106 and attached processes from the operator station 104. … The at least one teleoperation member 122 can be structured to receive commands or other input information from the operator of the teleoperation member 122, ….”), (Note: The Examiner interprets the teleoperation member 122 as the master.), 
a slave that applies a treatment to an object in accordance with an action of the master ([0020] via “According to certain embodiments, the robot 106 can be operative to position and/or orient the end effector 108 at locations within reach of a work envelope or workspace of the robot 106 that can accommodate the robot 106 utilizing the end effector 108 to perform work on a workpiece.”), ([0025] via “The operator station 104 is configured to, in connection with communications to/with the robot station 102, provide an operator of the operator station 104 direct remote control of the robot 106 and attached processes from the operator station 104. According to the exemplary embodiment depicted in FIG. 1, the operator station 104 can include at least one teleoperation member 122, one or more displays 124, and the management system 122.”), (Note: The Examiner interprets the robot 106 as the slave.), 
a sensor that is disposed in the slave and detects an operating state of the slave ([0034] via “FIG. 2 illustrates a portion of a robot of a robot station performing exemplary work on a workpiece. In the depicted embodiment, a sensor 114 in the form of a force sensor 130 is positioned between the end effector 108 and a subpart 132 that the robot 106 is assembling to a main part 134. … Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106
and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122
of the operator station 104.”), 
an imager that captures images of at least one of the slave and the object ([0048] via “Additionally, the computation member 116 and/or management system 120 may receive positional and/or orientation information for the robot 106 and/or the TCP 162 of the robot
106 from one or more of the sensors 114, such as, for example, a position sensor, accelerometer, and/or a captured image(s) through the use of a vision system, such, as, for example, a vision system having one or more cameras or imaging capturing devices, among other types of sensors. Such information may provide, or be used by the computation member 116 and/or management system 120 to determine, the position and/or orientation of the robot 106 and/or the TCP 162 of the robot 106.”), and 
a display that displays the captured images captured by the imager and presents the captured images to the user operating the master ([0036] via “According to certain embodiments, the management system 122 can be configured to display on the display 124 a graphical user interface 136 (GUI) that can correspond to an orientation or position of one or more of the robot 106, end effector 108, and/or workpiece, such as the subpart 132 and the main part 134. … For example, according to certain embodiments, the GUI 136 may provide a digital representation or image of one or more of the robot 106, end effector 108, and/or workpiece, but not of the teleoperation member 122.”), ([0053] via “Additionally, according to certain embodiments, digital representations provided by the GUI 136, such as, but not limited to, the TCP 162 of the robot 106, … can overlay video images displayed on the display 124, such as, for example, video images captured by an image capturing device, such as a camera, in the robot station 102.”), the program causing the computer to execute the functions of: 
performing action control of at least one of the master and the slave based on detection results of the sensor ([0034] via “According to the illustrated embodiment, the force sensor 130 can be integrated with haptic feedback capabilities of the teleoperation member 122. Moreover, the force sensor 130 can be used to detect and/or determine forces being experienced by, and/or imparted onto, the end effector 108 and/or robot 106 and can directly or indirectly provide corresponding information or data to the controller 112, the computation member 116 of the robot system 102, and/or the management system 122 of the operator station 104. The controller 112, computation member 116, and/or the management system 122 can utilize such data or related information to provide a signal to the teleoperation member 122 that is used to provide a visual, auditory, or physical signal to the operator of the teleoperation member 122 that may provide an indication of the relative degree of the force sensed by the force sensor 130. Moreover, the degree of haptic feedback provided by the teleoperation member 122, such as whether the force feedback is relatively large or small, may provide an indication of the extent of force sensed by the force sensor
130.”).
Rossano is silent on delaying the action control to reduce a lag between the action control and display timings of the captured images by the display, wherein each detection result is received from the slave, each captured image is received from the display, and a delay amount in delaying the action control is obtained based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result; receiving the captured images from the imager and combining each captured image with each respective detection result to generate a combined image signal; encoding the combined image signal as an encoded image signal and transmitting the encoded image signal; and decoding the encoded image signal and outputting the decoded image signal to the display such that the display displays the captured images combined with the respective detection results.
However, Miller teaches delaying the action control to reduce a lag between the action control and display timings of the captured images by the display ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), (Note: See Figure 5 of Miller as well.), 
wherein each detection result is received from the slave, each captured image is received from the display, and a delay amount in delaying the action control is obtained based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result ([0062] via “Abscissa F illustrates a timing at which image frames f0-f8 may be provided by video processor 325 to a video display such as, for example, video display 335 of FIG. 3, …. In particular, video processor 325 provides image frames for display to video display 335 based on the number of image frames accumulated at video processor 325, or stored at a video data buffer (not shown) accessible to video processor 325. Abscissa B illustrates a real-time latency of four image frames. In accordance with an exemplary embodiment of the present disclosure, video processor 325 provides image frames to video display 335 depending on how many image frames have been accumulated at video processor 325 and how that amount compares to a desired image-frame latency, which may be set, for example, to be around the real-time latency illustrated in Abscissa B.”), ([0064] via “At time unit t.sub.1 in FIG. 5, video processor 325 receives image frame f0. At time unit t.sub.3, with image frame f0 accumulated at video processor 325, video processor 325 receives image frame f1. Thus, in view of the accumulation of two image frames, video processor 325 provides image frame f0 to video display 335 at time unit t.sub.3.”), ([0066] via “At time unit t.sub.7, which would correspond to the display of the next image frame based on the current display rate, video processor 325 has yet to receive image frame f3. This may be due to, for example, bandwidth or other network issues. According to the exemplary embodiment, and because video processor 325 has one image frame, f2, accumulated, video processor 335 reduces its video display rate. For example, in the illustration of FIG. 5, video processor 335 reduces the video display rate from one image frame every two time units to one image frame every two and one half time units, and provides image frame f2 for display at the mid-point between time units t.sub.7 and t.sub.8.”), (Note: See Figure 5 of Miller as well.).
Further, Nishigori teaches receiving the captured images from the imager and combining each captured image with each respective detection result to generate a combined image signal ([0093] via “The image signal processing unit 12 performs various image signal processing on the captured image signal based on the digital signal obtained by the imaging unit 11 and outputs the image signal to the coding unit 17.”), ([0094] via “A sound collection signal by the microphone 13 is converted into a digital signal by the A/D converter 15 via the amplifier 14, then subjected to predetermined audio signal processing by the audio signal processing unit 16 and input to the coding unit 17.”), ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”), (Note: See Figure 3 of Nishigori as well.); 
encoding the combined image signal as an encoded image signal ([0096] via “The coding unit 17 includes a DSP (Digital Signal Processor), and performs coding according to a predetermined data format on a captured image signal input via the image signal processing unit 12 and an audio signal input via the audio signal processing unit 16.”) and transmitting the encoded image signal ([0098] via “Although the details will be described later, the coding unit 17 of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0099] via “The coding unit 17, the control unit 18, the storage unit 20, the display control unit 22, the media drive 23, and the communication unit 24 are communicably connected to each other via the bus 25.”), ([0110] via “The communication unit 24 performs data communication and network communication with an external device by wire or wirelessly. The above stream data can be transmitted to an external device via the communication unit 24.”); and 
decoding the encoded image signal and outputting the decoded image signal to the display such that the display displays the captured images combined with the respective detection results ([0098] via “Although the details will be described later, the coding unit 17
of this example generates stream data including the captured image signal, the audio signal, and the tactile signal in the same stream.”), ([0134] via “The decoding unit 39 inputs the stream data read from the recording medium mounted on the media drive 31 or the stream data received from the imaging device 1 via the communication unit 32 via the bus 38, and decodes (reproduces) the captured image signal, audio signal, and tactile signal included in the stream data. Note that, the decoding unit 39 decodes the stream data in response to an instruction given by the control unit 33 on the basis of, for example, an operation input via the operation unit 34.”), ([0135] via “The decoded captured image signal and audio signal are output to the display device 5 in a predetermined transmission data format via the image/audio I/F 40.”), ([0137] via “Further, the decoded tactile signals are input to the signal processing unit 41 of the corresponding channels, respectively. Each signal processing unit 41
performs signal processing such as calibration of the tactile presentation device 3 or the tactile presentation device 4 and predetermined filter processing on the tactile signals of the corresponding channels as necessary.”), (Note: See Figure 4 of Nishigori as well.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Miller wherein the computer executes the functions of: delaying the action control to reduce a lag between the action control and display timings of the captured images by the display, wherein each detection result is received from the slave, each captured image is received from the display, and a delay amount in delaying the action control is obtained based on a time lag between each detection result and each corresponding captured image by matching a signal waveform of each captured image with a signal waveform of a corresponding detection result. Doing so provides a smooth display to the user during real-time image rendering of the environment, as stated by Miller ([0074] via “As illustrated in FIG. 5, exemplary embodiments of the present disclosure control and vary the output of video frames from a buffer to the display in a manner that balances providing a smooth display in a real-time image rendering environment. In particular, under the exemplary timing for receiving image frames from a video capture device illustrated in Abscissa C of FIGS. 4 and 5, exemplary embodiments provide a relatively smooth and accurate display of the captured video via smaller freeze and/or jitter periods than those of conventional configurations, for example, as schematically illustrated in Abscissa D.”).
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Nishigori wherein the computer executes the functions of: receiving the captured images from the imager and combining each captured image with each respective detection result to generate a combined image signal; encoding the combined image signal as an encoded image signal and transmitting the encoded image signal; and decoding the encoded image signal and outputting the decoded image signal to the display such that the display displays the captured images combined with the respective detection results. Doing so enhances the realistic feeling the user experiences by reducing a time lag between the displayed captured images and the outputted respective detection results, as stated by Nishigori ([0253] via “By generating the tactile signal synchronized with the moving image data as described above, it is possible to prevent the occurrence of a time lag between the visual information and the tactile information, and it is possible to enhance the realistic feeling and sense of reality.”).


9. Claim(s) 3, 4, and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossano et al. (US 20180154518 A1 hereinafter Rossano) in view of Miller et al. (US 20130300846 A1 hereinafter Miller) and Nishigori et al. (US 20210223867 A1 hereinafter Nishigori), and further in view of Nocon (US 20190064924 A1 hereinafter Nocon).

Regarding Claim 3, modified reference Rossano teaches the remote control system according to claim 1, but is silent on wherein in a case where the controller updates the delay amount to a new delay amount obtained based on the time lag, the controller updates the delay amount to the new delay amount stepwise.
However, Nocon teaches wherein in a case where the controller updates the delay amount to a new delay amount obtained based on the time lag, the controller updates the delay amount to the new delay amount stepwise ([0046] via “By way of non-limiting illustration, based on comparison indicating the prior relative time delay is greater than the threshold time span and that the prior executions of the visual circuit control signals by presentation device 120 occurred relatively later than the prior executions of the haptic circuit control signals by the one or more haptic feedback devices 124, synchronization component 114 may be configured to instruct haptic control component 110 to delay the current transmissions of the haptic circuit control signals to one or more haptic feedback devices 124.”), ([0050] via “In some implementations, based on the comparison indicating that the first time delay in the current transmission is greater than the second time delay (e.g., the visual effects take longer), synchronization component 114 may be configured to instruct haptic control component 110 to delay the subsequent transmission of the haptic circuit control signals to one or more haptic feedback devices 124 by a first amount of time. The first amount of time may be determined so that the second time delay may be substantially the same as the first time delay.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Nocon wherein in a case where the controller updates the delay amount to a new delay amount obtained based on the time lag, the controller updates the delay amount to the new delay amount stepwise. Doing so presents both the visual and haptic data to the user at the same time, as stated by Nocon ([0050] via “Thus, the time delay associated with the control signals being transmitted to and executed by presentation device 120 and the time delay associated with control signals being transmitted to and executed by one or more haptic feedback devices 124 may be the same, or similar, time delay. This may cause the subsequent executions to occur simultaneously (or at least within the threshold time span).”), which creates a more realistic experience for the user of the system, improving the experience, as stated by Nocon ([0014] via “An interactive experience within an interactive space may include haptic feedback synchronized with visual effects to make the experiences more realistic. Current haptic feedback implementations may have an inherent delay which, when applied in an interactive space, may be noticeable and/or may degrade the experience. To address this requires precise synchronization to tell the system when to trigger the haptic feedback. Using an understanding of latencies (e.g., delays), system 100 may be configured to precisely align visual effects to haptic feedback.”).

Regarding Claim 4, modified reference Rossano teaches the remote control system according to claim 1, but is silent on wherein association information is added to at least one of the detection result and the captured image acquired by the sensor and the imager, respectively, at corresponding timings, the association information indicating that the detection result and the captured image are associated with each other, and the controller distinguishes the detection result and the captured image acquired at the corresponding timings based on the association information from the detection results and the captured images received by the controller, and obtains the time lag.
However, Nocon teaches wherein association information is added to at least one of the detection result and the captured image acquired by the sensor and the imager, respectively, at corresponding timings, the association information indicating that the detection result and the captured image are associated with each other ([0047] via “In some implementations, synchronization component 114 may be configured to compare a value of the visual circuit latency parameter with a value of the haptic circuit latency parameter.”), (Note: The Examiner interprets the comparison as determining the association information that the visual circuit latency and the haptic circuit latency parameters are related.), and 
the controller distinguishes the detection result and the captured image acquired at the corresponding timings based on the association information from the detection results and the captured images received by the controller, and obtains the time lag ([0048] via “By way of non-limiting illustration, latency component 112 may be configured to determine a value of the visual circuit latency parameter associated with a current transmission of the visual circuit control signals to presentation device 120 and a value of the haptic circuit latency parameter associated with a current transmission of the haptic circuit control signals to one or more haptic feedback devices 124. The value of the visual circuit latency parameter may specify a first time delay between points in time when a transmission of the visual circuit control signals is initiated and points in time when the visual circuit control signals are executed by presentation device 120. The value of the haptic circuit latency parameter may specify a second time delay between points in time when a transmission of the haptic circuit control signals is initiated and points in time when the haptic circuit control signals are executed by one or more haptic feedback devices 124.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Nocon wherein association information is added to at least one of the detection result and the captured image acquired by the sensor and the imager, respectively, at corresponding timings, the association information indicating that the detection result and the captured image are associated with each other, and the controller distinguishes the detection result and the captured image acquired at the corresponding timings based on the association information from the detection results and the captured images received by the controller, and obtains the time lag. Doing so allows the system to understand the respective latencies with the visual and haptic information such that it can properly synchronize the signals for the user, as stated by Nocon ([0047] via “In some implementations, synchronization component 114 may be configured to compare a value of the visual circuit latency parameter with a value of the haptic circuit latency parameter. The synchronization component 114 may be configured to, based on the comparison, instruct one or both of visual control component 108 and/or haptic control component 110 effectuate transmission of the visual circuit control signals to presentation device 120 and/or the haptic circuit control signals to one or more haptic feedback devices 124, respectively. The instruction by synchronization component 114 may be configured so that the points in time when the visual circuit control signals are executed by presentation device 120 and the points in time when the haptic circuit control signals are executed by one or more haptic feedback devices 124 may occur within a threshold time span.”).

Regarding Claim 5, modified reference Rossano teaches the remote control system according to claim 4, but is silent on wherein the image processor receives the detection results from the slave, adds, as the association information, the detected result acquired at the timing corresponding to the captured image to the captured image to generate the combined image signal, and transmits the encoded image signal to the controller, and the controller obtains the time lag based on a comparison between the detected result added to the captured image and the detection result received from the slave.
However, Nocon teaches wherein the image processor receives the detection results from the slave, adds, as the association information, the detected result acquired at the timing corresponding to the captured image to the captured image to generate the combined image signal, and transmits the encoded image signal to the controller,  ([0049] via “For a subsequent transmission of the visual circuit control signals and the haptic circuit control signals: synchronization component 114 may be configured to compare the value of the visual circuit latency parameter with the value of the haptic circuit latency parameter. The comparisons may include determining whether the first time delay is greater than, less than, or equal to the second time delay.”), ([0050] via “In some implementations, based on the comparison indicating that the first time delay in the current transmission is greater than the second time delay (e.g., the visual effects take longer), synchronization component 114 may be configured to instruct haptic control component 110 to delay the subsequent transmission of the haptic circuit control signals to one or more haptic feedback devices 124 by a first amount of time. … Thus, the time delay associated with the control signals being transmitted to and executed by presentation device 120 and the time delay associated with control signals being transmitted to and executed by one or more haptic feedback devices 124 may be the same, or similar, time delay.”), (Note: The Examiner interprets the synchronized visual and haptic information to be the added association information to the image, as this concept of adding/combining is similarly described on pages 19 and 40 of the specification of the instant application.), and 
the controller obtains the time lag based on a comparison between the detected result added to the captured image and the detection result received from the slave ([0041] via “FIG. 2 showing visual graphic of latency in system 100. The system 100 in FIG. 3 may include one or more of one or more computing platforms 102, presentation platform 120, an input device 302, a haptic feedback device 301, and a user. Visual circuit latency may be indicated by time “T1” related to a time delay between initiating transmission of signals from one or more computing platforms 102 and execution of those signals at presentation 120. Haptic circuit latency may be indicated by time “T2” related to a time delay between initiating transmission of signals from one or more computing platforms 102 and execution of those signals at haptic feedback device 304.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Nocon wherein the image processor receives the detection results from the slave, adds, as the association information, the detected result acquired at the timing corresponding to the captured image to the captured image to generate the combined image signal, and transmits the encoded image signal to the controller, and the controller obtains the time lag based on a comparison between the detected result added to the captured image and the detection result received from the slave. Doing so allows the system to properly synchronize the visual and haptic information with each other when presenting them to the user, as stated above by Nocon in paragraph [0050].


10. Claim(s) 12-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossano et al. (US 20180154518 A1 hereinafter Rossano) in view of Miller et al. (US 20130300846 A1 hereinafter Miller) and Nishigori et al. (US 20210223867 A1 hereinafter Nishigori), and further in view of Numata et al. (US 20180357526 A1 hereinafter Numata).

Regarding Claim 12, modified reference Rossano teaches the remote control system according to claim 1, but is silent on wherein the corresponding detection result is stored in a memory which the controller searches in order to match the signal waveforms.
However, Numata teaches wherein the corresponding detection result is stored in a memory which the controller searches in order to match the signal waveforms ([0080] via “The storage unit 302 stores the programs for realizing the imitation information calculating module 101, the language information calculating module 102, the output information generating module 103, and the learning module 104. Further, the storage unit 302 stores …
delay time defining information 313, ….”), ([0081] via “The delay time defining information
313 is the information for defining the delay times corresponding to the respective imitation actions. One example of the data structure of the delay time defining information 313 is described using FIG. 6.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Numata wherein the corresponding detection result is stored in a memory which the controller searches in order to match the signal waveforms. Doing so stores the appropriate delay time for each action for future retrieval, as stated by Numata ([0112] via “The delay time 603 is a field for storing a delay time corresponding to each imitation action. The initial delay time is to be set previously by an expert and the like. In the embodiment, a delay time is set with a point of generating the output information 316 defined as a starting point.”).

Regarding Claim 13, modified reference Rossano teaches the remote control system according to claim 9, but is silent on wherein the corresponding detection result is stored in a memory which the controller searches in order to match the signal waveforms.
However, Numata teaches wherein the corresponding detection result is stored in a memory which the controller searches in order to match the signal waveforms ([0080] via “The storage unit 302 stores the programs for realizing the imitation information calculating module 101, the language information calculating module 102, the output information generating module 103, and the learning module 104. Further, the storage unit 302 stores …
delay time defining information 313, ….”), ([0081] via “The delay time defining information
313 is the information for defining the delay times corresponding to the respective imitation actions. One example of the data structure of the delay time defining information 313 is described using FIG. 6.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Numata wherein the corresponding detection result is stored in a memory which the controller searches in order to match the signal waveforms. Doing so stores the appropriate delay time for each action for future retrieval, as stated by Numata ([0112] via “The delay time 603 is a field for storing a delay time corresponding to each imitation action. The initial delay time is to be set previously by an expert and the like. In the embodiment, a delay time is set with a point of generating the output information 316 defined as a starting point.”).

Regarding Claim 14, modified reference Rossano teaches the control method according to claim 10, but is silent on wherein the corresponding detection result is stored in a memory which is searched in order to match the signal waveforms.
However, Numata teaches wherein the corresponding detection result is stored in a memory which is searched in order to match the signal waveforms ([0080] via “The storage unit 302 stores the programs for realizing the imitation information calculating module 101, the language information calculating module 102, the output information generating module 103, and the learning module 104. Further, the storage unit 302 stores … delay time defining information 313, ….”), ([0081] via “The delay time defining information 313 is the information for defining the delay times corresponding to the respective imitation actions. One example of the data structure of the delay time defining information 313 is described using FIG. 6.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Numata wherein the corresponding detection result is stored in a memory which is searched in order to match the signal waveforms. Doing so stores the appropriate delay time for each action for future retrieval, as stated by Numata ([0112] via “The delay time 603 is a field for storing a delay time corresponding to each imitation action. The initial delay time is to be set previously by an expert and the like. In the embodiment, a delay time is set with a point of generating the output information 316 defined as a starting point.”).

Regarding Claim 15, modified reference Rossano teaches the non-transitory recording medium according to claim 11, but is silent on wherein the corresponding detection result is stored in a memory which is searched in order to match the signal waveforms.
However, Numata teaches wherein the corresponding detection result is stored in a memory which is searched in order to match the signal waveforms ([0080] via “The storage unit 302 stores the programs for realizing the imitation information calculating module 101, the language information calculating module 102, the output information generating module 103, and the learning module 104. Further, the storage unit 302 stores … delay time defining information 313, ….”), ([0081] via “The delay time defining information 313 is the information for defining the delay times corresponding to the respective imitation actions. One example of the data structure of the delay time defining information 313 is described using FIG. 6.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Numata wherein the corresponding detection result is stored in a memory which is searched in order to match the signal waveforms. Doing so stores the appropriate delay time for each action for future retrieval, as stated by Numata ([0112] via “The delay time 603 is a field for storing a delay time corresponding to each imitation action. The initial delay time is to be set previously by an expert and the like. In the embodiment, a delay time is set with a point of generating the output information 316 defined as a starting point.”).


Examiner’s Note
11. The Examiner has cited particular paragraphs or columns and line numbers in the
references applied to the claims above for the convenience of the Applicant. Although the
specified citations are representative of the teachings of the art and are applied to specific
limitations within the individual claim, other passages and figures may apply as well. It is
respectfully requested of the Applicant in preparing responses, to fully consider the references
in their entirety as potentially teaching all or part of the claimed invention, as well as the
context of the passage as taught by the prior art or disclosed by the Examiner. See MPEP
2141.02 [R-07.2015] VI. A prior art reference must be considered in its entirety, i.e., as a whole,
including portions that would lead away from the claimed Invention. W.L. Gore & Associates,
Inc. v. Garlock, Inc., 721 F.2d 1540, 220 USPQ 303 (Fed. Cir. 1983), cert, denied, 469 U.S. 851
(1984). See also MPEP §2123.


Conclusion
12. Any inquiry concerning this communication or earlier communications from the
examiner should be directed to BYRON X KASPER whose telephone number is (571)272-3895.
The examiner can normally be reached Monday - Friday 8 am - 5 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing
using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is
encouraged to use the USPTO Automated Interview Request (AIR) at
http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s
supervisor, Adam Mott can be reached on (571) 270-5376. The fax phone number for the
organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be
obtained from Patent Center. Unpublished application information in Patent Center is available
to registered users. To file and manage patent submissions in Patent Center, visit:
https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for
more information about Patent Center and https://www.uspto.gov/patents/docx for
information about filing in DOCX format. For additional questions, contact the Electronic
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO
Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BYRON XAVIER KASPER/Examiner, Art Unit 3657                                                                                                                                                                                                        
/ADAM R MOTT/Supervisory Patent Examiner, Art Unit 3657
Read full office action
Prosecution Timeline

Jun 13, 2023
Application Filed
Apr 09, 2025
Non-Final Rejection — §103
Jul 18, 2025
Response Filed
Aug 21, 2025
Final Rejection — §103
Nov 24, 2025
Applicant Interview (Telephonic)
Nov 24, 2025
Examiner Interview Summary
Nov 26, 2025
Request for Continued Examination
Dec 10, 2025
Response after Non-Final Action
Feb 03, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/126,888
Patent 12594964
METHOD OF AND SYSTEM FOR GENERATING REFERENCE PATH OF SELF DRIVING CAR (SDC)
2y 5m to grant Granted Apr 07, 2026
18/649,939
Patent 12594137
HARD STOP PROTECTION SYSTEM AND METHOD
2y 5m to grant Granted Apr 07, 2026
18/231,501
Patent 12583101
METHOD FOR OPERATING A MODULAR ROBOT, MODULAR ROBOT, COLLISION AVOIDANCE SYSTEM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 24, 2026
18/288,416
Patent 12576529
ROBOT SIMULATION DEVICE
2y 5m to grant Granted Mar 17, 2026
17/707,930
Patent 12564962
ROBOT REMOTE OPERATION CONTROL DEVICE, ROBOT REMOTE OPERATION CONTROL SYSTEM, ROBOT REMOTE OPERATION CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
88%
With Interview (+18.4%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 103 resolved cases by this examiner. Grant probability derived from career allow rate.