Last updated: May 29, 2026
Application No. 18/260,208
INFORMATION PROCESSING METHOD, INFORMATION PROCESSING DEVICE, AND NON-VOLATILE STORAGE MEDIUM

Non-Final OA §103
Filed
Jun 30, 2023
Priority
Jan 28, 2021 — CN 202110118481.6 +1 more
Examiner
HE, YINGCHUN
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Sony Semiconductor Solutions Corporation
OA Round
4 (Non-Final)
Interview Optional

— +14.4% interview lift. Interview lift (+14.4%) is below the 15.0% threshold. A written response is recommended.
Based on 650 resolved cases, 2023–2026
Examiner Intelligence

HE, YINGCHUN View full profile →
Grants 82% — above average
Career Allowance Rate
534 granted / 650 resolved
+20.2% vs TC avg
Moderate +14% lift
Without
With
+14.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
18 currently pending
Career history
673
Total Applications
across all art units
Statute-Specific Performance

§101
2.4%
-37.6% vs TC avg
§103
86.0%
+46.0% vs TC avg
§102
1.4%
-38.6% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 650 resolved cases
Office Action

§103
DETAILED ACTION
*Note in the following document:
1. Texts in italic bold format are limitations quoted either directly or conceptually from claims/descriptions disclosed in the instant application.
2. Texts in regular italic format are quoted directly from cited reference or Applicant’s arguments.
3. Texts with underlining are added by the Examiner for emphasis.
4. Texts with 
5. Acronym “PHOSITA” stands for “Person Having Ordinary Skill In The Art”.

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 6 February 2026 has been entered.

 Status of Claims
This is in response to applicant’s amendment/response file on 6 February 2026, which has been entered and made of record.  Claims 1-2 and 10 has/have been amended.  Claims 15-17 has/have been added.  Claims 3-5 have been added or cancelled.  Claims 1-2 and 6-17 are pending in the application.
	
	Response to Arguments
Applicant’s arguments, see p.7-11, filed on 6 February 2026, with respect to 35 U.S.C. §103 rejection to Claim 1/2/10 and their dependent claims have been fully considered but they are not persuasive.
Regarding independent Claim 1/2/10, Applicant’s main argument focuses on the relevant of the prior arts of Reisner-Kollmann, Held, Liu and Tian and the limitation of the integrated information-processing pipeline that relies on time-of-flight (ToF) depth data for two distinct but interconnected tasks: (1) occlusion processing based on per-object distances measured by the ToF sensor, and (2) gesture-trigger detection based on temporal depth maps, joint extraction, and motion analysis of a trigger object (p.8 last paragraph).  The Examiner respectfully disagrees with Applicant’s conclusion.
Regarding reference of Reisner-Kollmann, Applicant argues Reisner-Kollmann is relied upon as an alleged foundation for depth sensing. Even if the reference generally mentions that a time-of-flight camera system may be used, there is no corresponding disclosure of any operative system in which ToF based pixel-wise depth measurements actually drive object processing. Reisner-Kollmann's depth handling is Simultaneous Localization and Mapping (SLAM) based and is geared toward reconstructing planar surfaces and objects from point clouds. It does not teach ToF based per-pixel depth information feeding real-time AR rendering decisions, as set forth in Applicant's claim 1 (p.9 second paragraph). The Examiner respectfully disagrees.
Reisner-Kollmann discloses Systems, methods, and devices are described for constructing a digital representation of a physical scene by obtaining information about the physical scene.  Based on the information, … generating a three-dimensional (3D) reconstructed object using the properties associated with the physical object, and representing the planar surface as an augmented reality (AR) plane in an augmented reality environment, wherein the AR plane in the AR environment is capable of supporting 3D reconstructed objects on top of it (Abstract).  In order to generate the 3D reconstructed object, Reisner-Kollmann discloses detecting distance information of a real object based on depth data acquired by a time of flight (ToF) sensor (Fig.2, [0066]: FIG. 2 illustrates an example physical scene 200, in accordance with certain embodiments of the present disclosure. As illustrated, the scene may include part of a table 210 (e.g., a flat surface). In addition, some physical objects 215 (e.g., books, cups, toys, etc.) are positioned on the top of the table 210 and [0099]: While the AR plane may ensure that a virtual object does not collide with the virtual object, the modeling information provided to AR plane implementation application 660 may allow for virtual objects to be hidden from view when on the AR plane behind a modeled physical object) and ensuring that a virtual object does not collide with the virtual object, the modeling information provided to AR plane implementation application 660 may allow for virtual objects to be hidden from view when on the AR plane behind a modeled physical object ([0099].  [0060]: In addition to the use of SLAM, some other form of three-dimensional mapping process may be used, such as by capturing images that include depth information, for example using time-of-flight analysis or a stereoscopic camera system. As described herein, although the SLAM process may be used for illustration purposes, other techniques may be used instead of SLAM without deviating from the scope of the invention).  Therefore Reisner-Kollmann teaches using the distance information based on depth data acquired by a ToF sensor to generate a AR scene which allows for virtual objects to be hidden from view when on the AR plane behind a modeled physical object ([0099]).  Although Reisner-Kollmann shows using SLAM as an example embodiment to detect the distance, Reisner-Kollmann clearly discloses, in multiple places, in addition to SLAM process, a ToF sensor can be used to generate distance information ([0060]: In addition to the use of SLAM, some other form of three-dimensional mapping process may be used, such as by capturing images that include depth information, for example using time-of-flight analysis or a stereoscopic camera system. As described herein, although the SLAM process may be used for illustration purposes, other techniques may be used instead of SLAM without deviating from the scope of the invention.  [0088]: As an alternate to SLAM process engine 610, some other form of three dimensional mapping process may be used to create a point cloud. For example, images that include depth information, for example images captured using a time-of-flight analysis or a stereoscopic camera system, may be used to generate a point cloud for the physical environment.  [0090]: While SLAM process engine 610 may be used to determine the point cloud (such as the point cloud of FIG. 7), other arrangements may be employed. For instance, a camera capable of measuring depth, such as a stereo camera system or time-of-flight camera system may be used in place of (or to augment) SLAM process engine 610).  Therefore in contrary to Applicant’s allegation, Reisner-Kollmann teaches ToF based per-pixel depth information feeding real-time AR rendering decisions, as set forth in Applicant’s claim 1.
Regarding reference of Held, Applicant argues Held's system performs no per-object ToF based depth ordering, nor any form of ToF-based front/back superimposition. The AR content in Held is not controlled by ToF depth but by the distance relationships inferred through spatial mapping of stereo images. Consequently, Held cannot reasonably be relied upon for disclosure of occlusion processing "according to per-object distances measured by the ToF sensor," as claimed by Applicant (p.9 third paragraph).  The Examiner respectfully disagrees.
Held and Reisner-Kollmann are in the same augmented reality application field. Reisner-Kollmann discloses allowing for virtual objects to be hidden from view when on AR plane behind a modeled physical object ([0099]).  Held, before the effective filing date of the claimed invention, teaches occlusion processing on the real object and an AR object generated based on the distance information of the real object detected and displaying the result of the occlusion processing on a display (Fig.6/7 and [0055]-[0057]).  Since Reisner-Kollmann discloses in addition to SLAM, ToF can be used to generate distance information ([0060], [0088 and [0090]),  Held when combined with Reisner-Kollmann teaches or suggests occlusion processing according to per-object distances measured by ToF sensor.
Regarding reference of Liu, Applicant argues Nothing in Liu suggests using ToF-acquired depth frames, let alone reusing the same ToF data used for occlusion processing to generate multiple depth maps for a localized trigger object in an AR environment. The computational pipeline in Liu is fundamentally incompatible with the real-time needs of AR effect-triggering and in no way resembles the compact, trigger-gesture mechanism recited in Applicant's claim 1. Accordingly, Liu does not teach or suggest the required gesture-trigger elements, nor does it teach generating depth maps "from the depth data acquired by the ToF sensor," as claimed by Applicant (p.9 last paragraph to p.10 line 4).  The Examiner respectfully disagrees.
Reisner-Kollmann, Held and Liu all are regarding augmented reality.  Held discloses a PHOSITA before the effective filing date of the claimed invention had already known to using natural user input (NUI) and the NUI componentry includes depth camera for gesture recognition ([0089]). Liu, in the same field of endeavor, discloses the detail of gesture recognition using depth data via depth sensing devices (Fig.1a.  [0145]: Examples of NUI techniques may include those relying on …, gesture recognition both on a screen and adjacent to the screen, air gestures, head and eye tracking, …, touch, gestures, …. Example NUI technologies may include, but are not limited to, …, motion gesture detection using depth cameras (e.g., stereoscopic camera systems, infrared camera systems, RGB (red, green, blue) camera systems and combinations of these) … .  A skilled person would have known an infrared camera system is a ToF camera system).  Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Liu regarding gesture recognition into that of Reisner-Kollmann modified by Held in order to support gesture input in the AR application.
Lastly regarding reference Tian, Applicant argues Still further, the Action relies upon Tian for alleged occlusion features. Tian does use ToF data, but for a very different purpose. Tian performs depth-image densification, generating multi-layered depth structures and alpha maps to support pixel-level blending of real and virtual objects. Tian addresses error correction, morphological dilation of depth layers, and voxel-based collision detection. However, none of these corresponds to the per-object distance superimposition set forth in Applicant's claim 1. While Tian compares pixel depths to determine occlusion in an RGBD composition, it does not perform per-object depth ordering, and does not integrate gesture recognition at all. Also, Tian does not suggest that the same ToF depth data could simultaneously support multi-frame depth map generation for joint extraction and gesture detection (p.10 second paragraph).  The Examiner respectfully disagrees.
The combination of Reisner-Kollmann, Held and Liu discloses all limitation except wherein the occlusion processing comprises front/back superimposition of the AR object and the real object according to per-object distances measured by the ToF sensor.  
Tian, in the same AR application field, discloses The ToF depth maps are processed to remove outliers and overcome sensor bias and errors. Then a densification algorithm is applied to up-sample the low-resolution depth map to a resolution of an RGB image. … Accordingly, embodiments of the present disclosure describe a system that exploits a depth sensor (e.g., a ToF camera) on a computer system for multiple AR applications with computational efficiency and very good visual performance. The computer system is configured for depth map processing that removes outlier, densifies the depth map, and enables blending at the occlusion boundaries between real objects and virtual objects ([0037]).  Tian discloses The AR module 116 can generate an a red, green, blue, and depth (RGBD) image from the RGB image and the depth map to detect an occlusion of the virtual object 124 by at least a portion of the real-world object representation 122 or vice versa ([0038]) and the depth values in the RGBD image 710 are compared to depth values of a virtual object 720 and, if occlusion is detected based on the depth comparison, an alpha map 730 can be used for edge smoothing ([0080]).  Therefore Tian teaches or suggests wherein the occlusion processing comprises front/back superimposition of the AR object and the real object according to per-object distances measured by the ToF sensor (notice the Tian teaches the occlusion is detected based on the comparison between depth values of the RGBD image and the depth values of the virtual object and the depth values of RGBD image are derived from depth values measured by ToF sensor).  Tian’s invention is related to augmented reality applications.  Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Tian in order to use depth sensor to place a virtual object into an AR scene in real time as suggested by Tian ([0003]: When placing a virtual object into an AR scene, it is important that the placement is accurate and performed in real time).
Based on above rationale, the Examiner maintains her USC 103 rejection to independent Claim 1/2/10.
Application’s arguments regarding to remaining dependent claims are based on their dependency on Claim 1/2/10 (p.11 third paragraph).  Therefore all U.S.C. 103 rejections are maintained.

Claim Objections
Claim 2 is objected to because of the following informalities: Claim 2 is labeled as Currently Amended.  However there is  no sentence or word being amended.  Appropriate correction is required.  For Examination purpose, the Examiner interpreted same form of amendment like independent Claim 1/10 is intended (only the two “end” are to be deleted and adding “and” before the limitation of displaying a result of the occlusion processing on a display).

		Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2 and 6-14 are rejected under 35 U.S.C. 103 as being unpatentable over Reisner-Kollmann et al. (US 2015/0062120 A1) in view of Held et al. (US 2019/0238818 A1), Liu et al. (US 2014/0169623 A1) and Tian et al. (US 2022/0230383 A1).
Regarding Claim 1, Reisner-Kollmann teaches or suggests an information processing method to be executed by a computer ([0006]: Systems, methods, and devices are described for constructing a digital representation of a physical scene by obtaining information about the physical scene), the method comprising:
detecting distance ([0084]: Generally, a depth sensor provides a range image where each pixel represents the distance to the camera) information of a real object (Fig.2 and [0066]: FIG. 2 illustrates an example physical scene 200, in accordance with certain embodiments of the present disclosure. As illustrated, the scene may include part of a table 210 (e.g., a flat surface). In addition, some physical objects 215 (e.g., books, cups, toys, etc.) are positioned on the top of the table 210) based on depth data acquired by a time of flight (ToF) sensor ([0090]: a camera capable of measuring depth, such as a stereo camera system or time-of-flight camera system may be used in place of (or to augment) SLAM process engine 610).
Reisner-Kollmann discloses Systems, methods, and devices are described for constructing a digital representation of a physical scene by obtaining information about the physical scene.  Based on the information, … generating a three-dimensional (3D) reconstructed object using the properties associated with the physical object, and representing the planar surface as an augmented reality (AR) plane in an augmented reality environment, wherein the AR plane in the AR environment is capable of supporting 3D reconstructed objects on top of it (Abstract).  In order to generate the 3D reconstructed object, Reisner-Kollmann discloses detecting distance information of a real object based on depth data acquired by a time of flight (ToF) sensor (Fig.2, [0066]: FIG. 2 illustrates an example physical scene 200, in accordance with certain embodiments of the present disclosure. As illustrated, the scene may include part of a table 210 (e.g., a flat surface). In addition, some physical objects 215 (e.g., books, cups, toys, etc.) are positioned on the top of the table 210 and [0099]: While the AR plane may ensure that a virtual object does not collide with the virtual object, the modeling information provided to AR plane implementation application 660 may allow for virtual objects to be hidden from view when on the AR plane behind a modeled physical object) and ensuring that a virtual object does not collide with the virtual object, the modeling information provided to AR plane implementation application 660 may allow for virtual objects to be hidden from view when on the AR plane behind a modeled physical object ([0099].  [0060]: In addition to the use of SLAM, some other form of three-dimensional mapping process may be used, such as by capturing images that include depth information, for example using time-of-flight analysis or a stereoscopic camera system. As described herein, although the SLAM process may be used for illustration purposes, other techniques may be used instead of SLAM without deviating from the scope of the invention).  Therefore Reisner-Kollmann teaches using the distance information based on depth data acquired by a ToF sensor to generate a AR scene which allows for virtual objects to be hidden from view when on the AR plane behind a modeled physical object ([0099]).  Although Reisner-Kollmann shows using SLAM as an example embodiment to detect the distance, Reisner-Kollmann clearly discloses, in multiple places, in addition to SLAM process, a ToF sensor can be used to generate distance information ([0060]: In addition to the use of SLAM, some other form of three-dimensional mapping process may be used, such as by capturing images that include depth information, for example using time-of-flight analysis or a stereoscopic camera system. As described herein, although the SLAM process may be used for illustration purposes, other techniques may be used instead of SLAM without deviating from the scope of the invention.  [0088]: As an alternate to SLAM process engine 610, some other form of three dimensional mapping process may be used to create a point cloud. For example, images that include depth information, for example images captured using a time-of-flight analysis or a stereoscopic camera system, may be used to generate a point cloud for the physical environment.  [0090]: While SLAM process engine 610 may be used to determine the point cloud (such as the point cloud of FIG. 7), other arrangements may be employed. For instance, a camera capable of measuring depth, such as a stereo camera system or time-of-flight camera system may be used in place of (or to augment) SLAM process engine 610).  Reisner-Kollmann further teaches or suggests displaying a result of  the occlusion processing on a display ([0099]: While the AR plane may ensure that a virtual object does not collide with the virtual object, the modeling information provided to AR plane implementation application 660 may allow for virtual objects to be hidden from view when on the AR plane behind a modeled physical object).
But Reisner-Kollmann fails to explicitly recite performing occlusion processing on the real object and an augmented reality (AR) object generated by computer graphics (CG) based on the distance information of the real object detected.
However Held, in the same field of endeavor, discloses performing occlusion processing on the real object and an augmented reality (AR) object generated by computer graphics (CG) based on the distance information of the real object detected; and displaying a result of  the occlusion processing on a display (Fig.6/7 and [0055]-[0057]: FIG. 6 illustrates one example of an occluding object in the form of real-world couch 604 that may interfere with the display of stereo visual content at a location behind the couch … Accordingly, and in one potential advantage of the present disclosure, the system may determine that the couch 604 is located between a location of the HMD device 104 in the virtual coordinate system and a location of the stereo visual content 216 at the modified display distance 616. Such determination may be performed, for example, using image data captured by HMD device 104 and spatial mapping techniques to establish the locations of the HMD device, couch 604 and other real-world and virtual objects within the field of view of the HMD device.  With reference also to FIG. 7, based on determining that couch 604 is located between the HMD device 104 and the stereoscopic 3D movie 216 at the modified display distance 616, the modified display distance may be shortened to a shortened modified display distance 704 that is between the occluding object 604 and the location of the HMD device 608. Using the shortened modified display distance 704 and as explained above, the size of the stereoscopic 3D movie 216 also may be scaled to account for the shortened modified display distance 704.  Note Held teaches Head-mounted display (HMD) devices may display visual content to a user via a virtual reality experience or an augmented reality experience, see [0016].  Notice the virtual visual content 216 is occluded by the couch in Fig.6 and the couch is occluded by the visual content 216 shown in Fig.7 due to the distance difference).  

    PNG
    media_image1.png
    353
    580
    media_image1.png
    Greyscale

    PNG
    media_image2.png
    347
    577
    media_image2.png
    Greyscale

Held and Reisner-Kollmann are in the same augmented reality application field. Reisner-Kollmann discloses allowing for virtual objects to be hidden from view when on AR plane behind a modeled physical object ([0099]). Held, before the effective filing date of the claimed invention, teaches occlusion processing on the real object and an AR object generated based on the distance information of the real object detected and displaying the result of the occlusion processing on a display (see above Fig.6/7 and [0055]-[0057]).  Therefore it would have been obvious to a person skilled in the art before the effective filing date to incorporate the teaching of Held into that of Reisner-Kollmann and to add the limitation of performing occlusion processing on the real object and an augmented reality (AR) object generated by computer graphics (CG) based on the distance information of the real object detected; and displaying a result of  the occlusion processing on a display in order to present the hidden effect based on the distance information as suggested by Reisner-Kollmann.
Reisner-Kollmann modified by Held further discloses: 
detecting a trigger for starting processing of an effect using the AR object based on the depth data; detecting a gesture of a trigger object as the trigger (Held [0089]: When included, input subsystem 1120 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity; and/or any other suitable sensor).
Reisner-Kollmann modified by Held fails to disclose generating a depth map of the trigger object at a plurality of time points using the depth data; detecting joint information of the trigger object at a plurality of time points based on the depth map at the plurality of time points; detecting a motion of the trigger object based on joint information at the plurality of time points; determining whether the motion of the trigger object corresponds to the gesture to be the trigger.
However Liu discloses
generating a depth map of the trigger object at a plurality of time points using the depth data ([0017]: Depth maps may be obtained via many different types of sensors. For example, a KINECT depth sensing device may be used, as well as various types of scanners, laser devices, and stereo camera systems.  [0055]: the depth map acquisition component 105 may be configured to obtain the plurality of depth maps 106 from a depth sensing device 132.  [0145]: Examples of NUI techniques may include those relying on …, gesture recognition both on a screen and adjacent to the screen, …. Example NUI technologies may include, but are not limited to, …, motion gesture detection using depth cameras (e.g., stereoscopic camera systems, infrared camera systems, RGB (red, green, blue) camera systems and combinations of these), …); 
detecting joint information of the trigger object (Fig.1a: 120 Observed Entity.  [0054]: For example, determining the activity recognition 126 may include determining a recognition of an activity 128 that is engaged in by a moving entity 120, in association with an object 130, temporally over the respective plurality of time frames 107) at a plurality of time points based on the depth map at the plurality of time points ([0038]: A skeleton acquisition component 117 may be configured to obtain a plurality of skeleton representations 118 respectively corresponding to the respective time frames 107. Each skeleton representation 118 may include at least one joint 119 associated with an observed entity 120.  [0090]: the skeleton representations 118 may represent temporal skeletal outlines, including joint positions, that are associated with the observed entity 120, during each respective time frame 107); 
detecting a motion of the trigger object based on joint information at the plurality of time points ([0046]-[0047]: A local feature descriptor determination component 121 may be configured to determine local feature descriptors 122 corresponding to the respective time frames 107, based on the depth maps 106 and the joints 119 associated with the skeleton representations 118.  In accordance with example techniques discussed herein, the interaction between the observed entity 120 and other environmental objects may be represented as the local feature descriptors 122 at each joint 119.  [0002]: The activity recognition engine may include a depth map acquisition component configured to obtain depth maps corresponding to respective depth measurements determined over respective time frames); 
determining whether the motion of the trigger object corresponds to the gesture to be the trigger ([0145]: Examples of NUI techniques may include those relying on speech recognition, touch and stylus recognition, gesture recognition both on a screen and adjacent to the screen, air gestures, head and eye tracking, voice and speech, vision, touch, gestures, and machine intelligence. Example NUI technologies may include, but are not limited to, touch sensitive displays, voice and speech recognition, intention and goal understanding, motion gesture detection using depth cameras (e.g., stereoscopic camera systems, infrared camera systems, RGB (red, green, blue) camera systems and combinations of these), motion gesture detection using accelerometers/gyroscopes, facial recognition, 3D displays, head, eye, and gaze tracking, immersive augmented reality and virtual reality systems, all of which may provide a more natural interface, and technologies for sensing brain activity using electric field sensing electrodes (e.g., electroencephalography (EEG) and related techniques)).
Reisner-Kollmann, Held and Liu all are regarding augmented reality.  Held discloses a PHOSITA before the effective filing date of the claimed invention had already known to using natural user input (NUI) and the NUI componentry includes depth camera for gesture recognition ([0089]: In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; …). Liu, in the same field of endeavor, discloses the detail of gesture recognition using depth data via depth sensing devices (Fig.1a.  [0145]: Examples of NUI techniques may include those relying on …, gesture recognition both on a screen and adjacent to the screen, air gestures, head and eye tracking, …, touch, gestures, …. Example NUI technologies may include, but are not limited to, …, motion gesture detection using depth cameras (e.g., stereoscopic camera systems, infrared camera systems, RGB (red, green, blue) camera systems and combinations of these) … . A skilled person would have known an infrared camera system is a ToF camera system).  Liu’s reference can be used in gesture recognition in augmented reality (see [0145] above) using depth data via depth sensing device (Fig.1a: 132 Depth Sensing Device).
Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Liu into that of Reisner-Kollmann modified by Held and to include the limitation of generating a depth map of the trigger object at a plurality of time points using the depth data; detecting joint information of the trigger object at a plurality of time points based on the depth map at the plurality of time points; detecting a motion of the trigger object based on joint information at the plurality of time points; determining whether the motion of the trigger object corresponds to the gesture to be the trigger in order to characterize both entity motion and entity-object interactions and support natural user interface (NIU) view gesture as taught by Liu ([0023]: Further example features that may be suitable for depth data are discussed herein, which are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both entity motion and entity-object interactions (e.g., human motion and human-object interactions)).
The combination of Reisner-Kollmann, Held and Liu discloses all limitation except explicitly disclose wherein the occlusion processing comprises front/back superimposition of the AR object and the real object according to per-object distances measured by the ToF sensor and generating, from the depth data acquired by the ToF sensor, a depth map.
However Tian discloses The ToF depth maps are processed to remove outliers and overcome sensor bias and errors. Then a densification algorithm is applied to up-sample the low-resolution depth map to a resolution of an RGB image. … Accordingly, embodiments of the present disclosure describe a system that exploits a depth sensor (e.g., a ToF camera) on a computer system for multiple AR applications with computational efficiency and very good visual performance. The computer system is configured for depth map processing that removes outlier, densifies the depth map, and enables blending at the occlusion boundaries between real objects and virtual objects ([0037]).  Tian discloses The AR module 116 can generate an a red, green, blue, and depth (RGBD) image from the RGB image and the depth map to detect an occlusion of the virtual object 124 by at least a portion of the real-world object representation 122 or vice versa ([0038]) and the depth values in the RGBD image 710 are compared to depth values of a virtual object 720 and, if occlusion is detected based on the depth comparison, an alpha map 730 can be used for edge smoothing ([0080]).  
Tian, in the field of occlusion and collision detection in an AR sessions (Abstract), teaches or suggests wherein the occlusion processing comprises front/back superimposition of the AR object and the real object according to per-object distances measured by the ToF sensor ([0037]: Embodiments of the present disclosure involve a processing pipeline that uses the RGB and ToF cameras on a computer system (e.g., a smartphone, a tablet, an AR headset, or the like) to compute visual occlusion and collision detection. The ToF depth maps are processed to remove outliers and overcome sensor bias and errors. Then a densification algorithm is applied to up-sample the low-resolution depth map to a resolution of an RGB image. An alpha map is also generated for blending between virtual objects and real objects along the occluding boundaries. A light-weighted voxelization representation of the real-world scene is also generated from the depth maps to enable fast collision detection. Accordingly, embodiments of the present disclosure describe a system that exploits a depth sensor (e.g., a ToF camera) on a computer system for multiple AR applications with computational efficiency and very good visual performance. The computer system is configured for depth map processing that removes outlier, densifies the depth map, and enables blending at the occlusion boundaries between real objects and virtual object.  [0043]: the embodiments similarly apply to occlusion between virtual objects and/or to collisions between virtual objects.  [0082]: When a pixel to be rendered in an AR image corresponds to a first pixel of the RGBD image 710 and to a second pixel of the virtual object 720 (the first and second pixels are the same in the rendering buffer, the depth of the two pixels are compared to determine whether the second pixel should be occluded in the rendering or not. The depth of the first pixel is determined from the RGBD image 710. The depth of the second pixel can be retrieved from a buffer and can be defined by an AR application. The blending operation 750 then compares this depth to the depth of the second pixel. If equal to or smaller than the depth of the second pixel, the first pixel occludes the second pixel) and generating, from the depth data acquired by the ToF sensor, a depth map ([0038]: The depth sensor 112 generates depth data about the real-world environment, where this data includes, for instance, a depth map that shows depth(s) of the real-world object 130 (e.g., distance(s) between the depth sensor 112 and the real-world object 130).  [0040]: The depth sensor 112 can be a ToF camera).
Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Tian into that of Reisner-Kollmann as modified and to include the limitation of processing comprises front/back superimposition of the AR object and the real object according to per-object distances measured by the ToF sensor and generating, from the depth data acquired by the ToF sensor, a depth map of the trigger object at a plurality of time points in order to use depth sensor to place a virtual object into an AR scene in real time as suggested by Tian ([0003]: When placing a virtual object into an AR scene, it is important that the placement is accurate and performed in real time).

Regarding Claim 2, Claim 2 is similar to Claim 1 except in the format of device.  Therefore the rejection(s) to Claim 1 is/are also applied to Claim 2.

Regarding Claim 6, Reisner-Kollmann further teaches or suggests wherein the operations further comprise:
detecting an orientation of a camera ([0090]-[0091]: … a camera capable of measuring depth, such as a stereo camera system or time-of-flight camera system may be used in place of (or to augment) SLAM process engine 610. SLAM process engine 610 may allow for determining depth information without additional hardware being needed besides conventional hardware of the computing device 102, including a conventional two-dimensional camera and movement-sensing hardware, such as a gyroscope and/or accelerometer. In one instance, the gyroscope and/or the accelerometer may further aid in determining the change in distance/orientation of the camera with respect to the physical scene (or object in the scene)…. As such, AR objects placed on the ground plane are intended to appear as if sitting on top of the physical surface that is serving as the ground plane …); and
adjusting a position at which an effect using the AR object is applied to a video image obtained by the camera, based on the orientation of the camera ([0110]: At block 1470, the AR plane may be optionally provided to an application and used by the application for manipulating virtual objects in the AR environment. The AR plane may be output in the form of coordinates that define the cells of the AR plane. Therefore, the AR plane output may be based on the real-world, physical environment on which one or more virtual objects are to be positioned or moved in an AR environment without giving the appearance of collision with physical objects in the physical environment).

Regarding Claim 7, Reisner-Kollmann discloses wherein the operations further comprise:
generating point cloud data of a subject based on the depth data ([0094]: Returning to system 600 of FIG. 6, the point cloud output by SLAM process engine 610 may be analyzed by point cloud clustering engine 630. Point cloud clustering engine 630 may cluster reference points according to what object the reference points are determined to be part of Reference points determined to be part of the ground plane may be excluded from analysis by point cloud clustering engine 630. Reference points determined to belong to a same object, such as based on location, depth, identified boundaries and/or color properties may be clustered together), and performs the occlusion processing based on the point cloud data ([0095]: Object modeling engine 640 may serve to model physical objects that were identified based on clustered reference points by point cloud clustering engine 630. Object modeling engine 640 may use simplified geometric shapes to model the objects identified based on clustered reference points by point cloud clustering engine 630.  [0099]: In addition to receiving coordinates of the cell of the AR plane, AR plane implementation application 660 may receive modeling information for the objects. This modeling information may be the geometric models for objects determined by object modeling engine 640. This modeling information may include height information for the objects. While the AR plane may ensure that a virtual object does not collide with the virtual object, the modeling information provided to AR plane implementation application 660 may allow for virtual objects to be hidden from view when on the AR plane behind a modeled physical object).

Regarding Claim 8, Reisner-Kollmann modified by Held further teaches or suggests wherein the operations further comprise: in a case where the real object is closer to the TOF sensor than the AR object is, superimposing the real object in front of the AR object so that the AR object is hidden by the real object, as the occlusion processing (Held Fig.6 and [0055]: In some examples, a suitable display distance for visual content also may be impacted by real-world objects 172 and/or virtual content 168 within a real-world or virtual environment. FIG. 6 illustrates one example of an occluding object in the form of real-world couch 604 that may interfere with the display of stereo visual content at a location behind the couch. In this example, the couch 604 is located between the HMD device 104 and the location of stereo visual content 216 at a modified display distance 616). The same reason to combine as that of Claim 2 is applied.

    PNG
    media_image1.png
    353
    580
    media_image1.png
    Greyscale


Regarding Claim 9, Reisner-Kollmann modified by Held further teaches or suggests wherein the operations further comprise: in a case where the AR object is closer to the ToF sensor than the real object is, superimposing the AR object in front of the real object so that the real object is hidden by the AR object, as the occlusion processing (Held Fig.7 and [0057]: With reference also to FIG. 7, based on determining that couch 604 is located between the HMD device 104 and the stereoscopic 3D movie 216 at the modified display distance 616, the modified display distance may be shortened to a shortened modified display distance 704 that is between the occluding object 604 and the location of the HMD device 608. Using the shortened modified display distance 704 and as explained above, the size of the stereoscopic 3D movie 216 also may be scaled to account for the shortened modified display distance 704). The same reason to combine as that of Claim 2 is applied.


    PNG
    media_image2.png
    347
    577
    media_image2.png
    Greyscale


Regarding Claims 10-14, Claims 10-14 are similar to Claims 2 and 6-9 except in the format of storage medium.  Therefore the rejections to Claims 2 and 6-9 are also applied to Claims 10-14.

Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Reisner-Kollmann et al. (US 2015/0062120 A1) in view of Held et al. (US 2019/0238818 A1), Liu et al. (US 2014/0169623 A1) and Tian et al. (US 2022/0230383 A1) as applied to Claim 1/2/10 above, and further in view of Hildreth et al. (US 2017/0278304 A1).
Regarding Claim 15-17, Reisner-Kollmann as modified fails to explicitly disclose wherein the depth data acquired by the ToF sensor is used both for the occlusion processing and for generating the depth map of the trigger object used for gesture detection, such that a same per-pixel depth information from the ToF sensor is applied to both determining a front/back relationship and determining the gesture.
However Hildreth discloses including a time-of-flight camera to provide depth information to a control unit (Fig.1 and [0023]: HMD 100 also includes an optional depth camera 104. Depth camera 104 is configured to provide depth information 105 to the control unit 106. In some aspects, the depth camera 104 is a ranging camera, such as a time-of-flight (ToF) camera).  Hildreth discloses Depending on the VR simulation, the depth information provided by depth image 302 may be utilized such that the physical object (e.g., hand 210) may be occluded by close virtual objects in the virtual image 310. For example, VR engine 122 may be configured to compare a z-buffer of the rendered VR scene with the depth information provided by depth image 302 to determine whether one or more virtual objects should be presented in front of (i.e., occlude) the hand 210 in combined image 312 ([0040]).  Hildreth further discloses the depth information can be used to determine a gesture (Fig.4 and [0043]: Next, in process block 406, VR engine 122 determines a spatial relationship between a user of the HMD (e.g., user 202 of HMD 204) and one or more physical objects (e.g., desk/table 212, keyboard 214, and monitor 216) included in the physical environment 200 based on the depth information 105. As will be discussed in more detail below, determining the spatial relationship may be based, in part, on whether user 202 is touching a physical object, a distance between the user 202 and the physical object, a hand gesture of the user 202, and/or one or more past models of the physical environment 200).  Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Hildreth into that of Reisner-Kollmann as modified and to include the limitation of the depth data acquired by the ToF sensor is used both for the occlusion processing and for generating the depth map of the trigger object used for gesture detection, such that a same per-pixel depth information from the ToF sensor is applied to both determining a front/back relationship and determining the gesture in order to display both the virtual image and the one or more physical objects that were captured by the visual camera 102 as suggested by Hildreth ([0044]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YINGCHUN HE whose telephone number is (571)270-7218. The examiner can normally be reached M-F 8:00-5:00 MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao M Wu can be reached at 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YINGCHUN HE/Primary Examiner, Art Unit 2613
Read full office action
Prosecution Timeline

Show 2 earlier events
Aug 20, 2025
Response Filed
Sep 23, 2025
Non-Final Rejection mailed — §103
Dec 11, 2025
Response Filed
Jan 02, 2026
Final Rejection mailed — §103
Feb 06, 2026
Response after Non-Final Action
Apr 02, 2026
Request for Continued Examination
Apr 07, 2026
Response after Non-Final Action
May 01, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/783,204
Patent 12639864
IMAGE PROCESSING APPARATUS, METHOD FOR IMAGE PROCESSING, AND STORAGE MEDIUM
1y 10m to grant Granted May 26, 2026
18/530,009
Patent 12633239
PERSONAL IMMERSIVE DISPLAY DEVICE
2y 5m to grant Granted May 19, 2026
18/644,990
Patent 12631880
METHODS AND DEVICES RELATED TO EXTENDED REALITY
2y 0m to grant Granted May 19, 2026
18/738,681
Patent 12633068
Environment Model With Surfaces And Per-Surface Volumes
1y 11m to grant Granted May 19, 2026
18/807,325
Patent 12629562
WEIGHT TRAINING SYSTEM INTUITIVELY DISPLAYING QUANTITY OF MOTION
1y 9m to grant Granted May 19, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
82%
Grant Probability
97%
With Interview (+14.4%)
2y 4m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 650 resolved cases by this examiner. Grant probability derived from career allowance rate.