Last updated: April 19, 2026
Application No. 18/657,297
METHOD AND DEVICE FOR INTERPRETING USER GESTURES IN MULTI-REALITY SCENARIOS

Final Rejection §103§112
Filed
May 07, 2024
Examiner
SHEN, PEIJIE
Art Unit
2622
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
2 (Final)
Interview Optional

— +18.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 337 resolved cases, 2023–2026
Examiner Intelligence

SHEN, PEIJIE View full profile →
Grants 79% — above average
Career Allow Rate
266 granted / 337 resolved
+16.9% vs TC avg
Strong +18% interview lift
Without
With
+18.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
16 currently pending
Career history
353
Total Applications
across all art units
Statute-Specific Performance

§101
1.8%
-38.2% vs TC avg
§103
49.5%
+9.5% vs TC avg
§102
22.1%
-17.9% vs TC avg
§112
22.7%
-17.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 337 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to amended claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Alternatively, claim 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 1, 3, 13, 15 and 20, applicant has amended independent claims 1, 13 and 20 to recite, inter alia:
determining context of each of the one or more camera view zones by analyzing content displayed on a left display and a right display of the VST device;
classifying the one or more camera view zones for the determined contexts;
identifying a camera view zone, from among the classified one or more camera view zones in which a user gesture is recognized; and
performing an interaction corresponding to a context of the identified camera view zone based on the user gesture.
Applicant further amended claims 3 and 15 to recite, inter alia:
wherein, the determining the context comprises: analyzing, in each of the one or more camera view zones, thedisplayed on the left display and the right display and a user see through view, wherein the user see through view corresponds to a real-world view through the left display and the right display of the VST device; and determining the context present in the

Applicant cites fig. 11, paragraphs 64, 68, 69, 117 and 118 for providing support for the amendment:
[0064] The gesture analysis module 111-1 may determine a camera view zone among the one or more camera view zones in which the user gesture is detected. The gesture analysis module 111-1 may determine a coordinate of the camera view zone and a location of the camera view zone with respect to the center position of the glass frame. The gesture analysis module 111-1 may detect a position of the detected user gesture relative to a user's line of sight. In an embodiment, the gesture analysis module 111-1 may analyze the relevancy of the detected user gesture for the VR content, the AR content, MR content, XR content, and the real-world view. In an embodiment, the gesture analysis module 111-1 may generate and output a gesture relevancy table. The gesture relevancy table includes the relevancy factor of the gesture for the VR content, the AR content, MR content, XR content, and the real-world view.

[0068] The target identification module 111-3 may receive information of the one or more objects present in each of the left frame and the right frame. In an embodiment, the target identification module 111-3 may detect the content displayed on the glass frame. In an embodiment, the target identification module 111-3 may detect the relative position of the AR objects and the VR objects in the glass frame. In an embodiment, the target identification module 111-3 may perform a contextual analysis of the content displayed on the left frame and the right frame.

[0069] In an embodiment, the target identification module 111-3 may also receive the output of the one or more cameras to detect the real-world view of the user. The target identification module 111-3 may determine a camera view zone corresponding to the real-world object. The target identification module 111-3 may perform the contextual analysis of the real-world object. In an embodiment, the target identification module 111-3 may determine the one or more contexts present in the displayed content and in the real-world object based on the contextual analysis.

[0117] At step 1125, the input acceptance boundary cue engine 113-3 may determine whether the user gesture is accepted as an input for at least one identified target among the one or more identified targets. The input acceptance boundary cue engine 113-3 may accept the user gesture as the input for the at least one identified target if the user gesture is within the input acceptance boundary allocated for the corresponding target. The flow of the method 1100 now proceeds to step 1127. If it is determined that the user gesture is not accepted for any of the identified targets, then the flow of the method 1100 proceeds to step 1129.

[0118] At step 1127, the interaction engine 115 may receive the information that the user gesture is accepted for the one of the identified one or more targets. The interaction engine 115 may apply a gesture command on the at least one identified target for which the user gesture is accepted. The interaction engine 115 may modify the content to be displayed on the left frame and the right frame based on the gesture command.

	Reviewing corresponding specification, however, the corresponding specification discloses: determining relevancy of gesture with camera view zone, performing contextual analysis of content displayed, determining camera view zone corresponding to real-world object, determining context in displayed content and real-world object, determine input acceptance boundary, and modify displayed content based on acceptance of gesture. 
	It is noted, that none of above identified specification directly address “determining context of each of the one or more camera view zones by analyzing content displayed on a left display and a right display of the VST device, classifying the one or more camera view zones for the determined contexts; and
identifying a camera view zone, from among the classified one or more camera view zones in which a user gesture is recognized”, in paragraph 81 of specification, it is stated “the zone classification module 113-1 may classify, for the one or more contexts, the one or more camera view zones into the plurality of groups of camera view zones based on the correlation of the one or more identified targets and the one or more contexts”, that is, the camera view zone are classified based on correlation with possible gesture target. It is further noted that content displayed on left/right display of VST may be virtual object and not real-world object, with specific emphasis in fig. 8, 9, and 14 wherein left display content is VR content with no real-world object and while right display content is AR content with real-world object, while amended claims recite user see through view correspond to real-world view on left display and right display. It is further noted that fig. 11 specify a sequence of steps, wherein step 1105 “identify camera view zones” precedes step 1115 “identify possible targets for given gesture in scene”, that is, gestures are identified from camera view zones, instead of vice vera wherein camera view zone are “identified” from gesture, as claimed in “identifying a camera view zone, from among the classified one or more camera view zones in which a user gesture is recognized”. Hence, the specification lacks clear written description, or is at least indefinite with regard to what constitute the scope of “determining context of each of the one or more camera view zones by analyzing content displayed on a left display and a right display of the VST device; classifying the one or more camera view zones for the determined contexts; identifying a camera view zone, from among the classified one or more camera view zones in which a user gesture is recognized”. 
	Dependent claims 2, 4- 12, 14, and 16-19 are rejected for dependency on rejected independent claims. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 13, 15, 1, 3, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Desai et al, US 20230342677 A1 (hereinafter “Desai”), in view of Powderly et al., US 20220075458 A1 (hereinafter “Powderly”).
Regarding claims 13 and 15, Desai discloses (from claim 13) a Visual See Through (VST) device (fig. 3A, paragraph 77, augmented reality system including an eyewear device) comprising: 
one or more cameras (paragraphs 51, “a head-mounted device comprising a display to display content to a user and one or more cameras to capture images of a visual field of the user wearing the head-mounted device … obtaining input data from the one or more cameras, the input data including video captured by the one or more cameras; detecting, from the input data” paragraph 65, “Client system 200 includes an extended reality system 205 (e.g., a HMD), a processing system 210, and one or more sensors 215. As shown, extended reality system 205 is typically worn by user 220 and comprises an electronic display (e.g., a transparent, translucent, or solid display), optional controllers, and optical assembly for presenting extended reality content 225 to the user 220. The one or more sensors 215 may include motion sensors (e.g., accelerometers) for tracking motion of the extended reality system 205 and may include one or more image capture devices (e.g., cameras, line scanners) for capturing image data of the surrounding physical environment.”); 
memory configured to store instructions; and one or more processors (paragraph 51, “an extended reality system is provided that includes: a head-mounted device comprising a display to display content to a user and one or more cameras to capture images of a visual field of the user wearing the head-mounted device; one or more processors; and one or more memories accessible to the one or more processors, the one or more memories storing a plurality of instructions executable by the one or more processors), 
wherein the instructions, when executed by the one or more processors, cause the VST device to: 
identify one or more camera view zones based on fields of view (FOVs) of one or more cameras (paragraph 51, “The processing comprises: obtaining input data from the one or more cameras, the input data including video captured by the one or more cameras”, claim 1, “head-mounted device comprising a display to display content to a user and one or more cameras to capture images of a visual field of the user wearing the head-mounted device”); 
determine context of each of the one or more camera view zones by analyzing content displayed on a left display and a right display of the VST device; 
(paragraph 77: “As shown in FIG. 3A, augmented reality system 300 may include an eyewear device 305 with a frame 310 configured to hold a left display device 315(A) and a right display device 315(B) in front of a user's eyes. Display devices 315(A) and 315(B) may act together or independently to present an image or series of images to a user.” 
paragraph 109, “The virtual assistant 500 incorporates elements of interactive responses (e.g., voice or text) and context awareness to assist, e.g., deliver information and services, users via one or more interactions.”, 
paragraph 69, “During operation, the extended reality application constructs extended reality content 225 for display to user 220 by tracking and computing interaction information (e.g., tasks for completion) for a frame of reference, typically a viewing perspective of extended reality system 205. Using extended reality system 205 as a frame of reference and based on a current field of view as determined by a current estimated interaction of extended reality system 205, the extended reality application renders extended reality content 225 which, in some examples, may be overlaid, at least in part, upon the real-world, physical environment of the user 220 … Based on the sensed data, the extended reality application determines interaction information to be presented for the frame of reference of extended reality system 205 and, in accordance with the current context of the user 220, renders the extended reality content 225.”
paragraph 125, “The determined virtual content 595 may be generated and rendered by the virtual content module 592, as described in detail with respect to FIGS. 2A, 2B, 3A, 3B, 4A, 4B, and 4C. For example, the virtual content module 592 may trigger generation and rendering of virtual content 592 by the client system (including virtual assistant application 505 and I/O interfaces 520) based on a current field of view of user”) and
identify a camera view zone in which a user gesture is recognized, and
perform interaction corresponding to a context of the identified camera view zone based on the user gesture. 
(Paragraph 73, “The particular virtual user interface elements 255 for virtual user interface 250 may be context-driven based on the current extended reality applications engaged by the user 220 or real-world actions/tasks being performed by the user 220. When a user performs a user interface gesture in the extended reality environment at a location that corresponds to one of the virtual user interface elements 255 of virtual user interface 250, the client system 200 detects the gesture relative to the virtual user interface elements 255 and performs an action associated with the gesture and the virtual user interface elements 255. For example, the user 220 may press their finger at a button element 255 location on the virtual user interface 250. The button element 255 and/or virtual user interface 250 location may or may not be overlaid on the user 220, the user's hand 230, physical objects 235, or other virtual content, e.g., correspond to a position in the physical environment such as on a light switch or controller at which the client system 200 renders the virtual user interface button. In this example, the client system 200 detects this virtual button press gesture and performs an action corresponding to the detected press of a virtual user interface button (e.g., turns the light on). The client system 200 may also, for instance, animate a press of the virtual user interface button along with the button press gesture”), and 
(from claim 15) wherein, to determine the context, the instructions, when executed by the one or more processors, further cause the VST device to: 
analyze, in each of the one or more camera view zones, the content displayed on the left display and the right display and a user see through view, wherein the user see through view corresponds to a real-world view through the left display and the right display of the VST device; and 
(paragraph 77: “As shown in FIG. 3A, augmented reality system 300 may include an eyewear device 305 with a frame 310 configured to hold a left display device 315(A) and a right display device 315(B) in front of a user's eyes. Display devices 315(A) and 315(B) may act together or independently to present an image or series of images to a user. Paragraph 66: “the extended reality content 225 viewed through the extended reality system 205 comprises a mixture of real-world imagery (e.g., the user's hand 230 and physical objects 235) and virtual imagery (e.g., virtual content such as information or objects 240, 245 and virtual user interface 250) to produce mixed reality and/or augmented reality. In some examples, virtual information or objects 240, 245 may be mapped (e.g., pinned, locked, placed) to a particular position within extended reality content 225.” Paragraph 68, “The client system 200 may render one or more virtual content items in response to a determination that at least a portion of the location of virtual content items is in a field of view of the user 220. For example, client system 200 may render virtual user interface 250 only if a given physical object (e.g., a lamp) is within the field of view of the user 220”, paragraph 125, “the virtual content module 592 may trigger generation and rendering of virtual content 592 by the client system (including virtual assistant application 505 and I/O interfaces 520) based on a current field of view of user, as may be determined by real-time gaze tracking of the user, or other conditions. More specifically, image capture devices of the sensors capture image data representative of objects in the real world, physical environment that are within a field of view of image capture devices. During operation, the client system performs object recognition within image data captured by the image capture devices of HMD to identify objects in the physical environment such as the user, the user's hand, and/or physical objects. Further, the client system tracks the position, orientation, and configuration of the objects in the physical environment over a sliding window of time. Field of view typically corresponds with the viewing perspective of the HMD”) 
determine the contexts present in the content and the user see through view based on the analysis of the content and the user see through view in each of the one or more camera view zones (paragraph 58, 59, “the virtual assistant application 130 passively listens to and watches interactions of the user in the real-world, and processes what it hears and sees (e.g., explicit input such as audio commands or interface commands, contextual awareness derived from audio or physical actions of the user, objects in the real-world, environmental triggers such as weather or time, and the like) in order to interact with the user in an intuitive manner… context concerning activity of a user in the physical world may be analyzed and determined to initiate an interaction for completing an immediate task or goal”, paragraph 73, “The particular virtual user interface elements 255 for virtual user interface 250 may be context-driven based on the current extended reality applications engaged by the user 220 or real-world actions/tasks being performed by the user 220”).
Desai does not disclose in particular classify the one or more camera view zones for the determined contexts; and identify a camera view zone, from among the classified one or more camera view zones, in which a user gesture is recognized.
In similar field of endeavor, Powderly discloses a VST device (fig. 2, paragraphs 41-43, wearable system 200) recognizing user gesture input within field of view of VST device (paragraph 63, “The wearable system 400 can include an outward-facing imaging system 464 (e.g., a digital camera) that images a portion of the world 470. This portion of the world 470 may be referred to as the field of view (FOV) and the imaging system 464 is sometimes referred to as an FOV camera. The entire region available for viewing or imaging by a viewer may be referred to as the field of regard (FOR). The FOR may include 4π steradians of solid angle surrounding the wearable system 400 because the wearer can move his body, head, or eyes to perceive substantially any direction in space. In other contexts, the wearer's movements may be more constricted, and accordingly the wearer's FOR may subtend a smaller solid angle. Images obtained from the outward-facing imaging system 464 can be used to track gestures made by the user (e.g., hand or finger gestures), detect objects in the world 470 in front of the user, and so forth.”, paragraph 156, “A user can initiate an interaction event on an interactable object in his FOV after the user selects the interactable object. In some implementations, the virtual object may correspond to a physical object. As a result, when the user performs an interaction event on the virtual object, the virtual object may communicate to the physical object thereby allowing the user to interact with the physical object via the virtual user interface”) which classify the one or more camera view zones for the determined contexts; and identify a camera view zone, from among the classified one or more camera view zones, in which a user gesture is recognized (see paragraphs 142-144, 151-55, “examples of user interaction based on contextual information” paragraphs 184-192, “interacting with object based on contextual information”, in particular, for contextual analysis, the wearable system analyze layout of the virtual object display in user’s FOV, herein the different region within correspond to different claimed camera view zones, in case density of virtual object exceed a certain threshold or the layout meets a certain pattern, the wearable system may provide an option for the user to switch gesture input mode and further recognize the gesture. Herein, the context analysis of different regions of field of view constitute the claimed classifying one or more camera view zones by differentiating view zones with particular layout pattern/threshold from view zones with no such pattern). 
It would have been obvious to one of ordinary skill in the art at the time of filing to incorporate the concept of recognize user gesture based on different camera view zones, such as discloses by Powderly, into the context-aware user interaction system of Desai, to constitute VST device with one or more cameras classified by having different field of views and different virtual object pattern, wherein the VST device recognize user gesture as input, the result would have been predictable, and would constitute classify the one or more camera view zones for the determined contexts; and identify a camera view zone, from among the classified one or more camera view zones, in which a user gesture is recognized.

Regarding claim 1, this is a method claim counterpart of device claim 13, both reciting substantially similar subject matter. Accordingly, claim 1 is rejected for the same reasons as claim 13.

Regarding claim 3, this is a method claim counterpart of device claim 15, both reciting substantially similar subject matter. Accordingly, claim 3 is rejected for the same reasons as claim 15.

Regarding claim 20, this is a Beauregard claim (i.e., "non-transitory machine-readable medium") counterpart of device claim 13, both reciting substantially similar subject matter. Accordingly, claim 20 is rejected for the same reasons as claim 13.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PEIJIE SHEN whose telephone number is (571)272-5522. The examiner can normally be reached Monday - Friday 10AM - 6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patrick Edouard can be reached at 5712727603. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PEIJIE SHEN/Examiner, Art Unit 2622            


/PATRICK N EDOUARD/Supervisory Patent Examiner, Art Unit 2622
Read full office action
Prosecution Timeline

May 07, 2024
Application Filed
Aug 29, 2025
Non-Final Rejection — §103, §112
Dec 05, 2025
Response Filed
Mar 21, 2026
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/621,099
Patent 12567390
METHOD OF GENERATING COMPENSATION DATA AND DISPLAY DEVICE
2y 5m to grant Granted Mar 03, 2026
18/828,521
Patent 12563897
DISPLAY PANEL
2y 5m to grant Granted Feb 24, 2026
18/196,526
Patent 12530082
HAPTIC TOUCH MODULE AND A METHOD FOR MANUFACTURING THE HAPTIC TOUCH MODULE
2y 5m to grant Granted Jan 20, 2026
18/616,442
Patent 12525163
PIXEL CIRCUIT AND DISPLAY APPARATUS INCLUDING THE SAME
2y 5m to grant Granted Jan 13, 2026
18/604,650
Patent 12512077
DISPLAY DEVICE AND METHOD OF CONTROLLING DISPLAY DEVICE
2y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
79%
Grant Probability
97%
With Interview (+18.1%)
2y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 337 resolved cases by this examiner. Grant probability derived from career allow rate.