Last updated: April 19, 2026
Application No. 18/529,827
METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR FACILITATING ACTIONS BASED ON TEXT CAPTURED BY HEAD MOUNTED DEVICES

Non-Final OA §103
Filed
Dec 05, 2023
Examiner
HE, YINGCHUN
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Meta Platforms Inc.
OA Round
3 (Non-Final)
Interview Optional

— +14.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 644 resolved cases, 2023–2026
Examiner Intelligence

HE, YINGCHUN View full profile →
Grants 82% — above average
Career Allow Rate
529 granted / 644 resolved
+20.1% vs TC avg
Moderate +14% lift
Without
With
+14.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
27 currently pending
Career history
671
Total Applications
across all art units
Statute-Specific Performance

§101
8.4%
-31.6% vs TC avg
§103
54.0%
+14.0% vs TC avg
§102
5.4%
-34.6% vs TC avg
§112
17.9%
-22.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 644 resolved cases
Office Action

§103
DETAILED ACTION
*Note in the following document:
1. Texts in italic bold format are limitations quoted either directly or conceptually from claims/descriptions disclosed in the instant application.
2. Texts in regular italic format are quoted directly from cited reference or Applicant’s arguments.
3. Texts with underlining are added by the Examiner for emphasis.
4. Texts with 
5. Acronym “PHOSITA” stands for “Person Having Ordinary Skill In The Art”.

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 28 January 2026 has been entered.

Status of Claims
This is in response to applicant’s amendment/response file on 28 January 2026, which has been entered and made of record.  Claims 1-3, 11-12 and 19 has/have been amended.  No Claim has been added or cancelled.  Claims 1-20 are pending in the application.
	
	Response to Arguments
Applicant's arguments, with respect to objection to Claim 3, see p.7, filed on 28 January 2026 have been fully considered and are persuasive.  The previous objection to Claim 3 is withdrawn after Claim 3 being amended.
Applicant’s arguments, see p.7-8, filed on 28 January 2026, with respect to the rejection(s) of Claim(s) 1-20 under 35 USC §103 have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection. The newly amended Claim(s) 1/11/19 is/are now rejected under 35 USC §103 as being unpatentable over Browy et al. (US 2018/0075659 A1) in view of Andrade et al. (US 2012/0330646 A1).  Andrade teaches or suggests the newly added limitation of presenting options, by a display device of the head mounted device, superimposed on the environment, associated with the text item at the position, to enable selection, by the display device, of the superimposed one or more actions.  See detailed rejections below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4 , 7, 9-10, 11-14, 17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Browy et al. (US 2018/0075659 A1) in view of Andrade et al. (US 2012/0330646 A1).
Regarding Claim 1, Browy discloses a method ([0003]: Systems and methods disclosed herein address various challenges related to VR, AR and MR technology) comprising: 
capturing, by a head mounted device ([0032]: An example wearable system can comprise a head-mounted display, various imaging sensors, and one or more hardware processors. The display can be a see-through display worn in front of the eye or eyes), an image or a video corresponding to an environment detected in a field of view of a camera of the head mounted device ([0038]: As further described herein, a wearable system can receive an image of the user's environment. The image may be acquired by the outward-facing imaging system of a wearable device or a totem associated with the wearable device), wherein the image or the video comprises one or more text items associated with the environment ([0037]: In addition to or in alternative to enhancing the user's interaction experience with other people, the sensory eyewear system can also improve the user's experience with the environment. As an example of improving user interactions with the environment, a wearable system implementing the sensory eyewear system can recognize text (e.g., text on signage such as, e.g., commercial or public display signs) in an environment, modify the display characteristics of the text (e.g., by increasing the size of the text) or modify the content of the text (e.g., by translating the text to another language), and render the modified text over the physical text in the environment.  Also see Fig.8: notice three text items of which two are written in Chinese and one is in English); 

    PNG
    media_image1.png
    419
    564
    media_image1.png
    Greyscale

determining whether a text item of the one or more text items is interesting ([0218]: As will be further described with reference to FIG. 18, in certain embodiments, the wearable system 200 can recognize the meaning of the text and convert the text from an original language to a target language. For example, the wearable system 200 can identify letters, symbols, or characters from a variety of languages, such as, for example, English, Chinese, Spanish, German, Arabic, Hindi, etc., and translate the text from the original, displayed language to another language. In some embodiments, such translation can occur automatically according to previously specified settings (such as, e.g., user's preference or user's demographic or geographic information). In some embodiments, the translation can be done in response to a command (e.g., verbal or gesture) from the user. Browy does not explicitly use the phrase determining whether a text item of the one or more text items is interesting.  However Browy discloses detecting whether the text items are from a variety of languages, such as, for example, English, Chinese, Spanish, German, Arabic, Hindi, etc. and translating the identified items from the original displayed language to another language.  Therefore it would have been obvious to a PHOSITA that the foreign language letters, symbols, or characters are items of interesting since the wearable system will translate those foreign letters etc. from the original language to another language);
extracting, by the head mounted device, the text item determined as being interesting and superimposing, by the head mounted device, the text item at a position of the environment in the at image or the video ([0245]: As illustrated, both scenes 1800a and 1800b include a street 1802 and pedestrians 1804. The scene 1800a also shows street signs 1810a and 1820a which include Simplified Chinese characters. The sign 1820a also includes English characters. However, the user (not shown in FIG. 18) of the HMD may be an English speaker and may not understand the Chinese characters. Advantageously, in some embodiments, the wearable system can automatically recognize the text on the street signs 1810a and 1820b and convert the foreign language text portion of the street signs into a language that the user understands. The wearable system can also present the translated signage as a virtual image over the physical signs as shown in the scene 1800b. Accordingly, the user would not perceive the Chinese text in the signs 1810a, 1820a but instead would perceive the English text shown in the signs 1810b, 1820b, because the HMD displays virtual image (with the English text) with sufficient brightness that the underlying Chinese text is not perceived); and 
triggering, based on the text item determined as being interesting, one or more actions capable of being performed by the head mounted device ([0246]: The HMD (e.g., the wearable system 200) can use similar techniques as described with reference to FIGS. 16A-17 to identify a sign in the user's environment and recognize the sign. In some situations, the wearable system 200 may be configured to translate only a portion of a sign. For example, the wearable system 200 translates only the portion of the sign 1820a having Chinese text but not the portion of the sign 1820a having English text (“GOLDSTAR”), because the English portion can be understood by the user (e.g., because it is in the target language of the user)).
Browy fails to disclose presenting options, by a display device of the head mounted device, superimposed on the environment, associated with the text item at the position, to enable selection, by the display device, of the superimposed one or more actions.
However Andrade, in the same field of endeavor, discloses presenting options, by a display device, superimposed on the environment, associated with the text item at the position, to enable selection,, by the display device, of the superimposed one or more actions (Fig.5A/B.  [0047]: Attention will now be turned to FIGS. 5A and 5B, which shows how translated sign text according to the present invention can be used in conjunction with method 100 of the present invention and method 200 of the present invention. FIG. 5A shows a sign in Spanish. It is a sign for a store that sells decorative stones, or, "pedras decorativas" in the original Spanish. After the sign is translated into the user's native language (English language in this example), and as shown in FIG. 5B, the user is given a choice of data processing applications to do further processing on the recognized text. For example, as shown at the third line in the black box in FIG. 5B, the user can choose to run a searching application to search the phrase "decorative stones." If the user proceed to choose this option, through the user interface on her smart phone, then this would be an example of the method 100 where the recognized text is used to choose a data application and then perform further processing based, at least in part, on the recognized text. As shown in the first and second lines in the black box of FIG. 5B, the telephone number has been recognized as a telephone number and the name "Stone Pedras decorativas" has been recognized as the name of an entity corresponding to the telephone number. Therefore, the user is presented with additional options to call the number or to add it to an address book as a contact. This is, again, an example of the method of FIG. 1 because further data processing choices are provided based on the recognized text.  Notice option of call the number, adding the number to contact list).
Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Andrades into that of Browy and to include the limitation of presenting options, by a display device of the head mounted device, superimposed on the environment, associated with the text item at the position, to enable selection, by the display device, of the superimposed one or more actions so the user does not have to perform the user inputs that would otherwise be required to call up her telephone dialer application and/or her address book application and does not have to enter text into these applications corresponding to the telephone number and/or the business name as suggested by Andrades ([0048]).

    PNG
    media_image2.png
    678
    417
    media_image2.png
    Greyscale


Regarding Claim 2, Browy further teaches or suggests wherein: the head mounted device comprises smart glasses, an augmented reality device, or a virtual reality device ([0032]: A wearable system which is configured to present AR/VR/MR content can implement an sensory eyewear system to enhance the user's interaction with other people or the environment. An example wearable system can comprise a head-mounted display, various imaging sensors, and one or more hardware processors. The display can be a see-through display worn in front of the eye or eyes.  [0063]: In some embodiments, the display system preferably has latency of less than about 20 milliseconds for visual object alignment, less than about 0.1 degree of angular alignment, and about 1 arc minute of resolution, which, without being limited by theory, is believed to be approximately the limit of the human eye. The display system 220 may be integrated with a localization system, which may involve GPS elements, optical tracking, compass, accelerometers, or other data sources, to assist with position and pose determination; localization information may be utilized to facilitate accurate rendering in the user's view of the pertinent world (e.g., such information would facilitate the glasses to know where they are with respect to the real world)). 

Regarding Claim 3, Browy further teaches or suggests the determining whether the text item of the text items are interesting comprises determining that the text item corresponds to a predetermined content item of text content designated as interesting associated with training data of one or more machine learning models ([0038]: The wearable system may determine whether the image comprises letters or characters using a variety of techniques, such as, for example, machine learning algorithms or optical character recognition (OCR) algorithms). 

Regarding Claim 4, Browy further teaches or suggests determining that the text item is interesting in response to determining, based on the image or the video, that hand of a user holds, or points to, an object associated with the text item ([0234]: The wearable system 200 can be configured to identify or recognize text from an image automatically or in response to a user input. ... In embodiments where the text is identified in response to a user input, a user can use a variety of commands to initiate the identification or display of the text. For example, a command may be a verbal cue, a hand gesture, a head motion (e.g., nodding), an eye movement (e.g., blinking), etc.).

Regarding Claim 7, Browy discloses wherein action of the one or more actions comprises translating the text item in a first language to a translated text item in a second language associated with the head mounted device ([0245]: As illustrated, both scenes 1800a and 1800b include a street 1802 and pedestrians 1804. The scene 1800a also shows street signs 1810a and 1820a which include Simplified Chinese characters. The sign 1820a also includes English characters. However, the user (not shown in FIG. 18) of the HMD may be an English speaker and may not understand the Chinese characters. Advantageously, in some embodiments, the wearable system can automatically recognize the text on the street signs 1810a and 1820b and convert the foreign language text portion of the street signs into a language that the user understands. The wearable system can also present the translated signage as a virtual image over the physical signs as shown in the scene 1800b. Accordingly, the user would not perceive the Chinese text in the signs 1810a, 1820a but instead would perceive the English text shown in the signs 1810b, 1820b, because the HMD displays virtual image (with the English text) with sufficient brightness that the underlying Chinese text is not perceived).

Regarding Claim 9, Browy teaches or suggests presenting, by the head mounted device, the translated text item in the second language superimposed within the image or the video (Fig.18: 1800b).

Regarding Claim 10, Browy further teaches or suggests generating a prompt, by the head mounted device, enabling a user associated with the head mounted device to select an action of the one or more actions to enable the head mounted device to perform the action ([0287]: In a 29th aspect, the wearable system of any one of aspects 26-28, wherein to identify a target language based on contextual information associated with the user, the hardware processor is programmed to: set the target language as a language understood by the user based on at least one of: the user's speech as captured by the wearable system, the user's location, or an input from the user selecting the language as the target language).

Regarding Claims 11-14 and 17, Claims 11-14 and 17 are similar to Claims 1-4 and 7 except in the format of apparatus.  Park further discloses one or more processors and at least one memory storing instructions (Claim 1: A smart glasses apparatus, comprising: a physical frame; an outward-facing camera assembly; a display; a processor; a non-transitory computer-readable medium comprising instructions that, when executed by the processor, cause the smart glasses apparatus to …). Therefore the same reason(s) for rejection is/are applied to Claim 1-4 and 7 are also applied to Claim 11-14 and 17.

Regarding Claims 19 and 20, Claims 19 and 20 are similar to Claims 1 and 3 except in the format of non-transitory computer-readable medium.  Therefore the same reason(s) for rejection is/are applied to Claim 1 and 3 are also applied to Claim 19 and 20.

Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Browy et al. (US 2018/0075659 A1) in view of Andrade et al. (US 2012/0330646 A1) as applied to Claim 4, 1 and 14, 11 above, and further in view of Mittal et al. (US 2014/0225918 A1).
Regarding Claim 5, Browy discloses using a hand gesture to identify or recognize text from an image ([0234]). 
But Browy as modified fails to explicitly recite cropping out one or more regions or other text items captured in the image or the video in response to determining the hand of the user holds, or points to, the object associated with the text item to obtain a second image or a second video comprising the text item and excluding the one or more regions or the other text items.
However Mittal, in the same field of endeavor, discloses using a hand gesture to reduce the size of the ROI (Fig.5 and [0091]: FIG. 5 illustrates an example of the ROI 305 being reduced by the user, which results in less AR targets or POIs being shown on the display 140. As described in block 220, a user can reduce the size of the ROI 305 using hand gestures (e.g., by moving hands closer). Using the reduced-sized, the display 140 in FIG. 5 now only has three 3 POIs (Starbucks 505, Henry's Diner 510, Henry's Diner 515) inside the ROI. By reducing the size of the ROI, the number of targets inside the ROI has been reduced from five POIs in FIG. 4 to three POIs in FIG. 5. According to another embodiment, the user can further reduce the size of the ROI so that only one AR target 320 is inside the ROI 305).  

    PNG
    media_image3.png
    491
    436
    media_image3.png
    Greyscale

Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Mittal into that of Browy as modified and to include the limitation of cropping out one or more regions or other text items captured in the image or the video in response to determining the hand of the user holds, or points to, the object associated with the text item to obtain a second image or a second video comprising the text item and excluding the one or more regions or the other text items in order to identify the interesting text as a user desires.

Regarding Claim 6, Browy further teaches or suggests determining, based on analyzing the image or the video, a region of interest associated with the text item determined as being interesting and one or more background items or other items associated with a scene of the environment (Fig.18: The 1810a can be interpreted as a region of interest associated with the text item determined as being interesting and the English word “GOLDSTAR” can be interpreted as one of background items).
But Browy as modified fails to explicitly recite cropping out the one or more background items or the other items to obtain a second image or a second video comprising the region of interest and the text item.
However Mittal, in the same field of endeavor, discloses using a hand gesture to reduce the size of the ROI (Fig.5 and [0091]: FIG. 5 illustrates an example of the ROI 305 being reduced by the user, which results in less AR targets or POIs being shown on the display 140. As described in block 220, a user can reduce the size of the ROI 305 using hand gestures (e.g., by moving hands closer). Using the reduced-sized, the display 140 in FIG. 5 now only has three 3 POIs (Starbucks 505, Henry's Diner 510, Henry's Diner 515) inside the ROI. By reducing the size of the ROI, the number of targets inside the ROI has been reduced from five POIs in FIG. 4 to three POIs in FIG. 5. According to another embodiment, the user can further reduce the size of the ROI so that only one AR target 320 is inside the ROI 305).  Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Mittal into that of Browy as modified and to include the limitation of cropping out the one or more background items or the other items to obtain a second image or a second video comprising the region of interest and the text item in order to identify and focus the interesting text as a user desires.

Regarding Claims 15-16, Claims 15-16 are similar to Claims 5-6 except in the format of apparatus.  Therefore the same reason(s) for rejections is/are applied to Claims 5-6 are also applied to Claims 15-16.

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Browy et al. (US 2018/0075659 A1) in view of Andrade et al. (US 2012/0330646 A1) as applied to Claim 1 and 11 above, and further in view of Wexter et al. (US 2023/0012272 A1).
Regarding Claim 8, Browy disclose using hand gesture ([0234]: The wearable system 200 can be configured to identify or recognize text from an image automatically or in response to a user input. ... In embodiments where the text is identified in response to a user input, a user can use a variety of commands to initiate the identification or display of the text. For example, a command may be a verbal cue, a hand gesture, a head motion (e.g., nodding), an eye movement (e.g., blinking), etc.).  
But Browy as modified fails to explicitly disclose language in response to detecting finger of a user, associated with the head mounted device, pointing at or hovering over the text item in the environment.
However hand gesture with a finger pointing at or hovering over a target object had been a widely used hand gesture as a selection.  Wexter discloses In some embodiments, wearable apparatus 110 may further be configured to detect objects within the environment of the user. For example, wearable apparatus 110 may be configured to detect a hand 1820 of the user. Hand 1820 (and/or other objects that may be relevant to the selective reading of text, such as a pointer, etc.) may be identified using various object and feature detection techniques, as described above. In some embodiments, hand 1820 may be used to provide requests for selectively reading text in menu 1810. For example, the user may point to particular portions of menu 1810, and wearable apparatus 110 may read structural elements within the vicinity of where the user is pointing. As another example, the user may make gestures with hand 1820 for selectively reading portions of the text. For example, the user may make a gesture associated with a particular structural element (e.g., the title), or may make a gesture to navigate the text (e.g., a gesture to move to the next paragraph, etc.) ([0119]).
Therefore it would have been obvious to a PHOSITA before the effective filing date to incorporate the teaching of Wexter into that of Browy as modified and to include the limitation of performing the translating the text item in the first language to the translated text item in the second language in response to detecting finger of a user, associated with the head mounted device, pointing at or hovering over the text item in the environment since pointing at or hovering over a selection is one of nature hand gestures.

Regarding Claim 18, Claim 18 is similar to Claim 8 except in the format of non-transitory computer-readable medium.  Therefore the same reason(s) for rejection is/are applied to Claim 8 is also applied to Claim 18.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YINGCHUN HE whose telephone number is (571)270-7218. The examiner can normally be reached M-F 8:00-5:00 MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao M Wu can be reached at 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YINGCHUN HE/Primary Examiner, Art Unit 2613
Read full office action
Prosecution Timeline

Dec 05, 2023
Application Filed
Jul 30, 2025
Non-Final Rejection — §103
Nov 03, 2025
Response Filed
Nov 25, 2025
Final Rejection — §103
Jan 28, 2026
Response after Non-Final Action
Mar 11, 2026
Request for Continued Examination
Mar 13, 2026
Response after Non-Final Action
Mar 17, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/645,277
Patent 12602886
LOW LATENCY HAND-TRACKING IN AUGMENTED REALITY SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/029,570
Patent 12588711
METHOD AND APPARATUS FOR OUTPUTTING IMAGE FOR VIRTUAL REALITY OR AUGMENTED REALITY
2y 5m to grant Granted Mar 31, 2026
18/389,197
Patent 12586247
IMAGE DISTORTION CALIBRATION DEVICE, DISPLAY DEVICE AND DISTORTION CALIBRATION METHOD
2y 5m to grant Granted Mar 24, 2026
18/410,479
Patent 12586491
Display Device and Method for Driving the Same
2y 5m to grant Granted Mar 24, 2026
18/430,973
Patent 12579949
IMAGE PROCESSING APPARATUS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
96%
With Interview (+14.4%)
2y 5m
Median Time to Grant
High
PTA Risk
Based on 644 resolved cases by this examiner. Grant probability derived from career allow rate.