Prosecution Insights
Last updated: May 29, 2026
Application No. 18/695,268

Methods and Systems for Storing Object Information With Contextual Information

Final Rejection §103
Filed
Mar 25, 2024
Priority
Sep 24, 2021 — provisional 63/247,971 +1 more
Examiner
GIULIANI, GIUSEPPI J
Art Unit
2153
Tech Center
2100 — Computer Architecture & Software
Assignee
Apple Inc.
OA Round
4 (Final)
58%
Grant Probability
Moderate
5-6
OA Rounds
1y 2m
Est. Remaining
65%
With Interview

Examiner Intelligence

Grants 58% of resolved cases
58%
Career Allowance Rate
166 granted / 284 resolved
+3.5% vs TC avg
Moderate +6% lift
Without
With
+6.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
18 currently pending
Career history
310
Total Applications
across all art units

Statute-Specific Performance

§101
0.7%
-39.3% vs TC avg
§103
85.8%
+45.8% vs TC avg
§102
7.1%
-32.9% vs TC avg
§112
4.2%
-35.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 284 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Remarks This action is in response to the applicant’s response filed 9 December 2025, which is in response to the USPTO office action mailed 15 September 2025. Claims 18, 31, 35 and 38-42 are amended. Claims 1-17, 25, 27, 28, 32, 33, 36 and 37 are cancelled. Claims 43 and 44 are added. Claims 18-24, 26, 29-31, 34, 35 and 38-44 are currently pending. Response to Arguments With respect to the 35 USC §103 rejections of claims 18-24, 26, 29-31, 34, 35 and 38-42, the applicant’s arguments are moot in view of a new grounds of rejection, as necessitated by the applicant's amendments. Claim Rejections - 35 USC § 103 The following is a quotation of 3f5 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 18-24, 29-31, 34, 35 and 39-44 are rejected under 35 U.S.C. 103 as being unpatentable over Ju et al., US 2021/0201029 A1 (hereinafter “Ju”) in view of Poddar et al., US 2021/0118442 A1 (hereinafter “Poddar”). Claim 18: Ju teaches a method comprising: at a device including an image sensor, one or more processors, and non-transitory memory: capturing, using the image sensor, an image of an environment (Ju, [Fig. 2A], [Fig. 4], [0064] note At step 402 of flowchart 400, visual data of an environment with a real object is captured, where the visual data may correspond to images, video, and/or a scan of a real-world environment having the object. This may therefore be captured by a user's device, such as a mobile phone, wearable computing device, camera, or the like that includes an optical capture device); detecting a user engagement with an object in the environment based on detecting the object in the image of the environment; and in response to detecting the user engagement with the object (Ju, [0053] note a pointer selection 1014 may allow a user to utilize pointer 1010 to select bike 1006 for object identification): obtaining information regarding the object (Ju, [0058] note augmented reality objects may correspond to real-world objects, like bike 1006, that includes virtual data for an augmented reality experience. This may include offer A details 1112 that may display additional information for the offer associated with visual indicator 1105 and virtual graphic 1106); obtaining contextual information of the device at a time at which the image of the environment was captured (Ju, [0054] note Geolocation A 1018 may be used to add identifying data and characteristics to bike 1006 so that bike 1006 is associated with a geolocation. Similarly, a time 1020 may also be added to bike 1006 and/or the corresponding visual or augmented reality data so that time 1020 may be associated with geolocation A 1018 for bike 1006); and storing, in a database, an entry including the information regarding the object in association with the contextual information (Ju, [0039] note database 116 may store real-world images and/or virtual graphics or indicators for an augmented reality, as well as offer data for the virtual data of the augmented reality experience); Ju does not explicitly teach receiving a verbal query, wherein the verbal query includes a first set of one or more words relating to information regarding objects in entries of the database and a second set of one or more words relating to contextual information in the entries of the database; selecting one or more of the entries in the database based on the verbal query; and generating a response to the verbal query based on the selected entries. However, Poddar teaches this (Poddar, [0140] note a user may be moving about within a scene, perhaps while wearing smart glasses or while in view of a smart tablet's camera, while the assistant system 140 receives visual data such as a video comprising a plurality of images of the scene. As the images of this visual data are received, the CV module 504 of the assistant system 140 may continuously tag the images with detected objects and/or contexts and store this information as a visual state of the user's field of view, which may in turn be stored in the multimodal dialog state 337… Later, if the user issues a user query, such as “Hey Assistant, where did I leave my keys?”, the assistant system 140 may consult the multimodal dialog state 337 to determine the last image(s) in which the keys were tagged. In particular embodiments, the scene understanding engine 520 of the assistant system 140 may then be invoked by forwarding the visual data 503 and/or context 511 to the scene understanding engine 520 to perform heavier processing of the relevant images from the visual data to determine specific entities (e.g., the kitchen counter) and relational information (e.g., that the keys are on top of the counter). Finally, the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the object tracking of Ju with the multi-modal dialog state tracking process that may enable user assistance based on past data about items of Poddar according to known methods (i.e. responding to a user’s query regarding tagged objects). Motivation for doing so is that advantages may include the performance of action prediction that returns a particular set and type of relevant responses to a user query (Poddar, [0014]). Claim 19: Ju and Poddar teach the method of claim 18, wherein the information regarding the object includes machine-readable content associated with the object (Ju, [0058] note augmented reality objects may correspond to real-world objects, like bike 1006, that includes virtual data for an augmented reality experience. This may include offer A details 1112 that may display additional information for the offer associated with visual indicator 1105 and virtual graphic 1106). Claim 20: Ju and Poddar teach the method of claim 18, wherein the information regarding the object includes an object type of the object (Ju, [0016] note a machine learning model generated from one or more past travels of the object or similar object and types). Claim 21: Ju and Poddar teach the method of claim 18, wherein the contextual information includes a time at which the image of the environment was captured (Ju, [0054] note a time 1020 may also be added to bike 1006 and/or the corresponding visual or augmented reality data so that time 1020 may be associated with geolocation). Claim 22: Ju and Poddar teach the method of claim 18, wherein the contextual information includes a location of the device at a time at which the image of the environment was captured (Ju, [0054] note Geolocation A 1018 may be used to add identifying data and characteristics to bike 1006 so that bike 1006 is associated with a geolocation). Claim 23: Ju and Poddar teach the method of claim 18, wherein the contextual information includes an application executing on the device at a time at which the image of the environment was captured (Ju, [0034] note Augmented reality application 120, [0035] note Transaction application 112, [0038] note Other applications 114 may also include email, texting, voice and IM applications that allow a user to send and receive emails, calls, texts, and other notifications through network 150). Claim 24: Ju and Poddar teach the method of claim 18, wherein the contextual information includes an activity of a user of the device being performed at a time at which the image of the environment was captured (Ju, [0053] note a pointer selection 1014 may allow a user to utilize pointer 1010 to select bike 1006 for object identification). Claim 29: Ju and Poddar teach the method of claim 18, wherein generating the response includes generating a verbal response (Poddar, [0140] note the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). Claim 30: Ju and Poddar teach the method of claim 18, wherein generating the response includes displaying a response window including at least a portion of the image of the environment (Poddar, [Fig. 8] note 801, [0135] note As an example and not by way of limitation, in response to a user query, such as “When was the last time I went skiing?”, the assistant system 140 may provide a response in multiple modalities, such as an audio response (e.g., “It was Mar. 12, 2017 in Austria, with Mary.”) as well as a visual response (e.g., a photo 801 from the actual skiing trip)). Claim 31: Ju teaches a device comprising: an image sensor; a non-transitory memory; and one or more processors to: capture, using the image sensor, an image of an environment (Ju, [Fig. 2A], [Fig. 4], [0064] note At step 402 of flowchart 400, visual data of an environment with a real object is captured, where the visual data may correspond to images, video, and/or a scan of a real-world environment having the object. This may therefore be captured by a user's device, such as a mobile phone, wearable computing device, camera, or the like that includes an optical capture device); detect a user engagement with an object in the environment based on detecting the object in the image of the environment; and in response to detecting the user engagement with the object (Ju, [0053] note a pointer selection 1014 may allow a user to utilize pointer 1010 to select bike 1006 for object identification): obtain information regarding the object (Ju, [0058] note augmented reality objects may correspond to real-world objects, like bike 1006, that includes virtual data for an augmented reality experience. This may include offer A details 1112 that may display additional information for the offer associated with visual indicator 1105 and virtual graphic 1106); obtain contextual information of the device at a time at which the image of the environment was captured (Ju, [0054] note Geolocation A 1018 may be used to add identifying data and characteristics to bike 1006 so that bike 1006 is associated with a geolocation. Similarly, a time 1020 may also be added to bike 1006 and/or the corresponding visual or augmented reality data so that time 1020 may be associated with geolocation A 1018 for bike 1006); and store, in a database, an entry including the information regarding the object in association with the contextual information (Ju, [0039] note database 116 may store real-world images and/or virtual graphics or indicators for an augmented reality, as well as offer data for the virtual data of the augmented reality experience). Ju does not explicitly teach receiving a verbal query, wherein the verbal query includes a first set of one or more words relating to information regarding objects in entries of the database and a second set of one or more words relating to contextual information in the entries of the database; selecting one or more of the entries in the database based on the verbal query; and generating a response to the verbal query based on the selected entries. However, Poddar teaches this (Poddar, [0140] note a user may be moving about within a scene, perhaps while wearing smart glasses or while in view of a smart tablet's camera, while the assistant system 140 receives visual data such as a video comprising a plurality of images of the scene. As the images of this visual data are received, the CV module 504 of the assistant system 140 may continuously tag the images with detected objects and/or contexts and store this information as a visual state of the user's field of view, which may in turn be stored in the multimodal dialog state 337… Later, if the user issues a user query, such as “Hey Assistant, where did I leave my keys?”, the assistant system 140 may consult the multimodal dialog state 337 to determine the last image(s) in which the keys were tagged. In particular embodiments, the scene understanding engine 520 of the assistant system 140 may then be invoked by forwarding the visual data 503 and/or context 511 to the scene understanding engine 520 to perform heavier processing of the relevant images from the visual data to determine specific entities (e.g., the kitchen counter) and relational information (e.g., that the keys are on top of the counter). Finally, the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the object tracking of Ju with the multi-modal dialog state tracking process that may enable user assistance based on past data about items of Poddar according to known methods (i.e. responding to a user’s query regarding tagged objects). Motivation for doing so is that advantages may include the performance of action prediction that returns a particular set and type of relevant responses to a user query (Poddar, [0014]). Claim 34: Ju and Poddar teach the device of claim 31, wherein the one or more processors are to generate the response by generating a verbal response (Poddar, [0140] note the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). Claim 35: Ju teaches a non-transitory computer-readable medium having instructions encoded thereon which, when executed by a device including a processor and an image sensor, causes the device to: capture, using the image sensor, an image of an environment (Ju, [Fig. 2A], [Fig. 4], [0064] note At step 402 of flowchart 400, visual data of an environment with a real object is captured, where the visual data may correspond to images, video, and/or a scan of a real-world environment having the object. This may therefore be captured by a user's device, such as a mobile phone, wearable computing device, camera, or the like that includes an optical capture device); detect a user engagement with an object in the environment based on detecting the object in the image of the environment; and in response to detecting the user engagement with the object (Ju, [0053] note a pointer selection 1014 may allow a user to utilize pointer 1010 to select bike 1006 for object identification): obtain information regarding the object (Ju, [0058] note augmented reality objects may correspond to real-world objects, like bike 1006, that includes virtual data for an augmented reality experience. This may include offer A details 1112 that may display additional information for the offer associated with visual indicator 1105 and virtual graphic 1106); obtain contextual information of the device at a time at which the image of the environment was captured (Ju, [0054] note Geolocation A 1018 may be used to add identifying data and characteristics to bike 1006 so that bike 1006 is associated with a geolocation. Similarly, a time 1020 may also be added to bike 1006 and/or the corresponding visual or augmented reality data so that time 1020 may be associated with geolocation A 1018 for bike 1006); and store, in a database, an entry including the information regarding the object in association with the contextual information (Ju, [0039] note database 116 may store real-world images and/or virtual graphics or indicators for an augmented reality, as well as offer data for the virtual data of the augmented reality experience). Ju does not explicitly teach receive a verbal query, wherein the verbal query includes a first set of one or more words relating to information regarding objects in entries of the database and a second set of one or more words relating to contextual information in the entries of the database; select one or more of the entries in the database based on the verbal query; and generate a response to the verbal query based on the selected entries. However, Poddar teaches this (Poddar, [0140] note a user may be moving about within a scene, perhaps while wearing smart glasses or while in view of a smart tablet's camera, while the assistant system 140 receives visual data such as a video comprising a plurality of images of the scene. As the images of this visual data are received, the CV module 504 of the assistant system 140 may continuously tag the images with detected objects and/or contexts and store this information as a visual state of the user's field of view, which may in turn be stored in the multimodal dialog state 337… Later, if the user issues a user query, such as “Hey Assistant, where did I leave my keys?”, the assistant system 140 may consult the multimodal dialog state 337 to determine the last image(s) in which the keys were tagged. In particular embodiments, the scene understanding engine 520 of the assistant system 140 may then be invoked by forwarding the visual data 503 and/or context 511 to the scene understanding engine 520 to perform heavier processing of the relevant images from the visual data to determine specific entities (e.g., the kitchen counter) and relational information (e.g., that the keys are on top of the counter). Finally, the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the object tracking of Ju with the multi-modal dialog state tracking process that may enable user assistance based on past data about items of Poddar according to known methods (i.e. responding to a user’s query regarding tagged objects). Motivation for doing so is that advantages may include the performance of action prediction that returns a particular set and type of relevant responses to a user query (Poddar, [0014]). Claim 39: Ju and Poddar teach the method of claim 18, wherein the verbal query requests information regarding an object identified by the first set of one or more words detected during a context identified by second set of one or more words (Poddar, [0140] note the user issues a user query, such as “Hey Assistant, where did I leave my keys?”). Claim 40: Ju and Poddar teach the method of claim 18, wherein the verbal query introduces the first set of one or more words with the word 'what' and the second set of one or more words with the word 'when' (Poddar, [0140] note if the user issues a user query, such as “Hey Assistant, where did I leave my keys?”, the assistant system 140 may consult the multimodal dialog state 337 to determine the last image(s) in which the keys were tagged. In particular embodiments, the scene understanding engine 520 of the assistant system 140 may then be invoked by forwarding the visual data 503 and/or context 511 to the scene understanding engine 520 to perform heavier processing of the relevant images from the visual data to determine specific entities (e.g., the kitchen counter) and relational information (e.g., that the keys are on top of the counter). Finally, the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). Claim 41: Ju and Poddar teach the method of claim 18, wherein selecting the one or more entries includes selecting the entry based on the first set of one or more words matching the information regarding the object and the second set of one or more words matching the contextual information (Poddar, [0140] note the assistant system 140 may consult the multimodal dialog state 337 to determine the last image(s) in which the keys were tagged. In particular embodiments, the scene understanding engine 520 of the assistant system 140 may then be invoked by forwarding the visual data 503 and/or context 511 to the scene understanding engine 520 to perform heavier processing of the relevant images from the visual data to determine specific entities (e.g., the kitchen counter) and relational information (e.g., that the keys are on top of the counter)). Claim 42: Ju and Poddar teach the method of claim 18, wherein selecting the one or more entries includes selecting at least one first entry matching the first set of one or more words and selecting at least one second entry matching the second set of one or more words (Poddar, [0140] note the assistant system 140 may consult the multimodal dialog state 337 to determine the last image(s) in which the keys were tagged. In particular embodiments, the scene understanding engine 520 of the assistant system 140 may then be invoked by forwarding the visual data 503 and/or context 511 to the scene understanding engine 520 to perform heavier processing of the relevant images from the visual data to determine specific entities (e.g., the kitchen counter) and relational information (e.g., that the keys are on top of the counter). Finally, the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). Claim 43: Ju and Poddar teach the method of claim 18, wherein the verbal query is a vocal query (Poddar, [0052] note the assistant system 140 may support both audio input (verbal), [0140] note the user issues a user query, such as “Hey Assistant, where did I leave my keys?”). Claim 44: Ju and Poddar teach the method of claim 18, wherein the response includes information regarding objects in the selected entries (Poddar, [0140] note the assistant system 140 may send a response to this user request to the user (e.g., “You left them on the kitchen counter at 3:00 PM today”)). Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Ju and Poddar in further view of TOMIZUKA et al., US 2022/0222900 A1 (hereinafter “Tomizuka”). Claim 26: Ju and Poddar do not explicitly teach the method of claim 18, wherein detecting the user engagement with the object is further based on detecting that the user has physically interacted with the object. However, Tomizuka teaches this (Tomizuka, [0023] note the object 112 may be a real object that is located at the first location 116(1) and the XR environment 108 may represent a real-world environment at the first location 116(1). The first user may interact with the object 112 at the first location 116(1) via the first user device 102(1) executing the XR collaboration application 106. For instance, the first user may physically manipulate the object 112 by changing the position of the object 112 in a real-world environment that is represented as the XR environment 108). It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the object tracking of Ju and Poddar with the object interaction of Tomizuka according to known methods (i.e. determining a target object based on a user interaction). Motivation for doing so is that this provides better tracking and may more accurately render work being performed in real-time (Tomizuka, [0002]). Claim 38 is rejected under 35 U.S.C. 103 as being unpatentable over Ju and Poddar in further view of Keating et al., US 2014/0028712 A1 (hereinafter “Keating”). Claim 38: Ju and Poddar do not explicitly teach method of claim 18, wherein detecting the user engagement with the object is further based on detecting that the user has looked at the object for at least a threshold amount of time. However, Keating teaches this (Keating, [0005] note determining whether the object has been selected based at least in part on a set of selection criteria, [0074] note augmentation logic can be configured to overlay audiovisual content (referred to herein as "augmentation") over this view into the real world environment to provide a augmented reality view of the real-world environment. The augmentation logic can provide overlays over the background, foreground, and/or one or more tangible objects within the field of view of the ARD 14, [0047] note the control unit 120 of the ARD may be configured to determine whether the user is looking around. It may perform the following functions, including but not limited to… 3) start augmentation if the user initiates interaction with the object by a) stopping abruptly on the object, b) keeping the object in the camera view for a predetermined period of time, or c) any other direct or indirect means unrelated to velocity). It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the object tracking of Ju and Poddar with the object selection criteria of Keating according to known methods (i.e. determining a target object based on keeping objects in a camera view for a predetermined period of time). Motivation for doing so is that this can improve the conventional augmented reality applications (Keating, [0004]). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Giuseppi Giuliani whose telephone number is (571)270-7128. The examiner can normally be reached Monday-Friday. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached at (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /GIUSEPPI GIULIANI/Primary Examiner, Art Unit 2153
Read full office action

Prosecution Timeline

Show 7 earlier events
Aug 21, 2025
Examiner Interview Summary
Aug 22, 2025
Request for Continued Examination
Aug 26, 2025
Response after Non-Final Action
Sep 15, 2025
Non-Final Rejection mailed — §103
Dec 05, 2025
Examiner Interview Summary
Dec 05, 2025
Applicant Interview (Telephonic)
Dec 09, 2025
Response Filed
Dec 22, 2025
Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12639293
SYSTEMS AND METHODS FOR PAGINATING SEARCH RESULTS RETRIEVED FROM DATABASES THAT SUPPORT CURSOR-BASED PAGINATION
3y 4m to grant Granted May 26, 2026
Patent 12632499
SYSTEMS AND METHODS FOR USING GRAPH DATA STRUCTURES
1y 8m to grant Granted May 19, 2026
Patent 12613916
SYSTEMS AND METHODS TO INCREASE VIEWERSHIP OF ONLINE CONTENT
5y 0m to grant Granted Apr 28, 2026
Patent 12609824
PARTITIONING A BLOCKCHAIN NETWORK
2y 5m to grant Granted Apr 21, 2026
Patent 12602410
MULTIMODAL CONTEXT SELECTION FOR LARGE LANGUAGE MODEL BASED RESOLUTIONS ADDRESSING TECHNICAL ISSUES
2y 0m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6
Expected OA Rounds
58%
Grant Probability
65%
With Interview (+6.4%)
3y 5m (~1y 2m remaining)
Median Time to Grant
High
PTA Risk
Based on 284 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month