Last updated: April 19, 2026
Application No. 18/772,058
THREE-DIMENSIONAL REASONING USING MULTI-STAGE INFERENCE FOR AUTONOMOUS SYSTEMS AND APPLICATIONS

Non-Final OA §103§112
Filed
Jul 12, 2024
Examiner
GUO, XILIN
Art Unit
2616
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +17.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 456 resolved cases, 2023–2026
Examiner Intelligence

GUO, XILIN View full profile →
Grants 82% — above average
Career Allow Rate
374 granted / 456 resolved
+20.0% vs TC avg
Strong +17% interview lift
Without
With
+17.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
18 currently pending
Career history
474
Total Applications
across all art units
Statute-Specific Performance

§101
7.6%
-32.4% vs TC avg
§103
56.3%
+16.3% vs TC avg
§102
12.8%
-27.2% vs TC avg
§112
19.0%
-21.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 456 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. 

Claims 12-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 12 recites the limitation “3D”.The term “3D” is a relative term which renders the claim indefinite. Therefore, the claim is rejected under U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Dependent claims 13-18 are rejected because they depend upon independent claim 12.

Claim 13 depends upon independent claim 12 and recites the limitation “the one or more processors further to obtain, based at least on the application of the one or more images to the one or more machine learning models ...”. However, each of them does not describe “the application”. Therefore, the examiner deems the claim indefinite as it fail to particularly point out and distinctly claim what Applicant regards as the invention. Accordingly, the claim is rejected under U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Claim 18 depends upon independent claim 12 and recites the limitation “AI operations”. The term “AI” is a relative term which renders the claim indefinite. Therefore, the claim is rejected under U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Claim 19 recites the limitation “3D”.The term “3D” is a relative term which renders the claim indefinite. Therefore, the claim is rejected under U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Dependent claims 20 is rejected because they depend upon independent claim 19.
Claim 20 depends upon independent claim 19 and recites the limitation “AI operations”. The term “AI” is a relative term which renders the claim indefinite. Therefore, the claim is rejected under U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 4-5, 8-9, 12, 14-15 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over KUMAR et al (U.S. Patent Application Publication 2024/0038488 A1) in view of Sholingar et al (U.S. Patent Application Publication 2020/0160070 A1).

	Regarding claim 1, KUMAR discloses a method comprising: 
rendering one or more virtual images (FIG. 1; paragraph [0071], first images 112 captured at a first magnification level ... the first image montage comprising first images 112 may depict a cryogenic electron microscopy grid comprising a plurality of cryogenic electron microscopy grid units) from one or more perspectives using a magnified portion of a three-dimensional (3D) representation of an environment (Paragraph [0050], the user could obtain the model (e.g., the cryo-EM model described herein) and plug it into supported data acquisition software stacks; paragraph [0071], sample 130 being captured by microscope 110 may result in the images of a cryo-EM grid including holes within it. As described herein, a “grid” or “cryo-EM grid” refers to a substrate with which cryo-EM samples (e.g., samples 130) are placed for imaging ...; paragraph [0026], Cryo-EM's ability to elucidate 3D structures of previously uncharacterized, complex molecules has made it a powerful tool to define binding modes of novel compounds in protein targets), the magnified portion of the 3D representation (Paragraph [0071], first images 112 captured at a first magnification level) corresponding to one or more first predicted locations in the environment (Paragraph [0071], a location of where to focus to capture first images 112); 
obtaining, based at least on applying the one or more virtual images to one or more machine learning models (Paragraph [0072], computing system 120 may be configured to execute a first machine learning (ML) model 122), one or more second predicted locations in the environment (Paragraph [0072], the first image montage comprising first images 112 may be input to first ML model 122 ...; paragraph [0076], the first machine learning model (e.g., first ML model 122) may be configured to determine a predicted location of a center of each of the cryogenic electron microscopy grid units within a corresponding training image from the first plurality of training images); and 
performing one or more control operations associated with an object Paragraph [0073], computing system 120 may be configured to detect, using first ML model 122, one or more cryogenic electron microscopy units (e.g., identified squares 132) that satisfy a first quality condition. In one or more example, the cryo-EM grid units may use first ML model 122 to generate a first score indicating a quality of the cryogenic electron microscopy grid units).
It's noted that KUMAR describes “a location of where to focus to capture first images” and does not describe “one or more first predicted locations in the environment”. However, the invention of KUMAR describes that the computing system uses machine learning models to determine a predicted location. Thus, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for predicting a first location in the environment by using one of machine learning model.
However, KUMAR does not specifically disclose a machine in the environment.
In additional, Sholingar discloses a machine in the environment (Paragraph [0012], the computing device can detect and track traffic objects in an environment around a vehicle, where a traffic object is defined as a rigid or semi-rigid three-dimensional (3D) solid object occupying physical space in the real world surrounding a vehicle. Examples of traffic objects include vehicles and pedestrians, etc., as discussed below in relation to FIG. 2. Detecting and tracking traffic objects can include determining a plurality of estimates of the location of a traffic object with respect to the vehicle to determine motion and thereby predict future locations of traffic objects ... thereby permit a computing device to predict a future location for a traffic object based on a color video image of the vehicle's environment).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR incorporate the teachings of Sholingar, and applying the vehicle information system taught by Sholingar to provide compatibility with the system for performing control operations associated with a machine in the environment based on the predicted location. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR according to the relied-upon teachings of Sholingar to obtain the invention as specified in claim.

	Regarding claim 2, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1), and KUMAR disclose further comprising applying, to the one or more machine learning models substantially contemporaneously with the one or more virtual images (FIG. 1; paragraph [0072], the first image montage comprising first images 112 may be input to first ML model 122; paragraph [0079], the second image montage comprising second images 114 may be input to second ML model 124 ...), one or more token embeddings corresponding to a structured language command (Paragraph [0076], the first machine learning model (e.g., first ML model 122) may be configured to determine a predicted boundary of each of the cryogenic electron microscopy grid units within a corresponding training image from the first plurality of training images. In some embodiments, the predicted boundary may comprise a bounding box enclosing a boundary of each square of the grid ...; paragraph [0079], second ML model 124 may be configured to detect one or more apertures 134 within the cryogenic electron microscopy grid unit that satisfy a second quality condition based on the second image montage. As described herein, a “hole” refers to an aperture within the squares of the cryo-EM grid), wherein the obtaining of the one or more second predicted locations is further based at least on the applying of the one or more token embeddings (Paragraph [0073], computing system 120 may be configured to detect, using first ML model 122, one or more cryogenic electron microscopy units (e.g., identified squares 132) that satisfy a first quality condition ...).  

	Regarding claim 4, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1), and KUMAR disclose further wherein the one or more second predicted locations correspond to one or more refined versions of the one or more first predicted locations (FIG. 1; paragraph [0077], computing system 120 may be configured to generate a comparison score. The comparison score may indicate how similar the predicted location is to the predetermined location, how similar the predicted boundary is to the predetermined boundary, or both. In some embodiments, computing system 120 may be configured to update a first set of parameters of the first machine learning model based on the comparisons) such that one or more first confidence scores associated with the one or more first predicted locations (Paragraph [0073], computing system 120 may be configured to detect, using first ML model 122, one or more cryogenic electron microscopy units (e.g., identified squares 132) that satisfy a first quality condition. In one or more example, the cryo-EM grid units may use first ML model 122 to generate a first score indicating a quality of the cryogenic electron microscopy grid units; paragraph [0078], the first magnification level may comprise 1×, 2×, 3×, etc.) are less than one or more second confidences scores (Paragraph [0080], second ML model 124 may evaluate second images 124 to determine a second score for each aperture. Example images of apertures within a square are illustrated in FIG. 3B. Computing system 120 may input second images 114 into second machine learning model 124 to obtain a second score indicating a likelihood that at least one of a biological structure suspended within each of a plurality of apertures depicted by second images 114; paragraph [0078], the second magnification level may comprise 10×, 20×, 30×, etc.) associated with the one or more second predicted locations (Paragraph [0079], second ML model 124 may be configured to detect one or more apertures 134 within the cryogenic electron microscopy grid unit that satisfy a second quality condition based on the second image montage. As described herein, a “hole” refers to an aperture within the squares of the cryo-EM grid in which vitreous ice and protein are suspended. Identified holes 134 may be provided to microscope 110 with instructions to capture third images 116 (e.g., micrographs) captured at a third magnification level).   

	Regarding claim 5, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1), and KUMAR disclose wherein the one or more machine learning models include one or more convex upsampling layers to increase one or more spatial dimensions of one or more feature maps corresponding to the one or more virtual images (FIG. 1; paragraph [0086], computing system 120 may receive third images 116 depicting sample 130 captured at the third magnification level. In one or more examples, the third magnification may comprise 100×, 200×, 300×, etc., and the second magnification level may comprise 10×, 20×, 30×, etc.; FIGS. 3A-3C show images captured using electron microscopy techniques at various magnification levels).

	Regarding claim 8, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1), and KUMAR disclose wherein the one or more first predicted locations and the one or more second predicted locations correspond to at least one of: 
one or more objects in the environment (FIG, 1; paragraph [0072], first ML model 122 may be configured to detect one or more cryogenic electron microscopy grid units (e.g., squares) that satisfy a first quality condition based on the first image montage); or 
one or more positions associated with one or more key poses of the machine.

	Regarding claim 9, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1), and KUMAR disclose further comprising: 
generating the 3D representation of the environment based at least on applying one or more images depicting the environment (FIG. 1; computing system 120 may be configured to generate a 3D representation of the biological structure (e.g., sample 130)) to a neural network (Paragraph [0071], sample 130 being captured by microscope 110 ...; paragraph [0075], the first ML model may be a convolutional neural network; paragraph [0084], the second ML model may be a convolutional neural network); and 
obtaining the one or more first predicted locations in the environment based at least on applying, to one or more second machine learning models (Paragraph [0076], the first machine learning model (e.g., first ML model 122) may be configured to determine a predicted location of a center of each of the cryogenic electron microscopy grid units ...), one or more second virtual images depicting the 3D representation of the environment from one or more second perspectives (Paragraph [0078], second images 114 may be captured by microscope 110 at a second magnification level).   

	Regarding claim 12, KUMAR discloses a system comprising: 
one or more processors (FIG. 1; paragraph [0040], computing system 120 may execute these machine learning models utilizing one or more processing devices (e.g., computing system 1500 discussed below with respect to FIG. 15) that may include hardware (e.g., a general purpose processor, a graphic processing unit (GPU)) to: 
generate an updated version of a 3D representation of an environment (Paragraph [0050], the user could obtain the model (e.g., the cryo-EM model described herein) and plug it into supported data acquisition software stacks; paragraph [0071], sample 130 being captured by microscope 110 may result in the images of a cryo-EM grid including holes within it. As described herein, a “grid” or “cryo-EM grid” refers to a substrate with which cryo-EM samples (e.g., samples 130) are placed for imaging ...; paragraph [0026], Cryo-EM's ability to elucidate 3D structures of previously uncharacterized, complex molecules has made it a powerful tool to define binding modes of novel compounds in protein targets; paragraph [0072], computing system 120 may be configured to execute a first machine learning (ML) model 122 ... the first image montage comprising first images 112 may be input to first ML model 122 ...; paragraph [0076], the first machine learning model (e.g., first ML model 122) may be configured to determine a predicted location of a center of each of the cryogenic electron microscopy grid units within a corresponding training image from the first plurality of training images), the updated version including a magnified portion of the 3D representation based at least on one or more first predictions (Paragraphs [0072]-[0073], the first image montage comprising first images 112 may be input to first ML model 122 ... computing system 120 may be configured to detect, using first ML model 122, one or more cryogenic electron microscopy units (e.g., identified squares 132). As shown in FIG. 1, the identified squares 132 is provided to microscope 110 with instructions to capture second images 114) associated with the magnified portion (Paragraph [0071], first images 112 captured at a first magnification level); 
apply, to one or more machine learning models (Paragraph [0079], computing system 120 may be configured to execute a second machine learning (ML) model 124), one or more images depicting the magnified portion of the 3D representation (Paragraph [0078], computing system 120 may be configured to, for each of the one or more cryogenic electron microscopy grid units, receive a second image montage comprising second images of the cryogenic electron microscopy grid unit captured at a second magnification level); and 
perform one or more operations associated with an object Paragraph [0079], the second image montage comprising second images 114 may be input to second ML model 124. Second ML model 124 may be configured to detect one or more apertures 134 within the cryogenic electron microscopy grid unit that satisfy a second quality condition based on the second image montage. As described herein, a “hole” refers to an aperture within the squares of the cryo-EM grid in which vitreous ice and protein are suspended) based at least on one or more second predictions obtained using the one or more machine learning models (Paragraph [0084], the second machine learning model may be configured to determine a predicted location of a center of each of the apertures (e.g., holes) ...).
However, KUMAR does not specifically disclose a machine in the environment.
In additional, Sholingar discloses a machine in the environment (Paragraph [0012], the computing device can detect and track traffic objects in an environment around a vehicle, where a traffic object is defined as a rigid or semi-rigid three-dimensional (3D) solid object occupying physical space in the real world surrounding a vehicle. Examples of traffic objects include vehicles and pedestrians, etc., as discussed below in relation to FIG. 2. Detecting and tracking traffic objects can include determining a plurality of estimates of the location of a traffic object with respect to the vehicle to determine motion and thereby predict future locations of traffic objects ... thereby permit a computing device to predict a future location for a traffic object based on a color video image of the vehicle's environment).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR incorporate the teachings of Sholingar, and applying the vehicle information system taught by Sholingar to provide compatibility with the system for performing control operations associated with a machine in the environment based on the predicted location. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR according to the relied-upon teachings of Sholingar to obtain the invention as specified in claim.

Regarding claim 14, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 12), and KUMAR disclose wherein the one or more second predictions correspond to one or more refined versions of the one or more first predictions (FIG. 1; paragraph [0077], computing system 120 may be configured to generate a comparison score. The comparison score may indicate how similar the predicted location is to the predetermined location, how similar the predicted boundary is to the predetermined boundary, or both. In some embodiments, computing system 120 may be configured to update a first set of parameters of the first machine learning model based on the comparisons).

	Regarding claim 15, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 14), and KUMAR disclose wherein the one or more second predictions are associated with one or more greater confidence scores (FIG. 1; paragraph [0080], second ML model 124 may evaluate second images 124 to determine a second score for each aperture. Example images of apertures within a square are illustrated in FIG. 3B. Computing system 120 may input second images 114 into second machine learning model 124 to obtain a second score indicating a likelihood that at least one of a biological structure suspended within each of a plurality of apertures depicted by second images 114; paragraph [0078], the second magnification level may comprise 10×, 20×, 30×, etc. ) than the one or more first predictions (Paragraph [0073], computing system 120 may be configured to detect, using first ML model 122, one or more cryogenic electron microscopy units (e.g., identified squares 132) that satisfy a first quality condition. In one or more example, the cryo-EM grid units may use first ML model 122 to generate a first score indicating a quality of the cryogenic electron microscopy grid units; paragraph [0078], the first magnification level may comprise 1×, 2×, 3×, etc.).

	Regarding claim 18, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 12), and KUMAR disclose wherein the system is comprised in at least one of: 
a control system for an autonomous or semi-autonomous machine; 
a perception system for an autonomous or semi-autonomous machine; 
a system for performing one or more simulation operations; 
a system for performing one or more digital twin operations; 
a system for performing light transport simulation; 
a system for performing collaborative content creation for 3D assets; 
a system for performing one or more deep learning operations; 
a system implemented using an edge device; 
a system implemented using a robot; 
a system for performing one or more generative AI operations; 
a system for performing operations using a large language model; 
a system for performing operations using one or more vision language models (VLMs); 
a system for performing operations using one or more multi-modal language models; 
a system for performing one or more conversational AI operations (Paragraph [0053], underlying foundational concepts and terms of art relied upon may relate to one or more of the following: software development, Artificial Intelligence/Machine Learning (AI/ML); paragraph [0067], a process for attributing special importance to a subset of the labeled images used during the training of the micrograph evaluation convolutional neural network (CNN)); 
a system for generating synthetic data; 
a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; 
a system incorporating one or more virtual machines (VMs); 
a system implemented at least partially in a data center; or 
a system implemented at least partially using cloud computing resources.

	Regarding claim 19, KUMAR discloses a least one processor (FIG. 1; paragraph [0040], computing system 120 may execute these machine learning models utilizing one or more processing devices (e.g., computing system 1500 discussed below with respect to FIG. 15) that may include hardware (e.g., a general purpose processor, a graphic processing unit (GPU), an application-specific integrated circuit (ASIC), a system-on-chip (SoC)) comprising: 
processing circuitry (Paragraph [0040], an application-specific integrated circuit (ASIC), a system-on-chip (SoC)) to perform one or more operations associated with an object Paragraph [0050], the user could obtain the model (e.g., the cryo-EM model described herein) and plug it into supported data acquisition software stacks; paragraph [0071], sample 130 being captured by microscope 110 may result in the images of a cryo-EM grid including holes within it. As described herein, a “grid” or “cryo-EM grid” refers to a substrate with which cryo-EM samples (e.g., samples 130) are placed for imaging ...; paragraph [0026], Cryo-EM's ability to elucidate 3D structures of previously uncharacterized, complex molecules has made it a powerful tool to define binding modes of novel compounds in protein targets; paragraph [0079], the second image montage comprising second images 114 may be input to second ML model 124. Second ML model 124 may be configured to detect one or more apertures 134 within the cryogenic electron microscopy grid unit that satisfy a second quality condition based on the second image montage. As described herein, a “hole” refers to an aperture within the squares of the cryo-EM grid in which vitreous ice and protein are suspended), the one or more updated predictions generated based at least on applying, to one or more machine learning models (Paragraph [0079], computing system 120 may be configured to execute a second machine learning (ML) model 124), one or more images depicting a magnified portion of a 3D representation of the environment (Paragraph [0078], computing system 120 may be configured to, for each of the one or more cryogenic electron microscopy grid units, receive a second image montage comprising second images of the cryogenic electron microscopy grid unit captured at a second magnification level), the magnified portion corresponding to one or more locations associated with one or more initial predictions (Paragraphs [0072]-[0073], the first image montage comprising first images 112 may be input to first ML model 122 ... computing system 120 may be configured to detect, using first ML model 122, one or more cryogenic electron microscopy units (e.g., identified squares 132). As shown in FIG. 1, the identified squares 132 is provided to microscope 110 with instructions to capture second images 114).
However, KUMAR does not specifically disclose a machine in the environment.
In additional, Sholingar discloses a machine in the environment (Paragraph [0012], the computing device can detect and track traffic objects in an environment around a vehicle, where a traffic object is defined as a rigid or semi-rigid three-dimensional (3D) solid object occupying physical space in the real world surrounding a vehicle. Examples of traffic objects include vehicles and pedestrians, etc., as discussed below in relation to FIG. 2. Detecting and tracking traffic objects can include determining a plurality of estimates of the location of a traffic object with respect to the vehicle to determine motion and thereby predict future locations of traffic objects ... thereby permit a computing device to predict a future location for a traffic object based on a color video image of the vehicle's environment).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR incorporate the teachings of Sholingar, and applying the vehicle information system taught by Sholingar to provide compatibility with the system for performing control operations associated with a machine in the environment based on the predicted location. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR according to the relied-upon teachings of Sholingar to obtain the invention as specified in claim.

	Regarding claim 20, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 19), and KUMAR disclose wherein the processor is comprised in at least one of: 
a control system for an autonomous or semi-autonomous machine; 
a perception system for an autonomous or semi-autonomous machine; 
a system for performing one or more simulation operations; 
a system for performing one or more digital twin operations; 
a system for performing light transport simulation; 
a system for performing collaborative content creation for 3D assets; 
a system for performing one or more deep learning operations; 
a system implemented using an edge device; 
a system implemented using a robot; a system for performing one or more generative AI operations; 
a system for performing operations using a large language model; 
a system for performing operations using one or more vision language models (VLMs); 
a system for performing operations using one or more multi-modal language models; 
a system for performing one or more conversational AI operations (Paragraph [0053], underlying foundational concepts and terms of art relied upon may relate to one or more of the following: software development, Artificial Intelligence/Machine Learning (AI/ML); paragraph [0067], a process for attributing special importance to a subset of the labeled images used during the training of the micrograph evaluation convolutional neural network (CNN)); 
a system for generating synthetic data; 
a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; 
a system incorporating one or more virtual machines (VMs); 
a system implemented at least partially in a data center; or 
a system implemented at least partially using cloud computing resources.

Claims 3, 11 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over KUMAR et al (U.S. Patent Application Publication 2024/0038488 A1) in view of Sholingar et al (U.S. Patent Application Publication 2020/0160070 A1) in view of Chatzikalymnios et al (U.S. Patent Application Publication 2025/0068228 A1).

	Regarding claim 3, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1).
However, KUMAR does not specifically disclose further comprising obtaining, based at least on the applying of the one or more virtual images to the one or more machine learning models, one or more heatmaps indicative of the one or more second predicted locations.
In additional, Chatzikalymnios discloses (Paragraph [0029], An XR device can receive a sequence of images and track one or more objects depicted in the images in a 3D space. The XR device may utilize various parameters to track an object ... and predictive information (e.g., using a machine learning model to predict object motion)) further comprising obtaining, based at least on the applying of the one or more virtual images to the one or more machine learning models (FIG. 2; paragraph [0064], The multi-camera object tracking system 216 may implement two phases of object tracking: ... Various algorithms, including algorithms implemented by object tracking machine learning models), one or more heatmaps indicative of the one or more second predicted locations (FIG. 3; paragraph [0086], The inference component 314 may be used to generate tracking estimates or predictions, e.g., to predict the location or pose of a tracked object. As mentioned, the XR device 110 may utilize one or more object tracking machine learning models for this purpose ... The machine learning model may, in some examples, be known as a core tracker. A core tracker is used in computer vision systems to track the movement of an object in a sequence of images or videos ...; paragraph [0137], the multi-camera object tracking system may be configured to generate a heatmap across the relevant image that returns a score for each projected location) .
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Chatzikalymnios, and applying the multi-camera object tracking system taught by Chatzikalymnios to provide compatibility with the system for using the machine learning model to generate the heatmaps based on the predicted location. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Chatzikalymnios to obtain the invention as specified in claim.

	Regarding claim 11, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1).
However, KUMAR does not specifically disclose wherein the one or more second predicted locations include one or more two-dimensional (2D) space predictions corresponding to virtual images of the one or more virtual images, the method further comprising: 
mapping the 2D space predictions into a 3D space; 
generating, based at least on the mapping, one or more 3D space predictions; and 
performing one or more second control operations associated with the machine based at least on the one or more 3D space predictions.
In additional, Chatzikalymnios discloses (Paragraph [0029], An XR device can receive a sequence of images and track one or more objects depicted in the images in a 3D space. The XR device may utilize various parameters to track an object ... and predictive information (e.g., using a machine learning model to predict object motion)) wherein the one or more second predicted locations include one or more two-dimensional (2D) space predictions (FIG. 2; paragraph [0118], the multi-camera object tracking system 216 then uses the predicted location of the object to execute a distance algorithm for determining which subset of the image sensors 208 ... the multi-camera object tracking system 216 may translate the predicted location of the object in the real world onto a camera view image, or onto an image plane associated with the particular camera, to obtain the predicted location as it would appear on an image captured by the particular camera. This may be referred to as a 2D projected location) corresponding to virtual images of the one or more virtual images (FIF. 3; paragraph [0086], The inference component 314 may be used to generate tracking estimates or predictions, e.g., to predict the location or pose of a tracked object. As mentioned, the XR device 110 may utilize one or more object tracking machine learning models for this purpose ... The machine learning model may, in some examples, be known as a core tracker. A core tracker is used in computer vision systems to track the movement of an object in a sequence of images or videos), the method further comprising: 
mapping the 2D space predictions into a 3D space (Paragraph [0118], he multi-camera object tracking system 216 may utilize the pose of the XR device 110 to calculate where a particular 3D point should appear on the 2D plane of a camera. The 2D projected location may be obtained by applying, for example, a projection matrix to the 3D coordinates of the relevant object); 
generating, based at least on the mapping, one or more 3D space predictions (Paragraph [0132], it will be appreciated that predetermined calibrations and/or transformations stored for each respective camera may be used by the multi-camera object tracking system to generate or update these 2D projections, together with other data such as the pose of the XR device. In some examples, the predicted 3D location of the physical object 108 in the real-world environment 102 is a 3D position in a defined coordinate system that can be related in a specific manner to each respective camera view based on predefined or predetermined calibrations, e.g., during setup of the cameras of the multi-camera object tracking system or in some offline calibration step); and 
performing one or more second control operations associated with the machine based at least on the one or more 3D space predictions (FIG. 1; paragraph [0131], the 3D location of the physical object 108 in the real-world environment 102, as predicted by the multi-camera object tracking system for the current frame, is then projected onto a left camera view image 902 associated with the left camera and a right camera view image 904 associated with the right camera).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Chatzikalymnios, and applying the multi-camera object tracking system taught by Chatzikalymnios to provide compatibility with the system for using the machine learning model to map the 2D space predictions into a 3D space and project the projected location of the object onto the image. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Chatzikalymnios to obtain the invention as specified in claim.

	Regarding claim 13, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 12).
However, KUMAR does not specifically disclose the one or more processors further to obtain, based at least on the application of the one or more images to the one or more machine learning models, one or more heatmaps indicative of one or more locations corresponding to the one or more second predictions.    
In additional, Chatzikalymnios discloses (Paragraph [0029], An XR device can receive a sequence of images and track one or more objects depicted in the images in a 3D space. The XR device may utilize various parameters to track an object ... and predictive information (e.g., using a machine learning model to predict object motion)) the one or more processors further to obtain, based at least on the application of the one or more images to the one or more machine learning models (FIG. 2; paragraph [0064], The multi-camera object tracking system 216 may implement two phases of object tracking: ... Various algorithms, including algorithms implemented by object tracking machine learning models), one or more heatmaps indicative of one or more locations corresponding to the one or more second predictions (FIG. 3; paragraph [0086], The inference component 314 may be used to generate tracking estimates or predictions, e.g., to predict the location or pose of a tracked object. As mentioned, the XR device 110 may utilize one or more object tracking machine learning models for this purpose ... The machine learning model may, in some examples, be known as a core tracker. A core tracker is used in computer vision systems to track the movement of an object in a sequence of images or videos ...; paragraph [0137], the multi-camera object tracking system may be configured to generate a heatmap across the relevant image that returns a score for each projected location) .
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Chatzikalymnios, and applying the multi-camera object tracking system taught by Chatzikalymnios to provide compatibility with the system for using the machine learning model to generate the heatmaps based on the predicted location. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Chatzikalymnios to obtain the invention as specified in claim.

Claims 6-7 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over KUMAR et al (U.S. Patent Application Publication 2024/0038488 A1) in view of Sholingar et al (U.S. Patent Application Publication 2020/0160070 A1) in view of Carroll et al (U.S. Patent Application Publication 2020/0117952 A1).

	Regarding claim 6, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1).
However, KUMAR does not specifically disclose wherein the one or more virtual images are rendered such that one or more sizes associated with the one or more virtual images are rationally divisible by one or more patch sizes associated with the one or more machine learning models.
In additional, Carroll discloses (Abstract, a computer-implemented method for target object position prediction includes receiving, via an RGB camera a plurality of images depicting one or more persons positioned on a floor ...) wherein the one or more virtual images are rendered (Paragraph [0034], the bottom portion of FIG. 3B shows how the annotated images 380 are created. The input images 305 are pre-processed by the preprocessing module 310, as described above with respect to FIG. 3A) such that one or more sizes associated with the one or more virtual images are rationally divisible by one or more patch sizes (Paragraphs [0026]-[0029], FIG. 3A illustrates an example technique for training a FP CNN ... In addition to the input images 303 and the person location labels 355, a set of camera coordinates 340 is supplied that describes the position of the cameras 302 in real world space. For example in some embodiments the camera coordinates 340 include latitude and longitude values of the camera, as well as directional information ... the directional information also includes spherical coordinates (i.e., declination and degree of rotation relative to world coordinates) .. A preprocessing module 310 performs any pre-processing necessary to prepare the images for further processing. This pre-processing may include, for example, cropping the input images 303 to a preferred size or to focus on target objects, denoising images, or converting the input images 303 from color to black and white (or vice versa) ...) associated with the one or more machine learning models (Paragraph [0021], upon determining that one or more images comprise the person 110, the one or more input images are fed into the PE model. For the purposes of this discussion, it is assumed that the PE model is a convolutional neural network (CNN)).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Carroll, and applying the method for target object position prediction taught by Carroll to provide compatibility with the system for using the machine learning model to render the image based on the  degree of rotation and a preferred size. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Carroll to obtain the invention as specified in claim.

	Regarding claim 7, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 1) and the combination of KUMAR in view of Sholingar discloses a machine in the environment (see claim 1).
However, KUMAR does not specifically disclose further comprising: 
determining, based at least on one or more local features corresponding to the one or more second predicted locations, a degree of rotation associated with manipulating an end-effector of the machine; and 
wherein the one or more control operations include rotating the end-effector of the machine based at least on the degree of rotation.  
In additional, Carroll discloses (Abstract, a computer-implemented method for target object position prediction includes receiving, via an RGB camera a plurality of images depicting one or more persons positioned on a floor ...) further comprising: 
determining, based at least on one or more local features corresponding to the one or more second predicted locations (Paragraph [0005], techniques disclosed herein include identifying and calculating body part key points of a person included in an input image and predicting a position of the person's feet using the image and the calculated body part key points), a degree of rotation (Paragraphs [0026]-[0029], FIG. 3A illustrates an example technique for training a FP CNN ... In addition to the input images 303 and the person location labels 355, a set of camera coordinates 340 is supplied that describes the position of the cameras 302 in real world space. For example in some embodiments the camera coordinates 340 include latitude and longitude values of the camera, as well as directional information ... the directional information also includes spherical coordinates (i.e., declination and degree of rotation relative to world coordinates) .. A preprocessing module 310 performs any pre-processing necessary to prepare the images for further processing. This pre-processing may include, for example, cropping the input images 303 to a preferred size or to focus on target objects, denoising images, or converting the input images 303 from color to black and white (or vice versa) ...) associated with manipulating an end-effector of the Paragraph [0021], upon determining that one or more images comprise the person 110, the one or more input images are fed into the PE model. For the purposes of this discussion, it is assumed that the PE model is a convolutional neural network (CNN)); and 
wherein the one or more control operations include rotating the end-effector of the Paragraph [0034], the bottom portion of FIG. 3B shows how the annotated images 380 are created. The input images 305 are pre-processed by the preprocessing module 310, as described above with respect to FIG. 3A).  
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Carroll, and applying the method for target object position prediction taught by Carroll to provide compatibility with the system for using the machine learning model to render the image based on the  degree of rotation and a preferred size. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Carroll to obtain the invention as specified in claim.

	Regarding claim 16, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 12).
However, KUMAR does not specifically disclose wherein the one or more images are generated such that one or more sizes associated with the one or more images are rationally divisible by one or more patch sizes associated with the one or more machine learning models.
In additional, Carroll discloses (Abstract, a computer-implemented method for target object position prediction includes receiving, via an RGB camera a plurality of images depicting one or more persons positioned on a floor ...) wherein the one or more images are generated (Paragraph [0034], the bottom portion of FIG. 3B shows how the annotated images 380 are created. The input images 305 are pre-processed by the preprocessing module 310, as described above with respect to FIG. 3A) such that one or more sizes associated with the one or more images are rationally divisible by one or more patch sizes (Paragraphs [0026]-[0029], FIG. 3A illustrates an example technique for training a FP CNN ... In addition to the input images 303 and the person location labels 355, a set of camera coordinates 340 is supplied that describes the position of the cameras 302 in real world space. For example in some embodiments the camera coordinates 340 include latitude and longitude values of the camera, as well as directional information ... the directional information also includes spherical coordinates (i.e., declination and degree of rotation relative to world coordinates) .. A preprocessing module 310 performs any pre-processing necessary to prepare the images for further processing. This pre-processing may include, for example, cropping the input images 303 to a preferred size or to focus on target objects, denoising images, or converting the input images 303 from color to black and white (or vice versa) ...) associated with the one or more machine learning models (Paragraph [0021], upon determining that one or more images comprise the person 110, the one or more input images are fed into the PE model. For the purposes of this discussion, it is assumed that the PE model is a convolutional neural network (CNN)).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Carroll, and applying the method for target object position prediction taught by Carroll to provide compatibility with the system for using the machine learning model to render the image based on the  degree of rotation and a preferred size. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Carroll to obtain the invention as specified in claim.

	Regarding claim 17, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 12) and the combination of KUMAR in view of Sholingar discloses a machine in the environment (see claim 12).
However, KUMAR does not specifically disclose the one or more processors further to: 
determine, based at least on one or more local features corresponding to the one or more second predictions, a degree of rotation associated with manipulating an end-effector of the machine; and 
wherein the one or more operations include rotating the end-effector of the machine based at least on the degree of rotation.
In additional, Carroll discloses (Abstract, a computer-implemented method for target object position prediction includes receiving, via an RGB camera a plurality of images depicting one or more persons positioned on a floor ...) the one or more processors further to: 
determine, based at least on one or more local features corresponding to the one or more second predictions (Paragraph [0005], techniques disclosed herein include identifying and calculating body part key points of a person included in an input image and predicting a position of the person's feet using the image and the calculated body part key points), a degree of rotation (Paragraphs [0026]-[0029], FIG. 3A illustrates an example technique for training a FP CNN ... In addition to the input images 303 and the person location labels 355, a set of camera coordinates 340 is supplied that describes the position of the cameras 302 in real world space. For example in some embodiments the camera coordinates 340 include latitude and longitude values of the camera, as well as directional information ... the directional information also includes spherical coordinates (i.e., declination and degree of rotation relative to world coordinates) .. A preprocessing module 310 performs any pre-processing necessary to prepare the images for further processing. This pre-processing may include, for example, cropping the input images 303 to a preferred size or to focus on target objects, denoising images, or converting the input images 303 from color to black and white (or vice versa) ...) associated with manipulating an end-effector of the Paragraph [0021], upon determining that one or more images comprise the person 110, the one or more input images are fed into the PE model. For the purposes of this discussion, it is assumed that the PE model is a convolutional neural network (CNN)); and 
wherein the one or more operations include rotating the end-effector of the Paragraph [0034], the bottom portion of FIG. 3B shows how the annotated images 380 are created. The input images 305 are pre-processed by the preprocessing module 310, as described above with respect to FIG. 3A).  
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Carroll, and applying the method for target object position prediction taught by Carroll to provide compatibility with the system for using the machine learning model to render the image based on the  degree of rotation and a preferred size. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Carroll to obtain the invention as specified in claim.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over KUMAR et al (U.S. Patent Application Publication 2024/0038488 A1) in view of Sholingar et al (U.S. Patent Application Publication 2020/0160070 A1) in view of Bhatnagar et al (U.S. Patent Application Publication 2025/0341941 A1).

	Regarding claim 10, the combination of KUMAR in view of Sholingar discloses everything claimed as applied above (see claim 9).
However, KUMAR does not specifically disclose wherein a first zoom factor associated with the one or more virtual images is greater than a second zoom factor associated with the one or more second virtual images.
In additional, Bhatnagar discloses (Abstract, while a view of a three-dimensional environment is visible via a display generation component, a computer system automatically detects an object in the three-dimensional environment ...; paragraph [0231], FIG. 7C illustrates a virtual magnifier 7032 magnifying (or increasing the size of) a first portion of application user interface 7030, according to some embodiments. For example, virtual magnifier 7032 magnifies search field 7036, among other user interface elements. In some embodiments, virtual magnifier 7032 magnifies portions of the view of the three-dimensional environment 7000′ on which user 7002 is focused) wherein a first zoom factor associated with the one or more virtual images is greater than a second zoom factor associated with the one or more second virtual images (Paragraph [0232], Virtual magnifier 7032 includes slider 7034, which is a user interface element for adjusting a zoom level of the virtual magnifier 7032. A user can increase or decrease the zoom or magnification level by interacting (e.g., directly, or indirectly) with slider 7034 ... If the zoom level is decreased in response to user's 7002 input directed at slider 7034, the size of the displayed search field and/or other user interface elements (displayed by or within virtual magnifier 7032) decreases in accordance with the decreased zoom level. Thus, the second zoom factor is less than the first zoom factor when the zoom level is decreased in response to user's 7002 input).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system taught by KUMAR in view of Sholingar incorporate the teachings of Bhatnagar, and applying the view of a three-dimensional environment taught by Bhatnagar to provide zoom level of the virtual magnifier for controlling the size of the displayed object in the three-dimensional environment. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify KUMAR in view of Sholingar according to the relied-upon teachings of Bhatnagar to obtain the invention as specified in claim.

Claims 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Bhatnagar et al (U.S. Patent Application Publication 2025/0341941 A1) in view of KUMAR et al (U.S. Patent Application Publication 2024/0038488 A1).

	Regarding claim 21, Bhatnagar discloses a method comprising: 
determining an area of interest surrounding an object in a virtual representation of an environment (Abstract, while a view of a three-dimensional environment is visible via a display generation component, a computer system automatically detects an object in the three-dimensional environment ...; FIG. 7B; paragraph [0229], a view of a three-dimensional environment 7000' ...; paragraph [0230], application user interface 7030 includes one or more user interface elements, including a search field 7036 ...); 
generating, using a virtual camera (Paragraph [0047], a computer system displays a ray extending in the respective direction away from a reference point associated with the user, such as the user's viewpoint, and displays a cursor moving automatically along the ray. In response to one or more additional inputs from the user to stop the movement of the cursor, the cursor is stopped at a particular position along the ray, and a target corresponding to the particular position of the cursor is selected for further interaction) and based at least on zooming-in the virtual camera to magnify a view of the area of interest (FIG. 7C; paragraph [0232], Virtual magnifier 7032 includes slider 7034, which is a user interface element for adjusting a zoom level of the virtual magnifier 7032. A user can increase or decrease the zoom or magnification level by interacting (e.g., directly, or indirectly) with slider 7034 ... If the zoom level is decreased in response to user's 7002 input directed at slider 7034, the size of the displayed search field and/or other user interface elements (displayed by or within virtual magnifier 7032) decreases in accordance with the decreased zoom level. Thus, the second zoom factor is less than the first zoom factor when the zoom level is decreased in response to user's 7002 input), an image depicting a magnified view of the area of interest (Paragraph [00232], if the zoom level is decreased in response to user's 7002 input directed at slider 7034, the size of the displayed search field and/or other user interface elements (displayed by or within virtual magnifier 7032) decreases in accordance with the decreased zoom level. Thus, the second zoom factor is less than the first zoom factor when the zoom level is decreased in response to user's 7002 input).
However, Bhatnagar does not specifically disclose determining, using a machine learning model to analyze the image, a location of the object in the environment; and 
causing a machine to manipulate the object based at least on the location.
In additional, KUMAR discloses (Paragraph [0050], the user could obtain the model (e.g., the cryo-EM model described herein) and plug it into supported data acquisition software stacks; FIG. 1; paragraph [0071], sample 130 being captured by microscope 110 may result in the images of a cryo-EM grid including holes within it. As described herein, a “grid” or “cryo-EM grid” refers to a substrate with which cryo-EM samples (e.g., samples 130) are placed for imaging ...; paragraph [0026], Cryo-EM's ability to elucidate 3D structures of previously uncharacterized, complex molecules has made it a powerful tool to define binding modes of novel compounds in protein targets) determining, using a machine learning model to analyze the image (Paragraph [0071], first images 112 captured at a first magnification level ... the first image montage comprising first images 112 may depict a cryogenic electron microscopy grid comprising a plurality of cryogenic electron microscopy grid units; paragraph [0072], the first image montage comprising first images 112 may be input to first ML model 122 ...;), a location of the object in the environment (Paragraph [0076], the first machine learning model (e.g., first ML model 122) may be configured to determine a predicted location of a center of each of the cryogenic electron microscopy grid units within a corresponding training image from the first plurality of training images); and 
causing a machine (Paragraph [0040], computing system 120 may execute these machine learning models utilizing one or more processing devices (e.g., computing system 1500 discussed below with respect to FIG. 15) that may include hardware (e.g., a general purpose processor, a graphic processing unit (GPU) ...) to manipulate the object based at least on the location (Paragraph [0073], computing system 120 may be configured to detect, using first ML model 122, one or more cryogenic electron microscopy units (e.g., identified squares 132) that satisfy a first quality condition. In one or more example, the cryo-EM grid units may use first ML model 122 to generate a first score indicating a quality of the cryogenic electron microscopy grid units).
 Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system for viewing a three-dimensional environment taught by Bhatnagar incorporate the teachings of KUMAR, and applying the computing system taught by KUMAR to provide machine learning models for predicting the location in the environment and perform the control operations associated with the object. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Bhatnagar according to the relied-upon teachings of Bhatnagar to obtain the invention as specified in claim.

	Regarding claim 22, the combination of Bhatnagar in view of KUMAR discloses everything claimed as applied above (see claim 21).
 	However, Bhatnagar does not specifically disclose wherein the determining the area of interest comprises: 
applying a second image depicting the virtual representation of the environment to a second machine learning model; 
determining, using the second machine learning model, a predicted location of the object in the environment; and 
determining the area of interest based at least on the predicted location.  
In additional, KUMAR discloses wherein the determining the area of interest comprises: 
applying a second image depicting the virtual representation of the environment to a second machine learning model (FIG. 1; paragraphs [0078]-[0079], second images 114 may be captured by microscope 110 at a second magnification level ... computing system 120 may be configured to execute a second machine learning (ML) model 124. The second image montage comprising second images 114 may be input to second ML model 124); 
determining, using the second machine learning model, a predicted location of the object in the environment (Paragraph [0085], training the second machine learning model may include inputting each of the second plurality of training images to the second machine learning model to obtain an output of a predicted location of a center of each of the apertures (e.g., the holes), or a predicted boundary of each of the apertures (e.g., the holes)); and 
determining the area of interest based at least on the predicted location (Paragraph [0079], second ML model 124 may be configured to detect one or more apertures 134 within the cryogenic electron microscopy grid unit that satisfy a second quality condition based on the second image montage. As described herein, a “hole” refers to an aperture within the squares of the cryo-EM grid in which vitreous ice and protein are suspended).  
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system for viewing a three-dimensional environment taught by Bhatnagar incorporate the teachings of KUMAR, and applying the computing system taught by KUMAR to provide machine learning models for predicting the location in the environment and perform the control operations associated with the object. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Bhatnagar according to the relied-upon teachings of Bhatnagar to obtain the invention as specified in claim.

	Regarding claim 23, the combination of Bhatnagar in view of KUMAR discloses everything claimed as applied above (see claim 21), and Bhatnagar discloses further comprising: 
generating, using a second virtual camera (Paragraph [0068], a viewpoint of a user determines what content is visible in the viewport, a viewpoint generally specifies a location and a direction relative to the three-dimensional environment, and as the viewpoint shifts, the view of the three-dimensional environment will also shift in the viewport. For a head mounted device, a viewpoint is typically based on a location and direction of the head, face, and/or eyes of a user to provide a view of the three-dimensional environment that is perceptually accurate and provides an immersive experience when the user is using the head-mounted device. For a handheld or stationed device, the viewpoint shifts as the handheld or stationed device is moved and/or as a position of a user relative to the handheld or stationed device changes (e.g., a user moving toward, away from, up, down, to the right, and/or to the left of the device) ... the viewpoint of the user moves as the field of view of the one or more cameras moves (and the appearance of one or more virtual objects displayed via the one or more display generation components is updated based on the viewpoint of the user (e.g., displayed positions and poses of the virtual objects are updated based on the movement of the viewpoint of the user)); paragraph [0223], ... changes the viewpoint of the user in the three-dimensional environment in accordance with the movement of the display generation component relative to the user's head or face or relative to the physical environment) and based at least on zooming-in the second virtual camera to magnify a second view of the area of interest (FIG. 7B; paragraph [0229], a view of a three-dimensional environment 7000' ...; paragraph [0230], application user interface 7030 includes one or more user interface elements, including a search field 7036 and control 7038 ...; paragraph [0234], virtual magnifier 7032 is moved from left to right while the distance between the virtual magnifier 7032 and application user interface 7030 remains the same. The second portion of application user interface 7030 that is magnified incudes a magnified version 7038′ of control 7038 for initiating a video call), a second image depicting a second magnified view of the area of interest from a different perspective than the image (Paragraph [0231], virtual magnifier 7032 magnifies portions of the view of the three-dimensional environment 7000′ on which user 7002 is focused, where the focus of the user can be determined in a number of ways (e.g., a location of a gaze, a cursor, a location pointed at by a controller, the user's field of view, or other ways); paragraph [0234], FIG. 7D shows the virtual magnifier 7032 at a position different from the position at which the virtual magnifier 7032 was in FIG. 7C ...  The second portion of application user interface 7030 that is magnified incudes a magnified version 7038′ of control 7038 for initiating a video call).
However, Bhatnagar does not specifically disclose wherein the determining the location of the object is based at least on using the machine learning model to analyze the image and the second image.
In additional, KUMAR discloses wherein the determining the location of the object is based at least on using the machine learning model to analyze the image and the second image (FIG. 1; paragraph [0072], the first image montage comprising first images 112 may be input to first ML model 122. First ML model 122 may be configured to detect one or more cryogenic electron microscopy grid units (e.g., squares) that satisfy a first quality condition based on the first image montage ... as shown in FIG. 1, the identified squares 132 is provided to microscope 110 with instructions to capture second images 114; paragraph [0079],  the second image montage comprising second images 114 may be input to second ML model 124. Second ML model 124 may be configured to detect one or more apertures 134 within the cryogenic electron microscopy grid unit that satisfy a second quality condition based on the second image montage. As described herein, a “hole” refers to an aperture within the squares of the cryo-EM grid in which vitreous ice and protein are suspended).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the computing system for viewing a three-dimensional environment taught by Bhatnagar incorporate the teachings of KUMAR, and applying the computing system taught by KUMAR to provide machine learning models for predicting the location in the environment and perform the control operations associated with the object. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Bhatnagar according to the relied-upon teachings of Bhatnagar to obtain the invention as specified in claim.

Conclusion
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Xilin Guo whose telephone number is (571)272-5786. The examiner can normally be reached Monday - Friday 9:00 AM-5:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 571-272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XILIN GUO/Primary Examiner, Art Unit 2616
Read full office action
Prosecution Timeline

Jul 12, 2024
Application Filed
Feb 13, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/484,586
Patent 12602855
LIVE MODEL PROMPTING AND REAL-TIME OUTPUT OF PHOTOREAL SYNTHETIC CONTENT
2y 5m to grant Granted Apr 14, 2026
18/589,428
Patent 12597403
DISPLAY DEVICE FOR A VEHICLE
2y 5m to grant Granted Apr 07, 2026
18/420,037
Patent 12579712
ASSET CREATION USING GENERATIVE ARTIFICIAL INTELLIGENCE
2y 5m to grant Granted Mar 17, 2026
18/586,703
Patent 12579766
SYSTEM AND METHOD FOR RAPID OUTFIT VISUALIZATION
2y 5m to grant Granted Mar 17, 2026
18/432,623
Patent 12573121
Automated Generation and Presentation of Sign Language Avatars for Video Content
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+17.4%)
2y 5m
Median Time to Grant
Low
PTA Risk
Based on 456 resolved cases by this examiner. Grant probability derived from career allow rate.