Last updated: April 19, 2026
Application No. 17/976,583
Animatable Neural Radiance Fields from Monocular RGB-D Inputs

Final Rejection §103
Filed
Oct 28, 2022
Examiner
HARRISON, CHANTE E
Art Unit
2615
Tech Center
2600 — Communications
Assignee
Meta Platforms Technologies, LLC
OA Round
2 (Final)
Interview Optional

— +28.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 725 resolved cases, 2023–2026
Examiner Intelligence

HARRISON, CHANTE E View full profile →
Grants 69% — above average
Career Allow Rate
497 granted / 725 resolved
+6.6% vs TC avg
Strong +29% interview lift
Without
With
+28.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
30 currently pending
Career history
755
Total Applications
across all art units
Statute-Specific Performance

§101
8.9%
-31.1% vs TC avg
§103
40.3%
+0.3% vs TC avg
§102
31.8%
-8.2% vs TC avg
§112
15.2%
-24.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 725 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

1.	This action is responsive to communications: Amendment & Request for Reconsideration, filed on 04/24/2025.  This action is made FINAL.

2.	Claims 1-25 are pending in the case.  Claims 1, 19 and 20 are independent claims.  Claims 1, 13, 15, 17, and 19-20 have been amended.  

Response to Arguments
Applicant's arguments filed April 24, 2025 have been fully considered but they are not persuasive. 

Applicant argues (claims 1, 19 and 20) Brualla fails to disclose “generating, using a temporal transformer, a second latent representation based on tracking and combining temporal relationship between the sequence of image frames and the set of key frames, wherein the second latent representation encodes pose information of the one or more objects”.

In response, Brualla (Para 7) discloses A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them.  Further, Brualla (Para 8) discloses the systems and methods may generate based on the blending weights and the virtual view, a synthesized image according to the view parameters. Brualla (Para 84) discloses a multiresolution blending engine employs a two-stage, trained end-to- end convolutional network process, where the engine may utilize a number of source cameras.  Additionally, Brualla (Fig. 1; Para 52) discloses a plurality of input images captured at differing times (Fig. 5) are synthesized or aggregated with keyframe data and fed to an MLP (as Applicant’s Specification Para 42-43 similarly discloses).  Brualla (Para 67) discloses the image processor 216 also includes (and/or generates and/or receives), occlusion maps 226, depth maps 228, UV maps 230, target view parameters 232 (e.g. second latent representation), where (Para 76) the target view parameters 232 may include image parameters, and/or camera parameters associated with an image to be generated (e.g., synthesized). The target view parameters 232 may include a view direction, a pose, a camera perspective, and the like.  The interpretation of the camera-captured image content and/or video content may be used in combination with the techniques described herein to create unseen versions and views (e.g., poses, expressions, angles, etc.) of the captured image content and/or video content (Para 37, 66, 71).  Thus, Brualla discloses hardware having equivalent function and output as the temporal transformer to generate target view parameters including pose.  Therefore, Brualla discloses “generating, using a temporal transformer, a second latent representation based on tracking and combining temporal relationship between the sequence of image frames and the set of key frames, wherein the second latent representation encodes pose information of the one or more objects”.

	To the extent that the response to the applicant's arguments may have mentioned new portions of the prior art references which were not used in the prior office action, this does not constitute a new ground of rejection. It is clear that the prior art reference is of record and has been considered entirely by applicant. See In re Boyer, 363 F.2d 455, 458 n.2, 150 USPQ 441, 444, n.2 (CCPA 1966) and In re Bush, 296 F.2d 491, 496, 131 USPQ 263, 267 (CCPA 1961).

The mere fact that additional portions of the same reference may have been mentioned or relied upon does not constitute new ground of rejection. In re Meinhardt, 392, F.2d 273, 280, 157 USPQ 270, 275 (CCPA 1968).

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Hellenic Republic on September 21, 2022. It is noted, however, that applicant has not filed a certified copy of the GR202201011770 application as required by 37 CFR 1.55.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martin Brualla et al., WO 2022/216333 A1, and further in view of Sida Peng et al., “Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans”, March 29, 2021.
Independent claim 1, Brualla discloses a method, implemented by a computing system, comprising: 
accessing a particular image frame of a dynamic scene and depth information associated with the particular image frame, the dynamic scene comprising one or more objects in motion (i.e. the process may retrieve (e.g., capture, obtain, receive, etc.) a number of input images and data (e.g., target view parameters) to predict a novel view (e.g., an unseen color image) – Para 31; view of the target subject generated for a three-dimensional video conference – Para 11; the target is a moving image – Para 29, wherein the depth information is used to generate a point cloud of the particular image frame  (i.e. a 2D image may be projected to a 3D proxy model surface in order to generate a mesh proxy geometry 236. The proxy may function to represent a version of the actual geometry of particular image content – Para 74; a reproduced color (e.g., RGB) image, an i .sup.arp value may be calculated to represent reprojected RGB images (e.g., reprojected images 404), using target viewpoint depths (e.g., depth maps 228) reprojected into 3D space – Para 82);

generating a first latent representation based on the point cloud, the first latent representation encoding appearance information of the one or more objects depicted in the dynamic scene (i.e. The mesh proxy geometries 236 may represent a coarse geometry that includes a set of K proxies {P.sub.;,i, ... ,R ?,k) (i.e., rectangular, triangular, etc. meshes with UV coordinates). For example, a 2D image may be projected to a 3D proxy model surface in order to generate a mesh proxy geometry 236. The proxy may function to represent a version of the actual geometry of particular image content. In operation, system 200 uses proxy geometry principles to encode a geometric structure using a set of coarse proxy surfaces (e.g., mesh proxy geometries 236) as well as shape, albedo, and view dependent effects- Para 74); 

accessing (1) a sequence of image frames of the dynamic scene (i.e. the process may retrieve (e.g., capture, obtain, receive, etc.) a number of input images  - Para 31; novel (e.g., unseen) views may include image content and/or video content that has been interpreted (e.g., synthesized, interpolated, modeled, etc.) based on one or more frames of camera-captured image content and/or video content.  – Para 37) and (2) a set of key frames, wherein the sequence of image frames comprises the one or more objects in motion at a particular time segment (i.e. receive one or more input images (e.g., frames, stream) and/or other capture/feature parameter data and generate a feature-preserving one or more output images (e.g., frames, stream) – Para 75), and wherein the key frames are used to complete missing information of the one or more objects in the sequence of image frames (i.e. in-painter generated content missing from an image – Para 80); 

generating, using a temporal transformer, a second latent representation based on tracking and combining temporal relationship between the sequence of image frames and the set of key frames, wherein the second latent representation encodes pose information of the one or more objects (i.e.  images captured by witness cameras 119, 121 may be captured at substantially the same moment in time as a corresponding one of the other images (e.g., frames) captured by cameras 116, 118, 120, 122, 124, and/or 126, and combinations of such cameras; the system can track the position and orientation of the viewer's head, so that the 3D presentation can be rendered with the appearance corresponding to the viewer's current point of view – Para 52; target view parameters 232 may include image parameters, including a pose – Para 71; and are associated with the image depth map – Para 69, 82);

accessing camera parameters for rendering the one or more objects from a desired novel viewpoint (i.e. The view parameters 218 may include camera parameters associated with capture of particular input images 132 and/or associated with capture of an image to be generated (e.g., synthesized). In general, view parameters 218 may represent a camera model approximation – Para 66); 

generating a third latent representation based on the camera parameters, the third latent representation encoding camera pose information for the rendering (i.e.  view parameters 218 may represent a camera model approximation. The view parameters 218 may include any or all of a view direction, a camera perspective – Para 66); and 

training a neural network based model for free-viewpoint rendering of the dynamic scene based on the first, second, and third latent representations (i.e. generate a plurality of warped images based on the plurality of input images, the plurality of view parameters, and at least one of the plurality of depth images – abstract).  

Brualla fails to disclose training an improved neural radiance field (NeRF), which Peng discloses (i.e. NeRF represents scenes with implicit fields of density and color and introduces a set of latent codes, which are used with a network to encode the local geometry and appearance - pg. 3, col. 1, “Neural representation-based methods”; Neural Body generates implicit 3D representations of a human body at different video frames from the same set of latent codes, which are anchored to the vertices of a deformable mesh. For each frame, we transform the spatial locations of codes based on the human pose, and use a network to regress the density and color for any 3D location based on the structured latent codes. Then, images at any viewpoints can be synthesized by the volume rendering - pg. 2, col. 1, Para 1 “The basic idea of Neural Body”; code locations based on human pose are estimated from sparce camera views – pg. 2, col. 1, Para 3 “The basic idea of Neural Body”).
It would have been obvious to a POSITA before the effective filing date of the claimed invention at the time the invention was made to combine Peng’s known method of improved neural radiance field with the method of Brualla because enabling structured latent codes and neural networks to be jointly learned by minimizing the difference between the rendered images and input images provides the benefit of representation of a dynamic scene from effectively integrated observations in video.

 Claim 2, Brualla discloses the method of Claim 1, wherein training the improved NeRF-based model comprises: generating, by the improved NeRF-based model, the image based on combining color and values of all pixels in the image (i.e. using a learned blending of color and depth views of input images – Para 4); 

comparing generated image with a ground-truth image to compute a loss (i.e. The loss functions 234 may assess differences between a ground truth image and a predicted image – Para 72); and 

updating the improved NeRF-based model based on the loss (i.e. the loss functions can include a reconstruction loss based on a reconstruction difference between a segmented ground truth image mapped to activations of layers in aNN and a segmented predicted image mapped to activations of layers in the NN - Para 72).  

Brualla fails to disclose generating, by the improved NeRF-based model, a color value and a density value, for each pixel, of an image to render; generating, by the improved NeRF-based model, the image based on combining color and density values of all pixels in the image; and updating the improved NeRF-based model based on the loss, which Peng discloses (i.e. use a network to regress the density and color for any 3D location based on the structured latent codes. Then, images at any viewpoints can be synthesized by the volume rendering - pg. 2, col. 1, Para 1 “The basic idea of Neural Body”; NeRF represents scenes with implicit fields of density and color and introduces a set of latent codes, which are used with a network to encode the local geometry and appearance - pg. 3, col. 1, “Neural representation-based methods”; optimize the Neural Body to minimize rendering error of observed images by accounting for loss that measures the difference between the rendered and observed images – pg. 5, sec. 3.5 “Training”).

It would have been obvious to a POSITA before the effective filing date of the claimed invention at the time the invention was made to combine Peng’s known method generating, by the improved NeRF-based model, a color value and a density value, for each pixel, of an image to render; generating, by the improved NeRF-based model, the image based on combining color and density values of all pixels in the image; and updating the improved NeRF-based model based on the loss with the method of Brualla because using a network to regress the density and color for any 3D location based on latent codes provides the benefit of synthesizing images at any viewpoint, while minimizing rendering error.
Claim 3, Brualla discloses the method of Claim 2, wherein the ground-truth image and the image generated by the improved NeRF-based model are associated with a same viewpoint, the same viewpoint being the desired novel viewpoint (i.e.  witness cameras can be used by the systems described herein to ensure that image content that is uncaptured in the input images can be used to accurately predict novel views associated with the image content in the input images – Para 39; witness cameras may be used to capture high quality images, which may represent a ground truth image; images captured by witness cameras 119, 121 may be captured at substantially the same moment in time as a corresponding one of the other images (e.g., frames) captured by cameras 116, 118, 120, 122, 124, and/or 126 - Para 52).  

Claim 4, Brualla discloses the method of Claim 1 including one or more convolutional layers and down-sampling layers (Para 79) of reprojected images that are obtained from back projecting a target image point to a ray (Para 89) and that function as candidate colors selectable for a pixel in a novel synthesized output image (Para 93).

Peng discloses wherein generating the first latent representation comprises: obtaining a query pose of the one or more objects depicted in the dynamic scene by fitting points from the point cloud onto a predetermined body model (i.e. diffuse the latent codes defined on the surface to nearby 3D space – pg. 4, sec. 3.2. Code diffusion, Para 1”); extracting, using a sparse convolutional neural network, three-dimensional (3D) features from the query pose (i.e. choose the SparseConvNet to process the structured latent codes – pg. 4, sec. 3.2. Code diffusion, Para 2); generating a 3D volume based on extracted 3D features (i.e. SparseConvNet utilizes 3D sparse convolutions to process the input volume and output latent code volumes with 2×, 4×, 8×, 16× downsampled sizes - pg. 4, sec. 3.2. Code diffusion, Para 2); casting camera rays from a particular point of interest into the 3D volume to extract a subset of 3D features (i.e. given a viewpoint, accumulate volume densities and color along a corresponding  camera ray to estimate pixel colors – p.4. sec. 3.4. Volume rendering); and encoding, using a neural network, the subset of 3D features into the first latent representation (i.e. the Neural Body predicts volume densities and colors at these points – pg. 4, sec. 3.4. Volume rendering), which Brualla fails to disclose.  

It would have been obvious to a POSITA before the effective filing date of the claimed invention at the time the invention was made to combine Peng’s known method of generating the first latent representation comprises: obtaining a query pose of the one or more objects depicted in the dynamic scene by fitting points from the point cloud onto a predetermined body model; extracting, using a sparse convolutional neural network, three-dimensional (3D) features from the query pose; generating a 3D volume based on extracted 3D features; casting camera rays from a particular point of interest into the 3D volume to extract a subset of 3D features; and encoding, using a neural network, the subset of 3D features into the first latent representation with the method of Brualla because warping or reprojecting input images into reprojected images using the input image colors provides the benefit of reprojecting the input images into an output view that represents a desired novel view (Brualla, Para 93).

Claim 5, Brualla discloses the method of Claim 1, further comprising: accessing second depth information associated with each image frame of the sequence of image frames and the set of key frames; generating, using the second depth information, second point cloud associated with each image frame of the sequence of image frames and the set of key frames; accessing a predetermined body model or three-dimensional (3D) mesh corresponding to the one or more objects; and obtaining a sequence of query poses and a set of key poses corresponding to the sequence of image frames and the set of key frames, respectively, by fitting points from the second point cloud associated with each image frame and each key frame onto the predetermined body model (i.e. The images captured by the 3D content system 100 can be processed and thereafter displayed as a 3D presentation. As depicted in the example of FIG. 1, 3D image of user 104 is presented on the 3D display 110. As such, the user 102 can perceive the 3D image 104' (e.g., of a user) as a 3D representation of the user 104, who may be remotely located from the user 102. Similarly, the 3D image 102' is presented on the 3D display 112. As such, the user 104 can perceive the 3D image 102' as a 3D representation of the user 102 – Fig. 1; Para 56; for each of a plurality of captured images of either the first or second user the plurality of input images are processed and synthesized – Para 10).

Similar rationale as applied in the rejection of claims 1 and 4 apply herein.
Claim 6, Brualla discloses the method of Claim 5, further comprising: extracting, using a sparse convolutional neural network, 3D features from each of the sequence of query poses and the set of key poses; generating a set of 3D volumes corresponding to the sequence of query poses and the set of key poses based on extracted 3D features from each of the sequence of query poses and the set of key poses; casting camera rays from a particular point of interest into each of the 3D volumes of the set to extract a subset of 3D features from the 3D volume; (i.e. The images captured by the 3D content system 100 can be processed and thereafter displayed as a 3D presentation. As depicted in the example of FIG. 1, 3D image of user 104 is presented on the 3D display 110. As such, the user 102 can perceive the 3D image 104' (e.g., of a user) as a 3D representation of the user 104, who may be remotely located from the user 102. Similarly, the 3D image 102' is presented on the 3D display 112. As such, the user 104 can perceive the 3D image 102' as a 3D representation of the user 102 – Fig. 1; Para 56; for each of a plurality of captured images of either the first or second user, providing one or more convolutional layers and down-sampling layers - Para 79 - of reprojected images that are obtained from back projecting a target image point to a ray - Para 89 - and that function as candidate colors selectable for a pixel in a novel synthesized output image - Para 93).  

Peng discloses performing point tracking to identify (1) a first correspondence between the point of interest and a same point across the query poses and key poses and (2) a second correspondence between the point of interest and other points in each of the query poses and key poses (i.e. given a viewpoint, accumulate volume densities and color along a corresponding  camera ray to estimate pixel colors – p.4. sec. 3.4. Volume rendering; predicting densities and colors at point based on distances between adjacent sampled points – pg. 5, sec. 3.4. Volume rendering), which Brualla fails to disclose.

Similar rationale as applied in the rejection of claims 1 and 4 apply herein.

Claim 7, Brualla discloses the method of Claim 6.
Peng discloses wherein generating, using the temporal transformer, the second latent representation comprises: combining the extracted subset of 3D features from each of the 3D volumes, the first correspondence, and the second correspondence; processing, using the temporal transformer, combined information; and encoding processed combined information into the second latent representation (i.e. Neural Body generates implicit 3D representations of a human body at different video frames from the same set of latent codes, which are anchored to the vertices of a deformable mesh. For each frame, we transform the spatial locations of codes based on the human pose, and use a network to regress the density and color for any 3D location based on the structured latent codes- pg. 2, Fig. 2, col. 1), which Brualla fails to disclose.

Similar rationale as applied in the rejection of claims 1 and 6 apply herein.

Claim 8, Brualla discloses the method of Claim 1, further comprising performing the free-viewpoint rendering of a second dynamic scene using the improved NeRF-based model at inference time, wherein performing the free-viewpoint rendering of the second dynamic scene at the inference time comprises: accessing a single image of the second dynamic scene, second depth information associated with the single image, a second desired novel viewpoint from which to render the second dynamic scene, and the set of key frames; generating the first latent representation based on the single image and the second depth information associated with the single image; generating, using the temporal transformer, the second latent representation based on the single image of the dynamic scene, the second depth information associated with the single image, and the set of key frames; generating the third latent representation based on second camera parameters associated with the second desired novel viewpoint; and generating, using the improved NeRF-based model, color and density values for pixels of an image to render from the second desired novel viewpoint (i.e. The images captured by the 3D content system 100 can be processed and thereafter displayed as a 3D presentation. As depicted in the example of FIG. 1, 3D image of user 104 is presented on the 3D display 110. As such, the user 102 can perceive the 3D image 104' (e.g., of a user) as a 3D representation of the user 104, who may be remotely located from the user 102. Similarly, the 3D image 102' is presented on the 3D display 112. As such, the user 104 can perceive the 3D image 102' as a 3D representation of the user 102 – Fig. 1; Para 56).

Similar rationale as applied in the rejection of claims 1 and 2 apply herein.
Claim 9, Brualla discloses the method of Claim 8, wherein the second dynamic scene comprises a pose of the one or more objects that was not seen or observed during the training of the improved NeRF-based model (i.e. the second scene is a pose of a second object not used/observed during training of the neural network for the first object – Fig. 1).  

Claim 10, Brualla discloses the method of Claim 1, wherein the improved NeRF-based model is trained to perform the free-viewpoint rendering of the one or more objects in the dynamic scene under novel views and unseen poses (i.e. generating novel (e.g., unseen) views of image content – Para 29; enhance images in a 3D video conferencing system such as a telepresence system – Para 31).   

Claim 11, Brualla discloses the method of Claim 1, wherein the key frames are used to complete missing information of the one or more objects when the dynamic scene is rendered from a first viewpoint that is different from a second viewpoint from which the sequence of image frames was captured (i.e. the in-painter may function to pull-push hole-filling in images having missing depth information – Para 80; the neural network incorporates view dependent effects modelling the difference between true appearance and a diffuse reprojection – Para 81).  

Claim 12, Brualla discloses the method of Claim 1, wherein an object of the one or more objects in the dynamic scene comprises a human in motion (i.e. a target subject may include a user – Para 119; generating a virtual view of a target subject - Para 121; view of the target subject generated for a three-dimensional video conference – Para 11; the target is a moving image – Para 29; enhance images in a 3D video conferencing system such as a telepresence system – Para 31).

Claim 13, Brualla discloses the method of Claim 12, wherein the appearance information comprises one or more of facial characteristics of the human, body characteristics of the human, cloth winkles, or details of clothes that the human is wearing (i.e. create unseen versions and views (e.g., poses (body characteristics), expressions (facial characteristics), angles, etc.) of the captured image content – Para 37).  

Claim 14, Brualla discloses the method of Claim 1, wherein the camera parameters comprise a spatial location and a viewing direction of the camera from which to render the one or more objects of the dynamic scene (i.e. camera parameters are associated with capture of an image to be generated (e.g., synthesized); the view parameters may include any or all of a view direction, a pose, a camera perspective – Para 119).  

Claim 15, Brualla discloses the method of Claim 1, wherein the particular image frame that is used for generating the first latent representation is captured from the desired novel viewpoint (i.e. input images are captured with view parameters including a view direction – Para 66).  

Claim 16, Brualla discloses the method of Claim 1, wherein the desired novel viewpoint is provided via user input through one or more input mechanisms (i.e. obtain target view parameters to predict a novel view – Para 31; user input provided via devices providing interaction with a user – Para 147).  

Claim 17, Brualla discloses the method of Claim 1, wherein one of the image frames of the sequence of image frames comprises the particular image frame that is used for generating the first latent representation (i.e. The (mesh) proxy may function to represent a version of the actual geometry of particular image content – Para 74).  

Claim 18, Brualla discloses the method of Claim 1, wherein each of the first, second, and third latent representations is generated using a neural network (i.e. the image processor 216 may receive (e.g., obtain) one or more input images 132 and/or view parameters 218 and may generate image content  for NN, e.g. neural network – Para 65; The image processor 216 also includes (and/or generates and/or receives), occlusion maps 226, depth maps 228, UV maps 230, target view parameters 232, loss functions 234, and mesh proxy geometries 236 – Para 67; At least a portion of the synthesized view 250 may be determined based on output from a neural network (e.g., NN 224) using system 214 each time the user moves a head position while viewing the display and/or each time a particular image changes on the display – Para 84; the depth maps,  may be provided to the NN – Para 95; a number of view parameters, including any or all of a view direction, a pose, a camera perspective, lens distortions, and/or intrinsic and extrinsic parameters of a camera (virtual or actual camera), may be provided to the NN – Para 96).

Independent claim 19, the claim is similar in scope to claim 1.  Therefore, similar rationale as applied in the rejection of claim 1 applies herein.

Independent claim 20, the claim is similar in scope to claim 1.  Therefore, similar rationale as applied in the rejection of claim 1 applies herein.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHANTE HARRISON whose telephone number is (571)272-7659. The examiner can normally be reached Monday - Friday 8:00 am to 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Thompson can be reached on 571-272-2330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CHANTE E HARRISON/Primary Examiner, Art Unit 2615
Read full office action
Prosecution Timeline

Oct 28, 2022
Application Filed
Oct 25, 2024
Non-Final Rejection — §103
Apr 24, 2025
Response Filed
May 01, 2025
Response after Non-Final Action
Mar 12, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/120,569
Patent 12597213
GESTURE BASED TACTILE INTERACTION IN EXTENDED REALITY USING FORM FACTOR OF A PHYSICAL OBJECT
2y 5m to grant Granted Apr 07, 2026
18/219,627
Patent 12592043
Systems, Methods, and Graphical User Interfaces for Displaying and Manipulating Virtual Objects in Augmented Reality Environments
2y 5m to grant Granted Mar 31, 2026
18/378,996
Patent 12592045
AUGMENTED REALITY SYSTEM AND METHOD
2y 5m to grant Granted Mar 31, 2026
18/460,916
Patent 12586322
OPTICAL DEVICE FOR AUGMENTED REALITY HAVING GHOST IMAGE PREVENTION FUNCTION
2y 5m to grant Granted Mar 24, 2026
18/763,478
Patent 12561891
GRAPHICS PROCESSORS
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
69%
Grant Probability
97%
With Interview (+28.8%)
3y 4m
Median Time to Grant
Moderate
PTA Risk
Based on 725 resolved cases by this examiner. Grant probability derived from career allow rate.