Last updated: May 29, 2026
Application No. 18/436,199
3D MASK GENERATION

Non-Final OA §103
Filed
Feb 08, 2024
Examiner
LIU, ZHENGXI
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Snap Inc.
OA Round
3 (Non-Final)
This examiner grants 64% of cases after interview

— +39.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 358 resolved cases, 2023–2026
Examiner Intelligence

LIU, ZHENGXI View full profile →
Grants 64% of resolved cases
Career Allowance Rate
229 granted / 358 resolved
+2.0% vs TC avg
Strong +40% interview lift
Without
With
+39.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
20 currently pending
Career history
387
Total Applications
across all art units
Statute-Specific Performance

§101
1.3%
-38.7% vs TC avg
§103
94.1%
+54.1% vs TC avg
§102
1.6%
-38.4% vs TC avg
§112
2.8%
-37.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 358 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/20/2026 has been entered.
Claim Status 
Applicant’s amendments and arguments have been received on 4/20/2026.  Claims 1, 19, and 20 have been amended.  Claim 21 has been added.  Claim 2 has been cancelled.  Claims 1, 3-21 are pending. 
Response to Arguments
Some of Applicant’s arguments are persuasive and moot in view of the Examiner’s new ground of rejection. In particular, the Examiner adopts on the record Applicant’s interpretation of “normal map” as explained on Remarks p. 8. 
Applicant states that “Second, Wu does not ‘encod[] the 2D image with added noise, the normal map, and input text, into a latent space for a trained neural network" as recited in amended claim 1.’”  Remarks p. 8. 
The Examiner disagrees. 
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.
The predicted noise is mapped to  
    PNG
    media_image2.png
    26
    108
    media_image2.png
    Greyscale
.
The encoded input text is mapped to y after embedding. 
The 2D image with added noise is mapped to the image that 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
represents.  
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
 is the result after encoding.
In addition or alternatively, the Examiner’s new reference Github teaches all inputs and conditions are encoded for a diffusion model. 

    PNG
    media_image4.png
    638
    738
    media_image4.png
    Greyscale

Compact Prosecution 
With respect to Claim Interpretation, the Examiner has provided some notes regarding “[BRI on the record]” throughout the Office Action, so that the record is clear about the scope of the claimed invention, and the record is also clear about the basis for the Examiner’s analyses.  A clear record of the claim interpretation could expedite the examination by creating the condition to allow the examination to focus on Applicant’s inventive concept and its comparison with related prior art. 
If there are disagreements, Applicant may present an alternative interpretation based on MPEP 2111.  The Examiner will adopt Applicant’s interpretation on the record, if Applicant’s interpretation is reasonable and/or arguments are persuasive. 
Applicant may amend claims relying on the Examiner’s claim interpretation provided on the record. 
Double Patenting
The provisional rejections of Claims 1, 3-8, 10-11, 15-17, and 19-20 on the ground of nonstatutory double patenting are withdrawn in view of Applicant’s amendments to the independent claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-10, 14, 16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (“Text-Guided 3D Face Synthesis - From Generation to Editing”) in view of Metzer et al. (“Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures”) and Github (“ControlNet”).
Regarding Claim 1, Wu teaches A computing device comprising: 
one or more processors; and one or more memories storing instructions that, when executed by the one or more processors, configure the computing device to perform operations (
    PNG
    media_image5.png
    344
    462
    media_image5.png
    Greyscale
  Wu discloses a machine learning model that uses a computer.) comprising: 
rendering a trainable three-dimensional (3D) mask to generate a two-dimensional (2D) image, the 3D trainable mask comprising adjustable mesh vertices (
[BRI on the record]  With respect to “trainable three-dimensional (3D) mask,” the Examiner is reading it to mean a 3D mask generated by a trainable  machine learning model, wherein the 3D mask comprises adjustable mesh vertices.  The Examiner revised the interpretation in view of Applicant’s amendments
[Mapping Analysis]

    PNG
    media_image6.png
    366
    1018
    media_image6.png
    Greyscale

“In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a finetuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts.”  Wu Abstract.
The claimed “trainable three-dimensional (3D) mask” is mapped to the 3D mask generated by the disclosed text-guided learning model. 
Note the masked 3D model generated after the text commands, “Let him wear a purple Zorro mask” and “Make his lips black” in fig. 1. 
Further, the generated 3D mask comprises adjustable mesh vertices as shown in Wu fig. 2: 

    PNG
    media_image7.png
    202
    382
    media_image7.png
    Greyscale


The mesh vertices of the 3DMM-based mesh is adjusted to geometry g similar to actress Scarlett Johansson. 
The two-dimensional (2D) image is images generated for animation and relighting as shown in Wu fig. 1.); 
adding noise to the 2D image to generate a 2D image with added noise (
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image8.png
    102
    500
    media_image8.png
    Greyscale

Wu Fig. 2 visually explained the formula:  
    PNG
    media_image9.png
    206
    808
    media_image9.png
    Greyscale

The image I is the rendered 2D image. 
Here, noise is injected/added to the rendered 2D image.
The 2D image with added noise is mapped to the image that 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
 represents.); 

encoding the 2D image with added noise, 
 	Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.
The predicted noise is mapped to  
    PNG
    media_image2.png
    26
    108
    media_image2.png
    Greyscale
.
The encoded input text is mapped to y after embedding. Therefore, y is in an embedded latent space. 
The 2D image with added noise is mapped to the image that 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
represents.  
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
 is the result after encoding.  In addition, 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
is “noisy latent code” in the latent space. 
The trained neural network is mapped to SD/ControlNet/InsP2P/TexDiffusion in fig. 2.   We states, “a pretrained 2D diffusion model ϕ with a denoising function ϵϕ (zt; y, t) to optimize 3D parametersθ.”
The Examiner’s secondary reference Github also has related teachings encoding and latent space.); 
determining a loss between the 2D image with added noise and the predicted noise (
    PNG
    media_image10.png
    34
    100
    media_image10.png
    Greyscale
); and 
updating positions of the adjustable mesh vertices of the trainable 3D mask based on the loss (
“Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.”  Wu 3.1. Preliminaries (Score distillation sampling).
“The employed facial 3D morphable model provides strong priors to ensure the quality of generated geometry. As to the alignment with the input text, we utilize SDS on the network ϕsd of Stable
Diffusion [35] to guide the geometry generation.”  Wu 3.2.
Here, the 3D morphable model is mesh based as shown in fig. 2.
The positions of the adjustable mesh vertices are updated, e.g., in fig. 2 the 3DMM-based mesh model is updated to look like Scarlett Johansson.).
If  updating positions of the adjustable mesh vertices of the trainable 3D mask based on the loss also requires updating the trainable  machine learning model that generates the 3D mask, Wu is not absolutely clear regarding its statement “Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.” Wu 3.1. Preliminaries (Score distillation sampling).  A person with ordinary skills in the art understood the statement to mean that the loss function based on the Score distillation sampling is used to update the machine learning model. 
Further, Wu does not explicitly disclose:
determining a normal map for the trainable 3D mask; or 
encoding the normal mapas a condition into the latent space for the trained neural network.
However, Metzer makes clearer explanation of updating . . . the trainable 3D mask based on the loss (

    PNG
    media_image11.png
    204
    372
    media_image11.png
    Greyscale
  Metzer 3.1.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model.  Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.”  Metzer 3.1.
Wu in view of Metzer does not explicitly disclose 
determining a normal map for the trainable 3D mask; or 
encoding the normal mapas an additional condition into the latent space for the trained neural network. 
Github teaches 
determining a normal map for the trainable 3D mask (
“This model use normal map. Right now in the APP, the normal is computed from the midas depth map and a user threshold (to determine how many area is background with identity normal face to viewer, tune the "Normal background threshold" in the gradio app to get a feeling).”  Github: normal map 
After the combination of Wu in view of Metzer and Github, the determined normal map is for Wu in view of Metzer’s trainable 3D mask.); and 
encoding the normal mapas an condition into the latent space for the trained neural network (
Github teaches the conditions (including normal map) are encoded: 
    PNG
    media_image12.png
    424
    378
    media_image12.png
    Greyscale
, where the condition(s) are encoded by SD Encoder Block.
Github teaches the condition(s) are for a diffusion model, stating “Official implementation of Adding Conditional Control to Text-to-Image Diffusion Models,” which dovetails with Wu fig. 2’s teaching: 
    PNG
    media_image9.png
    206
    808
    media_image9.png
    Greyscale
, where one is allowed to add “condition (optional)” for a diffusion model.). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Github’s use a normal map as a condition for a diffusion model with Wu in view of Metzer. One of ordinary skill in the art would be motivated to enhance the quality of the generated image.  Github states, “Compared to depth model, this model seems to be a bit better at preserving the geometry. This is intuitive: minor details are not salient in depth maps, but are salient in normal maps. Below is the depth result with same inputs. You can see that the hairstyle of the man in the input image is modified by depth model, but preserved by the normal model.”  Github: normal map.

Claims 19-20 are substantially similar to Claim 1.  The rejection analyses based on Wu in view of Metzer and Github for Claim 1 are also applied to Claims 19-20.  In addition, Claim 19 recites “A non-transitory computer-readable storage medium including instructions that, when processed by one or more processors of a computing device, configure the computing device to perform operations . . .”  (Wu 4.1). Claim 20 recites “A method performed on a computing device” (Wu fig 2; Wu 4.1). 

Regarding Claim 3, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein determining the loss further comprises: 
determining the loss based on a difference between the 2D image with added noise and the predicted noise (
	Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
).

Regarding Claim 4, Wu in view of Metzer and Github teaches The computing device of claim 3, wherein the operations further comprise: 
determining a gradient based on the loss (
Wu 3.1. Preliminaries (Score distillation sampling):
    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
); 
backpropagating the gradient through the 2D image with added noise ( 
    PNG
    media_image11.png
    204
    372
    media_image11.png
    Greyscale
  Metzer 3.1.  Here, the backpropagating is through the diffusion process.
Wu provides details for the diffusion process: 
    PNG
    media_image13.png
    352
    470
    media_image13.png
    Greyscale

Wu Fig. 2 visually explained the formula:  
    PNG
    media_image9.png
    206
    808
    media_image9.png
    Greyscale

The backpropagation through the diffusion process that employs the 2D image with added noise mapped to is mapped to 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
); 
backpropagating the gradient through the 3D trainable mask; and updating the trainable 3D mask based on the gradient (“Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.”  Wu 3.1. Preliminaries (Score distillation sampling).  The diffusion process is to train/optimize the 3D trainable mask as shown in Wu figs. 1-2.).

Regarding Claim 5, Wu in view of Metzer and Github teaches The computing device of claim 4, 
wherein the backpropagating the gradient through the 2D image with added noise further comprises: 
subtracting the predicted noise (
    PNG
    media_image14.png
    82
    98
    media_image14.png
    Greyscale
) from the 2D image with added noise (
    PNG
    media_image15.png
    84
    98
    media_image15.png
    Greyscale
) (
	Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.  Metzer fig. 2 visually represent the process: 
    PNG
    media_image16.png
    302
    236
    media_image16.png
    Greyscale
 ).

Regarding Claim 6, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein rendering further comprises: 
rendering, using a differential renderer component, the trainable 3D mask to generate the 2D image (
Metzer: 
    PNG
    media_image17.png
    600
    562
    media_image17.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s differential renderers with Wu. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model.  Metzer: 
    PNG
    media_image18.png
    224
    552
    media_image18.png
    Greyscale



Regarding Claim 7, Wu in view of Metzer and Github teaches The computing device of claim 6, wherein the operations further comprise:
determining a gradient based on the loss (
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.); and 
propagating the gradient through the differential renderer component (
(
    PNG
    media_image11.png
    204
    372
    media_image11.png
    Greyscale
  Metzer 3.1.  
    PNG
    media_image19.png
    206
    552
    media_image19.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model.  Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.”  Metzer 3.1.  Metzer: 
    PNG
    media_image18.png
    224
    552
    media_image18.png
    Greyscale


Regarding Claim 8, Wu in view of Metzer and Github teaches The computing device of claim 1, 
wherein the trained neural network is trained, using a diffusion model, to generate 2D images based on input texts and masks (
Wu:
    PNG
    media_image6.png
    366
    1018
    media_image6.png
    Greyscale

	Wu: Fig. 1 shows a trained neural network that generate 2D images based on input text (e.g., “Make his lips black” and 
    PNG
    media_image20.png
    88
    100
    media_image20.png
    Greyscale
. 
Wu: 
    PNG
    media_image21.png
    552
    462
    media_image21.png
    Greyscale
, where shows that diffusion models have been used.), and 
the trained neural network comprises one or more of: 
convolutional layers, one or more up sampling layers, one or more down sampling layers, and one or more fully connected layers (Metzer:
    PNG
    media_image22.png
    220
    554
    media_image22.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s NeRF with Wu’s score distillation. One of ordinary skill in the art would be motivated to efficiently enhance a model.  Metzer:
    PNG
    media_image23.png
    390
    554
    media_image23.png
    Greyscale


Regarding Claim 9, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein the operations further comprise: 
receiving the input text from a user (
    PNG
    media_image6.png
    366
    1018
    media_image6.png
    Greyscale

For example, the input text could be “Let him wear a purple Zorro mask” received from Zuckerberg.); 
accessing an image of the user (The system has access to an image of Zuckerberg.); and 
determining a shape of the trainable 3D mask based on a head of the user within the image (
    PNG
    media_image20.png
    88
    100
    media_image20.png
    Greyscale
).

Regarding Claim 10, Wu in view of Metzer and Github teaches The computing device of claim 1, 
wherein rendering the trainable 3D mask further comprises: 
selecting a camera angle (Wu: 
    PNG
    media_image24.png
    308
    464
    media_image24.png
    Greyscale
); and 
rendering the trainable 3D mask based on the camera angle to generate the 2D image (
Wu: 
    PNG
    media_image25.png
    370
    458
    media_image25.png
    Greyscale
).

Regarding Claim 14, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein updating the trainable 3D mask comprises:
adjusting position of vertices of a plurality of vertices ( Wu 3.1:
    PNG
    media_image26.png
    278
    466
    media_image26.png
    Greyscale
), the trainable 3D mask comprising the plurality of vertices (“T is the mean face and S is the vertices offset basis.”).

Regarding Claim 16, Wu in view of Metzer and Github teaches The computing device of claim 1, 
wherein the inputting the 2D image further comprises:
inputting the 2D image with added noise, the input text, and a number of iterations into the trained neural network to generate the predicted noise of the 2D image with added noise (
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image8.png
    102
    500
    media_image8.png
    Greyscale

Wu Fig. 2 visually explained the formula:  
    PNG
    media_image9.png
    206
    808
    media_image9.png
    Greyscale

The image I is the rendered 2D image. 
Here, noise is injected/added to the rendered 2D image.
The 2D image with added noise is mapped to 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
.
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.
The predicted noise is mapped to  
    PNG
    media_image2.png
    26
    108
    media_image2.png
    Greyscale
.
The input text is mapped to y. 
Metzer: 
    PNG
    media_image27.png
    202
    558
    media_image27.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model.  Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.”  Metzer 3.1.

Regarding Claim 18, Wu in view of Metzer and Github teaches The computing device of claim 1,
wherein the trained neural network is a first trained neural network, the 2D image is a first 2D image (See the analysis for Claim 1; here, there is only renaming.), and 
wherein the operations further comprise: 
inputting a 2D mask and the text into a second trained neural network to generate a second 2D image in a shape of the 2D mask representing the text (
	Wu fig. 2: 

    PNG
    media_image28.png
    342
    312
    media_image28.png
    Greyscale

The 2D mask is mapped to 
    PNG
    media_image29.png
    60
    50
    media_image29.png
    Greyscale
. 
The second 2D image is mapped to 
    PNG
    media_image30.png
    70
    58
    media_image30.png
    Greyscale
.
The text is mapped to “let her wear a batman eyemask.”

    PNG
    media_image31.png
    338
    460
    media_image31.png
    Greyscale
); and 
determining, based on colors of the second 2D image, colors for a plurality of vertices, the 3D trainable mask comprising the plurality of vertices (
Wu 3.1:
    PNG
    media_image26.png
    278
    466
    media_image26.png
    Greyscale


    PNG
    media_image32.png
    658
    954
    media_image32.png
    Greyscale
).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 10, in further view of  Karthik (“Generating Photo-realistic Images with Stable Diffusion XL (SDXL 1.0)”)
Regarding Claim 11, Wu in view of Metzer and Github teaches The computing device of claim 10, 
wherein the inputting the 2D image further comprises: 

inputting the 2D image with added noise and 
the 
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.
The predicted noise is mapped to  
    PNG
    media_image2.png
    26
    108
    media_image2.png
    Greyscale
.
The input text is mapped to y. 
The 2D image with added noise is mapped to 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
).
Wu in view of Metzer does not explicitly disclose 
modifying the input text in accordance with the camera angle; or
the modified input text is used for training. 
Karthik teaches 
modifying the input text in accordance with the camera angle (
    PNG
    media_image33.png
    642
    730
    media_image33.png
    Greyscale
); or
the modified input text is used for training (
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.
The predicted noise is mapped to  
    PNG
    media_image2.png
    26
    108
    media_image2.png
    Greyscale
.
The modified input text is mapped to y used for training). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Karthik’s input text to specify camera angle with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to allow a user to control camera angle through text input for an AI generated image.  Karthik states, “But how do we bridge the gap between textual prompts and photorealistic images? In this exploration, we'll delve into constructing text-to-image prompts using foundational photography concepts. We'll consider the intricacies of the exposure triangle, the nuances introduced by different camera types (be it Mirrorless, DSLR etc), the characteristics of various lenses, and the impact of camera angles. We'll also dissect the essence of shots, from full to extreme close-ups, understand the significance of the camera's eye line (normal, low, or high), and factor in lighting conditions. Furthermore, we'll categorize the types of images, whether they're portraits, landscapes, or motion blurs, to guide AI in generating the perfect photograph.”

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 10, in further view of Geng et al. (US 20060023923 A1). 
Regarding Claim 12,  Wu in view of Metzer and Github teaches The computing device of claim 10, wherein the operations further comprise:
selecting one or more lighting sources, wherein the rendering is further based on the one or more lighting sources
Wu A. Appendix: 
    PNG
    media_image34.png
    110
    462
    media_image34.png
    Greyscale


    PNG
    media_image35.png
    276
    460
    media_image35.png
    Greyscale
).
	Wu in view of Metzer does not explicitly disclose wherein the one or more lighting sources are selected based on an image of a user.
	Geng teaches wherein the one or more lighting sources are selected based on an image of a user (“selecting a lighting model to best depict said individual”  Geng ¶ 3.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Geng’s lighting selection with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to generate images that better depict an individual. Geng ¶ 3.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer, Github and Geng as applied to Claim 12, in further view of Karthik (“Generating Photo-realistic Images with Stable Diffusion XL (SDXL 1.0)”)
Regarding Claim 13, Wu in view of Metzer, Github, and Geng teaches The computing device of claim 12. 
Wu in view of Metzer, Github, and Geng does not explicitly disclose wherein the inputting the 2D image further comprises: 
modifying the input text in accordance with the one or more lighting sources; and 
inputting the 2D image with added noise and the modified input text into the trained neural network to generate the predicted noise. 
Karthik teaches wherein the inputting the 2D image further comprises: 
modifying the input text in accordance with the one or more lighting sources (Karthik: 
    PNG
    media_image36.png
    682
    724
    media_image36.png
    Greyscale
); and 
inputting the 2D image with added noise and the modified input text into the trained neural network to generate the predicted noise (
After Wu in view of Metzer and Geng is combined with Karthik, it teaches the limitation.  
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image1.png
    186
    510
    media_image1.png
    Greyscale
.
The predicted noise is mapped to  
    PNG
    media_image2.png
    26
    108
    media_image2.png
    Greyscale
.
The modified input text is mapped to y used for training. 
The 2D image with added noise is mapped to 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Karthik’s input text to specify camera angle with Wu in view of Metzer, Github, and Geng. One of ordinary skill in the art would be motivated to allow a user to control camera angle  through text input for an AI generated image.  Karthik states, “But how do we bridge the gap between textual prompts and photorealistic images? In this exploration, we'll delve into constructing text-to-image prompts using foundational photography concepts. We'll consider the intricacies of the exposure triangle, the nuances introduced by different camera types (be it Mirrorless, DSLR etc), the characteristics of various lenses, and the impact of camera angles. We'll also dissect the essence of shots, from full to extreme close-ups, understand the significance of the camera's eye line (normal, low, or high), and factor in lighting conditions. Furthermore, we'll categorize the types of images, whether they're portraits, landscapes, or motion blurs, to guide AI in generating the perfect photograph.”

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 1, in further view of  El Hanchi El Amrani et al. (US 20240233246 A1) (thereafter as Amrani).
Regarding Claim 15, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein adding noise further comprises: 

adding the 
Wu 3.1. Preliminaries (Score distillation sampling):

    PNG
    media_image8.png
    102
    500
    media_image8.png
    Greyscale

Wu Fig. 2 visually explained the formula:  
    PNG
    media_image9.png
    206
    808
    media_image9.png
    Greyscale

The image I is the rendered 2D image. 
Here, noise is injected/added to the rendered 2D image.
The 2D image with added noise is mapped to 
    PNG
    media_image3.png
    38
    48
    media_image3.png
    Greyscale
.).
However, Wu in view of Metzer and Github does not explicitly disclose sampling a Gaussian distribution to determine the noise, wherein an amount of the noise is based on a number of iterations of updating the trainable 3D mask, thereby the noise is Gaussian noise.
Amrani teaches sampling a Gaussian distribution to determine the noise, thereby the noise is Gaussian noise (“In various instances, the training component can iteratively insert noise (e.g., Gaussian noise) into the first output, thereby yielding a second output. Accordingly, the second output can have the same size, format, or dimensionality as the first output (e.g., as the randomized array discussed above).”  Amrani ¶ 47.), 
wherein an amount of the noise is based on a number of iterations of updating the trainable 3D mask (“In various cases, the training component can perform noise insertion on the first output for any suitable number of iterations (e.g., can insert any suitable amount of noise into the first output).”  Amrani ¶ 47.  Here, the amount of the noise is correlated to the number of iterations. The training is to update Wu in view of Metzer’s 3D mask.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Amrani’s noise insertion strategy with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to train and use suitable number of iterations, so that sufficient quality of the model could be achieved with reasonable expense of computing resources.  “In various cases, the training component can perform noise insertion on the first output for any suitable number of iterations (e.g., can insert any suitable amount of noise into the first output).”  Amrani ¶ 47.  

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 1, in further view of ZHANG et al. (US 20230168326 A1).
Regarding Claim 17, Wu in view of Metzer and Github teaches The computing device of claim 1.
Wu in view of Metzer and Github does not explicitly disclose wherein the operations further comprise: 
repeating the rendering, the adding, the inputting, the determining, and the updating, until the loss transgresses a threshold value.
Zhang teaches repeating the rendering, the adding, the inputting, the determining, and the updating, until the loss transgresses a threshold value (
“Exemplary termination conditions may be that the value of a loss function obtained in the certain iteration is less than a threshold value, that a certain count of iterations has been performed, that the loss function converges such that the difference of the values of the loss function obtained in a previous iteration and the current iteration is within a threshold value, etc.”  Zhang ¶ 120.
After Wu in view of Metzer is combined with Zhang, Wu in view of Metzer’s iterations will be terminated when its loss function transgress a threshold value.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Zhang’s termination condition with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to conserve computation resources.  When the quality of the model is sufficient, it may not be necessary to continue the computation. 

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 18, in further view of Wong (US 20240404170 A1).
Regarding Claim 21,  Wu in view of Metzer and Github teaches The apparatus of claim 18
 wherein the operations further comprise:
accessing a user image (
    PNG
    media_image37.png
    290
    262
    media_image37.png
    Greyscale
)

Wu in view of Metzer and Github does not explicitly disclose selecting a 2D mask based on a body part depicted in the user image.
Wong teaches 
accessing a user image (“At 1310, the method 1300 may include applying the face decoration texture over a human face in a live video feed. Thus, once created, the face decoration texture may be usable by the user or other users immediately in a live video feed with constantly updating frames and the face decoration texture may stay located on the human face with high accuracy.”  Wong ¶ 35.); and
selecting a 2D mask based on a body part depicted in the user image (
“In addition, the mobile version may include one or more suggested prompts 84, which may be accompanied by images or video of corresponding face decoration textures. Selection of one of the suggested prompts 84 by the user may result in the suggested prompt 84 being added to the prompt input box 78 for the user, who may be free to modify or add to the suggested prompt 84 before finalizing the user text prompt 42.”  Wong ¶ 28.

    PNG
    media_image38.png
    352
    478
    media_image38.png
    Greyscale

The mask, e.g., “GLAM,” is “images or video of corresponding face decoration textures,” which are understood to be rendered in 2D.  The Examiner takes an Official Notice that it would have been well-known in the art that image or video could be 2D on a display.  The benefits of combining this well-known knowledge would have been to display image that is supported by the display and/or to reduce complication of generating images. 
These masks are based on the face, depicted in the user image. 
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Wong’s recommendation/suggestions with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated guide a user to make decisions, which make is easier for a user who is not familiar with the software or process.  “In addition, the mobile version may include one or more suggested prompts 84, which may be accompanied by images or video of corresponding face decoration textures. Selection of one of the suggested prompts 84 by the user may result in the suggested prompt 84 being added to the prompt input box 78 for the user, who may be free to modify or add to the suggested prompt 84 before finalizing the user text prompt 42.”  Wong ¶ 28.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wong (US 20240404170 A1) teaches a similar feature

    PNG
    media_image39.png
    522
    716
    media_image39.png
    Greyscale

Here, fig. 7 is similar to Applicant’s Figs. 7-9, 12.  However, they do not disclose the claimed details of a process that generates similar results.  Note the tiger face mask.
Laine et al. (US 20220051481 A1) teaches features related to some of the dependent claim 6: “A modular differentiable renderer design yields high performance by leveraging existing, highly optimized hardware graphics pipelines to reconstruct the 3D model. The differential renderer renders images of the 3D model and differences between the rendered images and reference images are propagated backwards through the rendering pipeline to iteratively adjust the 3D model.”  Abstract.  However, Laine does not teach the limitations of the parent claims.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHENGXI LIU whose telephone number is (571)270-7509. The examiner can normally be reached M-F 9 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ZHENGXI LIU/Primary Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Feb 08, 2024
Application Filed
Oct 22, 2025
Non-Final Rejection mailed — §103
Jan 20, 2026
Response Filed
Feb 20, 2026
Final Rejection mailed — §103
Apr 06, 2026
Response after Non-Final Action
Apr 20, 2026
Request for Continued Examination
Apr 23, 2026
Response after Non-Final Action
Apr 29, 2026
Non-Final Rejection (signed) — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/515,870
Patent 12633067
EFFECTIVENESS BOOSTING IN THE METAVERSE
2y 6m to grant Granted May 19, 2026
18/570,571
Patent 12626441
IMAGE PROCESSING METHOD AND APPARATUS, AND ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted May 12, 2026
18/551,308
Patent 12608869
LIVE VIDEO BASED ON MOTION TRACKING AND ANIMATION OF FOREARM
2y 7m to grant Granted Apr 21, 2026
18/473,184
Patent 12602865
METHODS FOR DEPTH CONFLICT MITIGATION IN A THREE-DIMENSIONAL ENVIRONMENT
2y 6m to grant Granted Apr 14, 2026
18/554,610
Patent 12599463
COLOR MANAGEMENT PROCESS FOR CUSTOMIZED DENTAL RESTORATIONS
2y 6m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
64%
Grant Probability
99%
With Interview (+39.8%)
3y 2m (~10m remaining)
Median Time to Grant
High
PTA Risk
Based on 358 resolved cases by this examiner. Grant probability derived from career allowance rate.