Prosecution Insights
Last updated: May 29, 2026
Application No. 18/436,199

3D MASK GENERATION

Non-Final OA §103
Filed
Feb 08, 2024
Examiner
LIU, ZHENGXI
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Snap Inc.
OA Round
3 (Non-Final)
64%
Grant Probability
Moderate
3-4
OA Rounds
10m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 64% of resolved cases
64%
Career Allowance Rate
229 granted / 358 resolved
+2.0% vs TC avg
Strong +40% interview lift
Without
With
+39.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
20 currently pending
Career history
387
Total Applications
across all art units

Statute-Specific Performance

§101
1.3%
-38.7% vs TC avg
§103
94.1%
+54.1% vs TC avg
§102
1.6%
-38.4% vs TC avg
§112
2.8%
-37.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 358 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 4/20/2026 has been entered. Claim Status Applicant’s amendments and arguments have been received on 4/20/2026. Claims 1, 19, and 20 have been amended. Claim 21 has been added. Claim 2 has been cancelled. Claims 1, 3-21 are pending. Response to Arguments Some of Applicant’s arguments are persuasive and moot in view of the Examiner’s new ground of rejection. In particular, the Examiner adopts on the record Applicant’s interpretation of “normal map” as explained on Remarks p. 8. Applicant states that “Second, Wu does not ‘encod[] the 2D image with added noise, the normal map, and input text, into a latent space for a trained neural network" as recited in amended claim 1.’” Remarks p. 8. The Examiner disagrees. Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale . The predicted noise is mapped to PNG media_image2.png 26 108 media_image2.png Greyscale . The encoded input text is mapped to y after embedding. The 2D image with added noise is mapped to the image that PNG media_image3.png 38 48 media_image3.png Greyscale represents. PNG media_image3.png 38 48 media_image3.png Greyscale is the result after encoding. In addition or alternatively, the Examiner’s new reference Github teaches all inputs and conditions are encoded for a diffusion model. PNG media_image4.png 638 738 media_image4.png Greyscale Compact Prosecution With respect to Claim Interpretation, the Examiner has provided some notes regarding “[BRI on the record]” throughout the Office Action, so that the record is clear about the scope of the claimed invention, and the record is also clear about the basis for the Examiner’s analyses. A clear record of the claim interpretation could expedite the examination by creating the condition to allow the examination to focus on Applicant’s inventive concept and its comparison with related prior art. If there are disagreements, Applicant may present an alternative interpretation based on MPEP 2111. The Examiner will adopt Applicant’s interpretation on the record, if Applicant’s interpretation is reasonable and/or arguments are persuasive. Applicant may amend claims relying on the Examiner’s claim interpretation provided on the record. Double Patenting The provisional rejections of Claims 1, 3-8, 10-11, 15-17, and 19-20 on the ground of nonstatutory double patenting are withdrawn in view of Applicant’s amendments to the independent claims. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 3-10, 14, 16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (“Text-Guided 3D Face Synthesis - From Generation to Editing”) in view of Metzer et al. (“Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures”) and Github (“ControlNet”). Regarding Claim 1, Wu teaches A computing device comprising: one or more processors; and one or more memories storing instructions that, when executed by the one or more processors, configure the computing device to perform operations ( PNG media_image5.png 344 462 media_image5.png Greyscale Wu discloses a machine learning model that uses a computer.) comprising: rendering a trainable three-dimensional (3D) mask to generate a two-dimensional (2D) image, the 3D trainable mask comprising adjustable mesh vertices ( [BRI on the record] With respect to “trainable three-dimensional (3D) mask,” the Examiner is reading it to mean a 3D mask generated by a trainable machine learning model, wherein the 3D mask comprises adjustable mesh vertices. The Examiner revised the interpretation in view of Applicant’s amendments [Mapping Analysis] PNG media_image6.png 366 1018 media_image6.png Greyscale “In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a finetuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts.” Wu Abstract. The claimed “trainable three-dimensional (3D) mask” is mapped to the 3D mask generated by the disclosed text-guided learning model. Note the masked 3D model generated after the text commands, “Let him wear a purple Zorro mask” and “Make his lips black” in fig. 1. Further, the generated 3D mask comprises adjustable mesh vertices as shown in Wu fig. 2: PNG media_image7.png 202 382 media_image7.png Greyscale The mesh vertices of the 3DMM-based mesh is adjusted to geometry g similar to actress Scarlett Johansson. The two-dimensional (2D) image is images generated for animation and relighting as shown in Wu fig. 1.); adding noise to the 2D image to generate a 2D image with added noise ( Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image8.png 102 500 media_image8.png Greyscale Wu Fig. 2 visually explained the formula: PNG media_image9.png 206 808 media_image9.png Greyscale The image I is the rendered 2D image. Here, noise is injected/added to the rendered 2D image. The 2D image with added noise is mapped to the image that PNG media_image3.png 38 48 media_image3.png Greyscale represents.); encoding the 2D image with added noise, Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale . The predicted noise is mapped to PNG media_image2.png 26 108 media_image2.png Greyscale . The encoded input text is mapped to y after embedding. Therefore, y is in an embedded latent space. The 2D image with added noise is mapped to the image that PNG media_image3.png 38 48 media_image3.png Greyscale represents. PNG media_image3.png 38 48 media_image3.png Greyscale is the result after encoding. In addition, PNG media_image3.png 38 48 media_image3.png Greyscale is “noisy latent code” in the latent space. The trained neural network is mapped to SD/ControlNet/InsP2P/TexDiffusion in fig. 2. We states, “a pretrained 2D diffusion model ϕ with a denoising function ϵϕ (zt; y, t) to optimize 3D parametersθ.” The Examiner’s secondary reference Github also has related teachings encoding and latent space.); determining a loss between the 2D image with added noise and the predicted noise ( PNG media_image10.png 34 100 media_image10.png Greyscale ); and updating positions of the adjustable mesh vertices of the trainable 3D mask based on the loss ( “Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.” Wu 3.1. Preliminaries (Score distillation sampling). “The employed facial 3D morphable model provides strong priors to ensure the quality of generated geometry. As to the alignment with the input text, we utilize SDS on the network ϕsd of Stable Diffusion [35] to guide the geometry generation.” Wu 3.2. Here, the 3D morphable model is mesh based as shown in fig. 2. The positions of the adjustable mesh vertices are updated, e.g., in fig. 2 the 3DMM-based mesh model is updated to look like Scarlett Johansson.). If updating positions of the adjustable mesh vertices of the trainable 3D mask based on the loss also requires updating the trainable machine learning model that generates the 3D mask, Wu is not absolutely clear regarding its statement “Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.” Wu 3.1. Preliminaries (Score distillation sampling). A person with ordinary skills in the art understood the statement to mean that the loss function based on the Score distillation sampling is used to update the machine learning model. Further, Wu does not explicitly disclose: determining a normal map for the trainable 3D mask; or encoding the normal mapas a condition into the latent space for the trained neural network. However, Metzer makes clearer explanation of updating . . . the trainable 3D mask based on the loss ( PNG media_image11.png 204 372 media_image11.png Greyscale Metzer 3.1.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.” Metzer 3.1. Wu in view of Metzer does not explicitly disclose determining a normal map for the trainable 3D mask; or encoding the normal mapas an additional condition into the latent space for the trained neural network. Github teaches determining a normal map for the trainable 3D mask ( “This model use normal map. Right now in the APP, the normal is computed from the midas depth map and a user threshold (to determine how many area is background with identity normal face to viewer, tune the "Normal background threshold" in the gradio app to get a feeling).” Github: normal map After the combination of Wu in view of Metzer and Github, the determined normal map is for Wu in view of Metzer’s trainable 3D mask.); and encoding the normal mapas an condition into the latent space for the trained neural network ( Github teaches the conditions (including normal map) are encoded: PNG media_image12.png 424 378 media_image12.png Greyscale , where the condition(s) are encoded by SD Encoder Block. Github teaches the condition(s) are for a diffusion model, stating “Official implementation of Adding Conditional Control to Text-to-Image Diffusion Models,” which dovetails with Wu fig. 2’s teaching: PNG media_image9.png 206 808 media_image9.png Greyscale , where one is allowed to add “condition (optional)” for a diffusion model.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Github’s use a normal map as a condition for a diffusion model with Wu in view of Metzer. One of ordinary skill in the art would be motivated to enhance the quality of the generated image. Github states, “Compared to depth model, this model seems to be a bit better at preserving the geometry. This is intuitive: minor details are not salient in depth maps, but are salient in normal maps. Below is the depth result with same inputs. You can see that the hairstyle of the man in the input image is modified by depth model, but preserved by the normal model.” Github: normal map. Claims 19-20 are substantially similar to Claim 1. The rejection analyses based on Wu in view of Metzer and Github for Claim 1 are also applied to Claims 19-20. In addition, Claim 19 recites “A non-transitory computer-readable storage medium including instructions that, when processed by one or more processors of a computing device, configure the computing device to perform operations . . .” (Wu 4.1). Claim 20 recites “A method performed on a computing device” (Wu fig 2; Wu 4.1). Regarding Claim 3, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein determining the loss further comprises: determining the loss based on a difference between the 2D image with added noise and the predicted noise ( Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale ). Regarding Claim 4, Wu in view of Metzer and Github teaches The computing device of claim 3, wherein the operations further comprise: determining a gradient based on the loss ( Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale ); backpropagating the gradient through the 2D image with added noise ( PNG media_image11.png 204 372 media_image11.png Greyscale Metzer 3.1. Here, the backpropagating is through the diffusion process. Wu provides details for the diffusion process: PNG media_image13.png 352 470 media_image13.png Greyscale Wu Fig. 2 visually explained the formula: PNG media_image9.png 206 808 media_image9.png Greyscale The backpropagation through the diffusion process that employs the 2D image with added noise mapped to is mapped to PNG media_image3.png 38 48 media_image3.png Greyscale ); backpropagating the gradient through the 3D trainable mask; and updating the trainable 3D mask based on the gradient (“Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.” Wu 3.1. Preliminaries (Score distillation sampling). The diffusion process is to train/optimize the 3D trainable mask as shown in Wu figs. 1-2.). Regarding Claim 5, Wu in view of Metzer and Github teaches The computing device of claim 4, wherein the backpropagating the gradient through the 2D image with added noise further comprises: subtracting the predicted noise ( PNG media_image14.png 82 98 media_image14.png Greyscale ) from the 2D image with added noise ( PNG media_image15.png 84 98 media_image15.png Greyscale ) ( Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale . Metzer fig. 2 visually represent the process: PNG media_image16.png 302 236 media_image16.png Greyscale ). Regarding Claim 6, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein rendering further comprises: rendering, using a differential renderer component, the trainable 3D mask to generate the 2D image ( Metzer: PNG media_image17.png 600 562 media_image17.png Greyscale ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s differential renderers with Wu. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer: PNG media_image18.png 224 552 media_image18.png Greyscale Regarding Claim 7, Wu in view of Metzer and Github teaches The computing device of claim 6, wherein the operations further comprise: determining a gradient based on the loss ( Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale .); and propagating the gradient through the differential renderer component ( ( PNG media_image11.png 204 372 media_image11.png Greyscale Metzer 3.1. PNG media_image19.png 206 552 media_image19.png Greyscale ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.” Metzer 3.1. Metzer: PNG media_image18.png 224 552 media_image18.png Greyscale Regarding Claim 8, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein the trained neural network is trained, using a diffusion model, to generate 2D images based on input texts and masks ( Wu: PNG media_image6.png 366 1018 media_image6.png Greyscale Wu: Fig. 1 shows a trained neural network that generate 2D images based on input text (e.g., “Make his lips black” and PNG media_image20.png 88 100 media_image20.png Greyscale . Wu: PNG media_image21.png 552 462 media_image21.png Greyscale , where shows that diffusion models have been used.), and the trained neural network comprises one or more of: convolutional layers, one or more up sampling layers, one or more down sampling layers, and one or more fully connected layers (Metzer: PNG media_image22.png 220 554 media_image22.png Greyscale ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s NeRF with Wu’s score distillation. One of ordinary skill in the art would be motivated to efficiently enhance a model. Metzer: PNG media_image23.png 390 554 media_image23.png Greyscale Regarding Claim 9, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein the operations further comprise: receiving the input text from a user ( PNG media_image6.png 366 1018 media_image6.png Greyscale For example, the input text could be “Let him wear a purple Zorro mask” received from Zuckerberg.); accessing an image of the user (The system has access to an image of Zuckerberg.); and determining a shape of the trainable 3D mask based on a head of the user within the image ( PNG media_image20.png 88 100 media_image20.png Greyscale ). Regarding Claim 10, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein rendering the trainable 3D mask further comprises: selecting a camera angle (Wu: PNG media_image24.png 308 464 media_image24.png Greyscale ); and rendering the trainable 3D mask based on the camera angle to generate the 2D image ( Wu: PNG media_image25.png 370 458 media_image25.png Greyscale ). Regarding Claim 14, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein updating the trainable 3D mask comprises: adjusting position of vertices of a plurality of vertices ( Wu 3.1: PNG media_image26.png 278 466 media_image26.png Greyscale ), the trainable 3D mask comprising the plurality of vertices (“T is the mean face and S is the vertices offset basis.”). Regarding Claim 16, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein the inputting the 2D image further comprises: inputting the 2D image with added noise, the input text, and a number of iterations into the trained neural network to generate the predicted noise of the 2D image with added noise ( Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image8.png 102 500 media_image8.png Greyscale Wu Fig. 2 visually explained the formula: PNG media_image9.png 206 808 media_image9.png Greyscale The image I is the rendered 2D image. Here, noise is injected/added to the rendered 2D image. The 2D image with added noise is mapped to PNG media_image3.png 38 48 media_image3.png Greyscale . Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale . The predicted noise is mapped to PNG media_image2.png 26 108 media_image2.png Greyscale . The input text is mapped to y. Metzer: PNG media_image27.png 202 558 media_image27.png Greyscale ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.” Metzer 3.1. Regarding Claim 18, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein the trained neural network is a first trained neural network, the 2D image is a first 2D image (See the analysis for Claim 1; here, there is only renaming.), and wherein the operations further comprise: inputting a 2D mask and the text into a second trained neural network to generate a second 2D image in a shape of the 2D mask representing the text ( Wu fig. 2: PNG media_image28.png 342 312 media_image28.png Greyscale The 2D mask is mapped to PNG media_image29.png 60 50 media_image29.png Greyscale . The second 2D image is mapped to PNG media_image30.png 70 58 media_image30.png Greyscale . The text is mapped to “let her wear a batman eyemask.” PNG media_image31.png 338 460 media_image31.png Greyscale ); and determining, based on colors of the second 2D image, colors for a plurality of vertices, the 3D trainable mask comprising the plurality of vertices ( Wu 3.1: PNG media_image26.png 278 466 media_image26.png Greyscale PNG media_image32.png 658 954 media_image32.png Greyscale ). Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 10, in further view of Karthik (“Generating Photo-realistic Images with Stable Diffusion XL (SDXL 1.0)”) Regarding Claim 11, Wu in view of Metzer and Github teaches The computing device of claim 10, wherein the inputting the 2D image further comprises: inputting the 2D image with added noise and the Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale . The predicted noise is mapped to PNG media_image2.png 26 108 media_image2.png Greyscale . The input text is mapped to y. The 2D image with added noise is mapped to PNG media_image3.png 38 48 media_image3.png Greyscale ). Wu in view of Metzer does not explicitly disclose modifying the input text in accordance with the camera angle; or the modified input text is used for training. Karthik teaches modifying the input text in accordance with the camera angle ( PNG media_image33.png 642 730 media_image33.png Greyscale ); or the modified input text is used for training ( Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale . The predicted noise is mapped to PNG media_image2.png 26 108 media_image2.png Greyscale . The modified input text is mapped to y used for training). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Karthik’s input text to specify camera angle with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to allow a user to control camera angle through text input for an AI generated image. Karthik states, “But how do we bridge the gap between textual prompts and photorealistic images? In this exploration, we'll delve into constructing text-to-image prompts using foundational photography concepts. We'll consider the intricacies of the exposure triangle, the nuances introduced by different camera types (be it Mirrorless, DSLR etc), the characteristics of various lenses, and the impact of camera angles. We'll also dissect the essence of shots, from full to extreme close-ups, understand the significance of the camera's eye line (normal, low, or high), and factor in lighting conditions. Furthermore, we'll categorize the types of images, whether they're portraits, landscapes, or motion blurs, to guide AI in generating the perfect photograph.” Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 10, in further view of Geng et al. (US 20060023923 A1). Regarding Claim 12, Wu in view of Metzer and Github teaches The computing device of claim 10, wherein the operations further comprise: selecting one or more lighting sources, wherein the rendering is further based on the one or more lighting sources Wu A. Appendix: PNG media_image34.png 110 462 media_image34.png Greyscale PNG media_image35.png 276 460 media_image35.png Greyscale ). Wu in view of Metzer does not explicitly disclose wherein the one or more lighting sources are selected based on an image of a user. Geng teaches wherein the one or more lighting sources are selected based on an image of a user (“selecting a lighting model to best depict said individual” Geng ¶ 3.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Geng’s lighting selection with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to generate images that better depict an individual. Geng ¶ 3. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer, Github and Geng as applied to Claim 12, in further view of Karthik (“Generating Photo-realistic Images with Stable Diffusion XL (SDXL 1.0)”) Regarding Claim 13, Wu in view of Metzer, Github, and Geng teaches The computing device of claim 12. Wu in view of Metzer, Github, and Geng does not explicitly disclose wherein the inputting the 2D image further comprises: modifying the input text in accordance with the one or more lighting sources; and inputting the 2D image with added noise and the modified input text into the trained neural network to generate the predicted noise. Karthik teaches wherein the inputting the 2D image further comprises: modifying the input text in accordance with the one or more lighting sources (Karthik: PNG media_image36.png 682 724 media_image36.png Greyscale ); and inputting the 2D image with added noise and the modified input text into the trained neural network to generate the predicted noise ( After Wu in view of Metzer and Geng is combined with Karthik, it teaches the limitation. Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image1.png 186 510 media_image1.png Greyscale . The predicted noise is mapped to PNG media_image2.png 26 108 media_image2.png Greyscale . The modified input text is mapped to y used for training. The 2D image with added noise is mapped to PNG media_image3.png 38 48 media_image3.png Greyscale ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Karthik’s input text to specify camera angle with Wu in view of Metzer, Github, and Geng. One of ordinary skill in the art would be motivated to allow a user to control camera angle through text input for an AI generated image. Karthik states, “But how do we bridge the gap between textual prompts and photorealistic images? In this exploration, we'll delve into constructing text-to-image prompts using foundational photography concepts. We'll consider the intricacies of the exposure triangle, the nuances introduced by different camera types (be it Mirrorless, DSLR etc), the characteristics of various lenses, and the impact of camera angles. We'll also dissect the essence of shots, from full to extreme close-ups, understand the significance of the camera's eye line (normal, low, or high), and factor in lighting conditions. Furthermore, we'll categorize the types of images, whether they're portraits, landscapes, or motion blurs, to guide AI in generating the perfect photograph.” Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 1, in further view of El Hanchi El Amrani et al. (US 20240233246 A1) (thereafter as Amrani). Regarding Claim 15, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein adding noise further comprises: adding the Wu 3.1. Preliminaries (Score distillation sampling): PNG media_image8.png 102 500 media_image8.png Greyscale Wu Fig. 2 visually explained the formula: PNG media_image9.png 206 808 media_image9.png Greyscale The image I is the rendered 2D image. Here, noise is injected/added to the rendered 2D image. The 2D image with added noise is mapped to PNG media_image3.png 38 48 media_image3.png Greyscale .). However, Wu in view of Metzer and Github does not explicitly disclose sampling a Gaussian distribution to determine the noise, wherein an amount of the noise is based on a number of iterations of updating the trainable 3D mask, thereby the noise is Gaussian noise. Amrani teaches sampling a Gaussian distribution to determine the noise, thereby the noise is Gaussian noise (“In various instances, the training component can iteratively insert noise (e.g., Gaussian noise) into the first output, thereby yielding a second output. Accordingly, the second output can have the same size, format, or dimensionality as the first output (e.g., as the randomized array discussed above).” Amrani ¶ 47.), wherein an amount of the noise is based on a number of iterations of updating the trainable 3D mask (“In various cases, the training component can perform noise insertion on the first output for any suitable number of iterations (e.g., can insert any suitable amount of noise into the first output).” Amrani ¶ 47. Here, the amount of the noise is correlated to the number of iterations. The training is to update Wu in view of Metzer’s 3D mask.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Amrani’s noise insertion strategy with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to train and use suitable number of iterations, so that sufficient quality of the model could be achieved with reasonable expense of computing resources. “In various cases, the training component can perform noise insertion on the first output for any suitable number of iterations (e.g., can insert any suitable amount of noise into the first output).” Amrani ¶ 47. Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 1, in further view of ZHANG et al. (US 20230168326 A1). Regarding Claim 17, Wu in view of Metzer and Github teaches The computing device of claim 1. Wu in view of Metzer and Github does not explicitly disclose wherein the operations further comprise: repeating the rendering, the adding, the inputting, the determining, and the updating, until the loss transgresses a threshold value. Zhang teaches repeating the rendering, the adding, the inputting, the determining, and the updating, until the loss transgresses a threshold value ( “Exemplary termination conditions may be that the value of a loss function obtained in the certain iteration is less than a threshold value, that a certain count of iterations has been performed, that the loss function converges such that the difference of the values of the loss function obtained in a previous iteration and the current iteration is within a threshold value, etc.” Zhang ¶ 120. After Wu in view of Metzer is combined with Zhang, Wu in view of Metzer’s iterations will be terminated when its loss function transgress a threshold value.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Zhang’s termination condition with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to conserve computation resources. When the quality of the model is sufficient, it may not be necessary to continue the computation. Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 18, in further view of Wong (US 20240404170 A1). Regarding Claim 21, Wu in view of Metzer and Github teaches The apparatus of claim 18 wherein the operations further comprise: accessing a user image ( PNG media_image37.png 290 262 media_image37.png Greyscale ) Wu in view of Metzer and Github does not explicitly disclose selecting a 2D mask based on a body part depicted in the user image. Wong teaches accessing a user image (“At 1310, the method 1300 may include applying the face decoration texture over a human face in a live video feed. Thus, once created, the face decoration texture may be usable by the user or other users immediately in a live video feed with constantly updating frames and the face decoration texture may stay located on the human face with high accuracy.” Wong ¶ 35.); and selecting a 2D mask based on a body part depicted in the user image ( “In addition, the mobile version may include one or more suggested prompts 84, which may be accompanied by images or video of corresponding face decoration textures. Selection of one of the suggested prompts 84 by the user may result in the suggested prompt 84 being added to the prompt input box 78 for the user, who may be free to modify or add to the suggested prompt 84 before finalizing the user text prompt 42.” Wong ¶ 28. PNG media_image38.png 352 478 media_image38.png Greyscale The mask, e.g., “GLAM,” is “images or video of corresponding face decoration textures,” which are understood to be rendered in 2D. The Examiner takes an Official Notice that it would have been well-known in the art that image or video could be 2D on a display. The benefits of combining this well-known knowledge would have been to display image that is supported by the display and/or to reduce complication of generating images. These masks are based on the face, depicted in the user image. ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Wong’s recommendation/suggestions with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated guide a user to make decisions, which make is easier for a user who is not familiar with the software or process. “In addition, the mobile version may include one or more suggested prompts 84, which may be accompanied by images or video of corresponding face decoration textures. Selection of one of the suggested prompts 84 by the user may result in the suggested prompt 84 being added to the prompt input box 78 for the user, who may be free to modify or add to the suggested prompt 84 before finalizing the user text prompt 42.” Wong ¶ 28. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Wong (US 20240404170 A1) teaches a similar feature PNG media_image39.png 522 716 media_image39.png Greyscale Here, fig. 7 is similar to Applicant’s Figs. 7-9, 12. However, they do not disclose the claimed details of a process that generates similar results. Note the tiger face mask. Laine et al. (US 20220051481 A1) teaches features related to some of the dependent claim 6: “A modular differentiable renderer design yields high performance by leveraging existing, highly optimized hardware graphics pipelines to reconstruct the 3D model. The differential renderer renders images of the 3D model and differences between the rendered images and reference images are propagated backwards through the rendering pipeline to iteratively adjust the 3D model.” Abstract. However, Laine does not teach the limitations of the parent claims. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHENGXI LIU whose telephone number is (571)270-7509. The examiner can normally be reached M-F 9 AM - 5 PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ZHENGXI LIU/Primary Examiner, Art Unit 2611
Read full office action

Prosecution Timeline

Feb 08, 2024
Application Filed
Oct 22, 2025
Non-Final Rejection mailed — §103
Jan 20, 2026
Response Filed
Feb 20, 2026
Final Rejection mailed — §103
Apr 06, 2026
Response after Non-Final Action
Apr 20, 2026
Request for Continued Examination
Apr 23, 2026
Response after Non-Final Action
Apr 29, 2026
Non-Final Rejection (signed) — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12633067
EFFECTIVENESS BOOSTING IN THE METAVERSE
2y 6m to grant Granted May 19, 2026
Patent 12626441
IMAGE PROCESSING METHOD AND APPARATUS, AND ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted May 12, 2026
Patent 12608869
LIVE VIDEO BASED ON MOTION TRACKING AND ANIMATION OF FOREARM
2y 7m to grant Granted Apr 21, 2026
Patent 12602865
METHODS FOR DEPTH CONFLICT MITIGATION IN A THREE-DIMENSIONAL ENVIRONMENT
2y 6m to grant Granted Apr 14, 2026
Patent 12599463
COLOR MANAGEMENT PROCESS FOR CUSTOMIZED DENTAL RESTORATIONS
2y 6m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
64%
Grant Probability
99%
With Interview (+39.8%)
3y 2m (~10m remaining)
Median Time to Grant
High
PTA Risk
Based on 358 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month