DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 4/20/2026 has been entered.
Claim Status
Applicant’s amendments and arguments have been received on 4/20/2026. Claims 1, 19, and 20 have been amended. Claim 21 has been added. Claim 2 has been cancelled. Claims 1, 3-21 are pending.
Response to Arguments
Some of Applicant’s arguments are persuasive and moot in view of the Examiner’s new ground of rejection. In particular, the Examiner adopts on the record Applicant’s interpretation of “normal map” as explained on Remarks p. 8.
Applicant states that “Second, Wu does not ‘encod[] the 2D image with added noise, the normal map, and input text, into a latent space for a trained neural network" as recited in amended claim 1.’” Remarks p. 8.
The Examiner disagrees.
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
.
The predicted noise is mapped to
PNG
media_image2.png
26
108
media_image2.png
Greyscale
.
The encoded input text is mapped to y after embedding.
The 2D image with added noise is mapped to the image that
PNG
media_image3.png
38
48
media_image3.png
Greyscale
represents.
PNG
media_image3.png
38
48
media_image3.png
Greyscale
is the result after encoding.
In addition or alternatively, the Examiner’s new reference Github teaches all inputs and conditions are encoded for a diffusion model.
PNG
media_image4.png
638
738
media_image4.png
Greyscale
Compact Prosecution
With respect to Claim Interpretation, the Examiner has provided some notes regarding “[BRI on the record]” throughout the Office Action, so that the record is clear about the scope of the claimed invention, and the record is also clear about the basis for the Examiner’s analyses. A clear record of the claim interpretation could expedite the examination by creating the condition to allow the examination to focus on Applicant’s inventive concept and its comparison with related prior art.
If there are disagreements, Applicant may present an alternative interpretation based on MPEP 2111. The Examiner will adopt Applicant’s interpretation on the record, if Applicant’s interpretation is reasonable and/or arguments are persuasive.
Applicant may amend claims relying on the Examiner’s claim interpretation provided on the record.
Double Patenting
The provisional rejections of Claims 1, 3-8, 10-11, 15-17, and 19-20 on the ground of nonstatutory double patenting are withdrawn in view of Applicant’s amendments to the independent claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3-10, 14, 16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (“Text-Guided 3D Face Synthesis - From Generation to Editing”) in view of Metzer et al. (“Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures”) and Github (“ControlNet”).
Regarding Claim 1, Wu teaches A computing device comprising:
one or more processors; and one or more memories storing instructions that, when executed by the one or more processors, configure the computing device to perform operations (
PNG
media_image5.png
344
462
media_image5.png
Greyscale
Wu discloses a machine learning model that uses a computer.) comprising:
rendering a trainable three-dimensional (3D) mask to generate a two-dimensional (2D) image, the 3D trainable mask comprising adjustable mesh vertices (
[BRI on the record] With respect to “trainable three-dimensional (3D) mask,” the Examiner is reading it to mean a 3D mask generated by a trainable machine learning model, wherein the 3D mask comprises adjustable mesh vertices. The Examiner revised the interpretation in view of Applicant’s amendments
[Mapping Analysis]
PNG
media_image6.png
366
1018
media_image6.png
Greyscale
“In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a finetuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts.” Wu Abstract.
The claimed “trainable three-dimensional (3D) mask” is mapped to the 3D mask generated by the disclosed text-guided learning model.
Note the masked 3D model generated after the text commands, “Let him wear a purple Zorro mask” and “Make his lips black” in fig. 1.
Further, the generated 3D mask comprises adjustable mesh vertices as shown in Wu fig. 2:
PNG
media_image7.png
202
382
media_image7.png
Greyscale
The mesh vertices of the 3DMM-based mesh is adjusted to geometry g similar to actress Scarlett Johansson.
The two-dimensional (2D) image is images generated for animation and relighting as shown in Wu fig. 1.);
adding noise to the 2D image to generate a 2D image with added noise (
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image8.png
102
500
media_image8.png
Greyscale
Wu Fig. 2 visually explained the formula:
PNG
media_image9.png
206
808
media_image9.png
Greyscale
The image I is the rendered 2D image.
Here, noise is injected/added to the rendered 2D image.
The 2D image with added noise is mapped to the image that
PNG
media_image3.png
38
48
media_image3.png
Greyscale
represents.);
encoding the 2D image with added noise,
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
.
The predicted noise is mapped to
PNG
media_image2.png
26
108
media_image2.png
Greyscale
.
The encoded input text is mapped to y after embedding. Therefore, y is in an embedded latent space.
The 2D image with added noise is mapped to the image that
PNG
media_image3.png
38
48
media_image3.png
Greyscale
represents.
PNG
media_image3.png
38
48
media_image3.png
Greyscale
is the result after encoding. In addition,
PNG
media_image3.png
38
48
media_image3.png
Greyscale
is “noisy latent code” in the latent space.
The trained neural network is mapped to SD/ControlNet/InsP2P/TexDiffusion in fig. 2. We states, “a pretrained 2D diffusion model ϕ with a denoising function ϵϕ (zt; y, t) to optimize 3D parametersθ.”
The Examiner’s secondary reference Github also has related teachings encoding and latent space.);
determining a loss between the 2D image with added noise and the predicted noise (
PNG
media_image10.png
34
100
media_image10.png
Greyscale
); and
updating positions of the adjustable mesh vertices of the trainable 3D mask based on the loss (
“Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.” Wu 3.1. Preliminaries (Score distillation sampling).
“The employed facial 3D morphable model provides strong priors to ensure the quality of generated geometry. As to the alignment with the input text, we utilize SDS on the network ϕsd of Stable
Diffusion [35] to guide the geometry generation.” Wu 3.2.
Here, the 3D morphable model is mesh based as shown in fig. 2.
The positions of the adjustable mesh vertices are updated, e.g., in fig. 2 the 3DMM-based mesh model is updated to look like Scarlett Johansson.).
If updating positions of the adjustable mesh vertices of the trainable 3D mask based on the loss also requires updating the trainable machine learning model that generates the 3D mask, Wu is not absolutely clear regarding its statement “Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.” Wu 3.1. Preliminaries (Score distillation sampling). A person with ordinary skills in the art understood the statement to mean that the loss function based on the Score distillation sampling is used to update the machine learning model.
Further, Wu does not explicitly disclose:
determining a normal map for the trainable 3D mask; or
encoding the normal mapas a condition into the latent space for the trained neural network.
However, Metzer makes clearer explanation of updating . . . the trainable 3D mask based on the loss (
PNG
media_image11.png
204
372
media_image11.png
Greyscale
Metzer 3.1.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.” Metzer 3.1.
Wu in view of Metzer does not explicitly disclose
determining a normal map for the trainable 3D mask; or
encoding the normal mapas an additional condition into the latent space for the trained neural network.
Github teaches
determining a normal map for the trainable 3D mask (
“This model use normal map. Right now in the APP, the normal is computed from the midas depth map and a user threshold (to determine how many area is background with identity normal face to viewer, tune the "Normal background threshold" in the gradio app to get a feeling).” Github: normal map
After the combination of Wu in view of Metzer and Github, the determined normal map is for Wu in view of Metzer’s trainable 3D mask.); and
encoding the normal mapas an condition into the latent space for the trained neural network (
Github teaches the conditions (including normal map) are encoded:
PNG
media_image12.png
424
378
media_image12.png
Greyscale
, where the condition(s) are encoded by SD Encoder Block.
Github teaches the condition(s) are for a diffusion model, stating “Official implementation of Adding Conditional Control to Text-to-Image Diffusion Models,” which dovetails with Wu fig. 2’s teaching:
PNG
media_image9.png
206
808
media_image9.png
Greyscale
, where one is allowed to add “condition (optional)” for a diffusion model.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Github’s use a normal map as a condition for a diffusion model with Wu in view of Metzer. One of ordinary skill in the art would be motivated to enhance the quality of the generated image. Github states, “Compared to depth model, this model seems to be a bit better at preserving the geometry. This is intuitive: minor details are not salient in depth maps, but are salient in normal maps. Below is the depth result with same inputs. You can see that the hairstyle of the man in the input image is modified by depth model, but preserved by the normal model.” Github: normal map.
Claims 19-20 are substantially similar to Claim 1. The rejection analyses based on Wu in view of Metzer and Github for Claim 1 are also applied to Claims 19-20. In addition, Claim 19 recites “A non-transitory computer-readable storage medium including instructions that, when processed by one or more processors of a computing device, configure the computing device to perform operations . . .” (Wu 4.1). Claim 20 recites “A method performed on a computing device” (Wu fig 2; Wu 4.1).
Regarding Claim 3, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein determining the loss further comprises:
determining the loss based on a difference between the 2D image with added noise and the predicted noise (
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
).
Regarding Claim 4, Wu in view of Metzer and Github teaches The computing device of claim 3, wherein the operations further comprise:
determining a gradient based on the loss (
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
);
backpropagating the gradient through the 2D image with added noise (
PNG
media_image11.png
204
372
media_image11.png
Greyscale
Metzer 3.1. Here, the backpropagating is through the diffusion process.
Wu provides details for the diffusion process:
PNG
media_image13.png
352
470
media_image13.png
Greyscale
Wu Fig. 2 visually explained the formula:
PNG
media_image9.png
206
808
media_image9.png
Greyscale
The backpropagation through the diffusion process that employs the 2D image with added noise mapped to is mapped to
PNG
media_image3.png
38
48
media_image3.png
Greyscale
);
backpropagating the gradient through the 3D trainable mask; and updating the trainable 3D mask based on the gradient (“Score distillation sampling has been proposed in Dream-Fusion [32] for text-to-3D generation.” Wu 3.1. Preliminaries (Score distillation sampling). The diffusion process is to train/optimize the 3D trainable mask as shown in Wu figs. 1-2.).
Regarding Claim 5, Wu in view of Metzer and Github teaches The computing device of claim 4,
wherein the backpropagating the gradient through the 2D image with added noise further comprises:
subtracting the predicted noise (
PNG
media_image14.png
82
98
media_image14.png
Greyscale
) from the 2D image with added noise (
PNG
media_image15.png
84
98
media_image15.png
Greyscale
) (
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
. Metzer fig. 2 visually represent the process:
PNG
media_image16.png
302
236
media_image16.png
Greyscale
).
Regarding Claim 6, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein rendering further comprises:
rendering, using a differential renderer component, the trainable 3D mask to generate the 2D image (
Metzer:
PNG
media_image17.png
600
562
media_image17.png
Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s differential renderers with Wu. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer:
PNG
media_image18.png
224
552
media_image18.png
Greyscale
Regarding Claim 7, Wu in view of Metzer and Github teaches The computing device of claim 6, wherein the operations further comprise:
determining a gradient based on the loss (
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
.); and
propagating the gradient through the differential renderer component (
(
PNG
media_image11.png
204
372
media_image11.png
Greyscale
Metzer 3.1.
PNG
media_image19.png
206
552
media_image19.png
Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.” Metzer 3.1. Metzer:
PNG
media_image18.png
224
552
media_image18.png
Greyscale
Regarding Claim 8, Wu in view of Metzer and Github teaches The computing device of claim 1,
wherein the trained neural network is trained, using a diffusion model, to generate 2D images based on input texts and masks (
Wu:
PNG
media_image6.png
366
1018
media_image6.png
Greyscale
Wu: Fig. 1 shows a trained neural network that generate 2D images based on input text (e.g., “Make his lips black” and
PNG
media_image20.png
88
100
media_image20.png
Greyscale
.
Wu:
PNG
media_image21.png
552
462
media_image21.png
Greyscale
, where shows that diffusion models have been used.), and
the trained neural network comprises one or more of:
convolutional layers, one or more up sampling layers, one or more down sampling layers, and one or more fully connected layers (Metzer:
PNG
media_image22.png
220
554
media_image22.png
Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s NeRF with Wu’s score distillation. One of ordinary skill in the art would be motivated to efficiently enhance a model. Metzer:
PNG
media_image23.png
390
554
media_image23.png
Greyscale
Regarding Claim 9, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein the operations further comprise:
receiving the input text from a user (
PNG
media_image6.png
366
1018
media_image6.png
Greyscale
For example, the input text could be “Let him wear a purple Zorro mask” received from Zuckerberg.);
accessing an image of the user (The system has access to an image of Zuckerberg.); and
determining a shape of the trainable 3D mask based on a head of the user within the image (
PNG
media_image20.png
88
100
media_image20.png
Greyscale
).
Regarding Claim 10, Wu in view of Metzer and Github teaches The computing device of claim 1,
wherein rendering the trainable 3D mask further comprises:
selecting a camera angle (Wu:
PNG
media_image24.png
308
464
media_image24.png
Greyscale
); and
rendering the trainable 3D mask based on the camera angle to generate the 2D image (
Wu:
PNG
media_image25.png
370
458
media_image25.png
Greyscale
).
Regarding Claim 14, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein updating the trainable 3D mask comprises:
adjusting position of vertices of a plurality of vertices ( Wu 3.1:
PNG
media_image26.png
278
466
media_image26.png
Greyscale
), the trainable 3D mask comprising the plurality of vertices (“T is the mean face and S is the vertices offset basis.”).
Regarding Claim 16, Wu in view of Metzer and Github teaches The computing device of claim 1,
wherein the inputting the 2D image further comprises:
inputting the 2D image with added noise, the input text, and a number of iterations into the trained neural network to generate the predicted noise of the 2D image with added noise (
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image8.png
102
500
media_image8.png
Greyscale
Wu Fig. 2 visually explained the formula:
PNG
media_image9.png
206
808
media_image9.png
Greyscale
The image I is the rendered 2D image.
Here, noise is injected/added to the rendered 2D image.
The 2D image with added noise is mapped to
PNG
media_image3.png
38
48
media_image3.png
Greyscale
.
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
.
The predicted noise is mapped to
PNG
media_image2.png
26
108
media_image2.png
Greyscale
.
The input text is mapped to y.
Metzer:
PNG
media_image27.png
202
558
media_image27.png
Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Metzer’s score distillation details with Wu’s score distillation. One of ordinary skill in the art would be motivated to enhance the quality of the machine learning model. Metzer states, “Score Distillation is a method that enables using a diffusion model as a critic, i.e., using it as a loss without explicitly back-propagating through the diffusion process.” Metzer 3.1.
Regarding Claim 18, Wu in view of Metzer and Github teaches The computing device of claim 1,
wherein the trained neural network is a first trained neural network, the 2D image is a first 2D image (See the analysis for Claim 1; here, there is only renaming.), and
wherein the operations further comprise:
inputting a 2D mask and the text into a second trained neural network to generate a second 2D image in a shape of the 2D mask representing the text (
Wu fig. 2:
PNG
media_image28.png
342
312
media_image28.png
Greyscale
The 2D mask is mapped to
PNG
media_image29.png
60
50
media_image29.png
Greyscale
.
The second 2D image is mapped to
PNG
media_image30.png
70
58
media_image30.png
Greyscale
.
The text is mapped to “let her wear a batman eyemask.”
PNG
media_image31.png
338
460
media_image31.png
Greyscale
); and
determining, based on colors of the second 2D image, colors for a plurality of vertices, the 3D trainable mask comprising the plurality of vertices (
Wu 3.1:
PNG
media_image26.png
278
466
media_image26.png
Greyscale
PNG
media_image32.png
658
954
media_image32.png
Greyscale
).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 10, in further view of Karthik (“Generating Photo-realistic Images with Stable Diffusion XL (SDXL 1.0)”)
Regarding Claim 11, Wu in view of Metzer and Github teaches The computing device of claim 10,
wherein the inputting the 2D image further comprises:
inputting the 2D image with added noise and
the
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
.
The predicted noise is mapped to
PNG
media_image2.png
26
108
media_image2.png
Greyscale
.
The input text is mapped to y.
The 2D image with added noise is mapped to
PNG
media_image3.png
38
48
media_image3.png
Greyscale
).
Wu in view of Metzer does not explicitly disclose
modifying the input text in accordance with the camera angle; or
the modified input text is used for training.
Karthik teaches
modifying the input text in accordance with the camera angle (
PNG
media_image33.png
642
730
media_image33.png
Greyscale
); or
the modified input text is used for training (
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
.
The predicted noise is mapped to
PNG
media_image2.png
26
108
media_image2.png
Greyscale
.
The modified input text is mapped to y used for training).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Karthik’s input text to specify camera angle with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to allow a user to control camera angle through text input for an AI generated image. Karthik states, “But how do we bridge the gap between textual prompts and photorealistic images? In this exploration, we'll delve into constructing text-to-image prompts using foundational photography concepts. We'll consider the intricacies of the exposure triangle, the nuances introduced by different camera types (be it Mirrorless, DSLR etc), the characteristics of various lenses, and the impact of camera angles. We'll also dissect the essence of shots, from full to extreme close-ups, understand the significance of the camera's eye line (normal, low, or high), and factor in lighting conditions. Furthermore, we'll categorize the types of images, whether they're portraits, landscapes, or motion blurs, to guide AI in generating the perfect photograph.”
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 10, in further view of Geng et al. (US 20060023923 A1).
Regarding Claim 12, Wu in view of Metzer and Github teaches The computing device of claim 10, wherein the operations further comprise:
selecting one or more lighting sources, wherein the rendering is further based on the one or more lighting sources
Wu A. Appendix:
PNG
media_image34.png
110
462
media_image34.png
Greyscale
PNG
media_image35.png
276
460
media_image35.png
Greyscale
).
Wu in view of Metzer does not explicitly disclose wherein the one or more lighting sources are selected based on an image of a user.
Geng teaches wherein the one or more lighting sources are selected based on an image of a user (“selecting a lighting model to best depict said individual” Geng ¶ 3.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Geng’s lighting selection with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to generate images that better depict an individual. Geng ¶ 3.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer, Github and Geng as applied to Claim 12, in further view of Karthik (“Generating Photo-realistic Images with Stable Diffusion XL (SDXL 1.0)”)
Regarding Claim 13, Wu in view of Metzer, Github, and Geng teaches The computing device of claim 12.
Wu in view of Metzer, Github, and Geng does not explicitly disclose wherein the inputting the 2D image further comprises:
modifying the input text in accordance with the one or more lighting sources; and
inputting the 2D image with added noise and the modified input text into the trained neural network to generate the predicted noise.
Karthik teaches wherein the inputting the 2D image further comprises:
modifying the input text in accordance with the one or more lighting sources (Karthik:
PNG
media_image36.png
682
724
media_image36.png
Greyscale
); and
inputting the 2D image with added noise and the modified input text into the trained neural network to generate the predicted noise (
After Wu in view of Metzer and Geng is combined with Karthik, it teaches the limitation.
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image1.png
186
510
media_image1.png
Greyscale
.
The predicted noise is mapped to
PNG
media_image2.png
26
108
media_image2.png
Greyscale
.
The modified input text is mapped to y used for training.
The 2D image with added noise is mapped to
PNG
media_image3.png
38
48
media_image3.png
Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Karthik’s input text to specify camera angle with Wu in view of Metzer, Github, and Geng. One of ordinary skill in the art would be motivated to allow a user to control camera angle through text input for an AI generated image. Karthik states, “But how do we bridge the gap between textual prompts and photorealistic images? In this exploration, we'll delve into constructing text-to-image prompts using foundational photography concepts. We'll consider the intricacies of the exposure triangle, the nuances introduced by different camera types (be it Mirrorless, DSLR etc), the characteristics of various lenses, and the impact of camera angles. We'll also dissect the essence of shots, from full to extreme close-ups, understand the significance of the camera's eye line (normal, low, or high), and factor in lighting conditions. Furthermore, we'll categorize the types of images, whether they're portraits, landscapes, or motion blurs, to guide AI in generating the perfect photograph.”
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 1, in further view of El Hanchi El Amrani et al. (US 20240233246 A1) (thereafter as Amrani).
Regarding Claim 15, Wu in view of Metzer and Github teaches The computing device of claim 1, wherein adding noise further comprises:
adding the
Wu 3.1. Preliminaries (Score distillation sampling):
PNG
media_image8.png
102
500
media_image8.png
Greyscale
Wu Fig. 2 visually explained the formula:
PNG
media_image9.png
206
808
media_image9.png
Greyscale
The image I is the rendered 2D image.
Here, noise is injected/added to the rendered 2D image.
The 2D image with added noise is mapped to
PNG
media_image3.png
38
48
media_image3.png
Greyscale
.).
However, Wu in view of Metzer and Github does not explicitly disclose sampling a Gaussian distribution to determine the noise, wherein an amount of the noise is based on a number of iterations of updating the trainable 3D mask, thereby the noise is Gaussian noise.
Amrani teaches sampling a Gaussian distribution to determine the noise, thereby the noise is Gaussian noise (“In various instances, the training component can iteratively insert noise (e.g., Gaussian noise) into the first output, thereby yielding a second output. Accordingly, the second output can have the same size, format, or dimensionality as the first output (e.g., as the randomized array discussed above).” Amrani ¶ 47.),
wherein an amount of the noise is based on a number of iterations of updating the trainable 3D mask (“In various cases, the training component can perform noise insertion on the first output for any suitable number of iterations (e.g., can insert any suitable amount of noise into the first output).” Amrani ¶ 47. Here, the amount of the noise is correlated to the number of iterations. The training is to update Wu in view of Metzer’s 3D mask.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Amrani’s noise insertion strategy with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to train and use suitable number of iterations, so that sufficient quality of the model could be achieved with reasonable expense of computing resources. “In various cases, the training component can perform noise insertion on the first output for any suitable number of iterations (e.g., can insert any suitable amount of noise into the first output).” Amrani ¶ 47.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 1, in further view of ZHANG et al. (US 20230168326 A1).
Regarding Claim 17, Wu in view of Metzer and Github teaches The computing device of claim 1.
Wu in view of Metzer and Github does not explicitly disclose wherein the operations further comprise:
repeating the rendering, the adding, the inputting, the determining, and the updating, until the loss transgresses a threshold value.
Zhang teaches repeating the rendering, the adding, the inputting, the determining, and the updating, until the loss transgresses a threshold value (
“Exemplary termination conditions may be that the value of a loss function obtained in the certain iteration is less than a threshold value, that a certain count of iterations has been performed, that the loss function converges such that the difference of the values of the loss function obtained in a previous iteration and the current iteration is within a threshold value, etc.” Zhang ¶ 120.
After Wu in view of Metzer is combined with Zhang, Wu in view of Metzer’s iterations will be terminated when its loss function transgress a threshold value.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Zhang’s termination condition with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated to conserve computation resources. When the quality of the model is sufficient, it may not be necessary to continue the computation.
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Metzer and Github as applied to Claim 18, in further view of Wong (US 20240404170 A1).
Regarding Claim 21, Wu in view of Metzer and Github teaches The apparatus of claim 18
wherein the operations further comprise:
accessing a user image (
PNG
media_image37.png
290
262
media_image37.png
Greyscale
)
Wu in view of Metzer and Github does not explicitly disclose selecting a 2D mask based on a body part depicted in the user image.
Wong teaches
accessing a user image (“At 1310, the method 1300 may include applying the face decoration texture over a human face in a live video feed. Thus, once created, the face decoration texture may be usable by the user or other users immediately in a live video feed with constantly updating frames and the face decoration texture may stay located on the human face with high accuracy.” Wong ¶ 35.); and
selecting a 2D mask based on a body part depicted in the user image (
“In addition, the mobile version may include one or more suggested prompts 84, which may be accompanied by images or video of corresponding face decoration textures. Selection of one of the suggested prompts 84 by the user may result in the suggested prompt 84 being added to the prompt input box 78 for the user, who may be free to modify or add to the suggested prompt 84 before finalizing the user text prompt 42.” Wong ¶ 28.
PNG
media_image38.png
352
478
media_image38.png
Greyscale
The mask, e.g., “GLAM,” is “images or video of corresponding face decoration textures,” which are understood to be rendered in 2D. The Examiner takes an Official Notice that it would have been well-known in the art that image or video could be 2D on a display. The benefits of combining this well-known knowledge would have been to display image that is supported by the display and/or to reduce complication of generating images.
These masks are based on the face, depicted in the user image.
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Wong’s recommendation/suggestions with Wu in view of Metzer and Github. One of ordinary skill in the art would be motivated guide a user to make decisions, which make is easier for a user who is not familiar with the software or process. “In addition, the mobile version may include one or more suggested prompts 84, which may be accompanied by images or video of corresponding face decoration textures. Selection of one of the suggested prompts 84 by the user may result in the suggested prompt 84 being added to the prompt input box 78 for the user, who may be free to modify or add to the suggested prompt 84 before finalizing the user text prompt 42.” Wong ¶ 28.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Wong (US 20240404170 A1) teaches a similar feature
PNG
media_image39.png
522
716
media_image39.png
Greyscale
Here, fig. 7 is similar to Applicant’s Figs. 7-9, 12. However, they do not disclose the claimed details of a process that generates similar results. Note the tiger face mask.
Laine et al. (US 20220051481 A1) teaches features related to some of the dependent claim 6: “A modular differentiable renderer design yields high performance by leveraging existing, highly optimized hardware graphics pipelines to reconstruct the 3D model. The differential renderer renders images of the 3D model and differences between the rendered images and reference images are propagated backwards through the rendering pipeline to iteratively adjust the 3D model.” Abstract. However, Laine does not teach the limitations of the parent claims.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHENGXI LIU whose telephone number is (571)270-7509. The examiner can normally be reached M-F 9 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ZHENGXI LIU/Primary Examiner, Art Unit 2611