Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is in response to Applicant’s Amendments and Remarks filed on 1/16/2026. Claims 1, 2, 9, 17, and 19 have been amended. Claims 1-20 are present for examination.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 9, and 17 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pandey (Pandey et al., Total Relighting: Learning to Relight Portraits for Background Replacement) in view of Li (Li, et al., Image Harmonization with Diffusion Model) and US Patent Publication No. 20230037591 A1 to Villegas et al.
Regarding claim 1, Pandey discloses A computer-implemented method comprising:
determining, in response to a request to generate a digital image, a target background image for inserting a foreground object into the target background image (Pandey, p. 1:4, col. 1, 2nd para., disclosing allowing the portrait’s subject to be relit and convincingly composited into any HDR lighting environment (and if only a background photo is available, its HDR lighting can be estimated), p. 1:15, col. 2, 3rd para., disclosing any image can be your background, p. 1:19, Fig. 23, showing using estimating illumination from the input background images and relighting the input foreground images and compositing the subjects into the new background with plausibly consistent illumination, indicating compositing the subjects into the input background image or HDR lighting environment can correspond to generating a digital image in response to a request and determining the input background image or the HDR lighting environment as the target background image for inserting the subject and/or input foreground image as a foreground object into the target background image);
generating, from the target background image and utilizing a lighting conditioning neural network, a lighting feature representation indicating one or more lighting features of the target background image (Pandey, p. 1:4, col. 1, 2nd para., disclosing allowing the portrait’s subject to be relit and convincingly composited into any HDR lighting environment (and if only a background photo is available, its HDR lighting can be estimated), p. 1:5, Fig. 4, showing the HDR map is prefiltered using diffuse and specular convolution operations and producing a per-pixel representation of diffuse and specular reflectance for the target illumination (light maps), Fig. 5, showing specular light map can be obtained utilizing a specular net as a lighting conditioning neural network, p. 1:15, col. 2, 3rd para., disclosing any image can be your background, p. 1:19, Fig. 23, showing using estimating illumination from the input background images and relighting the input foreground images and compositing the subjects into the new background with plausibly consistent illumination, indicating estimating illumination from the input background image in Fig. 23 and the specular light map shown in Fig. 5 can correspond to generating, from the input background image as the target background image, the illumination and the specular light map as a lighting feature representation of the target background image); and
generating, utilizing a generative neural network conditioned on the lighting feature representation, the digital image including the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object (Pandey, p. 1:4, Fig. 3, showing a final output image has been generated based on the lighting environment of the background as the lighting feature representation, the final output include the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background with the Matte ass the foreground mask corresponding to the foreground object, p. 1:19, Fig. 23, disclosing the illumination estimated from the input background images is used to relight the input foreground image and compositing the subjects into the new background, col. 2, Sec. 3.2, disclosing the relighting module including using per-pixel lighting representation or light maps which encode the specular S and diffuse D components and using a neural shading module to perform final foreground rendering, p. 1:5, Fig. 4, showing neural networks conditioned on the light maps to relight the foreground object, Fig. 5, showing the shading network with neural rendering to use the lighting maps including the specular light map and the foreground and relight the foreground object, indicating the relighting module can correspond to a generative neural network conditioned on the lighting feature representation such as the light maps being utilized to generate the digital image).
However, Pandey does not expressly disclose the generative neural network is a diffusion-based generative neural network, and the lighting feature representation comprising an encoding of one or more lighting features corresponding to one or more light sources of the target background image.
On the other hand, Li discloses generating, utilizing a diffusion-based generative neural network conditioned on the lighting feature representation, the digital image including the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object (Li, p. 1, col. 1, Sec. 1, 1st para., disclosing image composition involves merging a foreground image with a background image to create a composite, 2nd para., disclosing image harmonization to adjust the appearance of the foreground image to achieve compatibility with the background, col. 2, 2nd para., disclosing using diffusion models for image harmonization tasks by conditioning on unharmonized images to generate high-quality outputs with realistic and consistent colors, integrating background “light” using brightness prediction method, p. 2, col. 2, Sec. 3.1, 1st para., disclosing obtaining brightness information from the images as a representation of the appearance, 2nd para., disclosing using appearance consistency discriminator to guide the diffusion process to ensure appearance consistency, p. 5, Figure 2, showing the image harmonization results on real composite images comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object, and the harmonization results can correspond to digital images including the foreground object inserted into the target background image, indicating the appearance represented by brightness information can correspond to lighting feature representation conditioning the diffusion models as the diffusion-based generative neural network that can be utilized to generate digital image including the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Pandey with Li. The suggestion/motivation would have been to generate high-quality outputs with realistic and consistent colors, as suggested by Li (see Li, p. 1, col. 2, 2nd para.).
However, Pandey or Li does not expressly disclose the lighting feature representation comprising an encoding of one or more lighting features corresponding to one or more light sources of the target background image.
On the other hand, Villegas discloses the lighting feature representation comprising an encoding of one or more lighting features corresponding to one or more light sources of the target background image (Villegas, para. [0017], disclosing a light encoder neural network to extract a light source representation from a digital image including a background, para. [0018], disclosing the light encoder neural network to determine an embedding of at least one light source generating light and shadows in a digital image including a background, indicating the digital image including a background can correspond to the target background image and the light encoder neural network can provide the lighting feature representation comprising an embedding of at least one light source as an encoding of one or more lighting features corresponding to one or more light sources of the background image).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Pandey in view of Li and Villegas. The suggestion/motivation would have been for generating realistic shading for three-dimensional objects inserted into digital images, as suggested by Villegas (see Villegas, para. [0004]).
Regarding claim 2, the combination of Pandey, Li, and Villegas discloses the computer-implemented method of claim 1, wherein generating the lighting feature representation comprises: extracting the one or more lighting features corresponding to one or more light sources from the target background image (Villegas, para. [0017], disclosing a light encoder neural network to extract a light source representation from a digital image including a background, para. [0018], disclosing the light encoder neural network to determine an embedding of at least one light source generating light and shadows in a digital image including a background); and encoding the one or more lighting features, as feature vectors representing image content into an encoding space utilizing the lighting conditioning neural network (Villegas, para. [0017], disclosing a light encoder neural network to extract a light source representation from a digital image including a background, para. [0018], disclosing the light encoder neural network to determine an embedding of at least one light source generating light and shadows in a digital image including a background, para. [0032], disclosing a light representation embedding includes a feature map or collection of feature vectors presenting lighting in a digital image, the light representation embedding includes a feature map generated by a neural network, indicating the neural network that generates the feature map can correspond to part of the lighting conditioning neural network and generating the feature map can correspond to the encoding the one or more lighting features as feature vectors representing image content into an encoding space as light representation embedding). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Pandey in view of Li and Villegas. The suggestion/motivation would have been for generating realistic shading for three-dimensional objects inserted into digital images, as suggested by Villegas (see Villegas, para. [0004]).
Regarding claim 3, the combination of Pandey, Li, and Villegas discloses the computer-implemented method of claim 1, wherein generating the digital image comprises injecting the lighting feature representation into the diffusion-based generative neural network by providing conditional feature maps corresponding to the lighting feature representation to a plurality of diffusion decoders of the diffusion-based generative neural network (Pandey, p. 1:5, Fig. 4, showing injecting the light maps including diffuse light map and specular light map as the lighting feature representation into Shading Net as a generative neural network by providing these light maps as conditional feature maps corresponding to the lighting feature representation to the Shading Net as the generative neural network, Li, p. 1, col. 2, 2nd para., disclosing using two diffusion models, DDPM and LDM, for image harmonization tasks by conditioning on unharmonized images to generate high-quality outputs with realistic and consistent colors, integrating background “light” using brightness prediction method, p. 2, col. 2, Sec. 3.1, 1st para., disclosing obtaining brightness information from the images as a representation of the appearance, 2nd para., disclosing using appearance consistency discriminator to guide the diffusion process to ensure appearance consistency, indicating combining Pandey and Li could provide the light maps taught by Pandey as the conditional feature maps corresponding to the lighting feature representation to the two diffusion models taught by Li corresponding to a plurality of diffusion decoders of the diffusion-based generative neural network). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Pandey with Li. The suggestion/motivation would have been to generate high-quality outputs with realistic and consistent colors, as suggested by Li (see Li, p. 1, col. 2, 2nd para.).
Regarding claim 17, it recites similar limitations of claim 1 but in a non-transitory computer-readable medium form. The rationale of claim 1 rejection is applied to reject claim 17.
Claim(s) 4-8 and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Pandey, Li, and Villegas as applied to claim(s) 1 and 17 above, and further in view of Ren (Ren et al., Relightful Harmonization: Lighting-aware Portrait Background Replacement).
Regarding claim 4, the combination of Pandey, Li, and Villegas discloses the computer-implemented method of claim 1. However, Pandey, Li, or Villegas does not expressly disclose determining the target background image from a training tuple comprising a foreground image including the foreground object and the foreground mask, the target background image, and an environment map of the target background image; and jointly modifying parameters of the lighting conditioning neural network and the diffusion-based generative neural network to reduce an output of a loss function based on a noise input and the digital image generated utilizing the diffusion-based generative neural network according to the training tuple.
On the other hand, Ren discloses determining the target background image from a training tuple comprising a foreground image including the foreground object and the foreground mask, the target background image, and an environment map of the target background image (Ren, p. 4, col. 1, Sec. 3.1, 1st para., disclosing a training tuple includes the foreground image (with its corresponding alpha mask), the target background, the target environment map, and the target image); and jointly modifying parameters of the lighting conditioning neural network and the diffusion-based generative neural network to reduce an output of a loss function based on a noise input and the digital image generated utilizing the diffusion-based generative neural network according to the training tuple (Ren, p. 3, col. 2, Sec. 3, 1st para., disclosing lighting-aware diffusion training using a diffusion model and attaching a light representation learning branch to encode lighting information from the background image which is then injected into the UNet backbone, p. 4, col. 1, 4th para., disclosing noise is added to the target image and the UNet is conditioned on the background-extracted lighting feature and is trained to predict the noise with a loss function and jointly training both UNet and the conditioning branch, indicating the jointly training both UNet (corresponding to the diffusion-based generative neural network) and the light representation learning branch (corresponding to the lighting conditioning neural network) with added noise as noise input teaches jointly modifying parameters of the lighting conditioning neural network and the diffusion-based generative neural network to reduce an output of a loss function based on a noise input and the digital image generated utilizing the diffusion-based generative neural network according to the training tuple).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Li, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.).
Regarding claim 5, the combination of Pandey, Li, Villegas, and Ren discloses the computer-implemented method of claim 4, further comprising: generating, utilizing an environment lighting conditioning neural network, an environment lighting feature representation indicating one or more lighting parameters of the environment map of the target background image (Ren, p. 3, col. 2, Sec. 3, Stage II: lighting representation alignment, disclosing adapting the lighting representation extracted from as background image towards the learned representation from its environment map, p. 4, col. 2, 2nd para., disclosing using an environment map conditioned harmonization model to generate environment-map derived light representation); and modifying the parameters of the lighting conditioning neural network by comparing the lighting feature representation to the environment lighting feature representation (Ren, p. 3, col. 2, Sec. 3, Stage II: lighting representation alignment, disclosing adapting the lighting representation extracted from as background image towards the learned representation from its environment map, p. 4, col. 2, 2nd para., disclosing using an environment map conditioned harmonization model to generate environment-map derived light representation, Equation (2) showing loss functions based on the difference between the environment lighting representation and the lighting feature representation from the background image). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Li, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.).
Regarding claim 6, the combination of Pandey, Li, Villegas, and Ren discloses the computer-implemented method of claim 5, wherein: generating the environment lighting feature representation utilizing the environment lighting conditioning neural network with the diffusion-based generative neural network (Ren, p. 4, col. 2, 1st para., disclosing aligning the lighting representation extracted from a background image with features derived from environment map, 2nd para., disclosing using an environment map conditioned harmonization model to generate environment-map derived light representation); freezing the parameters of the environment lighting conditioning neural network and the diffusion-based generative neural network (Ren, p. 4, col. 2, 3rd para., disclosing freezing the environment-conditioned model and introduce an alignment network that calibrates the background-extracted lighting representation to align with its environment map equivalent); and modifying the parameters of the lighting conditioning neural network and parameters of a representation alignment neural network layer between the lighting conditioning neural network and the environment lighting conditioning neural network according to differences between the lighting feature representation and the environment lighting feature representation (Ren, p. 3, col. 2, Sec. 3, Stage II: lighting representation alignment, disclosing adapting the lighting representation extracted from as background image towards the learned representation from its environment map, p. 4, col. 2, Equation (2) showing loss functions based on the difference between the environment lighting representation and the lighting feature representation from the background image). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Pandey in view of Li with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Li, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.).
Regarding claim 7, the combination of Pandey, Li, and Villegas discloses the computer-implemented method of claim 1. However, Pandey, Li, or Villegas does not expressly disclose generating a synthesis training dataset for modifying the diffusion-based generative neural network by: extracting an object from a training image according to an object mask; generating a synthetic background image by inpainting the training image to remove the object from the training image; and generating a synthetic digital image comprising a modified version of the object inserted into an additional background image utilizing the diffusion-based generative neural network comprising parameters modified based on an environment lighting feature representation of an environment map of the additional background image.
On the other hand, Ren discloses generating a synthesis training dataset for modifying the diffusion-based generative neural network by: extracting an object from a training image according to an object mask (Ren, p. 5, Figure 3, showing extracting objects from a real image according to an object mask); generating a synthetic background image by inpainting the training image to remove the object from the training image (Ren, p. 5, Figure 3, showing extracting objects from a real image according to an object mask and a synthetic background image is generated using generative inpainting); and generating a synthetic digital image comprising a modified version of the object inserted into an additional background image utilizing the diffusion-based generative neural network comprising parameters modified based on an environment lighting feature representation of an environment map of the additional background image (Ren, p. 5, col. 1, 2nd para., disclosing a data synthesis pipeline that extracting foreground mask, inpainting the foreground region to create a clean background image, which can serve as the condition input for the training, the lighting of the foreground subject(s) is altered by running the trained model from stage I/II with a randomly chosen background image or environment map as the condition, indicating a synthetic digital image comprising a modified version of the subject(s) as the object inserted into the randomly chosen background image as an additional background image utilizing the diffusion-based generative neural network comprising parameters modified based on an environment lighting feature representation of an environment map of the additional background image (stage I/II of the trained model)).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Li, and Villegas with Ren. The suggestion/motivation would have been to ensure that the ground truth for finetuning the diffusion model remains real images. as suggested by Ren (see Ren, p. 5, col. 1, 2nd para.).
Regarding claim 8, the combination of Pandey, Li, Villegas, and Ren discloses the computer-implemented method of claim 7, further comprising modifying the diffusion-based generative neural network by: generating, utilizing the lighting conditioning neural network, an additional lighting feature representation from the synthetic background image (Ren, p. 3, Figure 2, showing synthetic dataset has the lighting feature representation generated from the synthetic background image, p. 5, Figure 3, showing the synthetic input image and the synthetic background in the synthetic dataset, col. 1, 2nd para., disclosing combining the original light stage dataset with the synthetic data to refine the model, indicating the synthetic background can be an input in the model where an additional lighting feature representation from the synthetic background image can be generated utilizing the stage I corresponding to the lighting conditioning neural network); generating, utilizing the diffusion-based generative neural network conditioned on the additional lighting feature representation, an additional digital image including the object inserted into the synthetic background image based on the modified version of the object in the synthetic digital image (Ren, p. 3, Figure 2, showing the stage III of the model that can generate the real image target utilizing the UUNet in the final model as the diffusion-based generative neural network conditioned on the additional lighting feature representation, Figure 3, showing the synthetic input image and the synthetic background in the synthetic dataset, indicating the generated result in the final model in Stage III can correspond to an additional digital image including the object inserted into the synthetic background image based on the modified version of the object in the synthetic digital image); and modifying parameters of the diffusion-based generative neural network based on differences between the additional digital image and the training image (Ren, p. 3, Figure 2, showing the loss function in the final model is used to train the UNet, indicating the parameters of the diffusion-based generative neural network based on the loss function corresponding to the differences between the additional image and the training image). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Li, and Villegas with Ren. The suggestion/motivation would have been to ensure that the ground truth for finetuning the diffusion model remains real images. as suggested by Ren (see Ren, p. 5, col. 1, 2nd para.).
Regarding claim 18, it recites similar limitations recited in claim 4 but in a non-transitory computer-readable medium form. The rationale of claim 4 rejection is applied to reject claim 18.
Regarding claim 19, it recites similar limitations recited in claims 4-6 but in a non-transitory computer-readable medium form. The rationale of claims 4-6 rejections is applied to reject claim 19.
Regarding claim 20, it recites similar limitations recited in claims 7-8 but in a non-transitory computer-readable medium form. The rationale of claims 7-8 rejections is applied to reject claim 20.
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pandey in view of US Patent Publication No. 20210065339 A1 to Cheon et al. and Villegas.
Regarding claim 9, Pandey discloses A system (Pandey, o. 1:2, col. 2, last para., disclosing a complete system from data generation to in-the-wild inference for portrait relighting and background replacement) comprising: one or more memory devices; and one or more processors coupled to the one or more memory devices (Pandey, p. 1:7, col. 1, Sec. 4.2, disclosing using GPUs and memory) that cause the system to perform operations comprising:
generating, utilizing an environment lighting conditioning neural network, an environment lighting feature representation from an environment map corresponding to a target background image (Pandey, p. 1:5, Fig. 4, showing using input HDR map to generate convolved light maps including diffuse light map and specular light maps, indicating the input HDR map can correspond to the environment map corresponding to a target background image, the diffuse light map and the specular light maps can correspond to an environment lighting feature representation generated from the HDR map as the environment map corresponding to the target background image); generating, utilizing a lighting conditioning neural network, a lighting feature representation from the target background image (Pandey, p. 1:5, Figure 5, showing using a specular net to generate a specular light map based on the specular light maps, indicating the specular net can correspond to the lighting condition neural network to generate the specular light map as the lighting feature representation; p. 1:4, col. 1, 2nd para., disclosing allowing the portrait’s subject to be relit and convincingly composited into any HDR lighting environment (and if only a background photo is available, its HDR lighting can be estimated), p. 1:5, Fig. 4, showing the HDR map is prefiltered using diffuse and specular convolution operations and producing a per-pixel representation of diffuse and specular reflectance fort the target illumination (light maps), Fig. 5, showing specular light map can be obtained utilizing a specular net as a lighting conditioning neural network, p. 1:15, col. 2, 3rd para., disclosing any image can be your background, p. 1:19, Fig. 23, showing using estimating illumination from the input background images and relighting the input foreground images and compositing the subjects into the new background with plausibly consistent illumination, indicating the specular map as the lighting feature representation from the target background image can be generated utilizing the specular net as the lighting conditioning neural network), and modifying parameters of the lighting conditioning neural network to reduce differences between the lighting feature representation (Pandey, p. 1:6, col. 2, Sec. 4.1, disclosing the relighting module minimizes loss terms including specular loss.
However, Pandey does not expressly disclose wherein the environment lighting feature representation comprises a first encoding of one or more lighting feature of the environment map; wherein the lighting feature representation comprises a second encoding of one or more lighting features of the target background image; and modifying parameters of the lighting conditioning neural network to reduce differences between the lighting feature representation and the environment lighting feature representation in an encoding space.
On the other hand, Cheon discloses modifying parameters of the lighting conditioning neural network to reduce differences between the lighting feature representation and the environment lighting feature representation (Cheon, para. [0067], disclosing learning a parameter by using an original HDR image and an LDR image, para. [0068], disclosing adjusting brightness by applying a pixel-specific brightness ratio identified by using the first parameter to the LDR image for learning, then generating an HDR image from the brightness-adjusted LDR image for learning, para. [0069], disclosing learning the first parameter and the second parameter by comparing the generated HDR image and the original HDR image, para. [0070], disclosing learning the first parameter such that the difference between the brightness of the generated HDR image and the brightness of the original HDR image becomes minimal, para. [0125], disclosing the first parameter may be learned by using a network having similar structure included in a test set, indicating the LDR image as the target background image and the brightness of the generated HDR image can correspond to lighting feature representation of the target background image, and the brightness of original HDR image can correspond to the environment lighting feature representation of the original HDR image as the environment map, learning the parameters such that the difference between the brightness of the generated HDR image and the brightness of the original HDR image becomes minimal can correspond to modifying parameters of the lighting conditioning neural network to reduce differences between the lighting feature representation and the environment lighting feature representation).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Pandey and Cheon. The suggestion/motivation would have been to obtain HDR images with brightness information of an image close to original HDR images, as suggested by Cheon (see Cheon, para. [0017]). Because Pandey discloses estimating HDR lighting using a background photo (Pandey, p. 1:4, col. 1, 2nd para., disclosing allowing the portrait’s subject to be relit and convincingly composited into any HDR lighting environment (and if only a background photo is available, its HDR lighting can be estimated)), combining Pandey and Cheon will obtain HDR lighting with minimal difference comparing with an original HDR lighting environment map).
However, Pandey or Cheon does not expressly disclose wherein the environment lighting feature representation comprises a first encoding of one or more lighting feature of the environment map; wherein the lighting feature representation comprises a second encoding of one or more lighting features of the target background image; and to reduce differences between the lighting features representations in an encoding space.
On the other hand, Villegas discloses wherein the first lighting feature representation comprises a first encoding of one or more lighting feature of the ground-truth parameters of the target background image; wherein the second lighting feature representation comprises a second encoding of one or more lighting features of the target background image (Villegas, para. [0019], disclosing a first light encoder to generate a first light representation embedding from a digital background image, a second light encoder to generate a second light representation embedding from ground-truth parameters associated with the digital background image). Because Pandey discloses generating an environment lighting feature representation from an environment map and generating a lighting feature representation from the target background image, combining Pandey in view of Cheon with Villegas could teach the environment lighting feature representation as the first lighting feature representation comprises a first encoding of one or more lighting feature of the environment map; the lighting feature representation as the second lighting feature representation comprises a second encoding of one or more lighting features of the target background image. Also because the generated light feature representations are encodings generated by encoders, the reducing differences between these lighting feature representations will be in an encoding space.
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Pandey in view of Cheon with Villegas. The suggestion/motivation would have been to utilize the contrastive loss to improve the light encoder neural network, as suggested by Villegas (see Villegas, para. [0019]).
Claim(s) 10-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Pandey, Cheon, and Villegas as applied to claim 9 above, and further in view of Ren.
Regarding claim 10, the combination of Pandey, Cheon, and Villegas discloses the system of claim 9. However, Pandey, Cheon, or Villegas does not expressly disclose wherein generating the environment lighting feature representation comprises: determining a training tuple comprising a foreground image including a foreground object, the target background image, and the environment map corresponding to the target background image; and generating the environment lighting feature representation from the environment map utilizing the environment lighting conditioning neural network with frozen parameters in connection with generating a digital image utilizing a diffusion-based generative neural network conditioned on the environment lighting feature representation.
On the other hand, Ren discloses wherein generating the environment lighting feature representation comprises: determining a training tuple comprising a foreground image including a foreground object, the target background image, and the environment map corresponding to the target background image (Ren, p. 3, Figure 2, showing the pipeline of relightful harmonization, p. 4, col. 1, Sec. 3.1, 1st para., disclosing a training tuple includes the foreground image (with its corresponding alpha mask), the target background, the target environment map, and the target image); and generating the environment lighting feature representation from the environment map utilizing the environment lighting conditioning neural network with frozen parameters in connection with generating a digital image utilizing a diffusion-based generative neural network conditioned on the environment lighting feature representation (Ren, p. 3, Figure 2, showing environment lighting feature representation can be generated from the environment map utilizing the environment lighting conditioning neural network, col. 2, Sec. 3, Stage II: lighting representation alignment, disclosing adapting the lighting representation extracted from as background image towards the learned representation from its environment map, p. 4, col. 2, 3rd para., disclosing freezing the environment-conditioned model and introduce an alignment network that calibrates the background-extracted lighting representation to align with its environment map equivalent, and the aligned feature extraction and conditioning are integrated to formulate the final model of stage III, Equation (2) showing loss functions based on the difference between the environment lighting representation and the lighting feature representation from the background image, p. 5, col. 1, 2nd para., disclosing training the model by freezing the lighting representation extraction and conditioning branch to finetune the UNet backbone, then using the final model to perform portrait harmonization). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Cheon, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.)).
Regarding claim 11, the combination of Pandey, Cheon, Villegas, and Ren discloses the system of claim 10, wherein generating the lighting feature representation comprises generating, utilizing the lighting conditioning neural network with modifiable parameters, the lighting feature representation from the target background image of the training tuple (Ren, p. 3, Figure 2, showing the lighting-aware diffusion using a background conditioned model and lighting alignment using environment conditioned model that utilizing the UNetbg as the lighting conditioning neural network with modifiable parameters based on the loss LD, deriving lighting features from background, p. 4, col. 1, Sec. 3.1, disclosing a training tuple includes input image alongside with its alpha mask, the target background, the target environment map, and the target image, indicating the lighting-aware diffusion and the lighting alignment can correspond to the lighting conditioning neural network with loss function corresponding to modifiable parameters that can be utilized to generate lighting representation as the lighting feature representation from the target background image of the training tuple). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Cheon, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.)).
Regarding claim 12, the combination of Pandey, Cheon, Villegas, and Ren discloses the system of claim 11, wherein modifying the parameters of the lighting conditioning neural network comprises: determining the differences between the lighting feature representation and the environment lighting feature representation utilizing an alignment neural network layer between the lighting conditioning neural network and the environment lighting conditioning neural network (Ren, p. 3, Figure 2, showing Stage II Lighting Alignment that aligns lighting features derived from the background with the environment map, col. 2, Sec. 3, 3rd para., disclosing a representation alignment step that adapt the lighting representation extracted from a background image towards the learned representation from its environment map, p. 4, col. 2, 3rd para., disclosing an alignment network that calibrates the background-extracted lighting representation to align with its environment map equivalent, Equation (2), showing the loss function LA with respect the difference between the lighting feature representation and the environment lighting feature representation); and modifying the parameters of the lighting conditioning neural network and parameters of the alignment neural network layer to reduce the differences between the lighting feature representation and the environment lighting feature representation (Ren, p. 3, Figure 2, showing the loss function LA is used to train the alignment neural network layer and the layer extracting lighting feature representation from the background as the light conditioning neural network, indicating the parameters of the lighting conditioning neural network and parameters of the alignment neural network layer are modified according to the loss function that reduced the difference between the lighting feature representation and the environment lighting feature representation). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Cheon, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.)).
Regarding claim 13, the combination of Pandey, Cheon, Villegas, and Ren discloses the system of claim 10, generating, utilizing a diffusion-based generative neural network conditioned on the lighting feature representation, a digital image including the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object (Ren, p. 3, Figure 2, p. 5, Figure 3, showing the diffusion-based generative neural network conditioned on the lighting feature representation is used to generate a digital image including the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object); and jointly modifying the parameters of the lighting conditioning neural network and parameters of the diffusion-based generative neural network to reduce an output of a loss function based on a noise input and the digital image (Ren, p. 3, col. 2, Sec. 3, 1st para., disclosing lighting-aware diffusion training using a diffusion model and attaching a light representation learning branch to encode lighting information from the background image which is then injected into the UNet backbone, p. 4, col. 1, 4th para., disclosing noise is added to the target image and the UNet is conditioned on the background-extracted lighting feature and is trained to predict the noise with a loss function and jointly training both UNet and the conditioning branch, indicating the jointly training both UNet (corresponding to the diffusion-based generative neural network) and the light representation learning branch (corresponding to the lighting conditioning neural network) with added noise as noise input teaches jointly modifying parameters of the lighting conditioning neural network and the diffusion-based generative neural network to reduce an output of a loss function based on a noise input and the digital image generated utilizing the diffusion-based generative neural network according to the training tuple). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Cheon, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.).
Regarding claim 14, the combination of Pandey, Cheon, Villegas, and Ren discloses the system of claim 9, further comprising modifying parameters of a diffusion-based generative neural network conditioned on the environment lighting feature representation of the target background image by: generating a synthesis training dataset comprising a plurality of training images, a plurality of synthetic background images comprising inpainted backgrounds from the plurality of training images, and a plurality of synthetic digital images generated by inserting objects of the plurality of training images inserted into additional background images utilizing the diffusion-based generative neural network (Ren, p. 5, Figure 3, showing data synthesis pipeline, col. 1, 2nd para., disclosing the process of the data synthesis); and modifying parameters of the diffusion-based generative neural network to reduce differences between the plurality of training images and a plurality of digital images generated by the diffusion-based generative neural network from the plurality of synthetic digital images (Ren, p. 5, col. 1, 2nd para., disclosing combining the original light stage dataset with the synthetic data to refine the model by finetuning the UNet backbone to refine the synthesis quality while maintain the learned lighting plausibility). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Cheon, and Villegas with Ren. The suggestion/motivation would have been to produce color and lighting harmonized output for a composite image as suggested by Ren (see Ren, p. 3, col. 2, Sec. 3, 1st para.).
Regarding claim 15, the combination of Pandey, Cheon, Villegas, and Ren discloses the system of claim 14, wherein generating the synthesis training dataset comprises: extracting an object from a training image of the plurality of training images according to an object mask (Ren, p. 5, Figure 3, showing extracting objects from a real image according to an object mask); generating a synthetic background image of the plurality of synthetic background images by inpainting the training image to remove the object from the training image (Ren, p. 5, Figure 3, showing extracting objects from a real image according to an object mask and a synthetic background image is generated using generative inpainting); and generating a synthetic digital image of the plurality of synthetic digital images comprising a modified version of the object inserted into an additional background image with modified color values utilizing the diffusion-based generative neural network (Ren, p. 5, col. 1, 2nd para., disclosing a data synthesis pipeline that extracting foreground mask, inpainting the foreground region to create a clean background image, which can serve as the condition input for the training, the lighting of the foreground subject(s) is altered by running the trained model from stage I/II with a randomly chosen background image or environment map as the condition). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Cheon, and Villegas with Ren. The suggestion/motivation would have been to ensure that the ground truth for finetuning the diffusion model remains real images. as suggested by Ren (see Ren, p. 5, col. 1, 2nd para.).
Regarding claim 16, the combination of Pandey, Cheon, Villegas, and Ren discloses the system of claim 9, wherein further comprising: determining, from a request to generate a digital image, an object from an input image to insert into a selected background image (Pandey, p.1:4, Fig. 3, showing determining an object from an input image to insert into a selected background image); generating, utilizing the lighting conditioning neural network with modified parameters, an additional lighting feature representation indicating lighting parameters of the selected background image in an encoding space (Ren, p. 3, Figure 2, showing synthetic dataset has the lighting feature representation generated from the synthetic background image, p. 5, Figure 3, showing the synthetic input image and the synthetic background in the synthetic dataset, col. 1, 2nd para., disclosing combining the original light stage dataset with the synthetic data to refine the model, indicating the synthetic background can be an input in the model where an additional lighting feature representation from the synthetic background image can be generated utilizing the stage I corresponding to the lighting conditioning neural network); and generating, utilizing a diffusion-based generative neural network conditioned on the additional lighting feature representation of the selected background image, the digital image comprising a modified version of the object within the selected background image with modified color values according to the lighting parameters of the selected background image (Ren, p. 3, Figure 2, showing the stage III of the model that can generate the real image target utilizing the UUNet in the final model as the diffusion-based generative neural network conditioned on the additional lighting feature representation, Figure 3, showing the synthetic input image and the synthetic background in the synthetic dataset). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Pandey, Cheon, and Villegas with Ren. The suggestion/motivation would have been to ensure that the ground truth for finetuning the diffusion model remains real images. as suggested by Ren (see Ren, p. 5, col. 1, 2nd para.).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAIXIA DU whose telephone number is (571)270-5646. The examiner can normally be reached Monday - Friday 8:00 am-4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HAIXIA DU/Primary Examiner, Art Unit 2611