Last updated: April 19, 2026
Application No. 18/452,827
IMAGE RELIGHTING

Non-Final OA §103
Filed
Aug 21, 2023
Examiner
WANG, JIN CHENG
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
3 (Non-Final)
Interview Optional

— +10.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 832 resolved cases, 2023–2026
Examiner Intelligence

WANG, JIN CHENG View full profile →
Grants 59% of resolved cases
Career Allow Rate
492 granted / 832 resolved
-2.9% vs TC avg
Moderate +10% lift
Without
With
+10.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
40 currently pending
Career history
872
Total Applications
across all art units
Statute-Specific Performance

§101
11.8%
-28.2% vs TC avg
§103
62.7%
+22.7% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 832 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed 11/10/2025 has been entered. The claims 1, 4, 8, 10-14. 16 and 18 have been amended. The claims 1-20 are pending in the current application. 

Response to Arguments
Applicant's arguments filed 11/10/2025 have been fully considered but are moot in view of the new grounds of rejection set forth in the current Office Action. 
Applicant's arguments filed 11/10/2025 have been fully considered but they are not persuasive. 
In Remarks, applicant repeated the claim languages set forth in the claim 1 and alleged that the previously cited references do not teach the target lighting representation is in a light representation space different from the latent space. The examiner cannot concur. 
However, Yamada’s lighting parameter in the lighting space or a condition vector expressing a lighting environment in which relighting is performed is different from the latent space of the input image. The condition vector expressing a lighting environment is a vector in the lighting space. 
In Remarks, applicant argued that the previously cited references do not teach the mapping network that generates a modified latent vector based on the input latent vector and the target lighting representation. 
The examiner cannot concur. 
Sun incorporated in Yamada by reference teaches at FIG. 2 and Section 1 that the target illumination is different from the latent space and the image generation network (a deep neural network) includes an encoder-decoder network under an arbitrary user-specified environment map. Sun teaches at Section 3 that the relit image is generated by the deep neural network using the input portrait image and the target lighting Lt. Sun teaches at Section 3.2 that in the decoder of our network, we feed the target lighting Lt as input into the network and encode it using conv layers before concatenating it with the representation (latent space) of the source image produced by the encoder. It is known from Dherse that the encoder generates a latent space vector of the input portrait image as the representation of the source image. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 

Yamada’s processor 11 (the mapping network) maps the input latent vector of the input image and the condition vector expressing a lighting environment in which relighting is performed to a modified latent vector. 
For example, Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 (performing the operations of the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. Then, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150 and the mapping unit 140 acquires the lighting environment of the training image from the data input unit 110. The mapping unit 140 acquires a latent space vector (a modified latent vector reflecting lighting environment) by embedding, in the latent space of the image generation model, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. The generation unit 150 acquires the latent space vector group from the mapping unit 140 and generates the feature quantity of the image structure in each layer of the generator. The generation unit 150 generates a relighted image by converting the feature quantity having the highest resolution of the feature group B into an RGB color space and passes the generated relighted image to the feature correction unit 160 and the feature correction unit 160 generates a corrected relighted image obtained by correcting the relighted image. 
Applicant then alleged that the cited references fail to disclose or teach that the generated modified latent vector represents the image content with the target lighting condition in a same input latent space as the input latent vector. The examiner cannot concur. 
Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-10 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over A. P. Dherse, et al., “Scene relighting with illumination estimation in the latent space on an encoder-decoder scheme”, DOI:10.48550/arXiv.2006.02333, June 3, 2020, pp. 1-30 (hereinafter Dherse) in view of Yamada et al. US-PGPUB No. 2024/0185391 (hereinafter Yamada) incorporating by reference Sun et al., “Single Image Portrait Relighting” at https://dl.acm.org/doi/pdf/10.1145/3306346.3323008, ACM Transactions on Graphics (TOG), Vol. 38, Issue 4, August 2019, pp. 79:1-19.12 (hereinafter Sun). 
Re Claim 1:  
Dherse teaches a method comprising: 
obtaining an input latent vector for an image generation network and a target lighting representation, wherein the input latent vector represents image content in a latent space of a decoder of the image generation network, and the target lighting representation indicates a target lighting condition different from a lighting condition of the image content in a lighting representation space different from the latent space (
Applicant’s specification at FIG. 4 shows the relighting model 330 takes input image 225 (as opposed to an input latent vector) and the encoder within the relighting model 330 produces the input latent vector 420 based on the target lighting representation 435(Lr) to generate a new latent vector and the StyleGAN 450 generates an output image 275. 
However, the target lighting representation is the target lighting latent vector (see Paragraph 0072 of applicant’s specification). 
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I);  
Generating, using a mapping network, a modified latent vector based on the input latent vector and the target lighting representation, wherein the mapping network comprises a neural network layer that takes a first input in the latent space and a second input in the lighting representation space, and produce an output in the latent space (Applicant’s specification at FIG. 4 shows the relighting model 330 takes input image 225 (as opposed to an input latent vector) and the encoder within the relighting model 330 produces the input latent vector 420 based on the target lighting representation 435(Lr) to generate a new latent vector and the StyleGAN 450 generates an output image 275. 
However, the target lighting representation is the target lighting latent vector (see Paragraph 0072 of applicant’s specification). 
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I. 
Dherse teaches at Section 3 at Page 8 that our network encoder for I and T transforms two images into their respective latent representations and a single representation is decoded into an output image). 
Generating, using the decoder of the image generation network, an image by decoding the modified latent vector, wherein the image depicts the image content with the target lighting condition (Dherse teaches at Section 3 at Page 8 that our network encoder for I and T transforms two images into their respective latent representations and a single representation is decoded into an output image). 

Sun incorporated in Yamada by reference teaches at FIG. 2 and Section 1 that the target illumination is different from the latent space and the image generation network (a deep neural network) includes an encoder-decoder network under an arbitrary user-specified environment map. Sun teaches at Section 3 that the relit image is generated by the deep neural network using the input portrait image and the target lighting Lt. Sun teaches at Section 3.2 that in the decoder of our network, we feed the target lighting Lt as input into the network and encode it using conv layers before concatenating it with the representation (latent space) of the source image produced by the encoder. It is known from Dherse that the encoder generates a latent space vector of the input portrait image as the representation of the source image. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 

Yamada teaches a method comprising: 
obtaining an input latent vector for an image generation network and a target lighting representation, wherein the input latent vector represents image content in a latent space of a decoder of the image generation network, and the target lighting representation indicates a target lighting condition different from a lighting condition of the image content in a lighting representation space different from the latent space (
Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150 and the mapping unit 140 acquires the lighting environment of the training image from the data input unit 110. The mapping unit 140 acquires a latent space vector (a modified latent vector reflecting lighting environment) by embedding, in the latent space of the image generation model, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. The generation unit 150 acquires the latent space vector group from the mapping unit 140 and generates the feature quantity of the image structure in each layer of the generator. The generation unit 150 generates a relighted image by converting the feature quantity having the highest resolution of the feature group B into an RGB color space and passes the generated relighted image to the feature correction unit 160 and the feature correction unit 160 generates a corrected relighted image obtained by correcting the relighted image. 
Yamada teaches at Paragraph 0044-0046 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph 0052-0053 that the relighted image generation unit includes the mapping unit 140 that acquires a latent space vector capable of generating a target which only the lighting environment is changed, by embedding, in a latent space of an image generation model learned with the large-scale data set, a feature quantity in which a condition vector expressing the lighting environment desired to be reflected is reflected in a feature quantity of an image structure of the input image);
Generating, using a mapping network, a modified latent vector based on the input latent vector and the target lighting representation, wherein the mapping network comprises a neural network layer that takes a first input in the latent space and a second input in the lighting representation space, and produce an output in the latent space (
Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 

Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space of the generation unit 150 and the mapping unit 140 acquires the lighting environment of the training image from the data input unit 110. The mapping unit 140 acquires a latent space vector by embedding, in the latent space of the image generation model, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0044-0046 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space); and
generating an image by decoding the modified latent vector using the image generation network, wherein the image depicts the image content with the target lighting condition (
Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 

Yamada teaches at Paragraph 0044-0046 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Dherse’s generating a modified image based on the modified latent vector into the neural network of Yamada’s deep layer generation model to have generated relighted image using the deep layer generation model. One of the ordinary skill in the art would have relighted the input image based on the modified latent vector and the target illumination to have generated a relighted image. 
Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the image depicts an object that is lit according to the target lighting representation.  
Dherse further teaches the claim limitation that the image depicts an object that is lit according to the target lighting representation (
Dherse teaches at FiG. 7 that the image describes an object that is lit according to the target lighting condition (Lr). 
Applicant’s specification at FIG. 4 shows the relighting model 330 takes input image 225 (as opposed to an input latent vector) and the encoder within the relighting model 330 produces the input latent vector 420 based on the target lighting representation 435(Lr) to generate a new latent vector and the StyleGAN 450 generates an output image 275. 
However, the target lighting representation is the target lighting latent vector (see Paragraph 0072 of applicant’s specification). 
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I).  

Re Claim 3: 
The claim 3 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the target lighting representation comprises a vector in a lighting representation space. 
Yamada and Dherse further teaches the claim limitation that the target lighting representation comprises a vector in a lighting representation space (
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I. 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space of the generation unit 150 and the mapping unit 140 acquires the lighting environment of the training image from the data input unit 110. The mapping unit 140 acquires a latent space vector by embedding, in the latent space of the image generation model, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed). 
Re Claim 4: 
The claim 4 encompasses the same scope of invention as that of the claim 3 except additional claim limitation that the modified latent vector is generated using the mapping network that is trained by using a lighting loss that compares an output of the image generation network in the lighting representation space.  
Dherse and Yamada further teach the claim limitation that the modified latent vector is generated using the mapping network that is trained by using a lighting loss that compares an output of the image generation network in the lighting representation space (
Dherse teaches at Section 3 generating by the deep learning network a training image based on the input latent vector of the original image I and the target lighting representation Lt. Dherse teaches at Section 3.4 computing the scene latent loss and light latent loss. 
Dherse teaches at Page 5, Section 3.1 and FIG. 10 that latent light scene split and illumination predictor with light direction and light color temperature prediction auxiliary losses and the scene is illuminated in different directions and an encoder-decoder predicts the illumination corresponding to the source image and replace it with the target lighting. Dherse teaches at Section 3.1 that the illumination space is the Cartesian product of the light color temperature set (intensities) and the light direction set and at Page 10 that the color temperature has been translated into RGB values with the use of the Color-Science Python library and we represent light direction as a Gaussian distribution over value/brightness component of the light color in HSL color space…..The other half is used for light color estimation in HSL color space. 
Yamada teaches at Paragraph 0044-0046 that the processor 11 (acts the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180).  

Re Claim 5: 
The claim 5 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the target lighting representation indicates a direction and an intensity of a light source. 
Dherse further teaches the claim limitation that the target lighting representation indicates a direction and an intensity of a light source (
Dherse teaches at Page 5, Section 3.1 and FIG. 10 that latent light scene split and illumination predictor with light direction and light color temperature prediction auxiliary losses and the scene is illuminated in different directions and an encoder-decoder predicts the illumination corresponding to the source image and replace it with the target lighting. Dherse teaches at Section 3.1 that the illumination space is the Cartesian product of the light color temperature set (intensities) and the light direction set).  
Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that obtaining an additional lighting representation;
generating an additional latent vector based on the input latent vector and the additional lighting representation; and
generating an additional image based on the additional latent vector, wherein the additional image shares an attribute with the image and has different lighting from the image according to the additional lighting representation. 
Dherse further teaches the claim limitation that obtaining an additional lighting representation;
generating an additional latent vector based on the input latent vector and the additional lighting representation; and
generating an additional image based on the additional latent vector, wherein the additional image shares an attribute with the image and has different lighting from the image according to the additional lighting representation (
Dherse teaches at Page 5, Section 3.1 and FIG. 10 that latent light scene split and illumination predictor with light direction and light color temperature prediction auxiliary losses and the scene is illuminated in different directions and an encoder-decoder predicts the illumination corresponding to the source image and replace it with the target lighting. Dherse teaches at Section 3.1 that the illumination space is the Cartesian product of the light color temperature set (intensities) and the light direction set and at Page 10 that the color temperature has been translated into RGB values with the use of the Color-Science Python library and we represent light direction as a Gaussian distribution over value/brightness component of the light color in HSL color space…..The other half is used for light color estimation in HSL color space. 
Dherse teaches at FIGS. 7-9 example of relighting is performed for an input I and target T repeatedly).  
Re Claim 8: 
Dherse teaches a method comprising: 
obtaining an input latent vector for an image generation network and a target lighting representation, wherein the input latent vector represents image content in a latent space of a decoder of the image generation network, and the target lighting representation indicates a target lighting condition different from a lighting condition of the image content in a lighting representation space different from the latent space (
Applicant’s specification at FIG. 4 shows the relighting model 330 takes input image 225 (as opposed to an input latent vector) and the encoder within the relighting model 330 produces the input latent vector 420 based on the target lighting representation 435(Lr) to generate a new latent vector and the StyleGAN 450 generates an output image 275. 
However, the target lighting representation is the target lighting latent vector (see Paragraph 0072 of applicant’s specification). 
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I);  
Generating a training image based on the input latent vector and the target lighting representation using the image generation network (Dherse teaches at Section 3 generating by the deep learning network a training image based on the input latent vector of the original image I and the target lighting representation Lt); 
Computing a lighting loss based on the training image (Dherse teaches at Section 3.4 computing the scene latent loss and light latent loss); 
Generating, using a mapping network, a modified latent vector based on the input latent vector and the target lighting representation, wherein the mapping network comprises a neural network layer that takes a first input in the latent space and a second input in the lighting representation space, and produce an output in the latent space (
Applicant’s specification at FIG. 4 shows the relighting model 330 takes input image 225 (as opposed to an input latent vector) and the encoder within the relighting model 330 produces the input latent vector 420 based on the target lighting representation 435(Lr) to generate a new latent vector and the StyleGAN 450 generates an output image 275. 
However, the target lighting representation is the target lighting latent vector (see Paragraph 0072 of applicant’s specification). 
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I. 
Dherse teaches at Section 3 at Page 8 that our network encoder for I and T transforms two images into their respective latent representations and a single representation is decoded into an output image). 
Generating, using the decoder of the image generation network, an image by decoding the modified latent vector, wherein the image depicts the image content with the target lighting condition (Dherse teaches at Section 3 at Page 8 that our network encoder for I and T transforms two images into their respective latent representations and a single representation is decoded into an output image). 
Sun incorporated in Yamada by reference teaches at FIG. 2 and Section 1 that the target illumination is different from the latent space and the image generation network (a deep neural network) includes an encoder-decoder network under an arbitrary user-specified environment map. Sun teaches at Section 3 that the relit image is generated by the deep neural network using the input portrait image and the target lighting Lt. Sun teaches at Section 3.2 that in the decoder of our network, we feed the target lighting Lt as input into the network and encode it using conv layers before concatenating it with the representation (latent space) of the source image produced by the encoder. It is known from Dherse that the encoder generates a latent space vector of the input portrait image as the representation of the source image. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Yamada further teaches a method of training an image generation network, comprising:
obtaining an input latent vector for an image generation network and a target lighting representation, wherein the input latent vector represents image content in a latent space of a decoder of the image generation network, and the target lighting representation indicates a target lighting condition different from a lighting condition of the image content in a lighting representation space different from the latent space (
Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space of the generation unit 150 and the mapping unit 140 acquires the lighting environment of the training image from the data input unit 110. The mapping unit 140 acquires a latent space vector by embedding, in the latent space of the image generation model, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. The generation unit 150 acquires the latent space vector group from the mapping unit 140 and generates the feature quantity of the image structure in each layer of the generator. The generation unit 150 generates a relighted image by converting the feature quantity having the highest resolution of the feature group B into an RGB color space and passes the generated relighted image to the feature correction unit 160 and the feature correction unit 160 generates a corrected relighted image obtained by correcting the relighted image. 
Yamada teaches at Paragraph 0044-0046 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph 0052-0053 that the relighted image generation unit includes the mapping unit 140 that acquires a latent space vector capable of generating a target which only the lighting environment is changed, by embedding, in a latent space of an image generation model learned with the large-scale data set, a feature quantity in which a condition vector expressing the lighting environment desired to be reflected is reflected in a feature quantity of an image structure of the input image);
generating a training image based on the input latent vector and the target lighting representation using the image generation network (
Yamada teaches at Paragraph 0044-0046 that the processor 11 (acts the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180);
computing a lighting loss based on the training image (Yamada teaches at Paragraph 0044-0046 that the processor 11 (acts the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180); 
Generating, using a mapping network, a modified latent vector based on the input latent vector and the target lighting representation, wherein the mapping network comprises a neural network layer that takes a first input in the latent space and a second input in the lighting representation space, and produce an output in the latent space (Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector as a modified latent space vector); and
Generating, using the decoder of the image generation network, an image by decoding the modified latent vector, wherein the image depicts the image content with the target lighting condition (
Yamada teaches at Paragraph 0030 that the feature correction unit 160 includes a multilayer decoder and generates a corrected relighted image obtained by correcting the relighted image and the feature correction unit 160 (decoder) acquires the feature group A in which features quantities of image structures in the respective layers of the encoder are stacked and uses the feature quantity obtained by combining the selected feature quantities as an input to the decoder and the feature correction unit 160 enlarges the resolution of the feature quantity group B equal to the resolution and uses the feature quantity as an input to the next layer of the decoder and the relighted image obtained by converting the feature quantity output from the final layer of the decoder into the RGB color space as a corrected relighted image. 
Yamada teaches at Paragraph 0044-0046 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color spaceYamada teaches at Paragraph 0044-0046 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph 0044-0046 that the processor 11 (acts the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Dherse’s generating a modified image based on the modified latent vector into the neural network of Yamada’s deep layer generation model to have generated relighted image using the deep layer generation model. One of the ordinary skill in the art would have relighted the input image based on the modified latent vector and the target illumination to have generated a relighted image. 

Re Claim 9: 
The claim 9 encompasses the same scope of invention as that of the claim 8 except additional claim limitation that computing an output lighting representation based on the training image; and comparing the output lighting representation to the target lighting representation, wherein the lighting loss is based on the comparison of the output lighting representation to the target lighting representation.
Dherse and Yamada further teach the claim limitation that computing an output lighting representation based on the training image (Dherse teaches at Section 3 generating by the deep learning network a training image based on the input latent vector of the original image I and the target lighting representation Lt. Dherse teaches at Section 3.4 computing the scene latent loss and light latent loss. 
Dherse teaches at Page 5, Section 3.1 and FIG. 10 that latent light scene split and illumination predictor with light direction and light color temperature prediction auxiliary losses and the scene is illuminated in different directions and an encoder-decoder predicts the illumination corresponding to the source image and replace it with the target lighting. Dherse teaches at Section 3.1 that the illumination space is the Cartesian product of the light color temperature set (intensities) and the light direction set and at Page 10 that the color temperature has been translated into RGB values with the use of the Color-Science Python library and we represent light direction as a Gaussian distribution over value/brightness component of the light color in HSL color space…..The other half is used for light color estimation in HSL color space. 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space of the generation unit 150 and the mapping unit 140 acquires the lighting environment of the training image from the data input unit 110. The mapping unit 140 acquires a latent space vector by embedding, in the latent space of the image generation model, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. The generation unit 150 acquires the latent space vector group from the mapping unit 140 and generates the feature quantity of the image structure in each layer of the generator. The generation unit 150 generates a relighted image by converting the feature quantity having the highest resolution of the feature group B into an RGB color space and passes the generated relighted image to the feature correction unit 160 and the feature correction unit 160 generates a corrected relighted image obtained by correcting the relighted image); and comparing the output lighting representation to the target lighting representation, wherein the lighting loss is based on the comparison of the output lighting representation to the target lighting representation (Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180). 
Re Claim 10: 
The claim 10 encompasses the same scope of invention as that of the claim 9 except additional claim limitation that the mapping network is trained based on the lighting loss. 
Yamada further teaches the claim limitation that the mapping network is trained based on the lighting loss (Yamada teaches at Paragraph 0044-0046 that the processor 11 (acts the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180). 
Re Claim 16: 
Dherse teaches an apparatus comprising:
an image generation network comprising parameters stored in the one or more memories, wherein the image generation network takes an input latent vector and a target lighting representation as input, the input latent vector represents image content in a latent space of a decoder of the image generation network, the target lighting representation indicates a target lighting condition different from a lighting condition of the image content in a lighting representation space different from the latent space, and the image generation network is trained to generate images based on the target lighting representation using a lighting loss by performing operations comprising (
Applicant’s specification at FIG. 4 shows the relighting model 330 takes input image 225 (as opposed to an input latent vector) and the encoder within the relighting model 330 produces the input latent vector 420 based on the target lighting representation 435(Lr) to generate a new latent vector and the StyleGAN 450 generates an output image 275. 
However, the target lighting representation is the target lighting latent vector (see Paragraph 0072 of applicant’s specification). 
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I);  
Generating, using a mapping network, a modified latent vector based on the input latent vector and the target lighting representation, wherein the mapping network comprises a neural network layer that takes a first input in the latent space and a second input in the lighting representation space, and produce an output in the latent space (Applicant’s specification at FIG. 4 shows the relighting model 330 takes input image 225 (as opposed to an input latent vector) and the encoder within the relighting model 330 produces the input latent vector 420 based on the target lighting representation 435(Lr) to generate a new latent vector and the StyleGAN 450 generates an output image 275. 
However, the target lighting representation is the target lighting latent vector (see Paragraph 0072 of applicant’s specification). 
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I. 
Dherse teaches at Section 3 at Page 8 that our network encoder for I and T transforms two images into their respective latent representations and a single representation is decoded into an output image). 
Generating, using the decoder of the image generation network, an image by decoding the modified latent vector, wherein the image depicts the image content with the target lighting condition (Dherse teaches at Section 3 at Page 8 that our network encoder for I and T transforms two images into their respective latent representations and a single representation is decoded into an output image). 
Sun incorporated in Yamada by reference teaches at FIG. 2 and Section 1 that the target illumination is different from the latent space and the image generation network (a deep neural network) includes an encoder-decoder network under an arbitrary user-specified environment map. Sun teaches at Section 3 that the relit image is generated by the deep neural network using the input portrait image and the target lighting Lt. Sun teaches at Section 3.2 that in the decoder of our network, we feed the target lighting Lt as input into the network and encode it using conv layers before concatenating it with the representation (latent space) of the source image produced by the encoder. It is known from Dherse that the encoder generates a latent space vector of the input portrait image as the representation of the source image. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 

Yamada further teaches an apparatus comprising:
one or more processors (e.g., processor 11 of FIG. 2);
one or more memories including instructions executable by the one or more processors (e.g., the memory 12 of FIG. 2 and Paragraph 0035);
an image generation network comprising parameters stored in the one or more memories (Yamada teaches at Paragraph 0032 that the model storage unit 180 has parameters of the learned deep layer generation model and at Paragraph 0036 that the data memory 13 uses as the model storage unit 180), 
an image generation network comprising parameters stored in the one or more memories, wherein the image generation network takes an input latent vector and a target lighting representation as input, the input latent vector represents image content in a latent space of a decoder of the image generation network, the target lighting representation indicates a target lighting condition different from a lighting condition of the image content in a lighting representation space different from the latent space, and the image generation network is trained to generate images based on the target lighting representation using a lighting loss by performing operations comprising (
Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0044-0046 that the processor 11 (acts the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180);  
Generating, using a mapping network, a modified latent vector based on the input latent vector and the target lighting representation, wherein the mapping network comprises a neural network layer that takes a first input in the latent space and a second input in the lighting representation space, and produce an output in the latent space (Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector as a modified latent space vector); and  
Generating, using the decoder of the image generation network, an image by decoding the modified latent vector, wherein the image depicts the image content with the target lighting condition (Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 

Yamada teaches at Paragraph 0028 that the mapping unit 140 acquires a latent space vector by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed, which is indicated by the lighting environment of the training image acquired from the data input unit 110, in the feature quantity of the image structure acquired from the image structure feature extraction unit 130. That is, the mapping unit 140 regards the acquired lighting environment as a condition vector, and expands the acquired lighting environment to match the resolution of the feature quantity of the image structure of each layer to create the feature quantity obtained (or combined) by taking the element product. The mapping unit 140 inputs the feature quantity of the image structure conditioned by the created lighting environment to each encoder included in the mapping unit 140, converts the feature quantity into a latent space vector that is a vector expressing the latent space by each encoder, and stacks the converted latent space vectors. Then, the mapping unit 140 passes the stacked latent space vector group to the generation unit 150.
Yamada teaches at Paragraph 0029 that he generation unit 150 has a multilayer generator, and generates an (relighted) image using the latent space vector as an input. Then, the generation unit 150 inputs the latent space vector corresponding to each layer of the generator from the acquired vector group, generates the feature quantity of the image structure in each layer of the generator, and stacks the feature quantity of the generated image structure. Hereinafter, a feature group in which feature quantities of image structures generated in the respective layers of the generator are stacked is referred to as a “feature group B”. The generation unit 150 passes the feature group B to the feature correction unit 160. 

Yamada teaches at Paragraph 0030 that the feature correction unit 160 includes a multilayer decoder and generates a corrected relighted image obtained by correcting the relighted image and the feature correction unit 160 (decoder) acquires the feature group A in which features quantities of image structures in the respective layers of the encoder are stacked and uses the feature quantity obtained by combining the selected feature quantities as an input to the decoder and the feature correction unit 160 enlarges the resolution of the feature quantity group B equal to the resolution and uses the feature quantity as an input to the next layer of the decoder and the relighted image obtained by converting the feature quantity output from the final layer of the decoder into the RGB color space as a corrected relighted image. 
Yamada teaches at Paragraph 0044-0046 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Dherse’s generating a modified image based on the modified latent vector into the neural network of Yamada’s deep layer generation model to have generated relighted image using the deep layer generation model. One of the ordinary skill in the art would have relighted the input image based on the modified latent vector and the target illumination to have generated a relighted image. 
Re Claim 17: 
The claim 17 encompasses the same scope of invention as that of the claim 16 except additional claim limitation that the image generation network comprises a generative adversarial network. 
Dherse and Yamada further teach the claim limitation that the image generation network comprises a generative adversarial network (
Dherse teaches at Section 5.1 the image generation network is GAN. 
Yamada teaches at Paragraph 0029 that the generator uses a deep layer generation model obtained by pre-learning a task of generating only a target to be relighted using a large-scale data set such as StyleGAN2. It is known that StyleGAN is a generative adversarial network). 

Re Claim 18: 
The claim 18 encompasses the same scope of invention as that of the claim 17 except additional claim limitation that the image generation network comprises the mapping network configured to generate the modified latent vector based on an input latent vector and the target lighting representation. 
Dherse and Yamada further teach the claim limitation that the image generation network comprises the mapping network configured to generate the modified latent vector based on an input latent vector and the target lighting representation (
Dherse teaches at Section 3.1 that the deep learning network takes the original image I and the target image T from which the lighting conditions should be acquired and the ground truth image G(I, T) corresponds to the input scene and the target illumination LT. 
It is known that the output image Ĝ(I, T) is generated by the deep learning network to approximate actual G(I, T) corresponding to the input scene and the target illumination LT. Thus, the deep learning network takes the target illumination LT as the second input to generate the latent representation of the target illumination and the latent vector encoded from the input scene I as first input.
Dherse teaches at Section 3.2 (Page 8) that the target illumination LT is represented by a small environment map image. 
Dherse teaches the claim invention in the same manner as Applicant’s specification. 
The encoder-decoder network of Dherse FIG. 3 corresponds to the relighting model 330 takes an input image I (as opposed to an input latent vector) and the target image T representing the target illumination LT. The encoder within the encoder-decoder network produces an input latent vector for I based on the new lighting condition T to generate a new latent vector. 
Dherse teaches at Section 3.1 that the deep learning network we used to solve the image relighting task takes two images as input: the original image I in which lighting conditions should be changed and the target image T from which the lighting conditions LT should be acquired. The target illumination conditions are encoded as the latent representation to generate one encoding representing the original image under the new light conditions. 
Dherse teaches at Section 3.3 that predicting from it directly light condition properties and the encoding fed to the decoder consists of the scene latent representation from I and light latent representation encoded from T. 
Dherse teaches at Section 3.2-3.3 and FIG. 3 that the image generation network includes encoder/decoder system G with the input latent vector representing the input image I and the target lighting representation T and we use an encoder for I and T that transforms two images into their respective latent representations and the latent space of the decoder is different from the light conditions LT of a given image….the decoder will be provided with the encoding of T in conjunction with skip connections from the encoder of I. Accordingly, the lighting condition LT is different from the latent encoding of I. 
Dherse teaches at Section 3 at Page 8 that our network encoder for I and T transforms two images into their respective latent representations and a single representation is decoded into an output image. 
Yamada teaches the same latent space of the image generation model where the latent vector of the input image is generated and the modified latent vector reflecting the lighting environment is generated. Yamada teaches at Paragraph 0007 and Paragraph 0028 embedding an input image in a latent space of an image generation unit. It is the same latent space where the input image is converted into a latent vector in the latent space of the generation unit 150 and the modified latent vector is generated by the generation unit 150 in the same latent space of the image generation model that reflects the lighting environment. 
Yamada teaches at Paragraph 0028 that the mapping unit maps the latent vector of the input image to a modified latent space vector reflecting a condition vector expressing a lighting environment in which relighting is performed. Yamada further teaches at Paragraph 0044 that the processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 
Yamada teaches at Paragraph 0028-0030 that each encoder converts the input data into a vector representing a latent space (input latent vector) of the generation unit 150. hen, the mapping unit 140 acquires a latent space vector (the modified latent space vector) by embedding, in the latent space of the image generation model learned with the large-scale data set, a feature quantity obtained by reflecting a condition vector expressing a lighting environment in which relighting is performed. 
Yamada teaches at Paragraph 0043 that the processor 11 executes the operation as the image structure feature extraction unit 130 simultaneously in parallel with the processing in step S12. That is, the processor 11 reads an input image I from the temporary storage area 13B, and extracts, from the input image I, a feature quantity for estimating a shape and/or texture, which is a feature quantity of an image structure of the input image I (step S13). The processor 11 stores the feature group A (latent vector of the input image) in which the feature quantities of the extracted image structures are stacked in the temporary storage area 13B.
Yamada teaches at Paragraph 0044 that the processor 11 executes the operation as the mapping unit 140, reads the feature group A in which the lighting environment and the feature quantity of the training image are stacked from the temporary storage area 13B, and combines the feature group A with the feature quantity of the image structure (step S14). The lighting environment of the training image is a lighting environment desired to be reflected. The processor 11 converts the combined feature quantity into a latent space vector (a modified latent space vector). 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over A. P. Dherse, et al., “Scene relighting with illumination estimation in the latent space on an encoder-decoder scheme”, DOI:10.48550/arXiv.2006.02333, June 3, 2020, pp. 1-30 (hereinafter Dherse)  in view of Yamada et al. US-PGPUB No. 2024/0185391 (hereinafter Yamada) incorporating by reference Sun et al., “Single Image Portrait Relighting” at https://dl.acm.org/doi/pdf/10.1145/3306346.3323008, ACM Transactions on Graphics (TOG), Vol. 38, Issue 4, August 2019, pp. 79:1-19.12 (hereinafter Sun);  
and Zhu et al. US-PGPUB No. 2022/0335689 (hereinafter Zhu). 
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that generating a random input vector; and generating the input latent vector based on the random input vector. 
Zhu teaches the claim limitation that generating a random input vector; and generating the input latent vector based on the random input vector (
Zhu teaches at Paragraph 0113 that Z is the noise vector in the latent space. 
Zhu teaches at Paragraph [0067] The generator network G of the Info-WGAN is trained to map a combination of a noise vector z and a latent code vector c as input to a simulated image x (such as a 2D image of pixels or 3D volume of voxels) of geological facies. For example, the pixels or voxels of the simulated image x can represent attributes, such as rock type, of geological facies of a subterranean formation or portion thereof. The noise vector z is in a one-dimensional latent space. The latent code vector c can have specific values for different categories of geological facies represented by the simulated image x. 
Zhu teaches at Paragraph [0034] In embodiments, the noise vector can be in a one-dimensional latent space, and the category code vector can have specific values for different categories of geological facies represented by the simulated images produced by the generator neural network. 
Zhu teaches at Paragraph 0068 that the discriminator network can be trained to map images to labels corresponding to the categories of geological faces and at Paragraph 0070 that in an online phase after the training phase is complete and the generator network G has been trained, combinations of values for the noise vector z and latent code vector c can be input to the generator network G which is configured to map each combination of noise vector z and latent code vector c into a simulated image of geological faces. The trained discriminator network D can be used in the online phase to map one or more simulated images of geological facies produced by the generator network G as input to a label corresponding to a particular category of geological facies for each simulated image. The dimensional space of the category label output by the discriminator network D corresponds to the different categories of geological facies represented by the latent code vector c input to the generator network G during training).  
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the GAN of Zhu to have modified StyleGAN of Yamada to have trained the generator network to map a combination of the input noise vector and a latent code vector to a simulated image and use a discriminator to have identified the category of the simulated image. One of the ordinary skill in the art would have been motivated to have trained the generator network using the input noise vector and a latent code vector. 

Claims 11-15 are rejected under 35 U.S.C. 103 as being unpatentable over A. P. Dherse, et al., “Scene relighting with illumination estimation in the latent space on an encoder-decoder scheme”, DOI:10.48550/arXiv.2006.02333, June 3, 2020, pp. 1-30 (hereinafter Dherse)  in view of Yamada et al. US-PGPUB No. 2024/0185391 (hereinafter Yamada) incorporating by reference Sun et al., “Single Image Portrait Relighting” at https://dl.acm.org/doi/pdf/10.1145/3306346.3323008, ACM Transactions on Graphics (TOG), Vol. 38, Issue 4, August 2019, pp. 79:1-19.12 (hereinafter Sun);  
and Liu et al. US-PGPUB No. 2023/0252692 (hereinafter Liu). 
Re Claim 11: 
The claim 11 encompasses the same scope of invention as that of the claim 8 except additional claim limitation that generating a comparison image based on the input latent vector; and computing an attribute loss based on the training image and the comparison image, wherein the image generation network is trained based on the attribute loss. 
Yamada at least suggests the claim limitation that generating a comparison image based on the input latent vector; and computing an attribute loss based on the training image and the comparison image, wherein the image generation network is trained based on the attribute loss (Yamada teaches at Paragraph 0044-0046 that the processor 11 (acts the mapping unit 140) converts the combined feature quantity into a latent space vector (a modified latent vector) and stores a vector group in which the converted latent space vectors are stacked and the processor 11 executes the operation as the generation unit 150, reads a vector group in which latent space vectors are stacked and acquires a feature quantity by a generator learned in advance using the vector group as an input and generates the relighted image by converting the feature quantity into the RGB color space. 
Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180). 
Liu further teaches the claim limitation that generating a comparison image based on the input latent vector; and computing an attribute loss based on the training image and the comparison image, wherein the image generation network is trained based on the attribute loss (
Liu teaches at Paragraph [0044] that the correspondence generation system may be trained with one or more objectives including a texture swapping constraint, a structure swapping constraint, a warping loss, a Chamfer loss, and a standard GAN objective. The loss function unit 205 evaluates an overall training objective including one or more of the objectives and backpropagates the transformation and generator parameter updates to reduce the loss. The texture swapping constraint is applied to ensure that the correspondence generation system generates the same identity and image layout when the structure is fixed and only the encoded image texture 250 is modified. Given a pair of synthesized images with a shared encoded image structure 240 (derived from the latent structure vector z.sub.s) and different encoded image textures 250 (derived from the latent texture vectors z.sub.t.sub.1, z.sub.t.sub.2), the texture swapping loss [AltContent: rect].sub.t is defined as the learned perceptual image patch similarity (LPIPS) loss between the two synthesized images. 
Liu teaches at Paragraph [0061] that combining the encoder 310 with the correspondence generation system 100 along with the texture map neural network 320 and the structure map neural network 330 enables the extraction of dense correspondence from real images. Specifically, an encoder E(⋅;θ.sub.E) parameterized by θ.sub.E is introduced to map an image x to a pair of structure w.sub.s,E and texture w.sub.t,E latent codes (e.g., latent texture and latent structure vectors). The latent texture and latent structure vectors are then mapped to the encoded image texture 150 and structure 140, respectively, and input to the correspondence generation system 100 to synthesize a replica of the image. Embedding real images directly into W+ space rather than W space typically results in better reconstruction. Therefore, in an embodiment, the encoder 310 outputs latent texture vectors w.sub.t,E.sup.+ in W+ space as opposed to encoding w.sub.t,E.sup.+ in W space. During training, the modulated generator 120 may be fixed (i.e., the generator parameters 165 may be held constant) while the encoder parameters are optimized via latent consistency, reconstruction, and/or texture swapping losses. 
Liu teaches at Paragraph [0062] that the latent consistency loss may be introduced by feeding synthesized images back into the encoder 310 and matching the distribution of outputs produced by the encoder 310 to that originally produced by the correspondence generation system 100. Suppose an image is synthesized with encoded image texture 150 w.sub.t the encoded image structure 140 w.sub.s, and the correspondence map 160 C.sup.w. Inputting the synthesized image back into the encoder 310 produces the encoded image texture 150 w.sub.t,E.sup.+ and the encoded image structure 140 w.sub.s,E, and the correspondence map 160 C.sub.E).  
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Liu’s image generation network using latent consistency and texture swapping losses to have optimized the parameters of the image generation network. One of the ordinary skill in the art would have motivated to have synthesized an image by the image generation network to have optimized the texture swapping losses. 
Re Claim 12: 
The claim 12 encompasses the same scope of invention as that of the claim 8 except additional claim limitation that generating a comparison image based on the input latent vector; 
and computing a texture loss based on the training image and the comparison image, wherein the image generation network is trained based on the texture loss.  
Liu further teaches the claim limitation that generating a comparison image based on the input latent vector; and computing a texture loss based on the training image and the comparison image, wherein the image generation network is trained based on the texture loss (
Liu teaches at Paragraph [0044] that the correspondence generation system may be trained with one or more objectives including a texture swapping constraint, a structure swapping constraint, a warping loss, a Chamfer loss, and a standard GAN objective. The loss function unit 205 evaluates an overall training objective including one or more of the objectives and backpropagates the transformation and generator parameter updates to reduce the loss. The texture swapping constraint is applied to ensure that the correspondence generation system generates the same identity and image layout when the structure is fixed and only the encoded image texture 250 is modified. Given a pair of synthesized images with a shared encoded image structure 240 (derived from the latent structure vector z.sub.s) and different encoded image textures 250 (derived from the latent texture vectors z.sub.t.sub.1, z.sub.t.sub.2), the texture swapping loss [AltContent: rect].sub.t is defined as the learned perceptual image patch similarity (LPIPS) loss between the two synthesized images. 
Liu teaches at Paragraph [0061] that combining the encoder 310 with the correspondence generation system 100 along with the texture map neural network 320 and the structure map neural network 330 enables the extraction of dense correspondence from real images. Specifically, an encoder E(⋅;θ.sub.E) parameterized by θ.sub.E is introduced to map an image x to a pair of structure w.sub.s,E and texture w.sub.t,E latent codes (e.g., latent texture and latent structure vectors). The latent texture and latent structure vectors are then mapped to the encoded image texture 150 and structure 140, respectively, and input to the correspondence generation system 100 to synthesize a replica of the image. Embedding real images directly into W+ space rather than W space typically results in better reconstruction. Therefore, in an embodiment, the encoder 310 outputs latent texture vectors w.sub.t,E.sup.+ in W+ space as opposed to encoding w.sub.t,E.sup.+ in W space. During training, the modulated generator 120 may be fixed (i.e., the generator parameters 165 may be held constant) while the encoder parameters are optimized via latent consistency, reconstruction, and/or texture swapping losses. 
Liu teaches at Paragraph [0062] that the latent consistency loss may be introduced by feeding synthesized images back into the encoder 310 and matching the distribution of outputs produced by the encoder 310 to that originally produced by the correspondence generation system 100. Suppose an image is synthesized with encoded image texture 150 w.sub.t the encoded image structure 140 w.sub.s, and the correspondence map 160 C.sup.w. Inputting the synthesized image back into the encoder 310 produces the encoded image texture 150 w.sub.t,E.sup.+ and the encoded image structure 140 w.sub.s,E, and the correspondence map 160 C.sub.E).  
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Liu’s image generation network using latent consistency and texture swapping losses to have optimized the parameters of the image generation network. One of the ordinary skill in the art would have motivated to have synthesized an image by the image generation network to have optimized the texture swapping losses. 

	Re Claim 13: 
	The claim 13 encompasses the same scope of invention as that of the claim 8 except additional claim limitation that generating a comparison image based on the input latent vector;
and computing an expression loss based on the training image and the comparison image, wherein the image generation network is trained based on the expression loss.
Yamada teaches the claim limitation claim limitation that generating a comparison image based on the input latent vector; and computing an expression loss based on the training image and the comparison image, wherein the image generation network is trained based on the expression loss (Yamada teaches at Paragraph [0031] that the evaluation unit 170 updates the parameters of the image structure feature extraction unit 130, the lighting environment feature extraction unit 120, the mapping unit 140, and the feature correction unit 160 using an optimization method to minimize an error between the estimated lighting environment, the relighted image, the corrected relighted image, and the training data. In the present embodiment, the evaluation unit 170 acquires the estimated lighting environment from the lighting environment feature extraction unit 120, and acquires the relighted image and the corrected relighted image from the feature correction unit 160. Further, the evaluation unit 170 acquires the training image and the lighting environment of the training image from the data input unit 110. Furthermore, in a case where the lighting environment feature extraction unit 120 estimates the lighting environment of the input image, the evaluation unit 170 acquires the lighting environment of the input image from the data input unit 110. Then, the evaluation unit 170 calculates, from the error function, an error between the estimated lighting environment and the lighting environment of the training image or the lighting environment of the input image, and an error between each of the relighted image and the corrected relighted image and the training image. The error function uses an L1 norm or an L2 norm. In addition, as an option, the L1 norm or the L2 norm of the feature calculated by an encoder used in existing image classification such as VGG, an encoder used for identification of the same person such as ArcFace, or the like may be added for the error between each of the relighted image and the corrected relighted image and the training image. Thereafter, using an optimization method designated in any manner by the user from the calculated error, the evaluation unit 170 obtains the gradient of the parameter of each of the lighting environment feature extraction unit 120, the image structure feature extraction unit 130, the mapping unit 140, and the feature correction unit 160 to minimize these errors, and updates each parameter. At this time, the parameters may be updated such that each error is treated equally and each error is minimized on average, or the parameters may be updated such that the error to be prioritized the most is minimized by giving a weight between the errors. Note that the generation unit 150 does not update the parameter. Finally, the evaluation unit 170 passes the parameters of the deep layer generation model that has learned the training image, the input image, and the corrected relighted image to the model storage unit 180). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Liu’s image generation network using latent consistency and texture swapping losses to have optimized the parameters of the image generation network. One of the ordinary skill in the art would have motivated to have synthesized an image by the image generation network to have optimized the texture swapping losses. 
Re Claim 14: 
The claim 14 encompasses the same scope of invention as that of the claim 8 except additional claim limitation that generating a comparison image based on the input latent vector;
and computing a consistency loss based on the training image and the comparison image, wherein the image generation network is trained based on the consistency loss. 
Liu further teaches the claim limitation that generating a comparison image based on the input latent vector;
and computing a consistency loss based on the training image and the comparison image, wherein the image generation network is trained based on the consistency loss (
Liu teaches at Paragraph [0044] that the correspondence generation system may be trained with one or more objectives including a texture swapping constraint, a structure swapping constraint, a warping loss, a Chamfer loss, and a standard GAN objective. The loss function unit 205 evaluates an overall training objective including one or more of the objectives and backpropagates the transformation and generator parameter updates to reduce the loss. The texture swapping constraint is applied to ensure that the correspondence generation system generates the same identity and image layout when the structure is fixed and only the encoded image texture 250 is modified. Given a pair of synthesized images with a shared encoded image structure 240 (derived from the latent structure vector z.sub.s) and different encoded image textures 250 (derived from the latent texture vectors z.sub.t.sub.1, z.sub.t.sub.2), the texture swapping loss [AltContent: rect].sub.t is defined as the learned perceptual image patch similarity (LPIPS) loss between the two synthesized images. 
Liu teaches at Paragraph [0061] that combining the encoder 310 with the correspondence generation system 100 along with the texture map neural network 320 and the structure map neural network 330 enables the extraction of dense correspondence from real images. Specifically, an encoder E(⋅;θ.sub.E) parameterized by θ.sub.E is introduced to map an image x to a pair of structure w.sub.s,E and texture w.sub.t,E latent codes (e.g., latent texture and latent structure vectors). The latent texture and latent structure vectors are then mapped to the encoded image texture 150 and structure 140, respectively, and input to the correspondence generation system 100 to synthesize a replica of the image. Embedding real images directly into W+ space rather than W space typically results in better reconstruction. Therefore, in an embodiment, the encoder 310 outputs latent texture vectors w.sub.t,E.sup.+ in W+ space as opposed to encoding w.sub.t,E.sup.+ in W space. During training, the modulated generator 120 may be fixed (i.e., the generator parameters 165 may be held constant) while the encoder parameters are optimized via latent consistency, reconstruction, and/or texture swapping losses. 
Liu teaches at Paragraph [0062] that the latent consistency loss may be introduced by feeding synthesized images back into the encoder 310 and matching the distribution of outputs produced by the encoder 310 to that originally produced by the correspondence generation system 100. Suppose an image is synthesized with encoded image texture 150 w.sub.t the encoded image structure 140 w.sub.s, and the correspondence map 160 C.sup.w. Inputting the synthesized image back into the encoder 310 produces the encoded image texture 150 w.sub.t,E.sup.+ and the encoded image structure 140 w.sub.s,E, and the correspondence map 160 C.sub.E).  
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Liu’s image generation network using latent consistency and texture swapping losses to have optimized the parameters of the image generation network. One of the ordinary skill in the art would have motivated to have synthesized an image by the image generation network to have optimized the latent consistency losses. 
Re Claim 15: 
The claim 15 encompasses the same scope of invention as that of the claim 8 except additional claim limitation that computing a discriminator loss using a discriminator network, wherein the image generation network is trained based on the discriminator loss. 
Liu teaches the claim limitation that computing a discriminator loss using a discriminator network, wherein the image generation network is trained based on the discriminator loss (
Liu teaches at Paragraph [0045] that the structure swapping constraint is applied to encourage images that share the same encoded image texture 250 to have similarly looking textures. This consists of encouraging two images with the same encoded image texture 250 (derived from the latent texture vector z.sub.t) but different encoded image structure 240 (derived from the latent structure vectors z.sub.s.sub.1, z.sub.s.sub.2) to have similar textures. A non-saturating GAN loss based on a patch discriminator may be used to define the structure swapping loss: 
Liu teaches at Paragraph [0046] that a warping loss is defined to explicitly regularize the correspondence map produced by the coordinate warping unit 210. Given a pair of synthesized images x.sub.1=G(z.sub.s.sub.1, z.sub.t.sub.1; θ.sub.G) and x.sub.2=G(z.sub.s.sub.2, z.sub.t.sub.2; θ.sub.G), x.sub.1 is warped to the coordinate frame of x.sub.2 by transferring pixel colors according to Equation (1). In practice, Equation (1) may be relaxed with an affinity matrix to make the warping differentiable, producing a warped image x.sub.2,1.sup.w. A warping loss based on the LPIPS loss). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Liu’s image generation network using the GAN discriminator losses, latent consistency and texture swapping losses to have optimized the parameters of the image generation network. One of the ordinary skill in the art would have motivated to have synthesized an image by the image generation network to have optimized the discriminator losses. 
Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over A. P. Dherse, et al., “Scene relighting with illumination estimation in the latent space on an encoder-decoder scheme”, DOI:10.48550/arXiv.2006.02333, June 3, 2020, pp. 1-30 (hereinafter Dherse)  in view of Yamada et al. US-PGPUB No. 2024/0185391 (hereinafter Yamada) incorporating by reference Sun et al., “Single Image Portrait Relighting” at https://dl.acm.org/doi/pdf/10.1145/3306346.3323008, ACM Transactions on Graphics (TOG), Vol. 38, Issue 4, August 2019, pp. 79:1-19.12 (hereinafter Sun);  
; Zhu et al. US-PGPUB No. 2022/0335689 (hereinafter Zhu) and Ranganathan et al. US-PGPUB No. 2023/0215128 (hereinafter Ranganathan). 
Re Claim 19: 
The claim 19 encompasses the same scope of invention as that of the claim 18 except additional claim limitation that the mapping network comprises one or more fully connected layers with ReLU activation.  
However, Ranganathan teachers the claim limitation that the mapping network comprises one or more fully connected layers with ReLU activation (Ranganathan teaches at Paragraph [0047] In an example implementation of modeling system 220, the model type for encoder neural network 710 may be a convolutional neural network for image generation. Encoder neural network 710 may use convolutional layers with a Rectified Linear Unit (ReLU) activation function and a Wasserstein loss function. In this example, the model type for decoder neural network 720 may be an image classification model. Decoder neural network 720 may use convolutional layers and fully connected dense layers with a ReLU activation function and a Wasserstein loss function).  
Zhu teachers the claim limitation that the mapping network comprises one or more fully connected layers with ReLU activation (Zhu teaches at Paragraph [0005] A more efficient and practical structure of GANs was proposed by Radford et al. in “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv:151106434, 2015, which introduced deep convolutional generative adversarial networks (DCGAN) to learn a hierarchy of representations from object parts to scenes in both the generator and discriminator. The DCGAN replaces pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator), uses batch normalization in both the generator and the discriminator, removes fully connected hidden layers for deeper architectures, uses the ReLU activation function in the generator for all layers except for the output (which uses the Tanh activation function), and uses the LeakyReLU activation function in the discriminator for all layers. This DCGAN structure has become a standard implementation of GANs in general image representations and image generations.). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have recognized that fully connected dense layers inherently includes ReLU activation function. One of the ordinary skill in the art would have been motivated to have implemented the fully connected dense layers with ReLU activation function. 
Re Claim 20: 
The claim 20 encompasses the same scope of invention as that of the claim 19 except additional claim limitation that a training component comprising a lighting estimator configured to generate an estimated lighting representation of an output image, wherein the lighting loss is based on the estimated lighting representation. 
Dherse further teaches the claim limitation that a training component comprising a lighting estimator configured to generate an estimated lighting representation of an output image, wherein the lighting loss is based on the estimated lighting representation (
Dherse teaches at Section 3 generating by the deep learning network a training image based on the input latent vector of the original image I and the target lighting representation Lt. Dherse teaches at Section 3.4 computing the scene latent loss and light latent loss. 
Dherse teaches at Page 5, Section 3.1 and FIG. 10 that latent light scene split and illumination predictor with light direction and light color temperature prediction auxiliary losses and the scene is illuminated in different directions and an encoder-decoder predicts the illumination corresponding to the source image and replace it with the target lighting. Dherse teaches at Section 3.1 that the illumination space is the Cartesian product of the light color temperature set (intensities) and the light direction set and at Page 10 that the color temperature has been translated into RGB values with the use of the Color-Science Python library and we represent light direction as a Gaussian distribution over value/brightness component of the light color in HSL color space…..The other half is used for light color estimation in HSL color space. 
Dherse teaches at FIGS. 7-9 example of relighting is performed for an input I and target T repeatedly). 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at 571-270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIN CHENG WANG/Primary Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Aug 21, 2023
Application Filed
Apr 15, 2025
Non-Final Rejection — §103
May 06, 2025
Interview Requested
May 16, 2025
Applicant Interview (Telephonic)
May 16, 2025
Examiner Interview Summary
Jun 10, 2025
Response Filed
Aug 20, 2025
Final Rejection — §103
Sep 11, 2025
Interview Requested
Sep 24, 2025
Applicant Interview (Telephonic)
Oct 01, 2025
Examiner Interview Summary
Oct 15, 2025
Response after Non-Final Action
Nov 20, 2025
Request for Continued Examination
Dec 01, 2025
Response after Non-Final Action
Feb 11, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/270,926
Patent 12594883
DISPLAY DEVICE FOR DISPLAYING PATHS OF A VEHICLE
2y 5m to grant Granted Apr 07, 2026
16/703,494
Patent 12597086
Tile Region Protection in a Graphics Processing System
2y 5m to grant Granted Apr 07, 2026
18/291,702
Patent 12592012
METHOD, APPARATUS, ELECTRONIC DEVICE AND READABLE MEDIUM FOR COLLAGE MAKING
2y 5m to grant Granted Mar 31, 2026
17/655,739
Patent 12586270
GENERATING AND MODIFYING DIGITAL IMAGES USING A JOINT FEATURE STYLE LATENT SPACE OF A GENERATIVE NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
17/888,216
Patent 12579709
IMAGE SPECIAL EFFECT PROCESSING METHOD AND APPARATUS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
59%
Grant Probability
69%
With Interview (+10.3%)
3y 7m
Median Time to Grant
High
PTA Risk
Based on 832 resolved cases by this examiner. Grant probability derived from career allow rate.