Last updated: April 19, 2026
Application No. 18/636,910
DEVICE AND METHOD FOR DETERMINING AN ALBEDO AND A SHADING OF AN OBJECT

Non-Final OA §103
Filed
Apr 16, 2024
Examiner
PHAM, NHUT HUY
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Robert Bosch GmbH
OA Round
1 (Non-Final)
Interview Optional

— +26.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 53 resolved cases, 2023–2026
Examiner Intelligence

PHAM, NHUT HUY View full profile →
Grants 79% — above average
Career Allow Rate
42 granted / 53 resolved
+17.2% vs TC avg
Strong +27% interview lift
Without
With
+26.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
31 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
9.4%
-30.6% vs TC avg
§103
62.2%
+22.2% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
14.5%
-25.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 53 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The United States Patent & Trademark Office appreciates the application that is submitted by the inventor/assignee. The United States Patent & Trademark Office reviewed the following application and has made the following comments below.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/16/2024 is considered and attached.

Priority
This application claims benefit of foreign priority under 35 U.S.C. 119(a)-(d) of:
EP23168780.7, filed in EP on 04/19/2023. 

Claim Status
Claims 1-5 and 7-14 are rejected under 35 USC § 103:
Claims 1, 2, 5, 7, 11 and 14 are rejected over Wimbauer in view of Sevas.
Claim 3 is rejected over Wimbauer in view of Sevas in view of Barron.
Claim 4 is rejected over Wimbauer in view of Sevas in view of Ranjit.
Claims 8-10 are rejected over Wimbauer in view of Sevas in view of Li.
Claims 12-13 are rejected over Wang in view of Wimbauer in view of Sevas in view of Li.
Claims 2 and 6 are objected.

Claim Objections
Claim 2 is objected because of the following informalities: “the shading s determined” at line 9, Examiner believes the intended language is “the shading is determined”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 2, 5, 7, 11 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wimbauer et al. (Wimbauer, Felix, Shangzhe Wu, and Christian Rupprecht. "De-rendering 3d objects in the wild." IEEE, hereinafter Wimbauer) in view of Sevastopolskiy et al. (US-20220157014-A1, hereinafter Sevas).
CLAIM 1 
Regarding claim 1, Wimbauer teaches a computer-implemented method, comprising: training a machine learning system (Wimbauer, page 18493, section 3.2: “Our network architecture is composed of several sub networks, that predict the different shape, material, and lighting properties of an input image … we propose a training scheme, with two additional objectives that regularize the learning problem and prevent degenerate solutions”, see FIG. 2; Wimbauer teaches a method to train a machine learning model), wherein the machine learning system is configured for determining an albedo and a shading of an object (Wimbauer, abstract: “We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters”), the method training including the following steps:
obtaining a plurality of measurements (Wimbauer, page 18494, section 4.1. Datasets and Metrics), wherein each measurement from the plurality of measurements characterizes a measurement of spatial location of a point located on an object and a measurement of a color of the object at the point; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.” The Examiner notes Wimbauer use RGB images (MS-COCO) that come with corresponding depth information (point clouds), which corresponds to “plurality of measurements”)
determining, by the machine learning system (Wimbauer, page 18494, left col, Learning to De-render, see reconstructed text below), a direction of light shining on the object by using the plurality of measurements as input; (Wimbauer, page 18492, right col, Light & Material: “we model the light as a single directional light source and a global ambient light, both emitting perfectly white light. It is parameterized by ambient and directional strength samb, sdir ∈ [0,1], and a light direction                         
                            l
                        
                     ∈ SO(3)”)

    PNG
    media_image1.png
    349
    710
    media_image1.png
    Greyscale

determining surface normal vectors at the measurements of spatial locations; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds … we use the Point Cloud Library [42] to compute surface normals from the point clouds”)
determining, by the machine learning system, a shading of the object based on the determined surface normal vectors and the determined direction of the light; (Wimbauer, page 18492, section 3.1, subsection Shape and Light & Material. Surface normals and light direction are used to compute shading map; see modified FIG. 2 below)

    PNG
    media_image2.png
    236
    794
    media_image2.png
    Greyscale

determining, by the machine learning system, an albedo by using the plurality of measurements as input; (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A)

    PNG
    media_image3.png
    490
    1692
    media_image3.png
    Greyscale
determining a reconstruction of colors of the plurality of measurements based on the determined shading and the determined albedo; (Wimbauer, page 18492 - 18493, equation (2); see Reconstruction                         
                            
                                
                                    I
                                
                                ^
                            
                        
                     image in modified fig. 2 below) and

training the machine learning system based on a first loss function, (Wimbauer, page 18494, fourth paragraph: “we apply a reconstruction loss between the rendered and the input image to train our model to capture all local details in the decomposition. Specifically, this loss term is computed from the combination of a per-pixel L1 loss and the patch-based structural similarity score SSIM(I,                         
                            
                                
                                    I
                                
                                ^
                            
                        
                    )”)
Wimbauer does not explicitly disclose the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements.
Sevas is in the same field of art of intrinsic decomposition of point cloud data. Further, Sevas teaches the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements. (Sevas, ¶ [0038-0039 and 0072]: “The albedo color matching loss may be calculated as a mismatch between the median reference texture (from S115) and the predicted albedo (from S105) resampled in the texture space by the bilinear interpolation”, see reconstructed text below)

    PNG
    media_image4.png
    175
    915
    media_image4.png
    Greyscale

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wimbauer by incorporating color matching loss between predicted albedo and ground truth that is taught by Sevas, to make a machine learning model that predict albedo from image data with correct gamma value; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to get the correct gamma value for predicted albedo (Sevas, ¶ [0072]: “To select the gamma for albedo, the albedo color matching loss may be introduced”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 2

    PNG
    media_image1.png
    349
    710
    media_image1.png
    Greyscale
Regarding Claim 2, the combination of Wimbauer and Sevas teaches the method of Claim 1. In addition, the combination of Wimbauer and Sevas teaches the albedo is determined by a providing the plurality of measurements as input to a first part of the machine learning system and providing an output of the first part as the albedo (Wimbauer, page 18494, left col, Learning to De-render, see reconstructed text below, green highlight), and/or (***The Examiner notes since a listing with “or” is disjunctive, any one of the elements found in the prior art is sufficient to reject the claim.  While citations have been provided for completeness and rapid prosecution, only one element is required.)

the direction of light is determined by providing the plurality of measurements to a second part of the machine learning system and providing an output of the second part as the direction of the light (Wimbauer, page 18494, left col, Learning to De-render, see reconstructed text above, red highlight), and/or (***The Examiner notes since a listing with “or” is disjunctive, any one of the elements found in the prior art is sufficient to reject the claim.  While citations have been provided for completeness and rapid prosecution, only one element is required.)
the shading is determined by providing the determined surface normal vectors and the determined direction of the light to a trainable shader (Wimbauer, page 18494, left col: “We train our model using complementary losses on the decomposition and on the rendered image... The loss is computed using the (precomputed) coarse shape, albedo, and light information as pseudo supervision.” The Examiner notes a trainable/learnable model from loss function to predict shading map corresponds to “trainable shader”) and providing an output of the trainable shader as the shading. (Wimbauer, page 18492, section 3.1, subsection Shape and Light & Material. Surface normals and light direction are used to compute shading map; see modified FIG. 2 below)

    PNG
    media_image2.png
    236
    794
    media_image2.png
    Greyscale


CLAIM 5
Regarding Claim 5, the combination of Wimbauer and Sevas teaches the method of Claim 2. In addition, the combination of Wimbauer and Sevas teaches the second part and/or the trainable shader are additionally trained based on a second loss function, wherein the second loss function includes:
a term characterizing a difference between the determined light direction and a desired light direction (Wimbauer, page 18494, left col, third paragraph, see reconstructed text below. A loss function regards difference between precomputed light information and predicted light information is computed, light information includes direction of the light), 

    PNG
    media_image5.png
    300
    604
    media_image5.png
    Greyscale

and/or (***The Examiner notes since a listing with “or” is disjunctive, any one of the elements found in the prior art is sufficient to reject the claim.  While citations have been provided for completeness and rapid prosecution, only one element is required.) a term characterizing a difference between the determined shading and a desired shading.

CLAIM 7
Regarding Claim 7, the combination of Wimbauer and Sevas teaches the method of Claim 1. In addition, the combination of Wimbauer and Sevas teaches determining an albedo and a shading of a first object using the trained machine learning system. (Wimbauer, page 18496, Figure 4. Qualitative results: “Every row contains the input image Iin, predicted albedo A and normals N, diffuse shading map Idiff, specular shading map Ispec and reconstructed image                                 
                                    
                                        
                                            I
                                        
                                        ^
                                    
                                
                            ”. Wimbauer teaches use the trained model with multiple datasets, results are shown in fig. 4)

CLAIM 11 
Regarding claim 11, Wimbauer teaches training a machine learning system (Wimbauer, page 18493, section 3.2: “Our network architecture is composed of several sub networks, that predict the different shape, material, and lighting properties of an input image … we propose a training scheme, with two additional objectives that regularize the learning problem and prevent degenerate solutions”, see FIG. 2; Wimbauer teaches training a machine learning model), wherein the machine learning system is configured for determining an albedo and a shading of an object (Wimbauer, abstract: “We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters”), the method training including the following steps:
obtaining a plurality of measurements (Wimbauer, page 18494, section 4.1. Datasets and Metrics), wherein each measurement from the plurality of measurements characterizes a measurement of spatial location of a point located on an object and a measurement of a color of the object at the point; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.” The Examiner notes Wimbauer use RGB images (MS-COCO) that come with corresponding depth information (point clouds), which corresponds to “plurality of measurements”)
determining, by the machine learning system (Wimbauer, page 18494, left col, Learning to De-render, see reconstructed text below), a direction of light shining on the object by using the plurality of measurements as input; (Wimbauer, page 18492, right col, Light & Material: “we model the light as a single directional light source and a global ambient light, both emitting perfectly white light. It is parameterized by ambient and directional strength samb, sdir ∈ [0,1], and a light direction                         
                            l
                        
                     ∈ SO(3)”)

    PNG
    media_image1.png
    349
    710
    media_image1.png
    Greyscale

determining surface normal vectors at the measurements of spatial locations; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds … we use the Point Cloud Library [42] to compute surface normals from the point clouds”)

    PNG
    media_image2.png
    236
    794
    media_image2.png
    Greyscale
determining, by the machine learning system, a shading of the object based on the determined surface normal vectors and the determined direction of the light; (Wimbauer, page 18492, section 3.1, subsection Shape and Light & Material. Surface normals and light direction are used to compute shading map; see modified FIG. 2 below)

determining, by the machine learning system, an albedo by using the plurality of measurements as input; (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A)

    PNG
    media_image3.png
    490
    1692
    media_image3.png
    Greyscale
determining a reconstruction of colors of the plurality of measurements based on the determined shading and the determined albedo; (Wimbauer, page 18492 - 18493, equation (2); see Reconstruction                         
                            
                                
                                    I
                                
                                ^
                            
                        
                     image in modified fig. 2 below) and

training the machine learning system based on a first loss function, (Wimbauer, page 18494, fourth paragraph: “we apply a reconstruction loss between the rendered and the input image to train our model to capture all local details in the decomposition. Specifically, this loss term is computed from the combination of a per-pixel L1 loss and the patch-based structural similarity score SSIM(I,                         
                            
                                
                                    I
                                
                                ^
                            
                        
                    )”)
Wimbauer does not explicitly disclose a training system configured to train a machine learning system; the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements.
Sevas is in the same field of art of intrinsic decomposition of point cloud data. Further, Sevas teaches a training system configured to train a machine learning system (Sevas, ¶ [0076-0077]: “a computing device …  Memory 50.3 is configured to store processor-executable instructions instructing the computing device 50 to perform any step or substep of the disclosed method, as well as weights of the deep neural network, latent descriptors, and auxiliary parameters obtained during the training stage”); the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements. (Sevas, ¶ [0038-0039 and 0072]: “The albedo color matching loss may be calculated as a mismatch between the median reference texture (from S115) and the predicted albedo (from S105) resampled in the texture space by the bilinear interpolation”, see reconstructed text below)

    PNG
    media_image4.png
    175
    915
    media_image4.png
    Greyscale

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wimbauer by incorporating color matching loss between predicted albedo and ground truth that is taught by Sevas, to make a machine learning model that predict albedo from image data with correct gamma value; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to get the correct gamma value for predicted albedo (Sevas, ¶ [0072]: “To select the gamma for albedo, the albedo color matching loss may be introduced”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 14
Regarding claim 14, Wimbauer teaches training a machine learning system (Wimbauer, page 18493, section 3.2: “Our network architecture is composed of several sub networks, that predict the different shape, material, and lighting properties of an input image … we propose a training scheme, with two additional objectives that regularize the learning problem and prevent degenerate solutions”, see FIG. 2; Wimbauer teaches training a machine learning model), wherein the machine learning system is configured for determining an albedo and a shading of an object (Wimbauer, abstract: “We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters”), the method training including the following steps:
obtaining a plurality of measurements (Wimbauer, page 18494, section 4.1. Datasets and Metrics), wherein each measurement from the plurality of measurements characterizes a measurement of spatial location of a point located on an object and a measurement of a color of the object at the point; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.” The Examiner notes Wimbauer use RGB images (MS-COCO) that come with corresponding depth information (point clouds), which corresponds to “plurality of measurements”)
determining, by the machine learning system (Wimbauer, page 18494, left col, Learning to De-render, see reconstructed text below), a direction of light shining on the object by using the plurality of measurements as input; (Wimbauer, page 18492, right col, Light & Material: “we model the light as a single directional light source and a global ambient light, both emitting perfectly white light. It is parameterized by ambient and directional strength samb, sdir ∈ [0,1], and a light direction                         
                            l
                        
                     ∈ SO(3)”)

    PNG
    media_image1.png
    349
    710
    media_image1.png
    Greyscale

determining surface normal vectors at the measurements of spatial locations; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds … we use the Point Cloud Library [42] to compute surface normals from the point clouds”)

    PNG
    media_image2.png
    236
    794
    media_image2.png
    Greyscale
determining, by the machine learning system, a shading of the object based on the determined surface normal vectors and the determined direction of the light; (Wimbauer, page 18492, section 3.1, subsection Shape and Light & Material. Surface normals and light direction are used to compute shading map; see modified FIG. 2 below)

determining, by the machine learning system, an albedo by using the plurality of measurements as input; (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A)
determining a reconstruction of colors of the plurality of measurements based on the determined shading and the determined albedo; (Wimbauer, page 18492 - 18493, equation (2); see Reconstruction                         
                            
                                
                                    I
                                
                                ^
                            
                        
                     image in modified fig. 2 below) and

    PNG
    media_image3.png
    490
    1692
    media_image3.png
    Greyscale

training the machine learning system based on a first loss function, (Wimbauer, page 18494, fourth paragraph: “we apply a reconstruction loss between the rendered and the input image to train our model to capture all local details in the decomposition. Specifically, this loss term is computed from the combination of a per-pixel L1 loss and the patch-based structural similarity score SSIM(I,                         
                            
                                
                                    I
                                
                                ^
                            
                        
                    )”)
Wimbauer does not explicitly disclose a non-transitory machine readable storage medium on which is stored a computer program for training a machine learning system; the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements.
Sevas is in the same field of art of intrinsic decomposition of point cloud data. Further, Sevas teaches a non-transitory machine readable storage medium on which is stored a computer program for training a machine learning system (Sevas, ¶ [0076-0077]: “Memory 50.3 is configured to store processor-executable instructions instructing the computing device 50 to perform any step or substep of the disclosed method, as well as weights of the deep neural network, latent descriptors, and auxiliary parameters obtained during the training stage”); the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements. (Sevas, ¶ [0038-0039 and 0072]: “The albedo color matching loss may be calculated as a mismatch between the median reference texture (from S115) and the predicted albedo (from S105) resampled in the texture space by the bilinear interpolation”, see reconstructed text below)

    PNG
    media_image4.png
    175
    915
    media_image4.png
    Greyscale

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wimbauer by incorporating color matching loss between predicted albedo and ground truth that is taught by Sevas, to make a machine learning model that predict albedo from image data with correct gamma value; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to get the correct gamma value for predicted albedo (Sevas, ¶ [0072]: “To select the gamma for albedo, the albedo color matching loss may be introduced”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 3
Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wimbauer in view of Sevas, and further in view of Barron et al. (Barron, Jonathan T., and Jitendra Malik. "Shape, illumination, and reflectance from shading." IEEE, published 2014, hereinafter Barron)
Regarding Claim 3, the combination of Wimbauer and Sevas teaches method of Claim 2.
The combination of Wimbauer and Sevas does not explicitly disclose the training of the machine learning system based on the first loss function is achieved by updating parameters of the first part and/or the second part and/or the trainable shader according to a negative gradient of a loss value determined from the first loss function with respect to the parameters.
Barron is in the same field of art of intrinsic image decomposition. Further, Barron teaches the training of the machine learning system based on the first loss function is achieved by updating parameters of the first part and/or the second part and/or the trainable shader according to a negative gradient of a loss value determined from the first loss function with respect to the parameters. (Barron, page 1673, section 4.1, see reconstructed text below. The Examiner notes the Gradient of Negative Log-Likelihood Loss corresponds to “a negative gradient of a loss value”)

    PNG
    media_image6.png
    540
    679
    media_image6.png
    Greyscale

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wimbauer and Sevas by incorporating Gaussian mixture model that is taught by Barron, to make a machine learning model that predict albedo with piecewise constancy ; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to maintain constancy between neighbor pixel in predicted albedo/reflectance image (Barron, page 1673, section 4.1: “The reflectance images of natural objects tend to be piece wise constant—or equivalently, variation in reflectance images tends to be small and sparse”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 4
Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wimbauer in view of Sevas, and further in view of Ranjit et al. (Ranjit, S. Sharan, and Raj K. Jaiswal. "Intrinsic decomposition with deep supervision from a single image." Journal of King Saud University-Computer and Information Sciences 34.10, published 2022, hereinafter Ranjit).
Regarding Claim 4, the combination of Wimbauer and Sevas teaches the method of Claim 1. In addition, the combination of Wimbauer and Sevas teaches the first loss function further includes a term that characterizes a cross correlation loss between the determined albedo and the desired albedo. (Sevas, ¶ [0038-0039 and 0072]: “The albedo color matching loss may be calculated as a mismatch between the median reference texture (from S115) and the predicted albedo (from S105) resampled in the texture space by the bilinear interpolation”, see reconstructed text below.)

    PNG
    media_image4.png
    175
    915
    media_image4.png
    Greyscale

The combination of Wimbauer and Sevas does not explicitly disclose the first loss function further includes a term that characterizes a difference of gradients of the determined albedo and gradients of a desired albedo.
Ranjit is in the same field of art of intrinsic decomposition. Further, Ranjit teaches the first loss function further includes a term that characterizes a difference of gradients of the determined albedo and gradients of a desired albedo. (Ranjit, page 8650, section 5.2, see 
    PNG
    media_image7.png
    373
    828
    media_image7.png
    Greyscale
reconstructed text below)

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wimbauer and Sevas by incorporating gradient loss that is taught by Ranjit, to make a machine learning model that predict albedo from image data with sharper details; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to preserve details when predicting albedo (Ranjit, page 8650, section 5.2 : “To obtain sharper edges in albedo map, L1 loss is defined over gradient(G) of the image”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.


Claim(s) 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wimbauer in view of Sevas and further in view of Li et al. (Li, Maohui, Zhuoqun Fang, and Senxiang Lu. "An accurate object detector with effective feature extraction by intrinsic prior knowledge." IEEE, published 2020, hereinafter Li).
CLAIM 8
Regarding Claim 8, the combination of Wimbauer and Sevas teaches the method of Claim 1. In addition, the combination of Wimbauer and Sevas teaches obtaining a plurality of first measurements (Wimbauer, page 18494, section 4.1. Datasets and Metrics), wherein each measurement from the plurality of first measurements characterizes a first measurement of spatial location of a point located on a first object and a first measurement of a color of the first object at the point (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.”); determining a first albedo using the trained machine learning system (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A); determining first surface normal vectors at the first measurements of spatial locations (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds … we use the Point Cloud Library [42] to compute surface normals from the point clouds”); selecting a desired lighting direction; determining, by the trained machine learning system, a first shading based on the determined first surface normal vectors (Wimbauer, page 18492, section 3.1, subsection Shape and Light & Material. Surface normals and light direction are used to compute shading map) and the desired direction of the light (Wimbauer, page 18494, left col, last paragraph: “we render two images in each forward pass: one with the predicted lighting conditions, denoted as                                 
                                    
                                        
                                            I
                                        
                                        ^
                                    
                                
                            , which is also used in the reconstruction loss term, and one with randomly sampled lighting conditions, denoted as                                 
                                    
                                        
                                            I
                                        
                                        ^
                                    
                                
                            ’.” Wimbauer teaches randomly selects a light condition); determining an image based on the determined first albedo and the determined first shading; (Wimbauer, FIG. 2, see Rand. relit                                 
                                    
                                        
                                            I
                                        
                                        ^
                                    
                                
                            ’ image)
The combination of Wimbauer and Sevas does not explicitly disclose creating a training dataset including images for training an image classifier, including: adding the image to the training dataset.
Li is in the same field of art of intrinsic image decomposition. Further, Li teaches creating a training dataset including images for training an image classifier (Li, page 130609, right col, CoReSh NETWORK: “The prediction of our network includes the classification of detected objects and the coordinates of bounding boxes.”), including: adding the image to the training dataset.  (Li, page 130609, right col, CoReSh NETWORK: “Firstly, we use the autoencoder network to decompose an original image into the corresponding intrinsic image. After that, in order to eliminate the difference of data distribution between the original image and the intrinsic image, it is necessary to preprocess the data: adjust it into a matrix with a suitable structure and then input to our network…”, see FIG. 2. Li teaches to generate new training image for a detector network using intrinsic image decomposition)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wimbauer and Sevas by incorporating system to generate training images that is taught by Li, to make a machine learning model that can decompose images into intrinsic components, and generate training data for a detector network based on said components; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to introducing intrinsic priors to image classification to improve performance (Li, page 130614, section V: “ we believe that introducing intrinsic prior knowledge to image classification and semantic segmentation tasks will also improve performance”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 9
Regarding Claim 9, the combination of Wimbauer and Sevas teaches the method of Claim 1. In addition, the combination of Wimbauer and Sevas teaches obtaining a training image and spatial locations for pixels of the training image (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.”); determining a first albedo by providing the pixels and the corresponding special locations as input to the trained machine learning system (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A); 
The combination of Wimbauer and Sevas does not explicitly disclose training an image classifier including the following steps: training the image classifier using the first albedo as input to the image classifier.
Li is in the same field of art of intrinsic image decomposition. Further, Li teaches training an image classifier (Li, page 130609, right col, CoReSh NETWORK: “Firstly, we use the autoencoder network to decompose an original image into the corresponding intrinsic image. After that, in order to eliminate the difference of data distribution between the original image and the intrinsic image, it is necessary to preprocess the data: adjust it into a matrix with a suitable structure and then input to our network … The prediction of our network includes the classification of detected objects and the coordinates of bounding boxes.””, see FIG. 2.) including the following steps: training the image classifier using the first albedo as input to the image classifier. (Li, page 130611, section F: “the input of CoReSh network consists of three images (color image, reflectance image and shading image)”)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wimbauer and Sevas by incorporating system to generate training images that is taught by Li, to make a machine learning model that can decompose images into intrinsic components, and generate training data for a detector network based on said components; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to intrinsic decomposition with image classification to improve performance of classification task (Li, page 130614, section V: “ we believe that introducing intrinsic prior knowledge to image classification and semantic segmentation tasks will also improve performance”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 10
Regarding Claim 10, the combination of Wimbauer, Sevas and Li teaches the method of Claim 9. In addition, the combination of Wimbauer, Sevas and Li teaches classifying an image (Li, page 130613, section IV: “The experiment chose Pascal VOC as the data set … Table 3 lists the detection precision of 20 categories using our network and the ordinary deep learning-based detection network.” Li discloses using his detector network on the Pascal VOC dataset, results are summarized in table 3) including:
obtaining an image and spatial locations for pixels of the image (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.”);
determining a second albedo by providing the pixels and the corresponding special locations as input to the trained machine learning system (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A); and
classifying the image by using the determined second albedo as input to the trained image classifier. (Li, page 130611-130612, section F: “The detection part of our network uses end-to-end structure. As shown in Fig. 4, the input of CoReSh network consists of three images (color image, reflectance image and shading image) … After the feature map is operated by non-maximum suppression, the position of the bounding box and the category of the target can be obtained by simple mathematical transformation on the vector if they contain the detected target”, see FIG. 2.)


Claim(s) 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US20240020897A1, filed 2022, hereinafter Wang) in view of Wimbauer in view of Sevas, and further in view of Li. 
CLAIM 12
Regarding Claim 12, Wang teaches a control system (Wang, ¶ [0154-0155]: “system(s) on chip(s) (“SoC(s)”) … vehicle may include any number of SoCS. In at least one embodiment, each of SoCS may include, without limitation, central processing units (“CPU(s)”), graphics processing units (“GPU(s)”), processor(s), cache(s), accelerator(s), data store(s), and/or other components and features not illustrated. In at least one embodiment, SoC(s) may be used to control vehicle in a variety of platforms and systems”); determine a control signal based on the classification of the image (Wang, ¶ [0193]: “a CNN for facial recognition and vehicle owner identification may use data from camera sensors to identify presence of an authorized driver and/or owner of vehicle  … to unlock a vehicle when an owner approaches a driver door and turns on lights, and, in a security mode, to disable such vehicle when an owner leaves such vehicle... In this way, SoC(s) provide for security against theft and/or carjacking” Wang teaches using machine learning model to perform image classification, and control the vehicle based on classification results), wherein the control signal is configured to control an actuator (Wang, ¶ [0134-0135]: “…system on chips (“SoCS”) … provide signals (e.g., representative of commands) to one or more components and/or systems of vehicle. For instance, in at least one embodiment, controller(s) may send signals to operate vehicle brakes via brake actuator(s), to operate steering system via steering actuator(s)”.) and/or a display. (Wang, ¶ [0137]: “provide outputs (e.g., represented by output data, display data, etc.) via a human-machine interface (“HMI”) display”)
	Wang does not explicitly disclose machine learning system is trained by: obtaining a plurality of measurements, wherein each measurement from the plurality of measurements characterizes a measurement of spatial location of a point located on an object and a measurement of a color of the object at the point, determining, by the machine learning system, a direction of light shining on the object by using the plurality of measurements as input, determining surface normal vectors at the measurements of spatial locations, determining, by the machine learning system, a shading of the object based on the determined surface normal vectors and the determined direction of the light, determining, by the machine learning system, an albedo by using the plurality of measurements as input, determining a reconstruction of colors of the plurality of measurements based on the determined shading and the determined albedo, and training the machine learning system based on a first loss function, wherein the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements.

Wimbauer in the same field of art of intrinsic image decomposition. Further, Wimbauer teaches a machine learning system is trained (Wimbauer, page 18493, section 3.2: “Our network architecture is composed of several sub networks, that predict the different shape, material, and lighting properties of an input image … we propose a training scheme, with two additional objectives that regularize the learning problem and prevent degenerate solutions”, see FIG. 2; Wimbauer teaches a method to train a machine learning model) by:
obtaining a plurality of measurements (Wimbauer, page 18494, section 4.1. Datasets and Metrics), wherein each measurement from the plurality of measurements characterizes a measurement of spatial location of a point located on an object and a measurement of a color of the object at the point; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.” The Examiner notes Wimbauer use RGB images (MS-COCO) that come with corresponding depth information (point clouds))
determining, by the machine learning system (Wimbauer, page 18494, left col, Learning to De-render, see reconstructed text below), a direction of light shining on the object by using the plurality of measurements as input; (Wimbauer, page 18492, right col, Light & Material: “we model the light as a single directional light source and a global ambient light, both emitting perfectly white light. It is parameterized by ambient and directional strength samb, sdir ∈ [0,1], and a light direction                         
                            l
                        
                     ∈ SO(3)”)

    PNG
    media_image1.png
    349
    710
    media_image1.png
    Greyscale

determining surface normal vectors at the measurements of spatial locations; (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds … we use the Point Cloud Library [42] to compute surface normals from the point clouds”)

    PNG
    media_image2.png
    236
    794
    media_image2.png
    Greyscale
determining, by the machine learning system, a shading of the object based on the determined surface normal vectors and the determined direction of the light; (Wimbauer, page 18492, section 3.1, subsection Shape and Light & Material. Surface normals and light direction are used to compute shading map; see modified FIG. 2 below)

determining, by the machine learning system, an albedo by using the plurality of measurements as input; (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A)

    PNG
    media_image3.png
    490
    1692
    media_image3.png
    Greyscale
determining a reconstruction of colors of the plurality of measurements based on the determined shading and the determined albedo; (Wimbauer, page 18492 - 18493, equation (2); see Reconstruction                         
                            
                                
                                    I
                                
                                ^
                            
                        
                     image in modified fig. 2 below) and

training the machine learning system based on a first loss function, (Wimbauer, page 18494, fourth paragraph: “we apply a reconstruction loss between the rendered and the input image to train our model to capture all local details in the decomposition. Specifically, this loss term is computed from the combination of a per-pixel L1 loss and the patch-based structural similarity score SSIM(I,                         
                            
                                
                                    I
                                
                                ^
                            
                        
                    )”)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by incorporating the training system that is taught by Wimbauer, to make a system to train a machine learning model training to perform intrinsic image decomposition; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve the performance of image decomposition task (Wimbauer, page 18491, right col, first paragraph: “We show that our model produces accurate and convincing image decompositions that improve the state of the art and even generalizes beyond the categories of objects it was trained on”).
The combination of Wang and Wimbauer does not explicitly disclose the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements.
Sevas is in the same field of art of intrinsic decomposition of point cloud data. Further, Sevas teaches the first loss function includes a term characterizing a difference between the colors of the plurality of measurements and the reconstruction of the colors of the plurality of measurements. (Sevas, ¶ [0038-0039 and 0072]: “The albedo color matching loss may be calculated as a mismatch between the median reference texture (from S115) and the predicted albedo (from S105) resampled in the texture space by the bilinear interpolation”, see reconstructed text below)

    PNG
    media_image4.png
    175
    915
    media_image4.png
    Greyscale

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang and Wimbauer by incorporating color matching loss between predicted albedo and ground truth that is taught by Sevas, to make a machine learning model that predict albedo from image data with correct gamma value; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to get the correct gamma value for predicted albedo (Sevas, ¶ [0072]: “To select the gamma for albedo, the albedo color matching loss may be introduced”).
The combination of Wang, Wimbauer and Sevas teaches classifying an image (Wang, ¶ [0174]: “ a CNN for facial recognition and vehicle owner identification may use data from camera sensors to identify presence of an authorized driver and/or owner of vehicle  … to unlock a vehicle when an owner approaches a driver door and turns on lights, and, in a security mode, to disable such vehicle when an owner leaves such vehicle...”. The combination of Wang, Wimbauer and Sevas does not explicitly disclose classifying the image by using the determined second albedo as input to the trained image classifier. 
Li is in the same field of art of intrinsic image decomposition and image classification. Further, Li teaches classifying the image by using the determined second albedo as input to the trained image classifier. (Li, page 130611-130612, section F: “The detection part of our network uses end-to-end structure. As shown in Fig. 4, the input of CoReSh network consists of three images (color image, reflectance image and shading image) … After the feature map is operated by non-maximum suppression, the position of the bounding box and the category of the target can be obtained by simple mathematical transformation on the vector if they contain the detected target”, see FIG. 2.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, Wimbauer and Sevas by incorporating image classification system that is taught by Li, to make a machine learning model that can decompose images into intrinsic components, and used said components for image classification; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to combine intrinsic decomposition with image classification to improve performance of classification task (Li, page 130614, section V: “ we believe that introducing intrinsic prior knowledge to image classification and semantic segmentation tasks will also improve performance”).
The combination of Wang, Wimbauer, Sevas and Li then teaches classifying an image including:
obtaining an image and spatial locations for pixels of the image (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.”);
determining a second albedo by providing the pixels and the corresponding special locations as input to the trained machine learning system (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A); and
classifying the image by using the determined second albedo as input to the trained image classifier. (Li, page 130611-130612, section F: “The detection part of our network uses end-to-end structure. As shown in Fig. 4, the input of CoReSh network consists of three images (color image, reflectance image and shading image) … After the feature map is operated by non-maximum suppression, the position of the bounding box and the category of the target can be obtained by simple mathematical transformation on the vector if they contain the detected target”, see FIG. 2.)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 13
Regarding Claim 13, the combination of Wang, Wimbauer, Sevas and Li teaches the system of Claim 12. In addition, the combination of Wang, Wimbauer, Sevas and Li teaches the image classifier is trained (Li, page 130609, right col, CoReSh NETWORK: “Firstly, we use the autoencoder network to decompose an original image into the corresponding intrinsic image. After that, in order to eliminate the difference of data distribution between the original image and the intrinsic image, it is necessary to preprocess the data: adjust it into a matrix with a suitable structure and then input to our network … The prediction of our network includes the classification of detected objects and the coordinates of bounding boxes.””, see FIG. 2.) by: obtaining a training image and spatial locations for pixels of the training image (Wimbauer, page 18494, section 4.1: “Co3D [40] is a collection of nearly 19,000 videos capturing objects from 50 MS-COCO [30] categories, that come with per-frame depth, camera pose data, and reconstructed sparse point clouds.”); determining a first albedo by providing the pixels and the corresponding special locations as input to the trained machine learning system (Wimbauer, page 18494, left col, Learning to De-render: “The albedo network Φalbedo predicts the albedo map A, Au ∈ [0,1]”; See fig. 2, albedo A); training the image classifier using the first albedo as input to the image classifier. (Li, page 130611, section F: “the input of CoReSh network consists of three images (color image, reflectance image and shading image)”)

Allowable Subject Matter
Claim 6 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The closest prior arts for Claim 6 are:
Wimbauer et al. (Wimbauer, Felix, Shangzhe Wu, and Christian Rupprecht. "De-rendering 3d objects in the wild." IEEE) teaches a system to train a machine learning model to perform intrinsic image decomposition. The system capitalizes on the coarse 3D shape reconstructions obtained from unsupervised methods and learns to predict a refined shape as well as further decomposes the material into albedo and specular components. 
Sevastopolskiy et al. (US-20220157014-A1) teaches a method for rendering a relighted 3D portrait of a person is provided, the method including: receiving an input defining a camera viewpoint and lighting conditions, rasterizing latent descriptors of a 3D point cloud at different resolutions based on the camera viewpoint to obtain rasterized images; processing the rasterized images with a deep neural network to predict albedo, normals, environmental shadow maps, and a segmentation mask for the received camera viewpoint, and fusing the predicted albedo, normals, environmental shadow maps, and segmentation mask into a relighted 3D portrait based on the lighting conditions.
While both Wimbauer and Sevastopolskiy teach training machine learning model to perform intrinsic image decomposition. Neither Wimbauer, or Sevastopolskiy, nor the combination teaches: 
“the second part and the trainable shader are trained based on the second loss function in a first stage and the first part is then trained based on the first loss function in a subsequent second stage.”

Pertinent Arts
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Han et al. (Han, Guangyun, et al. "Learning an intrinsic image decomposer using synthesized RGB-D dataset." IEEE, published 2018), which directed to a physically based renderer is used to generate color images and their underlying ground-truth albedo and shading from three-dimensional models. The model supports both RGB and RGB-D as input, and it employs both high-level and low-level features to avoid blurry outputs. Experimental results verify the effectiveness of our model on realistic images.




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NHUT HUY (JEREMY) PHAM whose telephone number is (703)756-5797. The examiner can normally be reached Mo - Fr. 8:30am - 6pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O'Neal Mistry can be reached on (313)446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NHUT HUY PHAM/Examiner, Art Unit 2674            
/Ross Varndell/Primary Examiner, Art Unit 2674
Read full office action
Prosecution Timeline

Apr 16, 2024
Application Filed
Feb 27, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/925,903
Patent 12598397
DIRT DETECTION METHOD AND DEVICE FOR CAMERA COVER
2y 5m to grant Granted Apr 07, 2026
17/990,310
Patent 12598074
FACIAL RECOGNITION METHOD AND APPARATUS, DEVICE, AND MEDIUM
2y 5m to grant Granted Apr 07, 2026
17/992,917
Patent 12597254
TRACKING OPERATING ROOM PHASE FROM CAPTURED VIDEO OF THE OPERATING ROOM
2y 5m to grant Granted Apr 07, 2026
18/125,767
Patent 12592087
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/973,627
Patent 12579622
METHOD AND APPARATUS FOR PROCESSING IMAGE SIGNAL, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+26.8%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 53 resolved cases by this examiner. Grant probability derived from career allow rate.
DEVICE AND METHOD FOR DETERMINING AN ALBEDO AND A SHADING OF AN OBJECT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email