DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation - 35 USC § 101
The limitation “generating, by the processing device, a synthesized image having a reconstruction of the scene based on the set of lighting conditions using the view-independent radiance and the view-dependent radiance” of claim 1 is considered a practical application of generating a relighted synthetic image view, based on lighting conditions and radiance.
The limitation “generating, by the processing device, a synthesized image having a reconstruction of the scene using the radiance ” of claim 11 is considered a practical application of generating a relighted synthetic image view, based on radiance.
The limitation “generating a synthesized image from the reconstruction using the diffuse radiance and specular radiance” of claim 15 is considered a practical application of generating a relighted synthetic image view, based on radiance.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1, 8 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Thies et al. (“Image-Guided Neural Object Rendering”, 2020.)(Hereinafter referred to as Thies).
Regarding claim 1, Thies teaches A method (We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis. See abstract) comprising:
receiving, by a processing device, a plurality of digital images that depict a scene from multiple perspectives (The view dependent effects are subtracted from the original images to get the diffuse images that can be re projected into the target image space. See figure 1 caption and figure 1);
determining, by the processing device, a view-independent radiance of the scene based on the plurality of digital images (A main contribution of our work is a convolutional neural network that learns the disentanglement of view-dependent and view-independent effects in a self-supervised manner (see Fig. 2). Since our training data consists of a series of images taken from different viewing directions, assuming constant illumination, the reflected radiance of two corresponding points in two different images only differs by the view-dependent effects. Our self-supervised training procedure is based on a Siamese network that gets a pair of randomly selected images from the training set as input. The task of the network is to extract view-dependent lighting effects from an image, based on the geometric information from the proxy geometry. See section 5, EffectsNet, page 4);
determining, by the processing device, a view-dependent radiance of the scene based on the plurality of digital images (A main contribution of our work is a convolutional neural network that learns the disentanglement of view-dependent and view-independent effects in a self-supervised manner (see Fig. 2). Since our training data consists of a series of images taken from different viewing directions, assuming constant illumination, the reflected radiance of two corresponding points in two different images only differs by the view-dependent effects. Our self-supervised training procedure is based on a Siamese network that gets a pair of randomly selected images from the training set as input. The task of the network is to extract view-dependent lighting effects from an image, based on the geometric information from the proxy geometry. See section 5, EffectsNet, page 4) (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. See page 3, first paragraph );
determining, by the processing device, a set of lighting conditions associated with an input perspective (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. This allows us to remove view-dependent effects from the input images, resulting in images that contain view-independent appearance information of the object. This view-independent information can be projected into a novel view using the reconstructed geometry, where new view-dependent effects can be added. See page 2, first paragraph);
generating, by the processing device, a synthesized image having a reconstruction of the scene based on the set of lighting conditions using the view-independent radiance and the view-dependent radiance (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. This allows us to remove view-dependent effects from the input images, resulting in images that contain view-independent appearance information of the object. This view-independent information can be projected into a novel view using the reconstructed geometry, where new view-dependent effects can be added. See page 2, first paragraph) (Since CompositionNet is trained to generate photo-realistic output images, it is resolving reprojection errors as well as filling regions where no image content is available. We demonstrate the effectiveness of our algorithm using synthetic and real data, and compare to classical computer graphics and learned approaches. See page 2, first paragraph)(To summarize, we propose a novel neural image-guided rendering method, a hybrid between classical image-based rendering and machine learning. The core contribution is the explicit handling of view-dependent effects in the source and the target views using EffectsNet that can be learned in a self-supervised fashion. The composition of the reprojected views to a final output image without the need of hand-crafted blending schemes is enabled using our network called CompositionNet. See page 2, second paragraph);
and outputting, by the processing device, the synthesized image (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. This allows us to remove view-dependent effects from the input images, resulting in images that contain view-independent appearance information of the object. This view-independent information can be projected into a novel view using the reconstructed geometry, where new view-dependent effects can be added. See page 2, first paragraph) (Since CompositionNet is trained to generate photo-realistic output images, it is resolving reprojection errors as well as filling regions where no image content is available. We demonstrate the effectiveness of our algorithm using synthetic and real data, and compare to classical computer graphics and learned approaches. See page 2, first paragraph)(To summarize, we propose a novel neural image-guided rendering method, a hybrid between classical image-based rendering and machine learning. The core contribution is the explicit handling of view-dependent effects in the source and the target views using EffectsNet that can be learned in a self-supervised fashion. The composition of the reprojected views to a final output image without the need of hand-crafted blending schemes is enabled using our network called CompositionNet. See page 2, second paragraph).
Regarding claim 8, Thies teaches The method of claim 1, further comprising: generating, by the processing device, the reconstruction by supervising a loss function with the view-dependent radiance and with the view-independent radiance (Since we assume constant illumination, the diffuse light reflected by a surface point is the same in every image, thus, the appearance of a surface point only changes by the view-dependent components. We train our network in a self-supervised manner based on a Siamese network that predicts the view-dependent effects of two random views such that the difference of the diffuse aligned images is minimal (see Fig. 2). To this end, we use the re-projection ability (see Sec. 6) to align pairs of input images, from which the view-dependent effects have been removed (original image minus view-dependent effects), and train the network to minimize the resulting differences in the overlap region of the two images. See page 5, second paragraph)(See equation 1, page 5, for loss function of self-supervised training for EffectsNet)
Allowable Subject Matter
Claims 2-7, 9, 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 11-20 are allowed.
The following is a statement of reasons for the indication of allowable subject matter: The closest prior art of record alone or in combination is silent to the limitations “further comprising: overfitting, by the processing device, a neural network based on the digital images to determine the view-dependent radiance and to determine the view-independent radiance.” Of claim 2 when read in light of the rest of the limitations in claim 2 and the claims to which claim 2 depends and thus claim 2 contains allowable subject matter.
Claim 3 contains allowable subject matter because it depends on a claim that contains allowable subject matter.
The closest prior art of record alone or in combination is silent to the limitations “determining, by the processing device, a latent appearance of the scene based on the digital images, wherein determining the view-independent radiance comprises determining the view-independent radiance based on the latent appearance. ” Of claim 4 when read in light of the rest of the limitations in claim 4 and the claims to which claim 4 depends and thus claim 4 contains allowable subject matter.
Claims 5-7 contain allowable subject matter because they depend on a claim that contains allowable subject matter.
The closest prior art of record alone or in combination is silent to the limitations “further comprising: generating, by the processing device, an environment map to store the view-dependent radiance and the view-independent radiance as lighting information used for generating the synthesized digital image. ” Of claim 9 when read in light of the rest of the limitations in claim 9 and the claims to which claim 4 depends and thus claim 9 contains allowable subject matter.
The closest prior art of record alone or in combination is silent to the limitations “wherein the environment map comprises a Laplacian pyramid environment map structure. ” Of claim 10 when read in light of the rest of the limitations in claim 10 and the claims to which claim 10 depends and thus claim 10 contains allowable subject matter.
Regarding claim 11, Thies teaches A method (We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis. See abstract) comprising: receiving, by a processing device, a plurality of digital images that depict a scene from multiple perspectives (The view dependent effects are subtracted from the original images to get the diffuse images that can be re projected into the target image space. See figure 1 caption and figure 1);
determining, by the processing device, a radiance of the scene based on the lighting conditions (A main contribution of our work is a convolutional neural network that learns the disentanglement of view-dependent and view-independent effects in a self-supervised manner (see Fig. 2). Since our training data consists of a series of images taken from different viewing directions, assuming constant illumination, the reflected radiance of two corresponding points in two different images only differs by the view-dependent effects. Our self-supervised training procedure is based on a Siamese network that gets a pair of randomly selected images from the training set as input. The task of the network is to extract view-dependent lighting effects from an image, based on the geometric information from the proxy geometry. See section 5, EffectsNet, page 4);
generating, by the processing device, a synthesized image having a reconstruction of the scene using the radiance (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. This allows us to remove view-dependent effects from the input images, resulting in images that contain view-independent appearance information of the object. This view-independent information can be projected into a novel view using the reconstructed geometry, where new view-dependent effects can be added. See page 2, first paragraph) (Since CompositionNet is trained to generate photo-realistic output images, it is resolving reprojection errors as well as filling regions where no image content is available. We demonstrate the effectiveness of our algorithm using synthetic and real data, and compare to classical computer graphics and learned approaches. See page 2, first paragraph)(To summarize, we propose a novel neural image-guided rendering method, a hybrid between classical image-based rendering and machine learning. The core contribution is the explicit handling of view-dependent effects in the source and the target views using EffectsNet that can be learned in a self-supervised fashion. The composition of the reprojected views to a final output image without the need of hand-crafted blending schemes is enabled using our network called CompositionNet. See page 2, second paragraph);
and outputting, by the processing device, the synthesized image (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. This allows us to remove view-dependent effects from the input images, resulting in images that contain view-independent appearance information of the object. This view-independent information can be projected into a novel view using the reconstructed geometry, where new view-dependent effects can be added. See page 2, first paragraph) (Since CompositionNet is trained to generate photo-realistic output images, it is resolving reprojection errors as well as filling regions where no image content is available. We demonstrate the effectiveness of our algorithm using synthetic and real data, and compare to classical computer graphics and learned approaches. See page 2, first paragraph)(To summarize, we propose a novel neural image-guided rendering method, a hybrid between classical image-based rendering and machine learning. The core contribution is the explicit handling of view-dependent effects in the source and the target views using EffectsNet that can be learned in a self-supervised fashion. The composition of the reprojected views to a final output image without the need of hand-crafted blending schemes is enabled using our network called CompositionNet. See page 2, second paragraph), but is silent generating, by the processing device, a Laplacian pyramid environment map structure that encodes lighting conditions of the scene.
The prior art of record alone or in combination is silent to the limitations “generating, by the processing device, a Laplacian pyramid environment map structure that encodes lighting conditions of the scene” of claim 11 when read in light of the rest of the limitations in claim 11 and thus claim 11 contains allowable subject matter.
Claims 12-14 contain allowable subject matter because they depend on a claim that contains allowable subject matter.
Regarding claim 15, Thies teaches A system (Figure 1: Overview of our image-guided rendering approach: based on the nearest neighbor views, we predict the corresponding view-dependent effects using our EffectsNet architecture. See caption figure 1)( To this end, we propose EffectsNet, a deep neural network that predicts view-dependent effects. Based on these estimations, we are able to
convert observed images to diffuse images. See abstract) comprising: a memory component (it is clear this is performed on a computer with memory); and a processing device coupled to the memory component to perform operations (it is clear this is performed on a computer with a processing device coupled to memory)( To keep processing time low, we restrict this search to a small subset of
the input images. See page 5, second to last paragraph) including:
determining a view-independent radiance of a scene from a plurality of digital images that depict the scene from multiple perspectives (A main contribution of our work is a convolutional neural network that learns the disentanglement of view-dependent and view-independent effects in a self-supervised manner (see Fig. 2). Since our training data consists of a series of images taken from different viewing directions, assuming constant illumination, the reflected radiance of two corresponding points in two different images only differs by the view-dependent effects. Our self-supervised training procedure is based on a Siamese network that gets a pair of randomly selected images from the training set as input. The task of the network is to extract view-dependent lighting effects from an image, based on the geometric information from the proxy geometry. See section 5, EffectsNet, page 4);
supervising a diffuse radiance output for a reconstruction of the scene using the view-independent radiance (Given this 3D reconstruction and the set of images of the video, we are able to train our pipeline in a self-supervised manner. The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. This allows us to remove view-dependent effects from the input images, resulting in images that contain view-independent appearance information of the object. This view-independent information can be projected into a novel view using the reconstructed geometry, where new view-dependent effects can be added. See page 1 last paragraph to page 2 first paragraph)( The view-dependent effects are subtracted from the original images to get the diffuse images that can be reprojected into the target image space. See caption figure 1.);
determining a view-dependent radiance of the scene based on the plurality of digital images (A main contribution of our work is a convolutional neural network that learns the disentanglement of view-dependent and view-independent effects in a self-supervised manner (see Fig. 2). Since our training data consists of a series of images taken from different viewing directions, assuming constant illumination, the reflected radiance of two corresponding points in two different images only differs by the view-dependent effects. Our self-supervised training procedure is based on a Siamese network that gets a pair of randomly selected images from the training set as input. The task of the network is to extract view-dependent lighting effects from an image, based on the geometric information from the proxy geometry. See section 5, EffectsNet, page 4) (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. See page 3, first paragraph );
and
generating a synthesized image from the reconstruction using the diffuse radiance and specular radiance (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. This allows us to remove view-dependent effects from the input images, resulting in images that contain view-independent appearance information of the object. This view-independent information can be projected into a novel view using the reconstructed geometry, where new view-dependent effects can be added. CompositionNet, a second network, composites the projected K nearest neighbor images to a final output. Since CompositionNet is trained to generate photo-realistic output images, it is resolving reprojection errors as well as filling regions where no image content is available. We demonstrate the effectiveness of our algorithm using synthetic and real data, and compare to classical computer graphics and learned approaches. See page 2, first paragraph), but is silent to supervising a specular radiance output for the reconstruction using the view-independent radiance.
The supervising of the specular radiance is based on the view dependent output (The core of our approach is a neural network called EffectsNet which is trained in a Siamese way to estimate view-dependent effects, for example, specular highlights or reflections. See page 2, first paragraph).
The prior art of record alone or in combination is silent to the limitations “supervising a specular radiance output for the reconstruction using the view-independent radiance” of claim 15 when read in light of the rest of the limitations in claim 15 and thus claim 15 contains allowable subject matter.
Claims 16-20 contain allowable subject matter because they depend on a claim that contains allowable subject matter.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS R WILSON whose telephone number is (571)272-0936. The examiner can normally be reached M-F 7:30-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (572)-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NICHOLAS R WILSON/Primary Examiner, Art Unit 2611