DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on August 28, 2024 was filed after the filing date of the application on May 3, 2024. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The replacement drawings were received on July 9, 2024. These drawings are accepted.
Double Patenting
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A non-statutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on non-statutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a non-statutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1 and 2 are provisionally rejected on the ground of non-statutory double patenting as being unpatentable over claims 1 and 2 of co-pending Application No. 18/653,552 (reference application) in view of Ubaru et al. (US 2024/0135185). Please see the tables below.
This is a provisional non-statutory double patenting rejection.
Present Application#18/655,028 Claim 1
Co-pending Application#18/653,552 Claim 1
A computer implemented method comprising:
A computer implemented method comprising:
receiving one or more videos of a scene where each of the one or more videos is associated with camera extrinsics including a three-dimensional (3D) camera location and a camera direction;
receiving one or more videos of a scene where each of the one or more videos is associated with camera extrinsics including a three-dimensional (3D) camera location and a camera direction;
training a neural network using the one or more videos and the camera extrinsics to encode frames of the one or more videos as a plurality of models of the scene in a polynomial-based latent space; and (see Limitation A below for “latent model decoder”)
training a neural network using the one or more videos and the camera extrinsics to encode frames of the one or more videos as a plurality of models of the scene in a latent space associated with a latent model decoder; and
transmitting one or more of the plurality of models of the scene to a viewing device including a latent model decoder; (Limitation A)
transmitting one or more of the plurality of models of the scene to a viewing device including the latent model decoder;
wherein the latent model decoder is configured to decode the one or more of the plurality of models to generate imagery corresponding to novel 3D views of the scene.
wherein the latent model decoder is configured to decode the one or more of the plurality of models to generate imagery corresponding to novel 3D views of the scene.
Claim 1 of the present application differs from claim 1 of the co-pending application in that claim 1 of the present application recites, “…a polynomial-based latent space” where claim 1 of the co-pending application simply recites, “…a latent space…” However, Ubaru et al. disclose using a neural network to determine a mapping between data input into a machine learning model and data output from the machine learning model, including learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space” (see all steps of Figure 2, where step 240 notes learning a polynomial chaos expansion to map the new data samples in the latent space to the corresponding data output to learn the data distributions and their relation to perform estimation with high-dimensional dataset under uncertainty such as missing values, by estimating the values using the distribution).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify the co-pending application with Ubaru et al.’s method of learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space,” to learn data representations under uncertainty such as missing values by estimating values, thus enhancing accuracy (see Background and Summary of Ubaru et al).
Present Application#18/655,028 Claim 2
Co-pending Application#18/653,552 Claim 2
A computer implemented method comprising:
A computer implement method comprising:
receiving a two-dimensional (2D) training image of a scene where the 2D training image is associated with a camera location and a camera direction;
receiving a two-dimensional (2D) training image of a scene where the 2D training image is associated with a camera location and a camera direction;
providing the 2D training image, the camera location and the camera direction to a neural network;
providing the 2D training image, the camera location and the camera direction to a neural network;
encoding the 2D training image using a neural network to produce an initial polynomial-based latent space model of the scene;
encoding the 2D training image using a neural network to produce an initial latent space model of the scene;
decoding the initial polynomial-based latent space model of the scene using a pre-trained latent model decoder to produce initial generated imagery corresponding to the scene;
decoding the initial latent space model of the scene using a pre-trained latent model decoder to produce initial generated imagery corresponding to the scene;
comparing the initial generated imagery to the 2D training image to evaluate an encoding loss based upon differences between the initial generated imagery to the 2D training image; and
comparing the initial generated imagery to the 2D training image to evaluate an encoding loss based upon differences between the initial generated imagery to the 2D training image; and
updating weights of the neural network using a parameter of the encoding loss.
updating weights of the neural network using a parameter of the encoding loss.
Claim 2 of the present application differs from claim 2 of the co-pending application in that claim 2 of the present application recites, “…a polynomial-based latent space” where claim 2 of the co-pending application simply recites, “…a latent space…” Ubaru et al. disclose using a neural network to determine a mapping between data input into a machine learning model and data output from the machine learning model, including learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space” (see all steps of Figure 2, where step 240 notes learning a polynomial chaos expansion to map the new data samples in the latent space to the corresponding data output to learn the data distributions and their relation to perform estimation with high-dimensional dataset under uncertainty such as missing values, by estimating the values using the distribution).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify the co-pending application with Ubaru et al.’s method of learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space,” to learn data representations under uncertainty such as missing values by estimating values, thus enhancing accuracy (see Background and Summary of Ubaru et al).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1 and 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 12,322,068) in view of Ubaru et al. (US 2024/0135185).
As to claim 1, Kim et al. disclose a computer implemented method (e.g. Figure 1A, content generation system/neural network 104, to execute the method as outlined in Figure 5A) comprising: receiving one or more videos of a scene (e.g. one or more images 102 as a sequence of images 102, where column 8, lines 35-37 notes the sequence of images may include frames of a video file or stream captured in, or with respect to, a scene) where each of the one or more videos is associated with camera extrinsics including a three-dimensional (3D) camera location and a camera direction (e.g. the one or more images 102 associated with information about a virtual camera used to capture the images, including location and orientation, as well as relevant camera parameters, e.g. depth of focus or field of view, and camera orientation)(e.g. step 502, column 8, lines 20-40 notes receiving a sequence of images for a scene, and parameters of one or more cameras used to capture this sequence, the camera parameters can include at least a location and orientation of cameras, with respect to the scene, where column 3, lines 32-36 also notes one or more images 102 can be input to a content generation system 104 that can generate 3D image content 112, such as may correspond to a scene or other object or representation, column 4, lines 7-26 notes a given sequence of images 102 associated with information about a virtual camera used to capture those images (e.g. location and orientation) as well as relevant camera parameters (e.g. depth of focus or field of view), camera orientation information, and other relevant information); training a neural network (e.g. training neural network 104, further comprising neural network encoder 106, latent space representation 108, and neural network decoder 110) using the one or more videos and the camera extrinsics (e.g. using the one or more images 102 and camera information noted above) to encode frames of the one or more videos (e.g. to encode frames of the one or more images 102 via neural network encoder 106) as a plurality of models of the scene (e.g. as a voxel-based 3D latent representation 108 of the scene, where Figure 3, column 6, lines 33-39 notes the latent representation 108 can be decomposed into a set of smaller representations) in a polynomial-based latent space (e.g. in a multi-dimensional latent space)(e.g. step 504, column 8, lines 41-59 notes generating, using an autoencoder, a set of density voxels and a set of feature voxels from this sequence and parameters, and step 506, generating, based at least in part upon feature and density voxels, a single voxel-based 3D representation of the scene, where, as noted above, one or more images 102 can be input to a content generation system 104, where column 4, lines 6-7 notes content generation system 104 as a neural network that can be trained to learn a 3D representation of a scene, and column 8, lines 35-37 notes the sequence of images may include frames of a video file or stream captured in, or with respect to, a scene, where column 3, lines 36-44 further notes a neural network encoder 106 can analyze images 102, which may correspond to the sequence of images captures using at least one camera, extract representative features of those images (e.g. camera information notes above), and encode those features into a latent representation 108 of a scene represented in images 102, latent representation 108 may take form of a latent space or latent vector, which may represent values in various locations of a voxel space, where column 4, lines 7-26 generating the voxel-based 3D representation 108 using a generative model (as may include encoder 106) using the sequence of images 102, relevant camera parameters (e.g. depth of focus or field of view), camera orientation information, as well as any other relevant information, and column 4, lines 41-56 notes latent representation 108 can be any appropriate representation that can represent important shape and detail information about a scene, such as a multi-dimensional latent space in which features from input sequence 102 are encoded at various points in this multi-dimensional latent space, this latent space can be represented by a 3D voxel space, such as a 3D array of voxels where each voxel represents a 3D sub-region of a scene corresponding to input sequence 102, which may further include both density and feature voxels); and transmitting one or more of the plurality of models of the scene (e.g. transmitting the latent representation 108 of the scene) to a viewing device including a latent model decoder (e.g. to neural network decoder 110); wherein the latent model decoder (e.g. neural network decoder 110) is configured to decode the one or more of the plurality of models (e.g. decoders the latent representation 108) to generate imagery corresponding to novel 3D views of the scene (e.g. to generate output image content 112 based at least in part upon the input latent representation 108, e.g. corresponding to new or unique points of view of the scene)(e.g. step 508, column 8, lines 60-66 notes providing camera parameters and voxel-based 3D representation to a decoder to generate a reconstruction of at least one scene image, where column 3, lines 44-48 notes the latent representation 108 can be passed as input to neural network decoder 110 which generates output image content 112 based at least in part upon the input latent representation 108, where column 4, lines 26-34 further notes decoder 110 uses the camera information along with the latent representation 108 to generate a series of images 112 that are accurate reconstructions of these input images 102, where changing camera parameters can cause decoder 110 to generate new images of this scene from one or more new or unique points of view that were not represented in this input image sequence 102 or set).
As noted above, Kim et al. disclose training a neural network using the one or more videos and the camera extrinsics to encode frames of the one or more videos as a plurality of models of the scene in a latent space, e.g. generating a voxel-based 3D latent representation in a multi-dimensional latent space, but do not disclose the multi-dimensional latent space as a “polynomial-based latent space.”
Ubaru et al. disclose using a neural network to determine a mapping between data input into a machine learning model and data output from the machine learning model, including learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space” (see all steps of Figure 2, where step 240 notes learning a polynomial chaos expansion to map the new data samples in the latent space to the corresponding data output to learn the data distributions and their relation to perform estimation with high-dimensional dataset under uncertainty such as missing values, by estimating the values using the distribution).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Kim et al.’s method of training a neural network to encode frames of one or more videos as a plurality of models of the scene in a latent space with Ubaru et al.’s method of learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space,” to learn data representations under uncertainty such as missing values by estimating values, thus enhancing accuracy (see Background and Summary of Ubaru et al).
As to claim 2, Kim et al. disclose a computer implement method (e.g. Figure 1A, content generation system/neural network 104, to execute the method as outlined in Figures 5A and 5B) comprising: receiving a two-dimensional (2D) training image of a scene (e.g. one or more images 102 as a sequence of images 102, where column 5, lines 32-35 notes learning a 3D representation (further noted below) from 2D images (e.g. as the input sequence)) where the 2D training image is associated with a camera location and a camera direction (e.g. the one or more images 102 associated with information about a virtual camera used to capture the images, including location and orientation, as well as relevant camera parameters, e.g. depth of focus or field of view, and camera orientation)(e.g. step 502, column 8, lines 20-40 notes receiving a sequence of images for a scene, and parameters of one or more cameras used to capture this sequence, the camera parameters can include at least a location and orientation of cameras, with respect to the scene, where step 550, column 9, lines 35-37 further notes receiving a plurality of two-dimensional images of a scene, such as images captured in a sequence at different locations, where column 3, lines 32-36 also notes one or more images 102 can be input to a content generation system 104 that can generate 3D image content 112, such as may correspond to a scene or other object or representation, column 4, lines 7-26 notes a given sequence of images 102 associated with information about a virtual camera used to capture those images (e.g. location and orientation) as well as relevant camera parameters (e.g. depth of focus or field of view), camera orientation information, and other relevant information); providing the 2D training image, the camera location and the camera direction to a neural network (e.g. providing the one or more images 102 as well as the camera information noted above to neural network 104, further comprising neural network encoder 106, latent space representation 108, and neural network decoder 110)(e.g. as noted above, one or more images 102 can be input to a content generation system 104, where column 4, lines 6-7 notes content generation system 104 as a neural network that can be trained to learn a 3D representation of a scene); encoding the 2D training image (e.g. encoding the one or more images 102 via neural network encoder 106) using a neural network (e.g. of neural network 104) to produce an initial polynomial-based latent space model of the scene (e.g. to produce a voxel-based 3D latent representation 108 in a multi-dimensional latent space of the scene) (e.g. step 504, column 8, lines 41-59 notes generating, using an autoencoder, a set of density voxels and a set of feature voxels from this sequence and parameters, and step 506, generating, based at least in part upon feature and density voxels, a single voxel-based 3D representation of the scene, where, as noted above, one or more images 102 can be input to a content generation system 104, where column 4, lines 6-7 notes content generation system 104 as a neural network that can be trained to learn a 3D representation of a scene, and column 8, lines 35-37 notes the sequence of images may include frames of a video file or stream captured in, or with respect to, a scene, where column 3, lines 36-44 further notes a neural network encoder 106 can analyze images 102, which may correspond to the sequence of images captures using at least one camera, extract representative features of those images (e.g. camera information notes above), and encode those features into a latent representation 108 of a scene represented in images 102, latent representation 108 may take form of a latent space or latent vector, which may represent values in various locations of a voxel space, where column 4, lines 7-26 generating the voxel-based 3D representation 108 using a generative model (as may include encoder 106) using the sequence of images 102, relevant camera parameters (e.g. depth of focus or field of view), camera orientation information, as well as any other relevant information, and column 4, lines 41-56 notes latent representation 108 can be any appropriate representation that can represent important shape and detail information about a scene, such as a multi-dimensional latent space in which features from input sequence 102 are encoded at various points in this multi-dimensional latent space, this latent space can be represented by a 3D voxel space, such as a 3D array of voxels where each voxel represents a 3D sub-region of a scene corresponding to input sequence 102, which may further include both density and feature voxels); decoding the initial polynomial-based latent space model of the scene (decoding the latent representation 108 of the scene) using a pre-trained latent model decoder (e.g. using neural network decoder 110, where Figure 4, column 7, lines 23-28 notes decoder may be pre-trained) to produce initial generated imagery corresponding to the scene (e.g. to generate output image content 112 based at least in part upon the input latent representation 108, e.g. corresponding to new or unique points of view of the scene)(e.g. step 508, column 8, lines 60-66 notes providing camera parameters and voxel-based 3D representation to a decoder to generate a reconstruction of at least one scene image, where column 3, lines 44-48 notes the latent representation 108 can be passed as input to neural network decoder 110 which generates output image content 112 based at least in part upon the input latent representation 108, where column 4, lines 26-34 further notes decoder 110 uses the camera information along with the latent representation 108 to generate a series of images 112 that are accurate reconstructions of these input images 102, where changing camera parameters can cause decoder 110 to generate new images of this scene from one or more new or unique points of view that were not represented in this input image sequence 102 or set); comparing the initial generated imagery to the 2D training image (e.g. comparing the output image content 112 to the input sequence, e.g. one or more images 102) to evaluate an encoding loss based upon differences between the initial generated imagery to the 2D training image (e.g. to evaluate an encoding loss based upon differences between the output image content 112 and the input sequence, e.g. one or more images 102)(e.g. step 510, column 9, lines 1-3 notes the recreated image can be compared against the original scene image to determine reconstruction loss, where column 4, lines 34-41 also notes reconstructing an input sequence can be useful for training this neural network 104, such as by determining a reconstruction loss by comparing this input sequence 102 with this output sequence 112 to ensure the latent representation 108 contains enough information about a relevant input scene corresponding to input sequence 102); and updating weights of the neural network using a parameter of the encoding loss (e.g. step 510, column 9, lines 1-5 notes one or more network parameters can then be adjusted to attempt to minimize this reconstruction loss as part of the training process, where column 14, lines 50 thru column 15, lines 10 further notes adjusting weights to refine an output using a loss function and adjustment algorithm).
As noted above, Kim et al. disclose encoding the 2D training image using a neural network to produce an initial latent space model of the scene, e.g. generating a voxel-based 3D latent representation in a multi-dimensional latent space, but do not disclose the multi-dimensional latent space as a “polynomial-based latent space.”
Ubaru et al. disclose using a neural network to determine a mapping between data input into a machine learning model and data output from the machine learning model, including learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space” (see all steps of Figure 2, where step 240 notes learning a polynomial chaos expansion to map the new data samples in the latent space to the corresponding data output to learn the data distributions and their relation to perform estimation with high-dimensional dataset under uncertainty such as missing values, by estimating the values using the distribution).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Kim et al.’s method of training a neural network to encode frames of one or more videos as a plurality of models of the scene in a latent space with Ubaru et al.’s method of learning a polynomial chaos expansion to map new data samples in a latent space to corresponding data, thus “in polynomial-based latent space,” to learn data representations under uncertainty such as missing values by estimating values, thus enhancing accuracy (see Background and Summary of Ubaru et al).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sajjadi et al. (US 2024/0169662) disclose a system and method of obtaining the source images of a scene, where a query associated with a target view of the scene is obtained, a portion of the query is parameterized in a latent pose space, where an output image of the scene associated with the target view is generated using a machine-learned image view synthesis model, the latent pose space is learned by reconstructing, using the machine-learned image view synthesis model and training target views of training scenes from training source images, and the latent pose values are used by the machine-learned image view synthesis model to reconstruct the training target views.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACINTA M CRAWFORD whose telephone number is (571)270-1539. The examiner can normally be reached 8:30a.m. to 4:30p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Y. Poon can be reached at (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JACINTA M CRAWFORD/Primary Examiner, Art Unit 2617