Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 4-6, 8, 9, 17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Van Hoff et al. (U.S. Patent Application Publication No. 2020/0134911), referred herein as Van Hoff, in view of Faulkner et al. (U.S. Patent Application Publication No. 2019/0369836), referred herein as Faulkner.
Regarding claim 1, Van Hoff teaches a method comprising: receiving, by a processing device, a digital video depicting a scene including a human and an object (paragraphs 25 and 27; paragraph 35; paragraph 38, lines 1-14; video is obtained that contains a plurality of objects, including a human and other objects);
generating, by the processing device, depth information indicating a depth of the human and a depth of the object in the digital video (paragraph 38, lines 14-27; paragraphs 58 and 60; paragraph 67; paragraphs 129 and 130; depths of the person and each object are determined, and a 3D mesh model is generated for the person and each object);
determining, by the processing device using a machine learning model, a three-dimensional position of the object in the scene by comparing the depth information to a an estimated dimension of a human mesh modeled from the human in the digital video (paragraphs 26 and 27; paragraph 33; paragraph 38, lines 14-27; paragraph 40, lines 1-9; paragraphs 52 and 53; paragraph 93; paragraph 131; paragraph 132, lines 1-5; a machine learning model uses the depth and mesh information to determine the position, orientation, and/or motion of the object, and the dimensions of the mesh); and
generating, by the processing device, a three-dimensional representation of the object, and generating, by the processing device, a scene reconstruction including the human mesh and the three-dimensional representation of the object at the three-dimensional position (paragraph 40, the last 11 lines; paragraph 53; paragraphs 61 and 63; paragraph 104; paragraph 132, lines 5-9; a reconstruction of the scene that includes the mesh representations is generated based on the position of the object).
Van Hoff does not explicitly teach that the generated depth information is a depth map, or determining a size of the object and position of the object by comparing the depth map to the mesh.
However, in a similar field of endeavor, Faulkner teaches a method comprising receiving a digital video depicting a scene including a plurality of objects, determining a depth of the objects, and generating a mesh modeled from the objects in the video (fig 1; paragraph 64, lines 1-8; paragraph 76, lines 1-7; paragraph 78; paragraphs 80 and 81), further comprising generating a depth map indicating a depth of objects in the video, and determining a size and position of an object based on comparing the depth map to the mesh (paragraph 64, lines 1-8; paragraphs 78 and 79; paragraph 80, lines 1-7; paragraph 81; paragraph 82, lines 1-7; paragraph 83, lines 1-6; paragraph 84; newly placed virtual objects, whose depths are represented by the depth map, are compared to the current mesh to determine their 3D size and location, and the mesh may then be updated to reflect the change).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the depth map and object size and location determination of Faulkner with the object determination of Van Hoff because this enables objects to be represented in a way that is more accurate and intuitive, while increasing the efficiency of reconstructing the scene for display (see, for example, Faulkner, paragraph 57, the last 11 lines; paragraph 82, the last 10 lines; paragraph 84, the last 5 lines).
Regarding claim 2, Van Hoff in view of Faulkner teaches the method of claim 1, wherein a viewpoint of the scene changes (Van Hoff, paragraph 41; paragraph 69, lines 1-12; paragraph 102, lines 1-14).
Regarding claim 4, Van Hoff in view of Faulkner teaches the method of claim 1, wherein the depth of the human and the depth of the object are determined using a monocular depth model (Van Hoff, paragraph 18, lines 10-19; paragraph 38, lines 14-27; paragraph 82, lines 1-15; paragraph 89, lines 1-9; paragraph 93).
Regarding claim 5, Van Hoff in view of Faulkner teaches the method of claim 1, wherein the machine learning model is a simultaneous localization and mapping (SLAM) model (Van Hoff, paragraph 40, lines 1-9; paragraph 82, lines 1-15).
Regarding claim 6, Van Hoff in view of Faulkner teaches the method of claim 1, wherein the scene reconstruction includes scene point clouds indicating three-dimensional features of the object (Van Hoff, paragraph 38, lines 14-24; paragraph 58).
Regarding claim 8, Van Hoff in view of Faulkner teaches the method of claim 1, wherein the human mesh tracks movement of the human in the scene (Van Hoff, paragraph 40, lines 1-9; paragraph 82, lines 1-15; paragraph 131).
Regarding claim 9, Van Hoff in view of Faulkner teaches the method of claim 1, wherein the digital video is an RGB video (Van Hoff, paragraph 38, lines 11-16).
Regarding claim 17, the limitations of this claim substantially correspond to limitations of claim 1 (except for the medium, instructions, and processing device, which is disclosed by Van Hoff, fig 14, medium 1406 and processor 1404; paragraph 158); thus they are rejected on similar grounds.
Regarding claims 19 and 20, the limitations of these claims substantially correspond to the limitations of claims 5 and 8, respectively; thus they are rejected on similar grounds as their corresponding claims.
Claims 3 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Van Hoff, in view of Faulkner, and further in view of Chen et al. (U.S. Patent Application Publication No. 2024/0185399), referred herein as Chen.
Regarding claim 3, Van Hoff in view of Faulkner teaches the method of claim 2, further comprising determining a camera view corresponding to the viewpoint of the scene based on the object relative to the human mesh in the scene reconstruction (Van Hoff, paragraph 63, lines 1-8; paragraphs 69 and 90; paragraph 102, lines 1-14; paragraph 144, lines 1-10).
Van Hoff in view of Faulkner does not explicitly teach determining a camera trajectory based on a determined position of the object.
However, in a similar field of endeavor, Chen teaches a method comprising receiving images depicting a scene including a plurality of objects, and determining positions and distances between objects that correspond to a camera viewpoint (figs 4-7; paragraph 66; paragraphs 78 and 81; paragraph 88), wherein a camera trajectory corresponding to the viewpoint is determined based on determined positions of the objects relative to one another (paragraphs 88 and 90).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the camera trajectory determination of Chen with the view determination of Van Hoff in view of Faulkner because this helps efficiently improve the accuracy of the reconstructed viewpoint, thus creating a higher fidelity output result (see, for example, Chen, paragraph 87).
Regarding claim 18, the limitations of this claim substantially correspond to the limitations of claims 2 and 3; thus they are rejected on similar grounds.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Van Hoff, in view of Faulkner, and further in view of Totty et al. (U.S. Patent Application Publication No. 2020/0302686), referred herein as Totty.
Regarding claim 7, Van Hoff in view of Faulkner teaches the method of claim 1, wherein the human mesh is generated by predicting per-frame position, orientation, and/or motion for the human (Van Hoff, paragraph 82; paragraph 85, lines 1-7; paragraph 95).
Van Hoff in view of Faulkner teaches event segmentation (Van Hoff, paragraph 84, lines 1-4), but does not explicitly teach using segmentation masks.
However, in a similar field of endeavor, Totty teaches a method comprising receiving digital video depicting a scene including a plurality of objects at different viewpoints, determining the size, depth, and/or dimensions of each object, and generating a scene reconstruction based on the object determinations (paragraph 40; paragraph 84; paragraphs 103 and 104), wherein segmentation masks are used for the objects (paragraph 98).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the segmentation masks of Totty with the per-frame prediction of Van Hoff in view of Faulkner because this helps produce geometry of objects in the scene efficiently and effectively, while producing a more compelling, higher quality result for the user (see, for example, Totty, paragraph 22, lines 1-7; paragraphs 23-25).
Claims 10, 12-14, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Van Hoff, in view of Kroeger (U.S. Patent No. 11,555,903), referred herein as Kroeger.
Regarding claim 10, Van Hoff teaches a system comprising: a memory component, and a processing device coupled to the memory component, the processing device to perform operations (fig 14, memory 1406 and processor 1404; paragraph 158) comprising:
receiving a digital video depicting a scene with a changing viewpoint, including a human and an object (paragraphs 25 and 27; paragraph 35; paragraph 38, lines 1-14; paragraph 90; paragraph 102, lines 1-14; video is obtained that contains a plurality of objects, including a human and other objects, and whose viewpoint changes);
generating depth information indicating a depth of the human and a depth of the object in the digital video (paragraph 38, lines 14-27; paragraphs 58 and 60; paragraph 67; paragraphs 129 and 130; depths of the person and each object are determined, and a 3D mesh model is generated for the person and each object);
determining, using a machine learning model, a position, orientation, and/or motion corresponding to a viewpoint of the scene by comparing the depth information to a human mesh modeled from the human in the digital video (paragraphs 26 and 27; paragraph 33; paragraph 38, lines 14-27; paragraph 40, lines 1-9; paragraph 69; paragraphs 90 and 92; paragraph 102, lines 1-14; a machine learning model uses the depth and mesh information to determine the position, orientation, and/or motion of the object corresponding to a viewpoint by comparing depth information to a human mesh model of a human in the video); and
displaying a scene reconstruction indicating the camera viewpoint (paragraph 40, the last 11 lines; paragraph 53; paragraphs 61 and 63; paragraph 104; paragraph 132, lines 5-9; a reconstruction of the scene that includes the mesh representations is generated and displayed based on the position, orientation, and/or motion of the object corresponding to the viewpoint).
Van Hoff does not explicitly teach that the depth information is a depth map, and determining a camera trajectory based on the depth map.
However, in a similar field of endeavor, Kroeger teaches a system for receiving digital video depicting a scene with a changing camera trajectory viewpoint that includes a variety of objects such as a human and other objects, and displaying a scene reconstruction indicating the camera trajectory (column 6, lines 2-22; column 7, lines 33-38 and 48-62), and comprising generating a depth map that indicates a depth of the objects in the digital video, and determining a camera trajectory based on the depth map and a mesh modeled from the objects in the digital video (column 7, line 63 through column 8, line 7; column 8, lines 11-22 and 29-44; column 10, lines 3-20).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the camera trajectory determination of Kroeger with the view determination of Van Hoff because this helps prevent any potential misalignment between captured image data and the rendering image data, thereby improving the image processing results (see, for example, Kroeger, column 7, lines 36-40; column 8, lines 23-27).
Regarding claim 12, Van Hoff in view of Kroeger teaches the system of claim 10, wherein the depth of the human and the depth of the object are determined using a monocular depth model (Van Hoff, paragraph 18, lines 10-19; paragraph 38, lines 14-27; paragraph 82, lines 1-15; paragraph 89, lines 1-9; paragraph 93).
Regarding claim 13, Van Hoff in view of Kroeger teaches the system of claim 10, wherein the machine learning model is a simultaneous localization and mapping (SLAM) model (Van Hoff, paragraph 40, lines 1-9; paragraph 82, lines 1-15).
Regarding claim 14, Van Hoff in view of Kroeger teaches the system of claim 10, wherein the scene reconstruction includes scene point clouds indicating three-dimensional features of the object (Van Hoff, paragraph 38, lines 14-24; paragraph 58).
Regarding claim 16, Van Hoff in view of Kroeger teaches the system of claim 10, wherein the human mesh tracks movement of the human in the scene (Van Hoff, paragraph 40, lines 1-9; paragraph 82, lines 1-15; paragraph 131).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Van Hoff, in view of Kroeger, and further in view of Faulkner.
Regarding claim 11, Van Hoff in view of Kroeger teaches the system of claim 10, further comprising determining, using the machine learning model, a position, orientation, and/or motion of the object by comparing the depth of the human, the depth of the object, and an estimated dimension of the human mesh (Van Hoff, paragraph 40, lines 1-9; paragraphs 52 and 53; paragraph 93; paragraph 131; paragraph 132, lines 1-5).
Van Hoff in view of Kroeger does not explicitly teach determining a size of the object.
However, in a similar field of endeavor, Faulkner teaches a system for receiving a digital video depicting a scene including a plurality of objects, determining a depth of the objects, and generating a mesh modeled from the objects in the video (fig 1; paragraph 64, lines 1-8; paragraph 76, lines 1-7; paragraph 78; paragraphs 80 and 81), wherein a size of an object is determined based on comparing the depth of one object to another, and based on the estimated dimensions of the mesh (paragraphs 78 and 79; paragraph 81; paragraph 82, lines 1-7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the object size determination of Faulkner with the object determination of Van Hoff in view of Kroeger because this enables objects to be represented in a way that is more accurate and intuitive, while increasing the efficiency of reconstructing the scene for display (see, for example, Faulkner, paragraph 57, the last 11 lines; paragraph 82, the last 10 lines).
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Van Hoff, in view of Kroeger, and further in view of Totty.
Regarding claim 15, Van Hoff in view of Kroeger teaches the system of claim 10, wherein the human mesh is generated by predicting per-frame position, orientation, and/or motion for the human (Van Hoff, paragraph 82; paragraph 85, lines 1-7; paragraph 95).
Van Hoff in view of Kroeger teaches event segmentation (Van Hoff, paragraph 84, lines 1-4), but does not explicitly teach using segmentation masks.
However, in a similar field of endeavor, Totty teaches a system for receiving digital video depicting a scene including a plurality of objects at different viewpoints, determining the size, depth, and/or dimensions of each object, and generating a scene reconstruction based on the object determinations (paragraph 40; paragraph 84; paragraphs 103 and 104), wherein segmentation masks are used for the objects (paragraph 98).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the segmentation masks of Totty with the per-frame prediction of Van Hoff in view of Kroeger because this helps produce geometry of objects in the scene efficiently and effectively, while producing a more compelling, higher quality result for the user (see, for example, Totty, paragraph 22, lines 1-7; paragraphs 23-25).
Response to Arguments
On page 11 of the Applicant’s Remarks, with respect to the 103 rejection of claim 1, the Applicant argues that the prior art does not teach a depth map indicating the depth of the human and the depth of the object because 1) Van Hoff discloses depth data for the objects, but this is different from generating a depth map, and 2) Faulkner discusses using mesh data to determine the depth of object locations, but this is not the same as generating a depth map, and 3) the depth described by Faulkner is based on a user selection for placing an object, rather than determining a depth of a human and an object in a digital video. The Examiner respectfully disagrees with these arguments.
With respect to the first argument, it is respectfully submitted that, as shown in the above Office Action, Van Hoff is not relied upon to teach the depth map, as this is disclosed by Faulkner.
With respect to the second argument, it is respectfully submitted that, as shown in the above Office Action, the size and location determination in Faulkner is explicitly described as utilizing a depth map, as opposed to simply determining depth.
With respect to the third argument, it is respectfully submitted that the objects in Faulkner are necessarily depicted in a digital video, not least because the processing is occurring in a three-dimensional virtual world. However, even though Faulkner discloses this feature, Faulkner is also not relied upon to teach receiving video depicting a human and object, as this is disclosed in Van Hoff. Additionally, regardless of the impetus of how an object ends up in the virtual world in Faulkner, it still remains the case that its size and location are determined, and that depth maps are the means for such determination in Faulkner.
It is noted that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Thus, the Examiner respectfully submits that the combination of Van Hoff and Faulkner teaches this limitation.
On pages 11 and 12 of the Applicant’s Remarks, with respect to the 103 rejection of claim 1, the Applicant argues that the prior art does not teach determining a size and position of the object by comparing the depth map to the dimension of the human mesh because Faulkner discusses computing the size based on the depth, but this is different because determining a size based on a computed depth is not equivalent to determining a size based on comparing a depth map to a dimension of a human mesh. The Examiner respectfully disagrees with this argument.
It is first respectfully submitted that, as discussed in the above Office Action, Van Hoff discloses comparing depth information to the human mesh model, but does not explicitly teach determining the size using a depth map. Thus, Faulkner is not relied upon to teach a human mesh model (although the determinations in Faulkner are disclosed as applying to any type of object). As shown in the above Office Action, however, Faulkner discloses using a depth map to determine the size and location of objects, and discloses that when an object is placed, its depth information in the depth map is compared to the mesh to determine its size and location, and then the mesh may be updated to reflect the changes in preparation for the next object interaction.
It is noted that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Thus, the Examiner respectfully submits that the combination of Van Hoff and Faulkner teaches this limitation.
On pages 12 and 13 of the Applicant’s Remarks, with respect to the 103 rejection of claim 1, the Applicant argues that the prior art does not teach generating a three-dimensional representation of the object based on the size of the object and generating a scene reconstruction including the mesh and the object at the position because 1) Van Hoff discloses generating a volumetric 3D model of an object, which does not involve generating a scene reconstruction including the human mesh and the object, and 2) Faulkner places virtual objects at locations in a real-world environment, which is different from generating a scene reconstruction including the human mesh and the object. The Examiner respectfully disagrees with these arguments.
With respect to the first argument, it is respectfully submitted that, as shown in the above Office Action, Van Hoff explicitly teaches generating a 3D simulation of the environment including the human (which is defined by the mesh) and a three-dimensional representation of any virtual objects – as just one example, paragraph 104 and paragraphs 129-132 describe generating a 3D simulation of the human and the objects, and the remaining citations detail the other features regarding how this is accomplished. As shown in the above Office Action, Van Hoff does not teach the size limitations, thus Faulkner is relied upon to teach these features.
With respect to the second argument, Faulkner is not relied upon to teach the scene reconstruction, as this is clearly disclosed in Van Hoff. However, it is respectfully submitted that Faulkner does indeed reconstruct the 3D scene, and determines the size and location of objects in the scene using a depth map, as previously discussed. Faulkner does not explicitly teach the human mesh model, but is not relied upon to teach this, as this is disclosed by Van Hoff.
It is noted that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Thus, the Examiner respectfully submits that the combination of Van Hoff and Faulkner teaches this limitation.
On page 13 of the Applicant’s Remarks, with respect to the 103 rejection of claims 10 and 17, the Applicant argues that these claims are not taught by the prior art for similar reasons as those discussed in regard to claim 1. The Examiner respectfully disagrees with these arguments, for the reasons discussed above.
With respect to claim 10, it is noted that other than the depth map, none of the limitations or arguments at issue in claims 1 and 17 apply to this claim, as the claim does not comprise any limitations reciting the object size, 3D object position, generating the virtual object, or reconstructing the scene to include the human mesh and 3D representation of the object at the 3D position. Thus, even if claim 1 comprised distinguishing features, claim 10 would not. The rejection of claim 10 has been updated based on the new depth map limitations, however, as shown in the above Office Action.
On page 13 of the Applicant’s Remarks, with respect to the 103 rejection of the dependent claims, the Applicant argues that these claims are not taught by the prior art insomuch as they depend from claims that are not taught by the prior art. The Examiner respectfully disagrees with these arguments, for the reasons discussed above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID T WELCH whose telephone number is (571)270-5364. The examiner can normally be reached Monday-Thursday, 8:30-5:30 EST, and alternate Fridays, 9:00-2:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
DAVID T. WELCH
Primary Examiner
Art Unit 2613
/DAVID T WELCH/Primary Examiner, Art Unit 2613