DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
2. Acknowledgment is made of applicant's claim for foreign priority under 35 U.S.C. § 119(a)-(d). The certified copy has been placed of record in the file.
Information Disclosure Statement
3. The information disclosure statements (IDS) were submitted on 10/30/2024 and 02/11/2026. The submissions are in compliance with the provisions of 37 CFR § 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 103
4. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
5. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
7. Claim 1, 8-11, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ren et al., (US 2021/0097297A1) (hereinafter Ren) in view of Vodrahalli et al., (US 2023/0092766A1) (hereinafter Vodrahalli).
Regarding claim 1, Ren discloses a video compression method, performed by a computer device (e.g., abstract; Figs. 5, 9-10), the method comprising:
obtaining a to-be-processed video frame (e.g., see abstract: acquiring one guided image of the first image; Fig. 5, paragraphs 0087-0089: a first image F1; also see Fig. 6, paragraphs 0093-0098) and a previous video frame of the to-be-processed video frame (e.g., see abstract: obtaining a reconstructed image by subjecting the first image; Fig. 5, paragraphs 0087-0089: a second image F2), the previous video frame adjacent to the to-be-processed video frame and before the to-be-processed video frame in a video frame sequence (e.g., see Fig. 5, paragraphs 0087-0089: a first image F1 and a second image F2; Fig. 6, paragraphs 0093-0098; also see Fig. 9, steps 10-30, paragraphs 0152-0154);
extracting a key point from the to-be-processed video frame to obtain first position information of a first key point in the to-be-processed video frame (e.g., see paragraphs 0059, 0094: a first position of a key point in a first image; Figs. 7, paragraphs 0099, 0100, 0114: position of each key point; also see Fig. 8, paragraphs 0126, 0131), and extracting a key point from the previous video frame to obtain second position information of a second key point in the previous video frame (e.g., see paragraphs 0059: a second position of a key point in a second image; Figs. 7, paragraphs 0099, 0100, 0114: position of each key point; also see Fig. 8, paragraphs 0126, 0131);
performing motion estimation based on the first position information and the second position information to obtain motion information of the to-be-processed video frame relative to the previous video frame (e.g., see Fig. 3, paragraphs 0057-0061: performing affine transformation that involves motions and translations; Figs. 5-6, paragraphs 0088, 0091, 0128, 0129);
performing image inpainting based on the motion information and the previous video frame to obtain an initial video frame (e.g., see Fig. 3, paragraph 0059: images are obtained by inpainting);
Ren does not explicitly disclose determining a latent feature based on the to-be-processed video frame and the initial video frame, the latent feature representing an inpainting deviation of the initial video frame relative to the to-be-processed video frame; and performing video compression based on the first position information, the second position information, and the latent feature to obtain a video compressed file.
However, Vodrahalli discloses determining a latent feature based on the to-be-processed video frame and the initial video frame (e.g., see paragraphs 0092, 0094: initial image), the latent feature (e.g., see paragraphs 0062, 0088: latent feature) representing an inpainting deviation of the initial video frame relative to the to-be-processed video frame (e.g., see paragraphs 0008, 0009, 0013: inpainting model; Fig. 3, paragraphs 0083-0086: inpainting model 305; also see Fig. 4, paragraphs 0087-0089); and
performing video compression based on the first position information, the second position information (e.g., see Fig. 3, paragraphs 0083, 0084: input image 301, mask image 303; paragraphs 0070, 0098: positions or orientations of the image), and the latent feature to obtain a video compressed file (e.g., see paragraphs 0062, 0088: latent feature).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the system disclosed by Ren to add the teachings of Vodrahalli as above, in order to provide an encoder of the inpainting model is used as an image embedding model to generate the one or more temporal image embeddings (see paragraph 0009: Vodrahalli).
Regarding claim 8, Ren and Vodrahalli disclose all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim.
Furthermore, Ren discloses wherein the first key point comprises a key point of a body part comprised in a first object in the to-be-processed video frame (e.g., see paragraphs 0059, 0094, 0099: key point of image; paragraphs 0100, 0114, 0126, 0131: key point of part of image; paragraphs 0139: part of body such as eyes, nose, mouth, eyebrow and face), and the second key point comprises a key point of a body part comprised in a second object in the previous video frame (e.g., see paragraphs 0139: part of body such as eyes, nose, mouth, eyebrow and face; paragraphs 0059, 0094, 0099: key point of image; paragraphs 0100, 0114, 0126, 0131: key point of part of image).
Regarding claim 9, Ren and Vodrahalli disclose all the limitations of claim 8, and are analyzed as previously discussed with respect to that claim.
Furthermore, Ren discloses wherein the extracting a key point from the to-be-processed video frame to obtain first position information of a first key point in the to-be-processed video frame (e.g., see paragraphs 0059, 0094, 0099: position of key point of image; paragraphs 0100, 0114, 0126, 0131: position of key point), and extracting a key point from the previous video frame to obtain second position information of a second key point in the previous video frame comprises: recognizing the body part comprised in the first object in the to-be-processed video frame, and recognizing the body part comprised in the second object in the previous video frame (e.g., see paragraphs 0139: part of body such as eyes, nose, mouth, eyebrow and face); and determining, based on a mapping relationship between a body part and a key point, a key point corresponding to the body part comprised in the first object, determining first position information of the key point corresponding to the body part comprised in the first object in the to-be-processed video frame (e.g., see paragraphs 0059, 0094, 0099: key point of image; paragraphs 0100, 0114, 0126, 0131: key point of part of image; paragraphs 0139: part of body such as eyes, nose, mouth, eyebrow and face), determining, based on the mapping relationship between the body part and the key point, a key point corresponding to the body part comprised in the second object (e.g., see paragraphs 0139: part of body such as eyes, nose, mouth, eyebrow and face), and determining second position information of the key point corresponding to the body part comprised in the second object in the previous video frame (e.g., see paragraphs 0059, 0094, 0099: key point of image; paragraphs 0100, 0114, 0126, 0131: key point of part of image; paragraphs 0139: part of body such as eyes, nose, mouth, eyebrow and face).
Regarding claim 10, Ren and Vodrahalli disclose all the limitations of claim 8, and are analyzed as previously discussed with respect to that claim.
Furthermore, Ren discloses wherein the extracting a key point from the to-be-processed video frame to obtain first position information of a first key point in the to-be-processed video frame, and extracting a key point from the previous video frame to obtain second position information of a second key point in the previous video frame (e.g., see paragraphs 0059, 0094, 0099: position of key point of image; paragraphs 0100, 0114, 0126, 0131: position of key point) comprises: extracting the key point from the to-be-processed video frame by using a key point detection model to obtain the first position information, and extracting the key point from the previous video frame by using the key point detection model to obtain the second position information (e.g., see paragraphs 0059, 0094, 0099: position of key point of image; paragraphs 0100, 0114, 0126, 0131: position of key point), the key point detection model being obtained through training a training sample, the training sample comprising a plurality of sample images (e.g., see Fig. 6, paragraphs 0092-0098: training image), a sample object in each sample image comprising a body part, and body parts comprised in sample objects in the plurality of sample images comprising various body parts (e.g., see paragraphs 0139: part of body such as eyes, nose, mouth, eyebrow and face).
Regarding claim 11, this claim is a video decoding method claim of a video encoding method version as applied to claim 1 above, wherein the video decoding method performs the same limitations cited in claim 1, the rejections of which are incorporated herein.
Regarding claim 16, this claim is a video compression apparatus claim of a method version as applied to claim 1 above, wherein the video compression apparatus performs the same limitations cited in claim 1, the rejections of which are incorporated herein. Furthermore, Ren discloses the video compression apparatus (see fig. 5, 9-11).
Regarding claim 20, it contains the limitations of claims 8 and 16, and is analyzed as previously discussed with respect to those claims.
Allowable Subject Matter
9. Claims 2-7, 12-15, 17-19 are objected to as being dependent upon a rejected base claims, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
The claims are directed towards a method, and corresponding apparatus including uniquely distinct features or limitations “ “wherein the determining a latent feature based on the to-be-processed video frame and the initial video frame comprises: extracting a feature from the initial video frame by using a feature extractor to obtain a feature vector of the initial video frame, and using the feature vector of the initial video frame as a video frame compression context; and splicing a pixel matrix of the to-be-processed video frame and the video frame compression context, and inputting a first splicing result obtained through the splicing into a context encoder to obtain the latent feature; further comprising: performing probabilistic modeling on the latent feature to obtain a distribution parameter, the distribution parameter configured to represent distribution of different information in the latent feature; and using the distribution parameter to assist in performing arithmetic coding on the latent feature to obtain an encoded latent feature, wherein the performing video compression based on the first position information, the second position information, and the latent feature to obtain a video compressed file comprises: writing the first position information, the second position information, the encoded latent feature, and the distribution parameter into the video compressed file; wherein the performing probabilistic modeling on the latent feature to obtain a distribution parameter comprises: performing hierarchical prior learning on the latent feature to obtain first prior information; performing spatial prior learning on the latent feature to obtain second prior information; performing temporal prior learning on the latent feature to obtain third prior information; and integrating the first prior information, the second prior information, and the third prior information to obtain the distribution parameter; wherein the performing motion estimation based on the first position information and the second position information to obtain motion information of the to-be-processed video frame relative to the previous video frame comprises: performing thin plate spline transformation based on the first position information and the second position information to obtain a thin plate spline transformation matrix; transforming the previous video frame based on the thin plate spline transformation matrix to obtain a transformed image; outputting a contribution graph over a motion network based on the transformed image, the contribution graph configured to represent a contribution of the thin plate spline transformation matrix to motion of each pixel on the previous video frame; and calculating the motion information based on the contribution graph and the thin plate spline transformation matrix; further comprising: splicing the to-be-processed video frame and the previous video frame, and inputting a second splicing result obtained through the splicing into a background motion prediction network to obtain an affine transformation matrix, the affine transformation matrix configured to represent background motion of the to-be-processed video frame relative to the previous video frame, wherein the transforming the previous video frame based on the thin plate spline transformation matrix to obtain a transformed image comprises: transforming the previous video frame by using the thin plate spline transformation matrix and the affine transformation matrix to obtain the transformed image, and wherein the performing video compression based on the first position information, the second position information, and the latent feature to obtain a video compressed file comprises: writing the first position information, the second position information, the latent feature, and the affine transformation matrix into the video compressed file; wherein the outputting a contribution graph over a motion network based on the transformed image comprises: outputting the contribution graph and mask information over the motion network based on the transformed image, and wherein the performing image inpainting based on the motion information and the previous video frame to obtain an initial video frame comprises: performing image inpainting based on the motion information, the mask information, and the previous video frame to obtain the initial video frame; wherein the video compressed file further comprises a distribution parameter, and the latent feature comprised in the video compressed file is a latent feature obtained by arithmetic encoding based on the distribution parameter, and before the performing second inpainting on the initial video frame by using the latent feature to obtain a final video frame, the method further comprises: using the distribution parameter to assist in performing arithmetic decoding on an encoded latent feature to obtain the latent feature.”
The prior art(s) fails to explicitly disclose, suggest or teach the combination of the limitations as recited above, when considered as a whole. The combination of above limitations as presented distinguish the independent claims over the prior art(s), rendering it for allowance. No strong motivation is found to combine the prior arts of the record to teach the combination of said limitations.
Conclusion
13. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ON MUNG whose telephone number is (571) 270-7557 and whose direct fax number is (571) 270-8557. The examiner can normally be reached on Mon-Fri 9am - 6pm (ET).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JAMIE ATALA can be reached on (571)272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ON S MUNG/Primary Examiner, Art Unit 2486