DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 03/16/2026 has been entered.
Response to Arguments
Applicant's arguments filed on 03/16/2026 have been fully considered but they are not persuasive.
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., the cropped frame is encoded without including the portion of the frame outside of the bounding shape) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). As discussed during the interview, Applicant is attempting to claim that the portion of the frame outside of the bounding shape is not encoded. However, the amended claims are merely defining the “cropped frame”. There is no recitation of the portion of the frame outside of the bounding shape not being encoded. Furthermore, Applicant’s specification recites in [0052[: The border determination and object tracking will decide a suitable cropping area, with the non-facial background handled separately, potentially on a lower video quality level. Therefore, the claimed invention does not exclude encoding the portion of the frame outside of the bounding shape. Furthermore, Boyce discloses in [0082], coding of regions of interest at high resolution without coding the full frame provides bit savings and efficiency while providing high quality regions of interest. Boyce shows in FIG. 9, decoded regions of interest without including the background.
Regarding Applicant’s argument, with respect to claim 1, that Boyce does not disclose “removing, based at least on the cropping information, at least a portion of the frame to generate a cropped frame including the at least one feature of interest and without including the portion of the frame outside of the bounding shape,” the examiner respectfully disagrees.
Boyce clearly describes in [0043], cropped images 301-304 (see FIG. 4) are generated without including the portion of the frame outside of the bounding shape (see FIG. 3).
Therefore, Boyce discloses “removing, based at least on the cropping information, at least a portion of the frame to generate a cropped frame including the at least one feature of interest and without including the portion of the frame outside of the bounding shape; and encoding the cropped frame and the cropping information for transmission in at least one bitstream.”
Regarding Applicant’s argument, with respect to claim 12 and claim 17, that Boyce does not disclose "decoding the encoded data to generate decoded data representative of the cropped frame, without including the portion of the frame outside of the bounding shape, and the cropping information, and compositing at least a portion of the cropped frame as foreground in a composited frame at a position determined based at least on the cropping information" and “"generating a composite image based at least on data representative of a cropped image and cropping information corresponding to the cropped image, the composite image being generated, without including a portion of the composite image outside of at least one feature of interest, such that at least a portion of the cropped image is included in the composite image at a position and a relative size determined based at least on the cropping information,” the examiner respectfully disagrees.
Boyce discloses in FIG. 9 and [0063], generating composite frame using only decoded regions of interest as foreground without including the background.
Therefore, Boyce discloses "decoding the encoded data to generate decoded data representative of the cropped frame, without including the portion of the frame outside of the bounding shape, and the cropping information, and compositing at least a portion of the cropped frame as foreground in a composited frame at a position determined based at least on the cropping information" and “generating a composite image based at least on data representative of a cropped image and cropping information corresponding to the cropped image, the composite image being generated, without including a portion of the composite image outside of at least one feature of interest, such that at least a portion of the cropped image is included in the composite image at a position and a relative size determined based at least on the cropping information.”
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-3, 5, 9-10, 12-14, 16-18 and 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Boyce et al (US 20230067541).
As to claim 1, Boyce discloses a computer-implemented method (FIG. 13) comprising:
determining a location of at least one feature of interest depicted in a frame (FIG. 13 and [0076], step 1301; see FIG. 3 and [0042], ROI detector and extractor 111 of patch-based encoder 100 or ROI detector and extractor 211 of patch-based encoder 200 may detect and extract regions of interest 301, 302, 303, 304 … each of regions of interest 301, 302, 303, 304, as indicated by their bounding boxes are located at particular positions within input image or frame 300);
generating a bounding shape corresponding to the at least one feature of interest (see FIG. 3 and [0042], inclusive of indicating a bounding box around each of regions of interest 301, 302, 303, 304);
determining, based at least on the bounding shape, cropping information corresponding to the frame (see [0037], metadata may provide correspondence, for a patch (e.g., region of interest), between the location (and size) of the patch in the input video and the location (and size) of the patch in one of any number of atlases. For each patch, the information may include the following: original image patch position and size (top, left, width, height) and atlas patch position and size (top, left, width, height); see [0043]);
removing, based at least on the cropping information, at least a portion of the frame to generate a cropped frame including the at least one feature of interest and without including the portion of the frame outside of the bounding shape (FIGS. 3-4 and [0043], each detected region of interest 301, 302, 303, 304 [which are cropped frames without including the background] is put into a rectangular patch, and all of the patches are arranged into atlas 400 as shown. For example, atlas constructor 111 of patch-based encoder 100 or atlas constructor 211 of patch-based encoder 200 may generate atlas 400 of FIG. 4 based on the detected faces (and corresponding patches), detected objects, detected features, or the like of input image or frame 300 illustrated in FIG. 3); and
encoding the cropped frame and the cropping information for transmission in at least one bitstream (FIG. 13 and [0079], step 1304).
As to claim 2, Boyce further discloses wherein the cropping information includes at least one of a dimension of the bounding shape or a position of the bounding shape within the frame (see [0037]).
As to claim 3, Boyce further discloses wherein another dimension of another bounding shape is determined for another frame in a same video content stream as the frame, the dimension being different from the another dimension (see FIG. 6 and [0051], [0054]).
As to claim 5, Boyce further discloses wherein the encoding of the cropping information causes a decoder to generate a decoded representation of the at least one feature of interest based at least on the cropping information (FIG. 13 and [0082]-[0083], steps 1305-1306; FIG. 1, decoder 110).
As to claim 9, Boyce further discloses wherein the cropping information includes a position of the bounding shape within the frame (FIG. 3 and [0037]), and wherein the cropping information is used after decoding to position at least a portion of the cropped frame within a composited frame according to the position of the bounding shape within the frame (FIGS. 9-10).
As to claim 10, Boyce further discloses wherein the video content stream is encoded to be at least substantially compliant with at least one video compression standard from a group of video compression standards comprising: H.264/MPEG-4 Advanced Video Coding (“AVC”), H.265/High Efficiency Video Encoding (“HEVC”), VP8, VP9, AV1, Versatile Video Coding (“VVC”), or MPEG-5/Essential Video Compression (“EVC”) (see [0027], [0032]).
As to claim 12, Boyce discloses a system (FIG. 14, system 1400) comprising:
one or more processors (FIG. 14 and [0073], central processor 1401 may implement one or more of an encoder 1411 (e.g., all or portions of patch-based encoder 100 and/or patch-based encoder 200), a decoder 1412 (e.g., all or portions of patch-based encoder 110 and/or patch-based encoder 210), and machine learning module 1413 (e.g., to implement one or both of machine learning system 120 and machine learning system 220)) configured to:
receive encoded data representative of a cropped frame and cropping information corresponding to the cropped frame (FIG. 13 and [0082], step 1305; FIGS. 3-4), the cropped frame being cropped based at least on:
a dimension of a variably sized bounding shape associated with one or more features of a subject depicted using the cropped frame (FIG. 3 and [0037]; see FIG. 6 and [0051], [0054]);
a removal of at least a portion of a frame outside the bounding shape (FIGS. 3-4 and [0043], region of interest 301, 302, 303, 304 [i.e. cropped frames with removed background]); and
decode the encoded data to generate decoded data representative of the cropped frame, without including the portion of the frame outside of the bounding shape (FIGS. 3-4 and 9, region of interests 301, 302, 303, 304 [which are decoded cropped frames without including the background]; see [0082]), and the cropping information (FIG. 13 and [0083], step 1306); and
composite at least a portion of the cropped frame as foreground in a composited frame at a position determined based at least on the cropping information (FIG. 9 and [0063]).
As to claim 13, Boyce further discloses wherein the system comprises at least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing light transport simulation; a system for rendering graphical output; a system using one or more multi-dimensional assets at least partially generated using a collaborative content creation platform; a system implementing digital twin simulation; a system for performing deep learning operations; a system implemented using an edge device; a system incorporating one or more virtual machines (“VMs”); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources (FIG. 14, system 1400; see [0029], [0111]).
As to claim 14, Boyce further discloses wherein the cropping information includes at least one of a dimension of the bounding shape or a position of the bounding shape within an original frame corresponding to the cropped frame (see [0037]).
As to claim 16, Boyce further discloses wherein the position of the cropped frame within the composited frame corresponds to a position of the variably sized bounding shape within an original frame corresponding to the cropped frame (FIGS. 9-10 and [0063], [0065]).
As to claim 17, Boyce discloses a processor FIG. 14 and [0073], central processor 1401 may implement one or more of an encoder 1411 (e.g., all or portions of patch-based encoder 100 and/or patch-based encoder 200), a decoder 1412 (e.g., all or portions of patch-based encoder 110 and/or patch-based encoder 210), and machine learning module 1413 (e.g., to implement one or both of machine learning system 120 and machine learning system 220)) configured to generate a composite image based at least on data representative of a cropped image and cropping information corresponding to the cropped image (FIG. 9 and [0063], image reconstructor 124 may reconstruct an image or frame 900; see [0065] and FIG. 10), the composite image being generated, without including a portion of the composite image outside of at least one feature of interest, such that at least a portion of the cropped image is included in the composite image at a position and a relative size determined based at least on the cropping information (FIG. 9 and [0063], [0082]).
As to claim 18, Boyce further discloses wherein the cropping information includes at least one of a dimension of a bounding shape or a position of the bounding shape within an original frame corresponding to the cropped image (see [0037]).
As to claim 20, Boyce further discloses wherein composite image corresponds to one of a video conferencing application or a gaming application (see [0029], [0111]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 4, 15 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boyce et al (US 20230067541) in view of Choi et al (US 20220417544).
As to claim 4, Boyce fails to explicitly disclose wherein the cropping information is encoded in the at least one bitstream as supplemental enhancement information (“SEI”).
However, Choi teaches wherein the cropping information is encoded in the at least one bitstream as supplemental enhancement information (“SEI”) (FIG. 12, S1220-S1230; FIG. 13, S1310).
At the time before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skills in the art to modify Boyce using Choi’s teachings to encode the cropping information in the at least one bitstream as supplemental enhancement information (“SEI”) in order to assist in the decoding and/or display of the encoded bitstream by decoding the determined independent region irrespective of whether the entire current picture is decoded (Choi; [0022] and [0115]).
As to claim 15, Boyce fails to explicitly disclose wherein the cropping information is included in the encoded data as supplemental enhancement information (“SEI”).
However, Choi teaches wherein the cropping information is included in the encoded data as supplemental enhancement information (“SEI”) (FIG. 12, S1220-S1230; FIG. 13, S1310).
At the time before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skills in the art to modify Boyce using Choi’s teachings to include the cropping information in the encoded data as supplemental enhancement information (“SEI”) in order to assist in the decoding and/or display of the encoded bitstream by decoding the determined independent region irrespective of whether the entire current picture is decoded (Choi; [0022] and [0115]).
As to claim 19, Boyce fails to explicitly disclose wherein the cropping information is received as a supplemental enhancement information (“SEI”) message.
However, Choi teaches wherein the cropping information is received as a supplemental enhancement information (“SEI”) message (FIG. 12, S1220-S1230; FIG. 13, S1310).
At the time before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skills in the art to modify Boyce using Choi’s teachings to receive the cropping information as a supplemental enhancement information (“SEI”) message in order to assist in the decoding and/or display of the encoded bitstream by decoding the determined independent region irrespective of whether the entire current picture is decoded (Choi; [0022] and [0115]).
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boyce et al (US 20230067541) in view of Jouppi et al (US 6785402).
As to claim 6, Boyce fails to explicitly disclose wherein the generating the bounding shape includes applying at least one non-linear exponential equation to determine movement of at least one of the bounding shape or a tracking window corresponding to the at least one feature of interest.
However, Jouppi teaches wherein the generating the bounding shape includes applying at least one non-linear exponential equation to determine movement of at least one of the bounding shape or a tracking window corresponding to the at least one feature of interest (col. 6, lines 38-58).
At the time before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skills in the art to modify Boyce using Jouppi’s teachings to applying at least one non-linear exponential equation to determine movement of at least one of the bounding shape or a tracking window corresponding to the at least one feature of interest in order to, when there is a large change in head position, make the bounding box moves slowly but steadily until it nears the new position and then it settles down more slowly to accurately track the user’s head (Jouppi; col. 5, lines 1-3; col. 6, lines 55-58).
Claim(s) 7-8 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boyce et al (US 20230067541) in view of Zingade et al (US 11748845).
As to claim 7, although Boyce further discloses wherein the generating the bounding shape includes resizing a prior bounding shape corresponding to one or more prior frames (see FIG. 6 and [0051], [0054]), Boyce fails to explicitly disclose wherein the generating the bounding shape includes resizing a prior bounding shape corresponding to one or more prior frames based at least on one or more of a zoom operation or a pan operation with respect to the at least one feature of interest.
However, Zingade teaches wherein the generating the bounding shape includes resizing a prior bounding shape corresponding to one or more prior frames based at least on one or more of a zoom operation or a pan operation with respect to the at least one feature of interest (see FIGS. 2-4; col. 7, lines 39-44; col. 11, line 54 to col 12, line 19; col. 15, lines 17-21).
At the time before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skills in the art to modify Boyce using Zingade’s teachings to include resizing a prior bounding shape corresponding to one or more prior frames based at least on one or more of a zoom operation or a pan operation with respect to the at least one feature of interest in order to enhance the resultant image quality of the modified video segment and improve the video conference (Zingade; col. 8, lines 40-43; col. 15, lines 17-21).
As to claim 8, the combination of Boyce and Zingade further discloses further comprising: maintaining the at least one feature of interest within the bounding shape across a plurality of frames including the frame at least by dynamically resizing the bounding shape for at least one frame of the plurality of frames (Boyce; FIG. 6 and [0051], [0054]).
As to claim 11, although Boyce further discloses further comprising: applying at least a portion of data from the bitstream to machine learning system (FIGS. 1-2, machine learning system 120, 220), Boyce fails to explicitly disclose applying at least a portion of data from the bitstream to a neural network to cause the neural network to perform at least one of video frame inferencing, video frame generation, video frame reconstruction, or adjustment of a two-dimensional or three-dimensional characteristic of an object of interest.
However, Zingade teaches applying at least a portion of data from the bitstream to a neural network to cause the neural network to perform at least one of video frame inferencing, video frame generation, video frame reconstruction, or adjustment of a two-dimensional or three-dimensional characteristic of an object of interest (col. 11, line 54 to col 12, line 19).
At the time before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skills in the art to modify Boyce using Zingade’s teachings to include applying at least a portion of data from the bitstream to a neural network to cause the neural network to perform at least one of video frame inferencing, video frame generation, video frame reconstruction, or adjustment of a two-dimensional or three-dimensional characteristic of an object of interest in order to enhance the resultant image quality of the modified video segment (Zingade; col 12, lines 18-19).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BOUBACAR ABDOU TCHOUSSOU whose telephone number is (571)272-7625. The examiner can normally be reached M-F 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chris Kelley can be reached at 5712727331. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BOUBACAR ABDOU TCHOUSSOU/Primary Examiner, Art Unit 2482