DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/09/2025 has been entered.
Status of Claims
Applicant’s Amendments filed on 12/09/2025 has been entered and made of record.
Currently pending Claim(s)
1–20
Independent Claim(s)
1, 10, 14
Amended Claim(s)
1, 5–6, 10–12, 14
Response to Arguments
This office action is responsive to Applicant’s Arguments/Remarks Made in an Amendment received on 12/09/2025.
Applicant’s Arguments/Remarks (December 9, 2025) include substantive amendments to the claims. This Office action has been updated with new grounds of rejection addressing those amendments. Further Applicant’s Arguments/Remarks with respect to independent claims 1, 10, and 14, on pages 1–3, have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection and the arguments are now rejected by newly cited art ‘Sun et al. (“Deep video matting via spatio-temporal alignment and aggregation”)’ and ‘Chen et al. (Robust multi-focus image fusion using edge model and multi-matting)’. Furthermore, the dependent claims are rejected with 103 rejections based on the new 103 rejections from the independent claims.
Furthermore, Applicant argues, on page 2, that Cahill “generates a binary mask for compositing, not an alpha matte with continuous alpha values.” However, the Examiner disagrees. Cahill (‘US 2004/0062439 A1’) teaches the alignment of features using information from the boundary regions between the foreground object of interest and the background [¶0048] while Sun (“Deep video matting via spatio-temporal alignment and aggregation”) teaches aligning of features and generating an alpha matte [Figure 3; pg. 6979, left column, Temporal Feature Alignment Module, second paragraph; right column, Feature Fusion Module, second paragraph] and Chen (“Robust multi-focus image fusion using edge model and multi-matting”) teaches the continuous alpha values [pg. 1530, left column, 1) Image Matting and Application in Multi-Focus Fusion, first paragraph].
Positive Statement regarding 35 U.S.C. 101: Claims 1–9 are determined to be eligible under 35 U.S.C. 101. These claims recite “One or more computer storage media storing computer-usable instructions...” Specification at page 26 excludes the computer storage media from being a transitory embodiment. Page 26 at paragraph [0062] states: “Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 1300. Computer storage media does not comprise signals per se.” [Emphasis added]
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3–8, 10, 14, and 16–20 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. (Sun, Yanan, et al. "Deep video matting via spatio-temporal alignment and aggregation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021) (hereafter, “Sun”) in view of Cahill (US 2004/0062439 A1) (hereafter, “Cahill”) and further in view of Chen et al. (Chen, Yibo, Jingwei Guan, and Wai-Kuen Cham. "Robust multi-focus image fusion using edge model and multi-matting." IEEE Transactions on Image Processing 27.3 (2017): 1526-1541.) (hereafter, “Chen”).
Regarding claim 1, Sun discloses receiving an image burst comprising a set of images of a scene having a foreground object of interest and a background [we create a new video (the examiner interprets a video to be an image burst) matting dataset, which is composed of real foreground videos, their groundtruth alpha mattes, and background videos of a great variety of natural and real-life scenes ... for the test set, we similarly combine each object from 50 images and 12 videos with 4 background videos, thus generating 248 test samples., pg. 6973, 3.1 Composited Dataset, left column, first paragraph; right column, third paragraph]; selecting a reference image from the set of images for matte generation, wherein the foreground object of interest is present in the reference image [Figure 2; after generating coarse trimaps for the target frame, pg. 6974, right column, 4.2 Encoder-Decoder Network, first paragraph ... our video matting framework. The lightweight trimap propagation network generates trimap of target frame It, pg. 6973, Figure 2 citation of 2.2 Video Matting]; aligning features of the reference image with features of other images from the set of images to provide aligned features [wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image] [Figure 3; we make our model aware of motion information by aligning the features of neighboring frames with features of target frame, pg. 6975, left column, Temporal Feature Alignment Module, second paragraph]; and generating, using the aligned features between the reference image and the other images, an alpha matte for the reference image [that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground object of interest and removes the background when applied to the reference image] [Figure 2 & 3; we apply a prediction head, composed of a 3 x 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Sun fails to explicitly disclose one or more computer storage media storing computer-usable instructions that, when used by a computing device, cause the computing device to perform operations, the operations comprising: [aligning features of the reference image with features of other images from the set of images to provide aligned features] wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image; [and generating, using the aligned features between the reference image and the other images, an alpha matte for the reference image] that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image.
However, Cahill teaches [aligning features of the reference image with features of other images from the set of images to provide aligned features] wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image [ambiguities can develop in boundary regions between foreground and background. These contour areas are refined (the examiner interprets the refined contour areas to be aligning) using the full resolution image 102 and image 116 by generating a band of boundary pixels, para 0048].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun’s reference and incorporate the teachings of Cahill to effectively distinguish the foreground from the background, as recognized by Cahill.
Neither Sun nor Cahill appears to explicitly disclose one or more computer storage media storing computer-usable instructions that, when used by a computing device, cause the computing device to perform operations; [generating, using the aligned features between the reference image and the other images, an alpha matte for the reference image] that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image.
However, Chen teaches one or more computer storage media storing computer-usable instructions that, when used by a computing device, cause the computing device to perform operations [all experiments were performed using Matlab R2013b on a computer equipped with a 3.10 GHz CPU and 8 GB memory, pg. 1539, right column, 4) Computational Efficiency Analysis, first paragraph]; [and generating, using the aligned features between the reference image and the other images, an alpha matte for the reference image] that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background [foreground and background will be represented as layers Ip1 and Ip2 respectively ... where α is the opacity of layer Ip1 called alpha matte. α(i) = 1 or 0 means the i-th pixel belongs to layer Ip1 or layer Ip2, respectively, pg. 1530, left column, 1) Image Matting and Application in Multi-Focus Fusion, first paragraph], such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image [Figure 3b; Fig. 3b focuses in foreground (a small clock), pg. 1529, left column, III. Problem Formulation and Proposed Approach, second paragraph ... the focused region can only be accurately extracted from Ik, pg. 1530, right column, 2) Proposed Multi-Matting Model, second paragraph].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun’s reference in view of Cahill and incorporate the teachings of Chen with alpha values to discriminate between regions, as recognized by Chen.
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Chen with Sun and Cahill to obtain the invention as specified in claim 1.
Regarding claim 3, which claim 1 is incorporated, Sun discloses wherein aligning the feature of the reference image with the features of the other images comprises: causing an encoder to generate a feature map for the reference image and features maps for the other images [our deep video matting framework employs an effective auto encoder-decoder structure to extract features of multiple image-trimap pairs. We first apply an encoder network to extract both low-level structural features and high-level semantic features of pixels, pg. 6974, right column, 4.2. Encoder-Decoder Network, first paragraph]; and causing a machine learning model to generate the aligned features using the feature map for the reference image and the feature maps for the other images [we make our model aware of motion information by aligning features of neighboring frames with features of target frame, pg. 6975, left column, Temporal Feature Alignment Module, second paragraph]; and wherein generating the alpha matte for the reference image comprises causing a decoder to generate the alpha matte using the aligned features [Figure 2; the spatio-temporal feature aggregation module (ST-FAM) analyzes and fuses information at different levels from neighboring frames {It−n, ··· , It+n} and progressively outputs aggregated features to be fed into corresponding layers of the decoder, pg. 6973, Figure 2 citation of section 2.2. Video Matting ... after deriving useful features aggregated in both spatial and temporal dimensions from our decoder, we apply a prediction head, composed of a 3 × 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Regarding claim 4, which claim 1 is incorporated, Sun discloses wherein aligning the features of the reference image with the features of the other images comprises: generating a preliminary matte for each image from the set of images [Figure 2-4; the trimap propagation module predicts the trimap for a target frame, pg. 6974, left column, 4. Method, first paragraph ... this encoder network receives multiple frames with corresponding propagated trimaps, pg. 6975, left column, 4.2 Encoder-Decoder Network, first paragraph]; and aligning features of the preliminary matte for the reference image and features of the preliminary matte for the other images [we not only need to consider global context information and local detailed structural information in a single frame, but also need to incorporate motion information of moving pixels by utilizing temporal information from neighboring features pg. 6975, left column, 4.3. Spatio-Temporal Feature Aggregation Module, first paragraph ... we make our model aware of motion information by aligning features of neighboring frames with features of target frame, pg. 6975, left column, Temporal Feature Alignment Module, second paragraph].
Regarding claim 5, which claim 1 is incorporated, Sun discloses identifying the boundary regions in the reference image and the boundary regions in the other images [Figure 2; these aggregated features are decoded to classify all pixels into three categories, i.e., foreground, background or unknown (the examiner interprets unknown to be boundary regions), through a classification head, pg. 6974, right column, 4.1. Trimap Propagation, first paragraph].
Sun fails to explicitly disclose wherein the aligned features are from comparison of the boundary regions in the reference image with the boundary regions in the other images.
However, Cahill teaches identifying the boundary regions in the reference image and the boundary regions in the other images [overlap certain portions of the first and second images in the display 904 to enhance the match up process, para 0051], wherein the aligned features are from comparison of the boundary regions in the reference image with the boundary regions in the other images [not only the zone borders but all or a portion of the second peripheral zone 910 can be used to help with the match, para 0051].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun and incorporate the teachings of Cahill to effectively distinguish the foreground from the background, as recognized by Cahill.
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Chen with Sun and Cahill to obtain the invention as specified in claim 5.
Regarding claim 6, which claim 5 is incorporated, Sun discloses wherein the boundary regions in the reference image and the boundary regions in the other images are determined using a trimap [Figure 2; the trimap propagation module predicts the trimap for a target frame, pg. 6974, left column, 4. Method, first paragraph ... our deep video matting framework employs an effective auto encoder-decoder structure to extract features of multiple image-trimap pairs, pg. 6974, right column, 4.2. Encoder-Decoder Network, first paragraph].
Regarding claim 7, which claim 1 is incorporated, Sun discloses wherein generating the alpha matte for the reference image using the aligned features comprises: generating a background image using the aligned features [learning offset and aligning features between t and t + Δt enable our model to automatically map identical or similar regions and pixels, pg. 6975, right column, Temporal Feature Alignment Module, first paragraph]; and generating the alpha matte using the reference image and the background image [after deriving useful features aggregated in both spatial and temporal dimensions from our decoder, we apply a prediction head, composed of a 3 × 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Regarding claim 8, which claim 1 is incorporated, Sun discloses wherein generating the alpha matte for the reference image using the aligned features comprises: generating a foreground image using the aligned features [learning offset and aligning features between t and t + Δt enable our model to automatically map identical or similar regions and pixels, pg. 6975, right column, Temporal Feature Alignment Module, first paragraph]; and generating the alpha matte using the reference image and the foreground image [after deriving useful features aggregated in both spatial and temporal dimensions from our decoder, we apply a prediction head, composed of a 3 × 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Regarding claim 10, Sun discloses receiving an image burst comprising a set of images of a scene having a foreground object of interest and a background [we create a new video (the examiner interprets a video to be an image burst) matting dataset, which is composed of real foreground videos, their groundtruth alpha mattes, and background videos of a great variety of natural and real-life scenes ... for the test set, we similarly combine each object from 50 images and 12 videos with 4 background videos, thus generating 248 test samples., pg. 6973, 3.1 Composited Dataset, left column, first paragraph; right column, third paragraph selecting a reference image from the set of images for matte generation, wherein the foreground object of interest is present in the reference image [Figure 2; after generating coarse trimaps for the target frame, pg. 6974, right column, 4.2 Encoder-Decoder Network, first paragraph ... our video matting framework. The lightweight trimap propagation network generates trimap of target frame It, pg. 6973, Figure 2 citation of 2.2 Video Matting]; generating a background reconstruction from the set of images using feature alignment information generated by aligning portions of the reference image with portions of other images from the sets of images [wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image] [Figure 3; we make our model aware of motion information by aligning the features of neighboring frames with features of target frame, pg. 6975, left column, Temporal Feature Alignment Module, second paragraph ... learning offset and aligning features between t and t + Δt enable our model to automatically map identical or similar regions and pixels, pg. 6975, right column, Temporal Feature Alignment Module, first paragraph]; and generating an alpha matte for a reference image from the set of images using the reference image and the background reconstruction [wherein the alpha matte comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground object of interest and removes the background when applied to the reference image] [Figure 2 & 3; we apply a prediction head, composed of a 3 x 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Sun fails to explicitly disclose a computer-implemented method comprising: [generating a background reconstruction from the set of images using feature alignment information generated by aligning portion of the reference image with portions of other images from the set of images,] wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image; [and generating an alpha matte for a reference image from the set of images using the reference image and the background reconstruction] wherein the alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image.
However, Cahill teaches [generating a background reconstruction from the set of images using feature alignment information generated by aligning portion of the reference image with portions of other images from the set of images,] wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image [ambiguities can develop in boundary regions between foreground and background. These contour areas are refined (the examiner interprets the refined contour areas to be aligning) using the full resolution image 102 and image 116 by generating a band of boundary pixels, para 0048]..
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun and incorporate the teachings of Cahill to effectively distinguish the foreground from the background, as recognized by Cahill.
Neither Sun nor Cahill appears to explicitly disclose a computer-implemented method comprising: [generating an alpha matte for a reference image from the set of images using the reference image and the background reconstruction] wherein the alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image.
However, Chen teaches a computer-implemented method [all experiments were performed using Matlab R2013b on a computer equipped with a 3.10 GHz CPU and 8 GB memory, pg. 1539, right column, 4) Computational Efficiency Analysis, first paragraph] comprising: [generating an alpha matte for a reference image from the set of images using the reference image and the background reconstruction] wherein the alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, [foreground and background will be represented as layers Ip1 and Ip2 respectively ... where α is the opacity of layer Ip1 called alpha matte. α(i) = 1 or 0 means the i-th pixel belongs to layer Ip1 or layer Ip2, respectively, pg. 1530, left column, 1) Image Matting and Application in Multi-Focus Fusion, first paragraph], such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image [Figure 3b; Fig. 3b focuses in foreground (a small clock), pg. 1529, left column, III. Problem Formulation and Proposed Approach, second paragraph ... the focused region can only be accurately extracted from Ik, pg. 1530, right column, 2) Proposed Multi-Matting Model, second paragraph].
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Chen with Sun and Cahill to obtain the invention as specified in claim 10.
Regarding claim 14, Sun discloses receiving an image burst comprising a set of images of a scene having a foreground object of interest and a background, the set of images including a reference image and plurality of burst images [we create a new video (the examiner interprets a video to be an image burst) matting dataset, which is composed of real foreground videos, their groundtruth alpha mattes, and background videos of a great variety of natural and real-life scenes ... for the test set, we similarly combine each object from 50 images and 12 videos with 4 background videos, thus generating 248 test samples., pg. 6973, 3.1 Composited Dataset, left column, first paragraph; right column, third paragraph], wherein the foreground object of interest is present in the reference image [Figure 2; our video matting framework. The lightweight trimap propagation network generates trimap of target frame It, pg. 6973, Figure 2 citation of 2.2 Video Matting]; determining feature alignment information by aligning portions of the reference image with portions of the burst images [wherein the aligning comprises using information from boundary regions between the foreground object of interest and the in the burst images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image] [Figure 3; we make our model aware of motion information by aligning the features of neighboring frames with features of target frame, pg. 6975, left column, Temporal Feature Alignment Module, second paragraph]; and generating, using the feature alignment information between the reference image and the burst images, an alpha matte for the reference image [that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground object of interest and removes the background when applied to the reference image] [Figure 2 & 3; we apply a prediction head, composed of a 3 x 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Sun fails to explicitly disclose a computer system comprising: one or more processors; and one or more computer storage media storing computer-useable that, when used by the one or more processors, causes the one or more processors to perform operations comprising: [aligning features of the reference image with features of other images from the set of images to provide aligned features] wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image; [and generating, using the aligned features between the reference image and the other images, an alpha matte for the reference image] that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image.
However, Cahill teaches [aligning features of the reference image with features of other images from the set of images to provide aligned features] wherein the aligning comprises using information from boundary regions between the foreground object of interest and the background in the other images to resolve ambiguities in boundary regions between the foreground object of interest and the background in the reference image [ambiguities can develop in boundary regions between foreground and background. These contour areas are refined (the examiner interprets the refined contour areas to be aligning) using the full resolution image 102 and image 116 by generating a band of boundary pixels, para 0048].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun’s reference and incorporate the teachings of Cahill to effectively distinguish the foreground from the background, as recognized by Cahill.
Neither Sun nor Cahill appears to explicitly disclose a computer system comprising: one or more processors; and one or more computer storage media storing computer-useable that, when used by the one or more processors, causes the one or more processors to perform operations comprising: [generating, using the aligned features between the reference image and the other images, an alpha matte for the reference image] that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background, such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image.
However, Chen teaches a computer system comprising: one or more processors [a computer equipped with a 3.10 GHz, pg. 1539, right column, 4) Computational Efficiency Analysis, first paragraph]; and one or more computer storage media storing computer-useable that, when used by the one or more processors, causes the one or more processors to perform operations [all experiments were performed using Matlab R2013b on a computer equipped with a 3.10 GHz CPU and 8 GB memory, pg. 1539, right column, 4) Computational Efficiency Analysis, first paragraph] comprising: [generating, using the aligned features between the reference image and the other images, an alpha matte for the reference image] that comprises alpha values indicating high opacity for pixels of the reference image for the foreground object of interest and alpha values indicating low opacity for pixels of the reference image for the background [foreground and background will be represented as layers Ip1 and Ip2 respectively ... where α is the opacity of layer Ip1 called alpha matte. α(i) = 1 or 0 means the i-th pixel belongs to layer Ip1 or layer Ip2, respectively, pg. 1530, left column, 1) Image Matting and Application in Multi-Focus Fusion, first paragraph], such that the alpha matte retains the foreground of interest and removes the background when applied to the reference image [Figure 3b; Fig. 3b focuses in foreground (a small clock), pg. 1529, left column, III. Problem Formulation and Proposed Approach, second paragraph ... the focused region can only be accurately extracted from Ik, pg. 1530, right column, 2) Proposed Multi-Matting Model, second paragraph].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun’s reference in view of Cahill and incorporate the teachings of Chen with alpha values to discriminate between regions, as recognized by Chen.
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Chen with Sun and Cahill to obtain the invention as specified in claim 14.
Regarding claim 16, which claim 14 is incorporated, Sun discloses wherein determining the feature alignment information comprises: generating feature maps for the reference image and the burst images using an encoder [Figure 2; our deep video matting framework employs an effective auto encoder-decoder structure to extract features of multiple image-trimap pairs. We first apply an encoder network to extract both low-level structural features and high-level semantic features of pixels, pg. 6974, right column, 4.2. Encoder-Decoder Network, first paragraph]; and generating the feature alignment information using a first machine learning network and the feature maps [Figure 2; we make our model aware of motion information by aligning features of neighboring frames with features of target frame, pg. 6975, left column, Temporal Feature Alignment Module, second paragraph].
Regarding claim 17, which claim 14 is incorporated, Sun discloses wherein determining the feature alignment information comprises: generating a preliminary matte for the reference image and the burst image [Figure 2-4; the trimap propagation module predicts the trimap for a target frame, pg. 6974, left column, 4. Method, first paragraph ... this encoder network receives multiple frames with corresponding propagated trimaps, pg. 6975, left column, 4.2 Encoder-Decoder Network, first paragraph]; and aligning features of the preliminary matte for the reference image and features of the preliminary matte for the burst images [we not only need to consider global context information and local detailed structural information in a single frame, but also need to incorporate motion information of moving pixels by utilizing temporal information from neighboring features pg. 6975, left column, 4.3. Spatio-Temporal Feature Aggregation Module, first paragraph ... we make our model aware of motion information by aligning features of neighboring frames with features of target frame, pg. 6975, left column, Temporal Feature Alignment Module, second paragraph].
Regarding claim 18, which claim 14 is incorporated, Sun discloses wherein the portions of the reference image and the portions of the burst images are determined using a trimap Figure 2; the trimap propagation module predicts the trimap for a target frame, pg. 6974, left column, 4. Method, first paragraph ... our deep video matting framework employs an effective auto encoder-decoder structure to extract features of multiple image-trimap pairs, pg. 6974, right column, 4.2. Encoder-Decoder Network, first paragraph.
Regarding claim 19, which claim 14 is incorporated, Sun discloses generating a background image using the feature alignment information [learning offset and aligning features between t and t + Δt enable our model to automatically map identical or similar regions and pixels, pg. 6975, right column, Temporal Feature Alignment Module, first paragraph]; and generating the matte using the reference image and the background image [after deriving useful features aggregated in both spatial and temporal dimensions from our decoder, we apply a prediction head, composed of a 3 × 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Regarding claim 20, which claim 14 is incorporated, Sun discloses generating a foreground image using the feature alignment information [learning offset and aligning features between t and t + Δt enable our model to automatically map identical or similar regions and pixels, pg. 6975, right column, Temporal Feature Alignment Module, first paragraph]; and generating the matte using the reference image and the foreground image [after deriving useful features aggregated in both spatial and temporal dimensions from our decoder, we apply a prediction head, composed of a 3 × 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Claims 2, 9, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Sun (“Deep video matting via spatio-temporal alignment and aggregation") in view of Cahill (US 2004/0062439 A1) and further in view of Chen ("Robust multi-focus image fusion using edge model and multi-matting"), as applied above, and further in view of Bhat et al. (“Deep burst super-resolution." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021) (hereafter, “Bhat”).
Regarding claim 2, which claim 1 is incorporated, Sun discloses wherein generating the alpha matte for the reference image comprises causing a second machine learning model to generate the alpha matte using the reference image and the aligned features [Figure 3; ST-FAM is composed of temporal feature alignment (TFA) and temporal feature fusion (TFF), which are respectively responsible for aligning and aggregating features of different frames, pg. 6974, Figure 3 citation of section 3.2. Real-World High-Resolution Videos ... after deriving useful features aggregated in both spatial and temporal dimensions from our decoder, we apply a prediction head, composed of a 3 × 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Neither Sun, Cahill, nor Chen appears to explicitly disclose wherein aligning the features of the reference image with the features of the other images comprises causing a first machine learning model to generate the aligned features using the reference image and the other images.
However, Bhat discloses wherein aligning the features of the reference image with the features of the other images comprises causing a first machine learning model to generate the aligned features using the reference image and the other images [Figure 2; explicitly aligning the individual image embeddings ei to a common reference LR image, called the base frame ... we allow greater flexibility in our alignment module by computing dense pixel-wise optical flow fi ∈ R^W/2 × H/2 ×2 between every burst image ˜bi and the reference image ˜b1 ... we use a state-of-the-art optical flow network PWC-Net [38] as our flow estimator F due to it’s high accuracy and speed, pg. 9211, 3.2. Alignment Module].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun in view of Cahill and further in view of Chen and incorporate the teachings of Bhat with a first machine learning model to resolve the displacement between the image frames and improve the final result, as recognized by Bhat.
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Bhat with Sun, Cahill, and Chen to obtain the invention as specified in claim 2.
Regarding claim 9, which claim 1 is incorporated, neither Sun, Cahill, nor Chen appears to explicitly disclose wherein the set of images comprises raw images.
However, Bhat discloses wherein the set of images comprises raw images [Figure 2; Our network inputs multiple noisy, RAW, low-resolution (LR) images captured in a single burst, pg. 9211, 3. Burst Super-Resolution Network].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun in view of Cahill and further in view of Chen and incorporate the teachings of Bhat to generate a high-resolution image as the output, as recognized by Bhat.
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Bhat with Sun, Cahill, and Chen to obtain the invention as specified in claim 9.
Regarding claim 13, (drawn to a computer-implemented method) the proposed combination of Sun in view of Cahill and further in view of Chen and Bhat explained in the rejection of computer storage media claim 9 renders obvious the steps of the computer-implemented method claim 13, because these steps occur in the operation of the computer storage media as discussed above. Thus, the arguments similar to that presented above for claim 9 is equally applicable to claim 13.
Regarding claim 15, which claim 14 is incorporated, Sun discloses wherein generating the alpha matte for the reference image comprises generating the alpha matte using a second machine learning model [Figure 3; ST-FAM is composed of temporal feature alignment (TFA) and temporal feature fusion (TFF), which are respectively responsible for aligning and aggregating features of different frames, pg. 6974, Figure 3 citation of section 3.2. Real-World High-Resolution Videos ... after deriving useful features aggregated in both spatial and temporal dimensions from our decoder, we apply a prediction head, composed of a 3 × 3 convolution and a sigmoid function, to generate the alpha matte for target frame, pg. 6975, right column, Temporal Feature Fusion Module, second paragraph].
Neither Sun, Cahill, nor Chen appears to explicitly disclose wherein determining the feature alignment information comprises generating the feature alignment information using a first machine learning model.
However, Bhat discloses wherein determining the feature alignment information comprises generating the feature alignment information using a first machine learning model [Figure 2; explicitly aligning the individual image embeddings ei to a common reference LR image, called the base frame ... we allow greater flexibility in our alignment module by computing dense pixel-wise optical flow fi ∈ R^W/2 × H/2 ×2 between every burst image ˜bi and the reference image ˜b1 ... we use a state-of-the-art optical flow network PWC-Net [38] as our flow estimator F due to it’s high accuracy and speed, pg. 9211, 3.2. Alignment Module].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun in view of Cahill and further in view of Chen incorporate the teachings of Bhat with a first machine learning model to align the deep feature encodings of each image, as recognized by Bhat.
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Bhat with Sun, Cahill, and Chen to obtain the invention as specified in claim 15.
Claims 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Sun (“Deep video matting via spatio-temporal alignment and aggregation.") in view of Cahill (US 2004/0062439 A1) and further in view of Chen ("Robust multi-focus image fusion using edge model and multi-matting"), as applied above, and further in view of Goel (US 2021/0166400 A1).
Regarding claim 11, which claim 10 is incorporated, neither Sun, Cahill, nor Chen appears to explicitly disclose wherein the background reconstruction is generated for the boundary regions between the foreground object and the background in the reference image.
However, Goel teaches wherein the background reconstruction is generated for the boundary regions between the foreground object and the background in the reference image [Figure 1; the image processing system (102) generates a tri-map (107) for each of the one or more objects (104) in the image (103) ... the tri-map (107) of the image (103) is indicative of a definite background (i.e., black color), a definite foreground (i.e., white color) and an unknown region (i.e., grey color) (the examiner interprets the unknown region to be a boundary between a foreground object and background), para 0046].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Sun in view of Cahill and further in view of Chen and incorporate the teachings of Goel to determine whether each pixel belongs to the foreground or the background, as recognized by Goel.
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Goel with Sun, Cahill, and Chen to obtain the invention as specified in claim 11.
Regarding claim 12, which claim 11 is incorporated, Sun discloses wherein the boundary regions of the reference image are based on a trimap for the reference image [Figure 2; the trimap propagation module predicts the trimap for a target frame, pg. 6974, left column, 4. Method, first paragraph ... our deep video matting framework employs an effective auto encoder-decoder structure to extract features of multiple image-trimap pairs, pg. 6974, right column, 4.2. Encoder-Decoder Network, first paragraph].
Conclusion
The art made of record and not relied upon is considered pertinent to applicant's disclosure:
GB 2557417 A to Zhaowen et al. discloses an image alignment for burst mode images using detected feature points and finding matching feature point pairs to generate an aligned image.
Video Matting via Consistency-Regularized Graph Neural Networks to Wang et al. discloses a temporal coherence enhancement through Consistency-Regularized Graph Neural Networks (CRGNN) with a video matting dataset.
US 8,611,728 B2 to Bhagavathy et al. discloses a method for propagating foreground-background constraint information for a first video frame to subsequent frames to extract foreground objects by using the estimated alpha matte for each frame.
US 7,834,894 B2 to Swanson et al. discloses a method and apparatus for background replacement in images by using an alpha mask.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TOLUWANI MARY-JANE IJASEUN whose telephone number is (571)270-1877. The examiner can normally be reached Monday - Friday 7:30AM-4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henok Shiferaw can be reached at (571) 272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TOLUWANI MARY-JANE IJASEUN/Examiner, Art Unit 2676
/Henok Shiferaw/Supervisory Patent Examiner, Art Unit 2676