DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 04/03/2024. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Status
Claims 13-14 are interpreted under 112(f).
Claim(s) 1-2, 4-5, 7-10, 12-13, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Criminisi (US 20060193509 A1) in view of Du (US 20200160533 A1).
Claims 3, 6, 11, and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
“computer program product is configured to…” in claim 13
“computer program product is configured to…” in claim 14
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4-5, 7-10, 12-13, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Criminisi (US 20060193509 A1) in view of Du (US 20200160533 A1).
Regarding claim 1, Criminisi discloses A method of background replacement, comprising: (Criminisi: ¶39 “For example, all or a portion of the background may be faded, blurred, replaced with an alternative scene, color modified, and the like, to focus attention on the located object, to conceal distracting or confidential backgrounds,”)
receiving a main image of a scene from a main image sensor and a secondary image of a scene from a secondary image sensor, (Criminisi: ¶33 “a left image 202 is captured by a camera 201 mounted on the left. Likewise, a right image 204 is captured by a second camera 203 mounted on the right.”) wherein the main image sensor and secondary image sensor are displaced relative to each other in at least one dimension; (Criminisi: ¶34 “The disparity map is an array of pixel values which represent the stereo disparity between the left and right images at matching pixels.”)
performing stereo rectification on the main image and secondary image; (Criminisi: ¶75 “The images and/or cameras providing the images may be suitably synchronized, rectified, and/or calibrated.”)
Criminisi fails to specifically disclose inputting the rectified images into a deep neural network; and
applying a deep neural network on the rectified images to generate an alpha matting mask for the main image.
In related art, Du discloses inputting the rectified images into a deep neural network; and (Du: ¶44 “in a CNN based disparity estimation system, first, deep features is extracted from the rectified left and right images.” Du discloses passing the images into a deep learning model)
applying a deep neural network on the rectified images to generate an alpha matting mask for the main image. (Du: ¶45 “ Furthermore, information from other low-level vision tasks such as semantic segmentation” ¶49 “The AMNet 100 may be extended to a foreground-background-aware AM-Net (FBA-AMNet), which utilizes foreground-background segmentation to improve disparity estimation.” Du discloses passing the recitified images to a deep learning model to perform foreground-background segmentation)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate the a deep learning network to perform segmentation disclosed by Du into the method of stereo image capturing and rectification disclosed by Criminsi to separate the foreground and backgound as stated in ¶37 of Criminisi and replace the background as stated in ¶39 of Criminisi.
Regarding claim 2, Criminisi, as modified by Du, disclose wherein the step of applying a deep neural network comprises performing a depth determination on the pixels of the main image based on displacement of the pixels of the secondary image relative to the main image. (Du: ¶38 “Given a rectified stereo image pair, depth estimation may be converted to disparity estimation with camera calibration. For each pixel in one image, disparity estimation finds the shifts between one pixel and its corresponding pixel in the other image on the same horizontal line so that the two pixels are the projections of a similar 3D position.”)
Regarding claim 4, Criminisi, as modified by Du, disclose wherein the alpha matting mask defines a silhouette of a person and objects placed upon or held by the person. (Du: ¶81 “Semantic information such as semantic segmentation maps and semantic boundaries define each object's category and location in one image” ¶126 “the FBA-AMNet (e.g., 600) is designed and trained to generate smoother and more accurate shapes for foreground objects,” Du discloses the the segmentation is designed to highlight the shapes of foreground objects)
Regarding claim 5, Criminisi, as modified by Du, disclose wherein further comprising generating a combined image by applying a new background (Du: ¶37 “ Other applications of accurate depth estimation include three-dimensional (3D) object reconstruction and virtual reality applications, where it is desired to change the background”) to the alpha matting mask. (Du: ¶49 “the AMNet 100 is extended to be a multi-task network, in which the main task is disparity estimation and the auxiliary task is foreground-background segmentation.”)
Regarding claim 7, Criminisi, as modified by Du, disclose wherein further comprising capturing the plurality of images with the image sensors. (Criminisi: ¶33 “a left image 202 is captured by a camera 201 mounted on the left. Likewise, a right image 204 is captured by a second camera 203 mounted on the right.”)
Regarding claim 8, Criminisi, as modified by Du, disclose wherein the image sensors are integrated within a single hardware device. (Criminisi: ¶31 “In a typical video teleconferencing environment, a single video camera is focused on a conference participant,” ¶27 discloses the stereo based image processing system as a single computing device )
Regarding claim 9, Criminisi discloses A system comprising: (Criminisi: ¶2 “FIG. 1 is an example computing system for implementing a stereo-based image processing system;”)
a main image sensor and a secondary image sensor, the secondary image sensor being displaced in at least one dimension relative to the main image sensor; and (Criminisi: ¶33 “a left image 202 is captured by a camera 201 mounted on the left. Likewise, a right image 204 is captured by a second camera 203 mounted on the right.”)
a computer program product comprising instructions, which, when executed by a computer, cause the computer to carry out the following steps: (Criminisi: ¶26 “Although not required, the stereo-based image processing system will be described in the general context of computer-executable instructions”)
receiving a main image of a scene from the main image sensor and a secondary image of the scene from the secondary image sensor; (Criminisi: ¶33 “a left image 202 is captured by a camera 201 mounted on the left. Likewise, a right image 204 is captured by a second camera 203 mounted on the right.”)
performing stereo rectification on the images; (Criminisi: ¶75 “The images and/or cameras providing the images may be suitably synchronized, rectified, and/or calibrated.”)
Criminisi fails to specifically disclose inputting the rectified images into a deep neural network; and
applying a deep neural network on the rectified images to generate an alpha matting mask for the main image.
In related art, Du discloses inputting the rectified images into a deep neural network; and (Du: ¶44 “in a CNN based disparity estimation system, first, deep features is extracted from the rectified left and right images.” Du discloses passing the images into a deep learning model)
applying a deep neural network on the rectified images to generate an alpha matting mask for the main image. (Du: ¶45 “ Furthermore, information from other low-level vision tasks such as semantic segmentation” ¶49 “The AMNet 100 may be extended to a foreground-background-aware AM-Net (FBA-AMNet), which utilizes foreground-background segmentation to improve disparity estimation.” Du discloses passing the recitified images to a deep learning model to perform foreground-background segmentation)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate the a deep learning network to perform segmentation disclosed by Du into the method of stereo image capturing and rectification disclosed by Criminsi to separate the foreground and backgound as stated in ¶37 of Criminisi and replace the background as stated in ¶39 of Criminisi.
Regarding claim 10, Criminisi, as modified by Du, disclose wherein the instructions further include, within the deep neural network, performing a depth determination on the pixels of the main image based on displacement of the pixels of the secondary image relative to the main image. (Du: ¶38 “Given a rectified stereo image pair, depth estimation may be converted to disparity estimation with camera calibration. For each pixel in one image, disparity estimation finds the shifts between one pixel and its corresponding pixel in the other image on the same horizontal line so that the two pixels are the projections of a similar 3D position.”)
Regarding claim 12, Criminisi, as modified by Du, disclose wherein the alpha matting mask defines a silhouette of a person and objects placed upon or held by the person. (Du: ¶81 “Semantic information such as semantic segmentation maps and semantic boundaries define each object's category and location in one image” ¶126 “the FBA-AMNet (e.g., 600) is designed and trained to generate smoother and more accurate shapes for foreground objects,” Du discloses the the segmentation is designed to highlight the shapes of foreground objects)
Regarding claim 13, Criminisi, as modified by Du, disclose wherein the computer program product is configured to generate the combined image by applying a new background (Du: ¶37 “ Other applications of accurate depth estimation include three-dimensional (3D) object reconstruction and virtual reality applications, where it is desired to change the background”) to the alpha matting mask. (Du: ¶49 “the AMNet 100 is extended to be a multi-task network, in which the main task is disparity estimation and the auxiliary task is foreground-background segmentation.”)
Regarding claim 15, Criminisi, as modified by Du, disclose wherein the image sensors are integrated within a single hardware device. (Criminisi: ¶31 “In a typical video teleconferencing environment, a single video camera is focused on a conference participant,” ¶27 discloses the stereo based image processing system as a single computing device )
Allowable Subject Matter
Claims 3, 6, 11, and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Jampani (US 12260572 B2) discloses A method includes determining, based on an image having an initial viewpoint, a depth image, and determining a foreground visibility map including visibility values that are inversely proportional to a depth gradient of the depth image. The method also includes determining, based on the depth image, a background disocclusion mask indicating a likelihood that pixel of the image will be disoccluded by a viewpoint adjustment. The method additionally includes generating, based on the image, the depth image, and the background disocclusion mask, an inpainted image and an inpainted depth image. The method further includes generating, based on the depth image and the inpainted depth image, respectively, a first three-dimensional (3D) representation of the image and a second 3D representation of the inpainted image, and generating a modified image having an adjusted viewpoint by combining the first and second 3D representation based on the foreground visibility map.
Puri (US 20220101047 A1) discloses a background of an object may be modified to generate a training image. A segmentation mask may be generated and used to generate an object image that includes image data representing the object. The object image may be integrated into a different background and used for data augmentation in training a neural network. Data augmentation may also be performed using hue adjustment (e.g., of the object image) and/or rendering three-dimensional capture data that corresponds to the object from selected views. Inference scores may be analyzed to select a background for an image to be included in a training dataset. Backgrounds may be selected and training images may be added to a training dataset iteratively during training (e.g., between epochs). Additionally, early or late fusion nay be employed that uses object mask data to improve inferencing performed by a neural network trained using object mask data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL KIM MAIDEN whose telephone number is (703)756-1264. The examiner can normally be reached Monday - Friday 7:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Koziol can be reached at 4089187630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL KIM MAIDEN/Examiner, Art Unit 2665
/Stephen R Koziol/Supervisory Patent Examiner, Art Unit 2665