DETAILED ACTION
Status of the Claims
Original claims 1-20 filed December 27, 2022, are pending.
Deferred Examination
Examination was deferred for 36 months from filing per Applicant’s request.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 6, 2023, is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim(s) 7 and 15 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The term “deep” in claim 7 is a relative term which renders the claim indefinite. The term “deep” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
Claim 7 recites that “the first neural network is a deep learning-based neural network.” Thus claim 7 seeks to further limit the first neural network so that its scope does not cover neural networks in general, but only “deep learning-based” neural networks in particular. “Deep learning” is a term of art that generally refers to the number of layers in a neural network. All neural networks rely on learning. Neural networks are “deeper” as they include more layers, and vice versa. A neural network is “deep learning-based” if it has enough layers (i.e., enough depth) to be considered “deep”. The determination of whether a given neural network has enough layers to be considered “deep” is fundamentally subjective and relative. I.e., one person of ordinary skill in the art may consider a given neural network to be “deep”, while another person of ordinary skill in the art may consider the same neural network to be relatively shallow and not deep. As an illustrative example, the well-known AlexNet neural network has 8 layers and was considered “deep” at the time of its introduction, but is much shallower than other well-known neural networks introduced later, such as GoogleNet (22 layers) and ResNet (can be configured with 34, 50, 101, or 152 layers).
The specification does not provide any standard for determining whether a given neural network specifically qualifies as being “deep learning-based”. For example, par. [0025] (as published) states that the multi-view enhancement network 104 (i.e., the claimed first neural network) may be a RNN, a CNN, or a GAN, “for example, among other types of deep-learning networks.” This only provides examples of deep-learning-based neural networks and does not provide a standard for one of ordinary skill in the art to determine whether a given neural network qualifies as being “deep learning-based”. This ambiguity makes the scope of the claim unclear and renders the claim indefinite.
Examiner stresses that this issue is caused by the claims attempting to draw a distinction between neural networks in general and deep-learning-based neural networks in particular. The claim is indefinite because this distinction is relative and subjective and the specification does not provide a standard for making it.
Claim 15 is also indefinite for substantially the same reasons as claim 7.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 7-10, and 15-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over ‘Lee’ (US 2023/0360177 A1) in view of ‘Bleibel’ (US 2017/0358092 A1).
Regarding claim 1, Lee teaches a method for generating multi-camera background mattes for video (e.g., Figs. 1-2 and 4; see Note Regarding Multiple Cameras below), the method comprising:
receiving, by a first neural network (e.g., Fig. 3E, alpha-trimap refinement network 348) trained to generate alpha mattes (e.g., [0050], first network 348 outputs a refined alpha matte) and foreground estimates (e.g., [0050], first network 348 outputs “three channels of a foreground RGB”) of multi-camera images (see Note Regarding Multiple Cameras below), first inputs generated by a second neural network (e.g., Fig. 3D, second machine learning model 306 is a second neural network, which generates alpha data 212; e.g., Fig. 3D, first network 348 receives alpha data 212);
receiving, by the first neural network, second inputs (e.g., Fig. 3, video data 112) comprising grayscale images (e.g., [0050], Fig. 3E, video data is input in an RGB format, which can be seen as three grayscale, single-channel inputs, each representing the intensity of red, green and blue colors, respectively) and depth maps of the multi-camera images (see Note Regarding Multiple Cameras below); and
generating, by the first neural network, based on the first inputs and the second inputs, multi-view alpha mattes (e.g., [0050], Fig. 3E, refined alpha mattes output for each frame in the video, the frames reflecting different views due to scene/camera motion; also see Note Regarding Multiple Cameras below) and multi-view foreground estimates (e.g., [0050], Fig. 3E, “three channels of a foreground RGB” output for each frame in the video, the frames reflecting different views due to scene/camera motion such as is illustrated in Fig. 3B; also see Note Regarding Multiple Cameras below) for the multi-camera images (see Note Regarding Multiple Cameras below).
Note Regarding Multiple Cameras. Lee teaches techniques for automatically matting video data (e.g., Figs. 1-2). Lee teaches processing video data including multiple images (e.g., Fig. 3B), but does not explicitly teach that the video images are “multi-camera” video images. Lee also does not explicitly teach the video data including depth maps.
However, Bleibel does teach techniques for automatically matting video data, where the video data includes multi-camera images (e.g., Fig. 1, multi-view video stream 100; e.g., [0058], multi-view video may be captured by an array of multiple cameras) and depth maps (e.g., Fig. 1, depth data 104).
Lee reduces the amount of effort needed to apply matting to a video by propagating matting from one annotated trimap throughout an entire video (e.g., [0036], Fig. 2). Bleibel demonstrates that such matting information can be propagated can be propagated not only through time, but also between views in a multi-camera/view video (e.g., Figs. 2-3; [0061]) and teaches that this saves even more effort because the matting process does not be repeated for each view/camera in a video stream (e.g., [0007]; [0060]). Bleibel also recognizes that background matting is ultimately a depth-based effect (e.g., [0008]-[0009]; [0059]), and that depth information may useful for refining a foreground-background segmentation ([0078]) and/or view propagation (e.g., [0143]).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the method of Lee with the multi-view video stream input of Bleibel in order to improve the method with the reasonable expectation that this would result in a method that could further reduce the resources needed for background matting by propagating results not only in time, but also across multiple views, and that obtained additional useful information about a scene. This technique for improving the method of Lee was within the ordinary ability of one of ordinary skill in the art based on the teachings of Lee and Bleibel.
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Lee and Bleibel to obtain the invention as specified in claim 1.
Regarding claim 2, Lee in view of Bleibel teaches the method of claim 1, and Lee further teaches that the multi-view alpha mattes and the multi-view foreground estimates are generated based on a loss function (e.g., [0060], “Objective functions are set for all outputs of the machine learning models except for the hidden features”; Also see training process at [0057] et seq.).
Regarding claim 7, Lee in view of Bleibel teaches the method of claim 1, and Lee further teaches that the first neural network is a deep learning-based neural network (e.g., [0043], machine learning model may include “deep convolutional neural networks,” “deep learning,” etc.).
Regarding claim 8, Lee in view of Bleibel teaches the method of claim 1, and Lee further teaches that the first inputs comprise alpha mattes for each camera view of the multi-camera images (e.g., [0050], input to the third machine learning model include “a predicted alpha matte” for a given frame of video, with each frame of the video being processed such that the “first inputs” comprise alpha mattes for each of the video images; As explained in the rejection of claim 1, Lee has been modified such that the video includes images from multiple cameras, each corresponding to a different camera view).
Regarding claim 9, Examiner notes that the claim recites a non-transitory computer-readable medium storing computer-executable instructions, which when executed by one or more processors result in performing a method that is substantially the same as the method of claim 1.
Lee in view of Bleibel teaches the method of claim 1 (see above).
Lee further teaches implementing its method as a non-transitory computer-readable medium storing computer-executable instructions, which when executed by one or more processors result in performing the method (e.g., Fig. 6, [0081]-[0082]).
Therefore, claim 9 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel for substantially the same reasons as claim 1.
Regarding claim 10, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 2. Lee in view of Bleibel teaches the invention of claim 2 (see above). Accordingly, claim 10 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel for substantially the same reasons as claim 2.
Regarding claim 15, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 7. Lee in view of Bleibel teaches the invention of claim 7 (see above). Accordingly, claim 15 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel for substantially the same reasons as claim 7.
Regarding claim 16, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 8. Lee in view of Bleibel teaches the invention of claim 8 (see above). Accordingly, claim 16 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel for substantially the same reasons as claim 8.
Regarding claim 17, Examiner notes that the claim recites a device comprising memory storing instructions, the memory coupled to at least one processor configured to perform a method that is substantially the same as the method of claim 1.
Lee in view of Bleibel teaches the method of claim 1 (see above).
Lee further teaches implementing its method as a device comprising memory storing instructions, the memory coupled to at least one processor configured to perform the method (e.g., Fig. 6, [0076]-[0077], [0081]-[0082]).
Therefore, claim 17 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel for substantially the same reasons as claim 1.
Regarding claim 18, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 2. Lee in view of Bleibel teaches the invention of claim 2 (see above). Accordingly, claim 18 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel for substantially the same reasons as claim 2.
Claim(s) 3-6, 11-14, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel as applied above, and further in view of ‘Hou’ (“Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation,” 2019).
Regarding claim 3, Lee in view of Bleibel teaches the method of claim 2 (see above).
Lee teaches a third machine learning model (i.e., a “first neural network”) that outputs predicted alpha mattes and foreground estimates (e.g., [0050], “the third machine learning model outputs one channel of a refined alpha matte … [and] three channels of a foreground RGB”). Lee further teaches that “Objective functions are set for all outputs of the machine learning models” ([0060]). Nevertheless, Lee does not explicitly teach what specific objective (i.e., loss) functions are used. In particular, Lee does not explicitly teach that the loss function minimizes a difference between predicted alpha mattes and ground truth alpha mattes for the multi-camera images.
Bleibel also does not explicitly teach this feature.
However, Hou does teach a neural network that, like Lee’s neural network, is trained to output predicted alpha mattes and foreground estimates (e.g., Fig. 2), and Hou further teaches training its neural network using a loss function that minimizes a difference between predicted alpha mattes and ground truth alpha mattes for input training images (e.g., Section 3.1, equation 2 and related text,
L
l
a
p
α
).
Hou teaches that its alpha matte loss function allows its neural network to be trained to produce “more numerically accurate results” than are achieved using an alternative loss function (e.g., Sec. 4.3, Loss functions, first sentence).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the method of Lee in view of Bleibel as applied above with the alpha matte loss function of Hou in order to improve the method with the reasonable expectation that this would result in a method that used a loss function that was advantageously demonstrated to be suitable for training neural networks for alpha matte prediction and to provide more numerically accurate results than an alternative loss function. This technique for improving the method of Lee in view of Bleibel was within the ordinary ability of one of ordinary skill in the art based on the teachings of Hou.
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Lee, Bleibel, and Hou to obtain the invention as specified in claim 3.
Regarding claim 4, Lee in view of Bleibel and Hou teaches the method of claim 3.
Lee teaches training the first neural network (e.g., [0059]; note that the third machine learning model in the reference corresponds to the first neural network in the claims) using a loss function (e.g., [0060], objective functions set for outputs of the model).
As explained above in the rejection of claim 3, Lee has been modified to use the alpha matte loss function of Hou. The alpha matte loss function of Hou uses the ground truth alpha mattes (e.g., Section 3.1, equation 2 and related text, ground truth alpha matte
α
^
).
For at least these reasons, the method of Lee in view of Bleibel and Hou as applied above also falls within the scope of claim 4.
Regarding claim 5, Lee in view of Bleibel teaches the method of claim 2 (see above).
Lee teaches a third machine learning model (i.e., a “first neural network”) that outputs predicted alpha mattes and foreground estimates (e.g., [0050], “the third machine learning model outputs one channel of a refined alpha matte … [and] three channels of a foreground RGB”). Lee further teaches that “Objective functions are set for all outputs of the machine learning models” ([0060]). Nevertheless, Lee does not explicitly teach what specific objective (i.e., loss) functions are used. In particular, Lee does not explicitly teach that the loss function minimizes a difference between predicted foreground estimates and ground truth foreground estimates for the multi-camera images.
Bleibel also does not explicitly teach this feature.
However, Hou does teach a neural network that, like Lee’s neural network, is trained to output predicted alpha mattes and foreground estimates (e.g., Fig. 2), and Hou further teaches training its neural network using a loss function that minimizes a difference between predicted foreground estimates and ground truth foreground estimates for input training images (e.g., Sec. 3.1, eqn. 5 and related text,
L
1
C
).
Hou teaches that its foreground loss function is “naturally needed” when the neural network outputs foreground color predictions and that its foreground loss also can improve alpha map prediction slightly (e.g., Sec. 4.3, Loss functions, second paragraph).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the method of Lee in view of Bleibel as applied above with the foreground loss function of Hou in order to improve the method with the reasonable expectation that this would result in a method that used a loss function that was naturally needed for training neural networks for foreground prediction and that advantageously provided improvement in alpha matte prediction as well. This technique for improving the method of Lee in view of Bleibel was within the ordinary ability of one of ordinary skill in the art based on the teachings of Hou.
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Lee, Bleibel, and Hou to obtain the invention as specified in claim 5.
Regarding claim 6, Lee in view of Bleibel and Hou teaches the method of claim 5.
Lee teaches training the first neural network (e.g., [0059]; note that the third machine learning model in the reference corresponds to the first neural network in the claims) using a loss function (e.g., [0060], objective functions set for outputs of the model).
As explained above in the rejection of claim 5, Lee has been modified to use the foreground loss function of Hou. The foreground loss function of Hou uses the ground truth foreground estimates (e.g., Sec. 3.1, eqn. 5 and related text, ground truth foreground
F
^
).
For at least these reasons, the method of Lee in view of Bleibel and Hou as applied above also falls within the scope of claim 6.
Regarding claim 11, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 3. Lee in view of Bleibel and Hou teaches the invention of claim 3 (see above). Accordingly, claim 11 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel and Hou for substantially the same reasons as claim 3.
Regarding claim 12, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 4. Lee in view of Bleibel and Hou teaches the invention of claim 4 (see above). Accordingly, claim 12 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel and Hou for substantially the same reasons as claim 4.
Regarding claim 13, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 5. Lee in view of Bleibel and Hou teaches the invention of claim 5 (see above). Accordingly, claim 13 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel and Hou for substantially the same reasons as claim 5.
Regarding claim 14, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 6. Lee in view of Bleibel and Hou teaches the invention of claim 6 (see above). Accordingly, claim 14 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel and Hou for substantially the same reasons as claim 6.
Regarding claim 19, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 3. Lee in view of Bleibel and Hou teaches the invention of claim 3 (see above). Accordingly, claim 19 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel and Hou for substantially the same reasons as claim 3.
Regarding claim 20, Examiner notes that the claim recites limitations that are substantially the same as limitations recited in claim 4. Lee in view of Bleibel and Hou teaches the invention of claim 4 (see above). Accordingly, claim 20 is also rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Bleibel and Hou for substantially the same reasons as claim 4.
Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
‘Chen’ (“Real-time multi-view background matting for 3D light field video,” 2021)
Uses a multi-branch neural network for multi-view video matting
‘Oz’ (US 2022/0191431 A1)
Predicts alpha mattes from multi-camera video
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEOFFREY E SUMMERS whose telephone number is (571)272-9915. The examiner can normally be reached Monday-Friday, 7:00 AM to 3:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached at (571) 272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GEOFFREY E SUMMERS/Examiner, Art Unit 2669