DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities:
In paragraph [0077], lines 5-6, symbols “
I
T
”, “602” and “604” are not shown in Figure 6.
In paragraph [0079], lines 2-3, symbols “606” and “608” are not shown in Figure 6.
In paragraph [0080], lines 1-3, symbols “612” and “614” are not shown in Figure 6.
In paragraphs [0081]-[0083], symbols “616” and “630” are not shown in Figure 6.
Appropriate correction is required.
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc. In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because it has phrases “This disclosure provides”. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-8 and 10-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al (2017 CVPR), hereinafter Zhou in view of Zhou et al (Xiv:1605.03557v3 11 Feb 2017), hereinafter Zhou1.
-Regarding claim 1, Zhou discloses a method for depth estimation, comprising (Abstract; FIGS. 1-10): generating, in accordance with first image data of a first image frame and second image data of a second image frame (FIGS. 2-3; equation (1), p. 1853, 1st Col., Sec. 3.1., 2nd paragraph, “
PNG
media_image1.png
144
370
media_image1.png
Greyscale
”), a first mask indicating one or more pixels determined not to change position between the first image frame and the second image frame (p. 1853, 2nd Col., Sec. 3.3, 1st paragraph – p.1854, 1st Col., 1st paragraph, “outputs a per-pixel soft mask
E
^
x
for each target-source pair, indicating the
PNG
media_image2.png
101
383
media_image2.png
Greyscale
”; p. 1855, 1st Col., 2nd paragraph, “Explainability mask”; p. 1858, Sec. 4.3.); generating, in accordance with the first image data and the second image data, a second mask indicating one or more pixels determined not to change position between the first image frame and the second image frame, wherein the first mask and the second mask are generated using at least some different input data (p.1853, equations (1)-(2); p.1854, equations (3)-(4), 2nd Col., 1st paragraph; Note: one or more pixels p in target view image frame
I
t
or a scale of target view image frame
I
t
is considered as the first image data, and the corresponding pixels or scales for source view frame
I
s
is considered as the second image data. Other one or more pixels p in target view image frame
I
t
or a different scale of target view image frame
I
t
, and the corresponding other one or more pixels or a different scale for source view frame
I
s
are considered as different input data);
Zhou does not disclose combining the first mask with the second mask to generate a third mask. However, Zhou does teach combining the output of each per-pixel soft mask
E
^
x
for each target-source pair (p.1854, equation (3)).
In the same field of endeavor, Zhou1 teaches a view synthesis method for multiple input views by learning how to optimally combine single-view predictions (Zhou1: Abstract; FIGS. 1-9). Zhou1 further teaches combining the first mask with the second mask to generate a third mask (Zhou1: FIG. 3; p.7, Sec.3.2., 2nd paragraph, “
PNG
media_image3.png
190
545
media_image3.png
Greyscale
”)
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zhou with the teaching of Zhou1 by combining the first mask with the second mask to generate a third mask in order to individual strength of different input views to synthesize target views that might not be feasible with any input view alone (Zhou1: p. 7., Sec.3.2., 1st paragraph).
-Regarding claim 10, Zhou discloses an apparatus, comprising (Abstract; FIGS. 1-10): a memory storing processor-readable code; and at least one processor coupled to the memory, the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations including (one or more processors or memories has to be used in order implement Zhou’s FIGS. 1-4): generating, in accordance with first image data of a first image frame and second image data of a second image frame (FIGS. 2-3; equation (1), p. 1853, 1st Col., Sec. 3.1., 2nd paragraph, “
PNG
media_image1.png
144
370
media_image1.png
Greyscale
”), a first mask indicating one or more pixels determined not to change position between the first image frame and the second image frame (p. 1853, 2nd Col., Sec. 3.3, 1st paragraph – p.1854, 1st Col., 1st paragraph, “outputs a per-pixel soft mask
E
^
x
for each target-source pair, indicating the
PNG
media_image2.png
101
383
media_image2.png
Greyscale
”; p. 1855, 1st Col., 2nd paragraph, “Explainability mask”; p. 1858, Sec. 4.3.); generating, in accordance with the first image data and the second image data, a second mask indicating one or more pixels determined not to change position between the first image frame and the second image frame, wherein the first mask and the second mask are generated using at least some different input data (p.1853, equations (1)-(2); p.1854, equations (3)-(4), 2nd Col., 1st paragraph; Note: one or more pixels p in target view image frame
I
t
or a scale of target view image frame
I
t
is considered as the first image data, and the corresponding pixels or scales for source view frame
I
s
is considered as the second image data. Other one or more pixels p in target view image frame
I
t
or a different scale of target view image frame
I
t
, and the corresponding other one or more pixels or a different scale for source view frame
I
s
are considered as different input data);
Zhou does not disclose combining the first mask with the second mask to generate a third mask. However, Zhou does teach combining the output of each per-pixel soft mask
E
^
x
for each target-source pair (p.1854, equation (3)).
In the same field of endeavor, Zhou1 teaches a view synthesis method for multiple input views by learning how to optimally combine single-view predictions (Zhou1: Abstract; FIGS. 1-9). Zhou1 further teaches combining the first mask with the second mask to generate a third mask (Zhou1: FIG. 3; p.7, Sec.3.2., 2nd paragraph, “
PNG
media_image3.png
190
545
media_image3.png
Greyscale
”)
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zhou with the teaching of Zhou1 by combining the first mask with the second mask to generate a third mask in order to individual strength of different input views to synthesize target views that might not be feasible with any input view alone (Zhou1: p. 7., Sec.3.2., 1st paragraph).
-Regarding claim 16, Zhou discloses an apparatus, comprising (Abstract; FIGS. 1-10): at least one image sensor configured to capture first image data and second image data; a positioning engine; a memory storing processor-readable code; and at least one processor coupled to the memory, to the at least one image sensor (p.1852, 2nd Col., Sec.3., 1st paragraph, “a moving camera”), and to the position engine (FIGS. 1-2, pose CNN), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations including (one or more processors or memories has to be used in order implement Zhou’s FIGS. 1-4): generating, in accordance with first image data of a first image frame and second image data of a second image frame (FIGS. 2-3; equation (1), p. 1853, 1st Col., Sec. 3.1., 2nd paragraph, “
PNG
media_image1.png
144
370
media_image1.png
Greyscale
”), a first mask indicating one or more pixels determined not to change position between the first image frame and the second image frame (p. 1853, 2nd Col., Sec. 3.3, 1st paragraph – p.1854, 1st Col., 1st paragraph, “outputs a per-pixel soft mask
E
^
x
for each target-source pair, indicating the
PNG
media_image2.png
101
383
media_image2.png
Greyscale
”; p. 1855, 1st Col., 2nd paragraph, “Explainability mask”; p. 1858, Sec. 4.3.); generating, in accordance with the first image data and the second image data, a second mask indicating one or more pixels determined not to change position between the first image frame and the second image frame, wherein the first mask and the second mask are generated using at least some different input data (p.1853, equations (1)-(2); p.1854, equations (3)-(4), 2nd Col., 1st paragraph; Note: one or more pixels p in target view image frame
I
t
or a scale of target view image frame
I
t
is considered as the first image data, and the corresponding pixels or scale for source view frame
I
s
is considered as the second image data. Other one or more pixels p in target view image frame
I
t
or a different scale of target view image frame
I
t
, and the corresponding other one or more pixels or a different scale for source view frame
I
s
are considered as different input data); wherein: generating the first mask comprises generating the first mask in accordance with first positioning information indicating a position of the at least one image sensor that captured the first image frame and the second image frame from the positioning engine (FIGS. 1-3; equations (1)-(4)), and generating the second mask comprises generating the second mask in accordance with second positioning information indicating the position of the at least one image sensor that captured the first image frame and the second image frame from a pose estimation network (FIGS. 1-3; equations (1)-(4); It is known that algorithm based on Structure from Motion (SFM) often assume precise per-frame camera poses (e.g., camera position and orientation) as auxiliary inputs, which are typically estimated with SFM; See Kopf et al (US 12243251 B1): Col. 1, lines 50-55).
Zhou does not disclose combining the first mask with the second mask to generate a third mask. However, Zhou does teach combining the output of each per-pixel soft mask
E
^
x
for each target-source pair (p.1854, equation (3)).
In the same field of endeavor, Zhou1 teaches a view synthesis method for multiple input views by learning how to optimally combine single-view predictions (Zhou1: Abstract; FIGS. 1-9). Zhou1 further teaches combining the first mask with the second mask to generate a third mask (Zhou1: FIG. 3; p.7, Sec.3.2., 2nd paragraph, “
PNG
media_image3.png
190
545
media_image3.png
Greyscale
”)
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zhou with the teaching of Zhou1 by combining the first mask with the second mask to generate a third mask in order to individual strength of different input views to synthesize target views that might not be feasible with any input view alone (Zhou1: p. 7., Sec.3.2., 1st paragraph).
-Regarding claims 2, 11 and 17, Zhou in view of Zhou1 teaches the method of claim 1, the apparatus of claim 10, and the apparatus of claim 16. Zhou discloses wherein the first mask comprises a first explainability mask, the second mask comprises a second explainability mask (p. 1853, 2nd Col., Sec. 3.3, 1st paragraph – p.1854, 1st Col., 1st paragraph, “outputs a per-pixel soft mask
E
^
x
for each target-source pair, indicating the
PNG
media_image2.png
101
383
media_image2.png
Greyscale
”; p. 1855, 1st Col., 2nd paragraph, “Explainability mask”; p. 1858, Sec. 4.3.).
Zhou does not disclose wherein the third mask comprises a third explainability mask.
In the same field of endeavor, Zhou1 teaches a view synthesis method for multiple input views by learning how to optimally combine single-view predictions (Zhou1: Abstract; FIGS. 1-9). Zhou1 further teaches wherein the third mask comprises a third explainability mask (Zhou1: FIG. 3; p.7, Sec.3.2., 2nd paragraph, “
PNG
media_image3.png
190
545
media_image3.png
Greyscale
”)
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zhou with the teaching of Zhou1 by combining the first mask with the second mask to generate a third mask in order to individual strength of different input views to synthesize target views that might not be feasible with any input view alone (Zhou1: p. 7., Sec.3.2., 1st paragraph).
-Regarding claims 3 and 12, Zhou in view of Zhou1 teaches the method of claim 1 and the apparatus of claim 10. The combination further teaches wherein generating the first mask comprises generating the first mask in accordance with first positioning information indicating a position of the at least one image sensor that captured the first image frame and the second image frame from the positioning engine (Zhou: FIGS. 1-3; equations (1)-(4)), and generating the second mask comprises generating the second mask in accordance with second positioning information indicating the position of the at least one image sensor that captured the first image frame and the second image frame from a pose estimation network (Zhou: FIGS. 1-3; equations (1)-(4); It is known that algorithm based on Structure from Motion (SFM) often assume precise per-frame camera poses (e.g., camera position and orientation) as auxiliary inputs, which are typically estimated with SFM; See Kopf et al (US 12243251 B1): Col. 1, lines 50-55).
-Regarding claims 4, 13 and 18, Zhou in view of Zhou1 teaches the method of claim 3, the apparatus of claim 12, and the apparatus of claim 17. The combination further teaches receiving, from the positioning engine, the first positioning information; generating a first reconstructed version of the first image frame in accordance with the first positioning information, the second image data, and depth information for the first image frame (Zhou: FIGS. 1-3; equation (1); p. 1853, 2nd Col., Sec. 3.2., “As indicated in Eq. 1, a key component of our learning framework is a differentiable depth image-based renderer that reconstructs the target view
I
t
by sampling pixels from a source view
I
s
based on the predicted depth map
D
^
t
and the relative pose
T
^
t
→
s
.”); and generating the first mask in accordance with the first reconstructed version of the first image frame and the first image frame (Zhou: equation (3); p. 1853 – p.1854, Sec. 3.3.).
-Regarding claims 5, 14 and 19, Zhou in view of Zhou1 teaches the method of claim 4, the apparatus of claim 13, and the apparatus of claim 18. The combination further teaches receiving, from the pose estimation network, the second positioning information; generating a second reconstructed version of the first image frame in accordance with the second positioning information, the second image data, and depth information for the first image frame (Zhou: FIGS. 1-3; equation (1); p. 1853, 2nd Col., Sec. 3.2., “As indicated in Eq. 1, a key component of our learning framework is a differentiable depth image-based renderer that reconstructs the target view
I
t
by sampling pixels from a source view
I
s
based on the predicted depth map
D
^
t
and the relative pose
T
^
t
→
s
.”); and generating the second mask in accordance with the second reconstructed version of the first image frame and the first image frame (Zhou: equation (3); p. 1853 – p.1854, Sec. 3.3.).
-Regarding claims 6, 15 and 20, Zhou in view of Zhou1 teaches the method of claim 5, the apparatus of claim 14, and the apparatus of claim 19. The combination further teaches generating a third reconstructed version of the first image frame in accordance with the first reconstructed version of the third image frame and the second reconstructed version of the first image frame (Zhou: FIGS. 1-4; equations (1)-(4)).
-Regarding claim 7, Zhou in view of Zhou1 teaches the method of claim 1. The combination further teaches determining a photometric loss in accordance with the third mask; and training a depth estimation network based on the photometric loss (Zhou: FIG. 2 (caption), “the photometric reconstruction loss is used for training the CNN”; equation (3)).
-Regarding claim 8, Zhou in view of Zhou1 teaches the method of claim 1.
Zhou discloses determining a first mask value of a first pixel of the first mask; determining a second mask value of a second pixel of the second mask, wherein the second pixel corresponds to the first pixel (Equation (3); p. 1853 – p.1854, Sec.3.3.);
Zhou does not disclose determining a combined mask value for a third pixel of the third mask in accordance with the first mask value and the second mask value, wherein the third pixel corresponds to the first and second pixels.
In the same field of endeavor, Zhou1 teaches a view synthesis method for multiple input views by learning how to optimally combine single-view predictions (Zhou1: Abstract; FIGS. 1-9). Zhou1 further teaches determining a combined mask value for a third pixel of the third mask in accordance with the first mask value and the second mask value, wherein the third pixel corresponds to the first and second pixels (Zhou1: FIG. 3; p.7, Sec.3.2., 2nd paragraph, “
PNG
media_image3.png
190
545
media_image3.png
Greyscale
”)
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zhou with the teaching of Zhou1 by combining the first mask with the second mask to generate a third mask in order to individual strength of different input views to synthesize target views that might not be feasible with any input view alone (Zhou1: p. 7., Sec.3.2., 1st paragraph).
Allowable Subject Matter
Claim 9 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XIAO LIU/Primary Examiner, Art Unit 2664