Office Action Analysis: 18703608 — THREE-DIMENSIONAL BUILDING MODEL GENERATION BASED ON CLASSIFICATION OF IMAGE ELEMENTS

Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
Paragraphs “00100” through “00121” should be “0100” through “0121”.
In paragraph 00109, line 3, “solid state drive” should be “solid-state drive”.
In paragraph 00111, lines 4-5, “world wide packet” should be “world-wide packet”.
Appropriate correction is required.

Claim Objections
Claim 30 is objected to because of the following informalities: in line 7, “images and and” should be “images and”.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 38 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 38 recites the following limitations:
“nonselected cameras” in line 8.
“the second subset” in line 9
“the selected camera pair” in line 9.
There is insufficient antecedent basis for this limitation in the claim. It appears that the applicant has made a mistake and did not delete limitations that was deleted from claim 29, which is a claim with substantially similar limitations to claim 38. As the limitations do not make sense compared to the rest of the claims in which claim 38 are dependent on, it is impossible for an art rejection to be made for claim 38. However, for advance compact prosecution, we examined claim 38 as is similar to claim 29 for art rejection.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 25, 32-34, and 41-43 are rejected under 35 U.S.C. 103 as being unpatentable over Porter et al. (US 2021/0158615 A1, hereinafter Porter) in view of Merras et al. ("Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms.", hereinafter Merras).

	Regarding claim 25, Porter teaches a method for generating three-dimensional data, the method comprising: ([0002] “the present disclosure relates to systems and methods for reconstructing a three dimensional (“3D”) structure in world coordinates from one or more two dimensional (“2D”) images.”)
	obtaining a plurality of images depicting an object, ([0034] “In step 12, the system 10 performs an imagery selection phase. The imagery selection phase retrieves one or more images and metadata of the retrieved images based on a geospatial region of interest (“ROI”)”, where “The region can be of interest to the user because the region may contain one or more buildings.” [0036])
	wherein each image of the plurality of images is taken at an individual position about the object, ([0037] “However, it should be understood that images which cover the geospatial ROI and in which the geospatial ROI is close to the center of the image can be preferred for detecting property features of the roof.”)
	and wherein the images are associated with camera properties comprising extrinsic camera parameters; ([0038] “In step 26, the system retrieves metadata for the selected image. The metadata can include data about the camera used to capture each selected image, such as but not limited to, intrinsic and extrinsic parameters of the camera.”)
	obtaining for each image of the plurality of images, via a machine learning model, semantic labels describing one or more elements of the object, ([0039] “Specifically, FIG. 4 illustrates the process steps performed during the neural network inference phase. More specifically, the neural network inference phase includes a neural network (or another computer vision system) which generates annotations of the roof in the images retrieved in the imagery selection phase.”, where “The annotations can be pixel-level annotations which include, but are not limited to, roof line types, roof line directions, roof gradient, corner locations, face types, etc.” [0039])
	wherein the semantic labels are associated with two-dimensional positions in the plurality of images; ([0034] “The neural network inference phase produces 2D outputs in pixel space, such as surface gradients, line gradients, line types, corners, etc., for one or more structures in the retrieved image(s).”)
	estimating respective three-dimensional positions for the one or more elements, ([0034]” In step 20, the system 10 performs a 3D reconstruction phase. The 3D reconstruction phase processes the output from the line graph construction phase and the metadata from the image(s) to transform the line data into 3D line segment geometries in world space.”)
	estimating a selected image pair from the plurality of images ([0066] “the system matches line segments between multiple views from multiple images and uses this correspondence to generate a set of line segments in world space. The system then projects the line segments back onto an image to add the elevation information onto three dimensional (“3D”) line segments… from the accuracy of line segments recovered from the neural network outputs.”)
	and generating a three-dimensional representation of at least a portion of the object based on the estimated one or more three-dimensional positions ([0071] “FIG. 17 shows a flowchart illustrating step 20 of FIG. 1 in greater detail. Specifically, FIG. 17 illustrates process steps performed during the 3D reconstruction phase. More specifically, the 3D reconstruction phase generates a 3D representation of the roof in world space.”).
	Porter fails to teach wherein estimating is based on camera properties. However, this is known in the art as taught by Merras. Merras teaches wherein estimating is based on camera properties ([pg. 6278, sec. 3.3, par. 1, lines 1-4] “we use the Euclidean reconstruction to get the 3D points cloud from the points matching between pairs of images and the intrinsic and extrinsic parameters of the cameras”). Merras is analogous to the claimed invention, as both relate to 3D reconstruction given multi-view 2D images. Merras further teaches that their pipeline helps address “the main problems of the 3D reconstruction and modeling from stereo images [which] are generally linked to the proposed constraints on the vision system (the characteristics of the cameras[)]”. Therefore, it would be obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Merras into Porter in order to achieve better 3D reconstruction to account for the characteristics of the camera such as its extrinsic parameters.
	Porter teaches a reprojection error ([0080] “Each roof candidate may then be projected back into pixel space of the multiple views and compared with the neural network outputs to determine how well the 3D reconstructed output matches the neural network outputs. This can be used to generate a confidence score for each 3D reconstruction candidate and select the highest scoring reconstruction as a final candidate roof.”), but fails to teach that same reprojection error associated with the one or more elements in an image from the plurality of images other than the image pair.
However, it is known in the art as taught by Merras. Merras teaches a reprojection error associated with the one or more elements in an image from the plurality of images other than the image pair (“We also calculated the reprojection error of the 3D points reconstructed by formula (26)”, where “εij is a binary factor of visibility, N is the images number, n number of the 3D points, Pi is the projection matrix” [pg. 6280, sec. 4.1, col. 2, full par. 1, lines 1-4]). Merras further teaches that “the minimization of the reprojection error to find the optimal 3D point cloud… allows reducing the errors of our method compared to other approaches” [pg. 6282, col. 1, full par. 1, lines 9-11]. Therefore, it would be obvious for one of ordinary skill before the effective filing date of the claimed invention to incorporate the teachings of Merras to Porter, such that the reprojection error of an image at a time allows for 3D reconstruction to produce less errors.

    PNG
    media_image1.png
    172
    916
    media_image1.png
    Greyscale

Formula 26 – Merras et al.

	Regarding claim 32, the combination of Porter and Merras teaches the method of claim 25, further comprising generating additional three-dimensional semantic geometries based on geometrical constraints associated with a particular semantic label (Porter; [0043] “The direction the roof is sloping may allow for the application of the constraints over possible configurations of a final line graph”, where the neural network first “In step 32, the system assigns line type labels to the roof detected” [0040] and “In step 34, the system assigns line direction labels” [0041], and “the system can generate 3D line segment geometries in world space from the line graph” [0005]).

	Regarding claim 33, the combination of Porter and Merras teaches the method of claim 25, wherein the object is a building object (Porter; [0033] “methods and embodiments discussed throughout this disclosure may be applied to any structure, including but not limited to, roofs, walls, buildings, awnings, houses, decks, pools, temporary structures such as tents, motor vehicles, foundations, etc.”).

	Regarding claim 34, claim 34 recites substantially similar limitations to claim 1, but in a medium form. The combination of Porter and Merras further teaches one or more non-transitory storage media storing instructions that when executed by a system of one or more processors, cause the processors to perform operations (Porter; [0081] “The system can include a plurality of internal servers 224a-224n having at least one processor and memory for executing the computer instructions and methods described above (which could be embodied as computer software 222 illustrated in the diagram). The system can also include a plurality of image storage servers 226a-226n for receiving the image data and video data.”)

	Claim 41 recites substantially similar limitations as claim 32, therefore, will be rejected under the same rationale as claim 32.

	Claim 42 recites substantially similar limitations as claim 33, therefore, will be rejected under the same rationale as claim 33.

	Regarding claim 43, claim 43 recites substantially similar limitations to claim 34, but in a system form. The combination of Porter and Merras further teaches a system comprising one or more processors and non-transitory computer storage media storing instructions that when executed by the one or more processors, cause the one or more processors to perform operations (Porter; [0081] “The system can include a plurality of internal servers 224a-224n having at least one processor and memory for executing the computer instructions and methods described above”).

Claims 26, 35, and 44 are rejected under 35 U.S.C. 103 as being unpatentable over Porter (US 2021/0158615 A1) in view of Merras ("Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms."), and further in view of O’Shea et al. (“An Introduction to Convolutional Neural Networks”, hereinafter O’Shea).

Regarding claim 26, the combination of Porter and Merras teaches the method of claim 25, wherein the machine learning model is a neural network which is trained to classify portions of an input image according to a plurality of semantic labels (Porter; [0039] “More specifically, the neural network inference phase includes a neural network (or another computer vision system) which generates annotations of the roof in the images retrieved in the imagery selection phase.”),
wherein the semantic labels are associated with object-specific elements, and wherein the object-specific elements are building object elements comprising roof elements (Porter; [0039] “The annotations can be pixel-level annotations which include, but are not limited to, roof line types, roof line directions, roof gradient, corner locations, face types, etc.”).
The combination of Porter and Merras fails to teach a convolutional neural network. However, this is known in the art by O’Shea.  O’Shea teaches that “CNNs are primarily used in the field of pattern recognition within images… [and] this allows us to encode image-specific features into the architecture, making the network more suited for image-focused tasks- whilst further reducing the parameters required to set up the model” [pg. 2, full par. 4, lines 1-3 - pg. 3, par. 1, lines 1-2]. Therefore, it would be obvious for one of ordinary skill in the art to incorporate the teachings of O’Shea to the combination of Porter and Merras to use a convolutional neural network for 3D reconstruction using 2D images, as convolutional neural networks are suited for images.

Claim 35 recites substantially similar limitations as claim 26, therefore, will be rejected under the same rationale as claim 26.

Claim 44 recites substantially similar limitations as claim 26, therefore, will be rejected under the same rationale as claim 26.

Claims 27 and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Porter (US 2021/0158615 A1) in view of Merras ("Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms."), and further in view of Ben-Artzi et al. (“Epipolar Geometry Based On Line Similarity”, hereinafter Ben-Artzi).

Regarding claim 27, the combination of Porter and Merras teaches the method of claim 25, wherein a first element of the one or more plurality of elements is associated with a first semantic label, (Porter; “In step 32, the system assigns line type labels to the roof detected in the retrieved image” [0040])
and wherein estimating the three-dimensional position of the first element comprises: obtaining a first subset of images from the plurality of images, (Porter; [Abstract] “The processor selects one or more images and the respective metadata thereof from the first database”)
wherein the first semantic label was obtained for each image of the first subset (Porter; [0042] “It should be understood that the line type label and the direction vector can be used to segment out line instances in a later phase (e.g., the line extraction phase)”, where “as indicated above, multiple images can also be used by the methods and systems of this disclosure” [0037]. Note: the line type label, which is the first semantic label, is being used by the system in a later phase, which requires the system to obtain the semantic label first before being used to create line segments.).
The combination of Porter and Merras fails to teach generating, based on epipolar constraints, a plurality of image pairs from the first subset of images. However, this is known in the art as taught by Ben-Artzi.
Ben-Artzi teaches generating, based on epipolar constraints, a plurality of image pairs from the first subset of images ([pg. 1868, sec. 6, par. 1, lines 2-5] “The house dataset includes 10 images, representing different angles. The images are presented in Figure 7. We used every consecutive pair of images as a stereo pair, which results in 9 pairs”, where “It is based on the quality of stereo matching between the two lines... instead of an exhaustive search over all possible pairs of lines, the search space is substantially reduced when two corresponding point pairs are given.” [Abstract, par. 2, lines 4-9]). Ben-Artzi is analogous to the claimed invention, as both relate to epipolar geometry between pairs of real-world images, more specifically images of houses. Ben-Artzi further teaches that “based on the quality of stereo matching between the two lines… corresponding epipolar lines yield a good stereo correspondence”. Therefore, it would be obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Ben-Artzi to the combination of Porter and Merras, as epipolar constraints allow images with high quality stereo matching to be paired together.

Claim 36 recites substantially similar limitations as claim 27, therefore, will be rejected under the same rationale as claim 27.

Claims 28 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Porter (US 2021/0158615 A1) in view of Merras ("Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms.") and Ben-Artzi et al. (“Epipolar Geometry Based On Line Similarity”, hereinafter Ben-Artzi), and further in view of Jepson et al. (“Epipolar Geometry”, hereinafter Jepson).
Regarding claim 28, the combination of Porter, Merras, and Ben-Artzi teaches the method of claim 27, wherein the first element in a first image of a particular image pair satisfies a first epipolar constraint of a second image of the particular image pair and the first element in the second image of the particular image pair satisfies a second epipolar constraint of the first image (Ben-Artzi; [Abstract, par. 1, lines 5-9] “This paper proposes a similarity measure between lines that indicates whether two lines are corresponding epipolar lines and enables finding epipolar line correspondences as needed for the computation of epipolar geometry”).
This is further taught by Jepson. Jepson describes how a point of the right image (second image), lies in the epipolar line of the left image (first image), and vice versa [pg. 3, par. 1, and image in pg. 3], teaching that images that satisfy the epipolar constraints of their pair is done by falling under each other’s corresponding epipolar lines. Ben-Artzi further teaches that “stereo matching will be more successful when applied to corresponding epipolar lines, rather than to random, unrelated lines.” [pg. 1865, sec. 2.4, par. 1, lines 3-5]. Therefore, it would be obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to further incorporate the teachings of Ben-Artzi and Jepson to the combination of Porter, Merras, and Ben-Artzi, as epipolar constraints allow for successful pairing of images.

    PNG
    media_image2.png
    395
    535
    media_image2.png
    Greyscale

Figure on pg. 3 – Jepson et al.

Claim 37 recites substantially similar limitations as claim 28, therefore, will be rejected under the same rationale as claim 28.

Claims 29 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Porter (US 2021/0158615 A1) in view of Merras ("Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms.") and Ben-Artzi (“Epipolar Geometry Based On Line Similarity”), and further in view of Chen et al. (“Multi-View Triangulation: Systematic Comparison and an Improved Method”, hereinafter Chen).
The combination of Porter, Merras, and Ben-Artzi teaches the method of claim 27, further comprising: iteratively selected image pairs from the plurality of image pairs (Ben-Artzi; [pg. 1868, sec. 6, par. 3, lines 1-3] “For each pair of images we repeatedly executed 10 iterations. In each iteration we randomly sampled two pairs of corresponding points as input to our approach”)
and calculating a reprojection score for each selected image pair (Ben-Artzi; Figure 8 [pg. 1869],). Note: “symmetric epipolar distance” is explained as “accuracy” with a “median error” [Figure 8 description], which the examiner interprets this distance a score based on the reprojection accuracy in which the reprojected pairs are exact from each other.).

The combination of Porter, Merras, and Ben-Artzi fails to teach iteratively triangulating a three-dimensional position of the first element, reprojecting each iteratively triangulated three-dimensional position into each image of the first subset of images; and calculating a reprojection score.
However, this is known in the art as taught by Chen. Chen teaches iteratively triangulating a three-dimensional position of the first element ([pg. 21019, sec. C, par. 1, line 2] “iterative optimization method” under “L2 triangulation method”)
reprojecting each iteratively triangulated three-dimensional position into each image of the first subset of images; and calculating a reprojection score (as seen in formula 11 [pg. 21019, sec. C], where “In addition to directly figuring out the location of spatial points, more scholars adopt the iterative optimization method… This is the reprojection error in 3D reconstruction” [pg. 21019, sec. C, par. 1, lines 1-10], and “Xi is the coordinate of the measured image point, [and] x̂i is the coordinate of the image point calculated by the quadratic projection of the spatial point” [pg. 21019, sec. C, par. 1, lines 11-13]. Note: the formula shows the summation of the reprojection errors, which requires a reprojection error score for each image to be calculated).
Chen in analogous to the claimed invention, as both relate to 3D reconstruction using 2D multi-view images. Chen also teaches that “triangulation is an important task in the 3D reconstruction of computer vision.” Therefore, it would be obvious for one in ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Chen to the combination of Porter, Merras, and Ben-Artzi, as triangulation is known in the art of computer vision to be significant to 3D reconstruction using 2D multi-view images.

    PNG
    media_image3.png
    95
    503
    media_image3.png
    Greyscale

Formula 11 – Chen et al.

Claim 38 recites substantially similar limitations as claim 29, therefore, will be rejected under the same rationale as claim 29.

	Claims 31 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Porter (US 2021/0158615 A1) in view of Merras ("Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms."), Ben-Artzi (“Epipolar Geometry Based On Line Similarity”) and Chen et al. (“Multi-View Triangulation: Systematic Comparison and an Improved Method”, hereinafter Chen), and further in view of Lindstrom (“Triangulation Made Easy”).

	Regarding claim 31, the combination of Porter, Merras, Ben-Artzi, Chen, and Lindstrom teaches the method of claim 29, wherein the estimated three-dimensional position of the first element is a triangulated three-dimensional position of the first element that produced the lowest reprojection score (Lindstrom; [Abstract] “We describe a simple and efficient algorithm for two view triangulation of 3D points from approximate 2D matches based on minimizing the L2 reprojection error.”). Lindstrom further teaches that this algorithm “In practice, however, the measured and reprojected points do not exactly coincide which causes the rays from the camera centers through the imaged points not to intersect in 3D,” [pg. 3, sec. 1, par. 1, lines 13-15]. Therefore, it would be obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Lindstrom to the combination of Porter, Merras, Ben-Artzi, Chen, and Lindstrom in order to allow the measured and reprojected positions to be more accurate and be able to intersect in 3D space.

Claim 40 recites substantially similar limitations as claim 31, therefore, will be rejected under the same rationale as claim 31.

Allowable Subject Matter
Claims 30 and 39 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
In regards to claim 30, the prior art taken singly or in combination do not teach or suggest the limitation of “The method of claim 29, wherein the reprojection score comprises a combination of reprojection errors for a triangulated three-dimensional position in each image of the first subset of images, wherein the reprojection error in an image of the first subset of images is a positional difference between a triangulated three-dimensional position of the first element reprojected onto the image of the first subset of images and a two-dimensional position of the first semantic label in the image of the first subset of images.”

In regards to claim 39, the prior art taken singly or in combination do not teach or suggest the limitation of “The one or more non-transitory storage media of claim 38, wherein the reprojection score comprises a combination of reprojection errors for a triangulated three-dimensional position in each image of the first subset of images, wherein the reprojection error in an image of the first subset of images is a positional difference between a triangulated three-dimensional position of the first element reprojected onto the image of the first subset of images and a two-dimensional position of the first semantic label in the image of the first subset of images.”

Therefore, claims 30 and 39 are considered allowable.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALICIA HA whose telephone number is (571)272-3601. The examiner can normally be reached Mon-Thurs 9:00 AM - 6:00 PM, and Fri 9:00 AM - 1:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571) 272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611                                                                                                                                                                                                        

/ALICIA HA/Examiner, Art Unit 2611
Read full office action
THREE-DIMENSIONAL BUILDING MODEL GENERATION BASED ON CLASSIFICATION OF IMAGE ELEMENTS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

THREE-DIMENSIONAL BUILDING MODEL GENERATION BASED ON CLASSIFICATION OF IMAGE ELEMENTS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email