DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/20/2025 has been entered.
Response to Amendment
This Office Action is in response to Applicant’s amendment/response filed on 11/20/2025, which has been entered and made of record.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 7-9, 11-15, and 18-21 are rejected under 35 U.S.C. 103 as being unpatentable over Cier et al. (US 11,252,329 B1, hereinafter “Cier”).
Regarding claim 1, Cier teaches A method comprising: at a device having a processor: (col. 15, lines 43-46, “the mobile computing device of the user may include various hardware components, such as memory 152, a display 142, one or 45 more hardware processors 132”; col. 49, line 19, “A computer-implemented method”).
obtaining sensor data of a room in a physical environment during a scan of the room, wherein the sensor data contributes to at least a portion of a first three-dimensional (3D) representation of the room in a first format; (col. 3, lines 37-46, “using motion data from IMU sensors on the mobile computing device in combination with visual data from one or more image sensors on the mobile computing device, including in at least some such embodiments to use the additional data captured by the mobile computing device to generate an estimated three-dimensional ("3D") shape of the enclosing room (e.g., based on a 3D point cloud with a plurality of 3D data points and/or estimated planar surfaces of walls and optionally the floor and/or ceiling”). Note that: (1) image data of the room are acquired using one or more image sensors for the room; (2) an estimated three-dimensional ("3D") shape of the enclosing room is mapped into a first 3D representation; and (3) the first 3D representation can be in a format of 3D point cloud and/or estimated planar surfaces as a first format.
determining two-dimensional (2D) shapes representing boundaries of the room based on the first 3D representation; (col. 5, lines 52-57, “analyzing the visual data of the target panorama image (or other target image) and the additional visual data of one or more further images captured by the mobile computing device to identify features (e.g., 2D features) visible in both that visual data and that additional visual data”; col. 10 / lines 62 – col. 11 / line 28, “a Mapping Information Generation Manager (MIGM) system may analyze various images acquired in and around a building in order to automatically determine room shapes of the building's rooms (e.g., 3D room shapes, 2D room shapes, etc.) and to automatically generate a floor plan for the building … using SLAM techniques for multiple video frame images and/or other SfM techniques for a 'dense' set of images that are separated by at most a defined distance (such 15 as 6 feet) to generate a 3D point cloud for the room including 3D points along walls of the room and at least some of the ceiling and floor of the room and optionally with 3D points corresponding to other objects in the room, etc.) and/or by determining and aggregating information about planes for detected features and normal ( orthogonal) directions to those planes to identify planar surfaces for likely locations of walls and other surfaces of the room and to connect the various likely wall locations and form an estimated room
shape for the room”). Note that: (1) Using MIGM the sensor data (various images) are analyzed to generate 3D point cloud for the room (the first 3D representation) and at least some of the ceiling and floor of the room and optionally with 3D points corresponding to other objects in the room; (2) MIGM automatically determines room shapes of the building's rooms (e.g., 3D room shapes, 2D room shapes, etc.) and automatically generates a floor plan; and (3) it is known to one having ordinary skills in the art that a floor plan specifies the locations and sizes of elements of the physical environment that are approximately planar/architectural, e.g., walls, floors, ceilings, openings, windows, doors, etc., including the boundaries of the room.
determining a 3D primitive representing a 3D object in the room based on the first 3D representation; (col. 4, lines 18-58, “the determining of the additional estimated room shape for the enclosing room using the visual data of the target image may further include using data from one or more IMU sensors of the camera device (e.g., using SLAM and/or SfM techniques) …using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g.,
3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”). Note that: (1) 3D bounding boxes for the detected elements are the 3D primitives representing the detected elements (countertops, bath tubs, sinks, islands, fireplaces, etc.); (2) a trained neural network can estimate a 3D point cloud (the 3D representation) of the walls and other surfaces of the enclosing room; and (3) a trained neural network can detect wall structural elements, borders, corners, and generate 3D bounding boxes for the detected objects.
generating a second 3D representation of the room by combining the 2D shapes to form at least a portion of the boundaries of the room and positioning the 3D primitive within the at least a portion of the boundaries of the room, wherein the second 3D representation has a different format than the first format; and (col. 4, lines 29-58, “using one or more trained neural networks or other techniques to estimate a 3D room shape shown in the target image-as non-exclusive examples, such 3D room shape estimation may include one or more of the following: using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”; col. 27, lines 35-38, “distances between one or more (e.g., all) structural wall elements identified in the estimated room shape may be measured ( e.g., using corresponding 3D bounding boxes for those structural wall element objects)”). Note that: (1) a 3D room shape is regarded as a second 3D representation of the room, and has a second format different from the first 3D representation (3D point cloud and/or estimated planar surfaces); (2) the 3D room shape includes or combines both 2D shapes (wall structural elements and borders of the room, etc.) and 3D bounding boxes (fixed structural elements, e.g., countertops, bath tubs, sinks, etc.); and (3) the 3D primitives (bounding boxes) are located, placed or positioned within the boundaries of the room in the 3D room plan with distance measurement to align objects according to the boundaries and avoid the conflict of object occupation, and it is obvious for one having ordinary skills in the art to understood this and operate accordingly.
Providing the second 3D representation of the room with the 2D shapes representing the boundaries and the 3D primitive representing the 3D object for display. (col. 43, lines 11-15, “and to optionally further use the generated mapping information, such as to provide the generated 2D floor plan and/or 3D computer model floor plan for display on one or more client devices and/or to one or more other devices”). Note that: (1) after the previous step of this claim above, the 3D representation of the room with the corresponding 2D shapes and 3D primitives of objects has been generated; and (2) the 3D representation of the room is equivalent to a 3D model floor plan of rooms that is provided and displayed on one or more client devices as cited above.
Before the effective filing date of the claimed invention, it would have been obvious to learn and use the teachings on generating a 3D room shape or 3D room representation including 2D shapes and 3D primitives for the detected elements that are obviously positioned inside the room and within the boundaries of the corresponding generated structural lines, outlines of, and borders, as taught by Cier. Although the positioning, locations and sizes of the 2D shapes (polygons and lines) and 3D primitives (bounding boxes) are not explicitly described by Cier using the application’s terms regarding computing methods and their relationships, it is obvious for one having ordinary skills in the art to understand the perform the corresponding operations, calculations, and determinations. The motivation would have been “automated operations for determining, generating and presenting information on a floor plan for a building based on images taken in the building interior, including to illustrate information in FIGS. 2E-2J that relates to determining one type of estimate of the likely shape of a room from analyzing images in the room” (col. 1 / line 66 – col. 2 / line 4). The suggestion for doing so would allow to generate a 3D room representation with 2D shapes and 3D primitives and perform various manipulations to improve the accuracy of a room plan. Therefore, it would have been obvious to use the teachings by Cier.
Regarding claim 2, Cier teaches The method of claim 1, wherein each of the 2D shapes is defined by parameters specifying a plurality of points that define a position and a size of a respective 2D polygon in a 3D coordinate system. (col. 6, lines 7-16, “analyzing visual data of the target panorama image and further visual data of another panorama image captured in the enclosing room (e.g., another previously acquired target image) to identify features (e.g., 2D features) visible in both the target panorama image and the other panorama image, and using offsets of positions of the identified features to determine a common coordinate system for the target panorama image and the other panorama image to use in combining the visual data”; col. 4, lines 32-56, “using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.)”). Note that: (1) all acquired sensor data (images) of rooms are based on a common coordinate system in 3D spatial space; (2) the generated wall structural elements consist of 2D shapes are formed by a series of point sets (wireframes, solid geometry vertices, borders, etc.) and the points constitute or specify various 2D polygons (e.g., vertexes as parameters); and (3) one having ordinary skills in the art can readily determine the sizes and positions of the polygons under the common coordinate system.
Regarding claim 3, Cier teaches The method of claim 1, wherein the 3D primitive is defined by parameters specifying a plurality of points that define a position and a size of the 3D primitive in a 3D coordinate system. (col. 6, lines 7-16, “analyzing visual data of the target panorama image and further visual data of another panorama image captured in the enclosing room (e.g., another previously acquired target image) to identify features (e.g., 2D features) visible in both the target panorama image and the other panorama image, and using offsets of positions of the identified features to determine a common coordinate system for the target panorama image and the other panorama image to use in combining the visual data”; col. 4, line 46-58, “using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”). Note that: (1) all acquired sensor data (images) of rooms are based on a common coordinate system in 3D spatial space; (2) the generated 3D bounding boxes are formed by a series of point sets (eight 3D points at least for each 3D box) and the points constitute or specify various 3D bounding boxes (e.g., vertexes as parameters); and (3) one having ordinary skills in the art can readily determine the sizes and positions of the 3D bounding boxes under the common coordinate system.
Regarding claim 4, Cier teaches The method of claim 1, wherein determining the 2D shapes comprises: detecting walls and wall openings based on the first 3D representation of the room;
performing a wall opening consistency process;
detecting windows or doors on walls of the room; or
estimating a wall or a wall opening height based on the first 3D representation. (col. 4, lines 32-58, “col. 4, lines 29-58, “using one or more trained neural networks or other techniques to estimate a 3D room shape shown in the target image-as non-exclusive examples, such 3D room shape estimation may include one or more of the following: using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid, geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”; col. 42 / line 66 – col. 43 / line 2, “In block 585, the routine further estimates heights of walls in some or all rooms, such as from analysis of images and optionally sizes of known objects in the images”). Note that: (1) A the first 3D representation (3D cloud point) of the 3D room shape is generated; (2) walls and wall openings (doorways) are detected in the generation process of the first 3D room representation; (3) windows and doors (doorways) are detected; and (4) wall heights are estimated.
Regarding claim 7, Cier teaches The method of claim 1, wherein determining the 2D shapes comprises refining edge positions of the 2D shapes based on the sensor data. (col. 28, lines 3-7, “once the acquisition position information is determined for such a target image, it may be shown on updated versions of the floor plan for the building, such as illustrated for updated floor plans 230u and 265u in information 255u of FIG. 2U”). Note that the 2D shapes as a part of a room floor that includes objects edges, outlines, and boundaries. When the floor plan is updated, the edges, outlines, and boundaries of the objects are updated or refined accordingly.
Regarding claim 8, Cier teaches The method of claim 1, wherein determining the 3D primitive comprises:
detecting the 3D object based on the sensor data;
refining object boundaries based on the sensor data; or
aligning the 3D primitive with at least one of the 2D shapes. (col. 4, lines 46-58, “using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid, geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”). Note that 3D objects are detected and the bounding boxes are generated.
Regarding claim 9, Cier teaches The method of claim 1, wherein determining the 3D primitive comprises:
detecting the 3D object based on the sensor data; (col. 4, lines 46-58, “using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid, geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”). Note that 3D objects are detected and the bounding boxes are generated.
refining object boundaries based on the sensor data; and (col. 28, lines 3-7, “once the acquisition position information is determined for such a target image, it may be shown on updated versions of the floor plan for the building, such as illustrated for updated floor plans 230u and 265u in information 255u of FIG. 2U”). Note that the 2D shapes as a part of a room floor that includes objects edges, outlines, and boundaries. When the floor plan is updated, the edges, outlines, and boundaries of the objects are updated or refined.
aligning the 3D primitive with at least one of the 2D shapes; and (col. 27, lines 35-42, “distances between one or more (e.g., all) 35 structural wall elements identified in the estimated room shape may be measured ( e.g., using corresponding 3D bounding boxes for those structural wall element objects), such as shown 234a for the two doorways at the lower left side of the rooms, and with the smaller the distance reflecting the better the match (and in some embodiments, the higher the matching score)”). Note that: (1) distance between structural wall elements (3D bounding boxes) are measured corresponding 3D bounding boxes (primitives) to improve the alignment of 3D bounding boxes to the estimated room shape including the 2D version of 3D room representation and to 2D structural elements (boundaries, outlines, and borders); and (2) It is obvious to one having ordinary skills in the art that the distance measurement can guide or align the 3D primitives with 2D shapes and avoid the conflict of occupation.
producing data specifying the position of the 3D primitive. Note that the bounding boxes are parts of 3D room shape, 3D room plan, or 3D room representation, and their positions relative the room shape are determined when they are detected and generated by the neural networks cited above.
Regarding claim 11, Cier teaches The method of claim 1, wherein obtaining the sensor data comprises obtaining the first 3D three-dimensional (3D) representation of the room based on the sensor data. (col. 4, lines 18-58, “the determining of the additional estimated room shape for the enclosing room using the visual data of the target image may further include using data from one or more IMU sensors of the camera device (e.g., using SLAM and/or SfM techniques) … using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and 50 other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”). Note that: (1) the 3D room shape which includes or combines both 2D shapes (wall structural elements and borders of the room, etc.) and 3D bounding boxes (fixed structural elements, e.g., countertops, bath tubs, sinks, etc.); and (2) using visual data includes using data from one or more IMU sensors of the camera device.
Claim 12 reciting “A system comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising:”, is corresponding to claim 1. Therefore, claim 12 is rejected for the same rationale for claim 1.
In addition, Cier teaches A system comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising: (col. 60, lines 40-47, “A system comprising: one or more hardware processors of one or more computing devices; and one or more memories with stored instructions that, when executed by at least one of the one or more hardware processors, cause at least one of the one or more computing devices to perform automated operations including at least:”; Fig. 3:”mobile computing device(s) as a system comprising “storage 365”, “memory 367”, and “CPU 361” coupled to “storage 365” and “memory 367”; col. 37, lines 44-49,“ Some or all of the components, systems and data structures may also be stored (e.g., as software instructions or structured data) on a non-transitory computer-readable storage mediums, such as a hard disk or flash drive or other non-volatile storage device”).
Claims 13-15, are corresponding to claims 2-4, respectively. Therefore, claims 13-15 are rejected for the same rationale for claims 2-4, respectively.
Regarding claim 18, Cier teaches The system of claim 12, wherein determining the 2D shapes comprises:
refining edge positions of the 2D shapes based on the sensor data;
detecting the 3D object based on the sensor data;
refining object boundaries based on the sensor data; or
aligning the 3D primitive with at least one of the 2D shapes. (col. 4, lines 46-58, “using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid, geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”). Note that 3D objects are detected and the bounding boxes are generated.
Claim 19 is corresponding to claim 9. Therefore, claim 19 is rejected for the same rationale for claim 9.
Claim 20 reciting “A non-transitory computer-readable storage medium storing program instructions executable via one or more processors to perform operations comprising:”, is corresponding to claim 1. Therefore, claim 20 is rejected for the same rationale for claim 1.
In addition, Cier teaches A non-transitory computer-readable storage medium storing program instructions executable via one or more processors to perform operations comprising: (col. 59, lines 20-22, “A non-transitory computer-readable medium having 20 stored contents that cause one or more computing devices to perform automated operations including at least”; Fig. 3:”mobile computing device(s) as a system comprising “storage 365”, “memory 367”, and “CPU 361” coupled to “storage 365” and “memory 367”; col. 37, lines 44-49,“ Some or all of the components, systems and data structures may also be stored (e.g., as software instructions or structured data) on a non-transitory computer-readable storage mediums, such as a hard disk or flash drive or other non-volatile storage device”).
Regarding claim 21, Cier discloses The method of claim 1, wherein the first 3D representation has a 3D point cloud format and the second 3D representation has a parametric representation format. (col. 3, lines 37-46, “using motion data from IMU sensors on the mobile computing device in combination with visual data from one or more image sensors on the mobile computing device, including in at least some such embodiments to use the additional data captured by the mobile computing device to generate an estimated three-dimensional ("3D") shape of the enclosing room (e.g., based on a 3D point cloud with a plurality of 3D data points and/or estimated planar surfaces of walls and optionally the floor and/or ceiling”; col. 4, lines 29-58, “using one or more trained neural networks or other techniques to estimate a 3D room shape shown in the target image-as non-exclusive examples, such 3D room shape estimation may include one or more of the following: using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”; col. 27, lines 35-38, “distances between one or more (e.g., all) structural wall elements identified in the estimated room shape may be measured ( e.g., using corresponding 3D bounding boxes for those structural wall element objects)”). Note that: (1) the first 3D representation includes 3D cloud and/or estimated planar surfaces. Therefore, the first 3D representation has a 3D point cloud format; and (2) the second 3D representation includes the 3D room shape which includes or combines both 2D shapes (wall structural elements and borders of the room, etc.) and 3D bounding boxes (fixed structural elements, e.g., countertops, bath tubs, sinks, etc.). Therefore, the second 3D representation (2D shapes and 3D bounding boxes) has a parametric representation format.
Claims 5-6 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Cier in view of Howe et al. (US 2010/0268512 A1, hereinafter “Howe”).
Regarding claim 5, Cier teaches The method of claim 1, wherein determining the 2D shapes comprises:
detecting walls and wall openings based on the first 3D representation of the room;
performing a wall opening consistency process during the scan;
detecting windows or doors on walls of the room;
estimating a wall or a wall opening height based on the first 3D representation; and producing a floor plan based on the walls, wall openings, windows, doors, wall heights, and wall opening heights. (col. 4, lines 32-58, “using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid, geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements; col. 42 / line 66 – col. 43 / line 2, “In block 585, the routine further estimates heights of walls in some or all rooms, such as from analysis of images and optionally sizes of known objects in the images”). Note that: (1) the first 3D representation (3D point cloud) of the 3D room shape is generated; (2) walls and wall openings (doorways) are detected in the generation process of the 3D room representation; (3) windows and doors (doorways) are detected; (4) wall heights are estimated; (5) the operations to generate the 3D room representation including 2D shapes and 3D primitive are integrated during the room scan; and (6) since the wall openings are detected by the neural networks and are included in the 3D room plan, the wall opening heights can be readily determined by one having ordinary skills in the art.
However, Cier fails to teach, but in the same art of analyzing data, Howe discloses performing a wall opening consistency process; (Howe, page 3, para. [0049], “a structural feature 270 and a wall opening vector 272 may conflict about whether there is a wall or an opening. Structural feature 270 indicates that there is a wall where wall opening vector 272 indicates there is an opening. Vector 270 is removed in order to allow wall opening vector 272 to constrain the solution”; page 4, claim 6, “determining that a wall opening vector of a first structural feature conflicts with a second structural feature; and removing the wall opening vector”). Note that: (1) the conflict of a wall opening between two structural features is determined; and (2) the conflict is resolved by removing the wall opening vector, indicating a consistency process is performed.
Cier and Howe are in the same field of endeavor, namely analyzing data. Before the effective filing date of the claimed invention, it would have been obvious to apply performing a wall opening conflict process, as taught by Howe into Cier. The motivation would have been “maintaining a wall opening vector to constrain a solution” (Howe, page 3, para. [0049]). The suggestion for doing so would allow to process the wall opening conflict for consistency guarantee. Therefore, it would have been obvious to combine Cier and Howe.
Regarding claim 6, Cier in view of Howe teaches The method of claim 5, wherein the floor plan is produced during the scan. (Cier, col. 10, lines 62-67, “a Mapping Information Generation Manager (MIGM) system may analyze various images acquired in and around a building in order to automatically determine room shapes of the building's rooms (e.g., 3D room shapes, 2D room shapes, etc.) and to automatically generate a floor plan for the building”; col. 4, lines 32-56, “using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.)”). Note that: (1) the room scan and generation of 3D room representation including 2D shapes are integrated during the room scan; and (2) the floor plan is generated as 2D part of outputs of the corresponding neural networks and Mapping Information Generation Manager (MIGM) system.
Claims 16-17 are corresponding to claims 5-6, respectively. Therefore, claims 16-17 are rejected for the same rationale for claims 5-6.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Cier in view of Palmer (US 11094042 B1, hereinafter “Palmer”).
Regarding claim 10, Cier teaches The method of claim 1, wherein generating the second 3D representation of the room is further based on detecting a mirror in the room. (col. 4, lines 32-49, “using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways”). Note that: (1) the generation of 3D room representation includes the room scan and generation of 3D room representation including 2D shapes are integrated during the room scan; and (2) the wall structural elements ((e.g., windows and/or sky-lights) are detected in the room.
However, Cier fails to teach, but in the same art of analyzing data, Palmer discloses a mirror in the room (Palmer, col. 4, lines 47-49, “mirrors in interior rooms and mirrored building surfaces in external environments result in relatively sharp reflected images”). Note that mirrors in rooms are reflected and result in different images.
Cier and Palmer are in the same field of endeavor, namely analyzing data. Before the effective filing date of the claimed invention, it would have been obvious to apply a mirror in the room, as taught by Palmer into Cier. The motivation would have been “effectively operate in interior spaces which include reflective surfaces (e.g., mirrors, windows, translucent partitions or doors, etc.)” (Palmer, col. 2, lines 9-11) The suggestion for doing so would allow to analyze the reflectiveness of planer surface of an object and detect the mirror objects. Therefore, it would have been obvious to combine Cier and Palmer.
Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Cier in view of Hu et al. (US 20220012942 A1, hereinafter “Hu”).
Regarding claim 22, Cier discloses The method of claim 1, wherein the first 3D representation (col. 4, lines 29-58, “using one or more trained neural networks or other techniques to estimate a 3D room shape shown in the target image-as non-exclusive examples, such 3D room shape estimation may include one or more of the following: using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”; col. 27, lines 35-38, “distances between one or more (e.g., all) structural wall elements identified in the estimated room shape may be measured ( e.g., using corresponding 3D bounding boxes for those structural wall element objects)”). Note that: the second 3D representation includes the 3D room shape which includes or combines both 2D shapes (wall structural elements and borders of the room, etc.) and 3D bounding boxes (fixed structural elements, e.g., countertops, bath tubs, sinks, etc.). Therefore, the second 3D representation (2D shapes and 3D bounding boxes) has a parametric representation format.
However, Cier fails to disclose, but in the same art of computer graphics, Hu discloses
has a 3D mesh format (Hu, page 10, claim 1, “A method for generating a mesh representation of a surface comprising: receiving a three-dimensional 3D) point cloud representing the surface; generating, from the 3D point cloud, a reconstruction dataset having a higher resolution than the 3D point cloud in one or more regions corresponding to the surface; and generating, using the reconstruction dataset, a polygon mesh representation of the surface by using a fine-to-coarse hash map for building polygons at a highest resolution first followed by progressively coarser resolution polygons”). Note that: (1) the first 3D representation for claim 1 includes 3D cloud and/or estimated planar surfaces. Therefore, the first 3D representation for claim 1 has a 3D point cloud format; (2) The teaching by Hu can generate a mesh representation from 3D point cloud. Therefore, the first 3D representation can be converted into a 3D mesh representation so that the first 3D representation has a 3D mesh format.
Cier and Hu are in the same field of endeavor, namely computer graphics. Before the effective filing date of the claimed invention, it would have been obvious to apply generating a mesh representation from the 3D point cloud, as taught by Hu into Cier. The motivation would have been “receiving a three-dimensional (3D) point cloud representing the surface, generating a reconstruction dataset having a higher resolution than the 3D point cloud” (Hu, Abstract). The suggestion for doing so would allow them to have a mesh representation having higher resolution than the 3D point cloud. Therefore, it would have been obvious to combine Cier and Hu.
Response to Arguments
Applicant's arguments with respect to claim rejection 35 U.S.C. 103, have been fully considered but they are not persuasive.
Applicant alleges, “Cier describes using multiple devices to determine a location. It describes estimating a room shape using visual data of a target image. However, it does not suggest determining 2D shapes and a 3D primitive based on a first 3D representation of a room and generating a second 3D representations by combining those 2D shapes and the 3D primitive, where the second 3D representation has a different format than the first 3D representation, as recited in the independent claims. Accordingly, withdrawal of the rejection of all claims is respectfully requested. Favorable consideration of new dependent claims 21-22 is also respectfully requested.” (page 8, lines 19-25). However, Examiner respectfully disagrees about the respective allegations as whole because:
Cier discloses obtaining sensor data of a room in a physical environment during a scan of the room, wherein the sensor data contributes to at least a portion of a first three-dimensional (3D) representation of the room in a first format; (col. 3, lines 37-46, “using motion data from IMU sensors on the mobile computing device in combination with visual data from one or more image sensors on the mobile computing device, including in at least some such embodiments to use the additional data captured by the mobile computing device to generate an estimated three-dimensional ("3D") shape of the enclosing room (e.g., based on a 3D point cloud with a plurality of 3D data points and/or estimated planar surfaces of walls and optionally the floor and/or ceiling”). Note that: (1) image data of the room are acquired using one or more image sensors for the room; (2) an estimated three-dimensional ("3D") shape of the enclosing room is mapped into a first 3D representation; and (3) the first 3D representation can be in a format of 3D point cloud and/or estimated planar surfaces as a first format.
Cier discloses determining two-dimensional (2D) shapes representing boundaries of the room based on the first 3D representation; (col. 5, lines 52-57, “analyzing the visual data of the target panorama image (or other target image) and the additional visual data of one or more further images captured by the mobile computing device to identify features (e.g., 2D features) visible in both that visual data and that additional visual data”; col. 10 / lines 62 – col. 11 / line 28, “a Mapping Information Generation Manager (MIGM) system may analyze various images acquired in and around a building in order to automatically determine room shapes of the building's rooms (e.g., 3D room shapes, 2D room shapes, etc.) and to automatically generate a floor plan for the building … using SLAM techniques for multiple video frame images and/or other SfM techniques for a 'dense' set of images that are separated by at most a defined distance (such 15 as 6 feet) to generate a 3D point cloud for the room including 3D points along walls of the room and at least some of the ceiling and floor of the room and optionally with 3D points corresponding to other objects in the room, etc.) and/or by determining and aggregating information about planes for detected features and normal ( orthogonal) directions to those planes to identify planar surfaces for likely locations of walls and other surfaces of the room and to connect the various likely wall locations and form an estimated room shape for the room”). Note that: (1) Using MIGM the sensor data (various images) are analyzed to generate 3D point cloud for the room (the first 3D representation) and at least some of the ceiling and floor of the room and optionally with 3D points corresponding to other objects in the room; (2) MIGM automatically determines room shapes of the building's rooms (e.g., 3D room shapes, 2D room shapes, etc.) and automatically generates a floor plan; and (3) it is known to one having ordinary skills in the art that a floor plan specifies the locations and sizes of elements of the physical environment that are approximately planar/architectural, e.g., walls, floors, ceilings, openings, windows, doors, etc., including the boundaries of the room.
Cier discloses determining a 3D primitive representing a 3D object in the room based on the first 3D representation; (col. 4, lines 18-58, “the determining of the additional estimated room shape for the enclosing room using the visual data of the target image may further include using data from one or more IMU sensors of the camera device (e.g., using SLAM and/or SfM techniques) …using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”). Note that: (1) 3D bounding boxes for the detected elements are the 3D primitives representing the detected elements (countertops, bath tubs, sinks, islands, fireplaces, etc.); (2) a trained neural network can estimate a 3D point cloud (the 3D representation) of the walls and other surfaces of the enclosing room; and (3) a trained neural network can detect wall structural elements, borders, corners, and generate 3D bounding boxes for the detected objects.
Cier discloses generating a second 3D representation of the room by combining the 2D shapes to form at least a portion of the boundaries of the room and positioning the 3D primitive within the at least a portion of the boundaries of the room, wherein the second 3D representation has a different format than the first format; and (col. 4, lines 29-58, “using one or more trained neural networks or other techniques to estimate a 3D room shape shown in the target image-as non-exclusive examples, such 3D room shape estimation may include one or more of the following: using a trained convolutional neural network or other analysis technique to take the target image as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image; using a trained neural network or other analysis technique to take the target image as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image ( e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; comers (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements”; col. 27, lines 35-38, “distances between one or more (e.g., all) structural wall elements identified in the estimated room shape may be measured ( e.g., using corresponding 3D bounding boxes for those structural wall element objects)”). Note that: (1) a 3D room shape is regarded as a second 3D representation of the room, and has a second format different from the first 3D representation (3D point cloud and/or estimated planar surfaces); (2) the 3D room shape includes or combines both 2D shapes (wall structural elements and borders of the room, etc.) and 3D bounding boxes (fixed structural elements, e.g., countertops, bath tubs, sinks, etc.); and (3) the 3D primitives (bounding boxes) are located, placed or positioned within the boundaries of the room in the 3D room plan with distance measurement to align objects according to the boundaries and avoid the conflict of object occupation, and it is obvious for one having ordinary skills in the art to understood this and operate accordingly.
Cier discloses Providing the second 3D representation of the room with the 2D shapes representing the boundaries and the 3D primitive representing the 3D object for display. (col. 43, lines 11-15, “and to optionally further use the generated mapping information, such as to provide the generated 2D floor plan and/or 3D computer model floor plan for display on one or more client devices and/or to one or more other devices”). Note that: (1) after the previous step of this claim above, the 3D representation of the room with the corresponding 2D shapes and 3D primitives of objects has been generated; and (2) the 3D representation of the room is equivalent to a 3D model floor plan of rooms that is provided and displayed on one or more client devices as cited above.
Before the effective filing date of the claimed invention, it would have been obvious to learn and use the teachings on generating a 3D room shape or 3D room representation including 2D shapes and 3D primitives for the detected elements that are obviously positioned inside the room and within the boundaries of the corresponding generated structural lines, outlines of, and borders, as taught by Cier. Although the positioning, locations and sizes of the 2D shapes (polygons and lines) and 3D primitives (bounding boxes) are not explicitly described by Cier using the application’s terms regarding computing methods and their relationships, it is obvious for one having ordinary skills in the art to understand the perform the corresponding operations, calculations, and determinations. The motivation would have been “automated operations for determining, generating and presenting information on a floor plan for a building based on images taken in the building interior, including to illustrate information in FIGS. 2E-2J that relates to determining one type of estimate of the likely shape of a room from analyzing images in the room” (col. 1 / line 66 – col. 2 / line 4). The suggestion for doing so would allow to generate a 3D room representation with 2D shapes and 3D primitives and perform various manipulations to improve the accuracy of a room plan. Therefore, it would have been obvious to use the teachings by Cier.
Cier discloses all limitations of claim 1.
Independent claims 12 and 20 are corresponding to claim 1, respectively. Therefore, independent claims 12 and 20 are rejected for the same rationale for claim 1, respectively.
All dependent claims including new claims 21-22 are rejected with the corresponding rationales, respectively. Please see the respective details above.
The arguments are not persuasive.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BIAO CHEN whose telephone number is (703)756-1199. The examiner can normally be reached M-F 8am-5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee M Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611
/Biao Chen/
Patent Examiner, Art Unit 2611