DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
2. The information disclosure statement (IDS) submitted on 03/24/2025. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
3. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
4. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
6. Claim(s) 1-5 and 14-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gausebeck et al. (US 2019/0026956 A1) in view of Speasl et al. (US 2019/0236732 A1).
7. With reference to claim 1, Gausebeck teaches A resource efficient computing system configured to run on a camera enabled hand-held computing device, (“system 100 includes a user device 3002 with one or more cameras 1404 configured to capture 2D image data (e.g., including panoramic images and video) and of an object or environment and 3D-from-2D processing module 1406 configured to derive depth data for one or more 2D images included in the 2D image data. … The user device 3002 can include various types of computing devices that include one or more cameras on or within a housing thereof configured to capture 2D image data of an environment and a display/rendering component 1408 that include hardware and/or software that facilitates rendering digital objects on or within a representation of an environment via a display of the user device 3002, as hologram, or the like. For example, in some embodiments, the user device 3002 can include an AR headset configured to be worn by a user and including a display (e.g., a transparent glass display) that is position in front of the user's eyes (e.g., glasses, goggles, an HUD, etc.). In another, embodiment the user device can be or include a mobile handheld device, such as a mobile phone or smartphone, a tablet PC, or a similar device. Still in other embodiments, the user device 3002 can include a device that can be positioned in a relatively fixed position relative to an environment, such as a laptop PC, a desktop PC or the Like.” [0216]) Gausebeck also teaches the system configured to determine geometry and semantic information for a virtual representation of a location in real time with spatially localized information of elements within the location being embedded in the virtual representation, (“the 3D models can include immersive virtual reality VR environments that can be navigated as facilitated by the navigation component 126. In the embodiment shown, the reconstructed representations/3D models and associated alignment data generated by the 3D model generation component 118 is identified as 3D model and alignment data 128. System 100 can further include a suitable user device 130 comprising a display 132 that can receive and render the reconstructed/3D models generated by the 3D model generation component 118. For example, the user device 130 can include but is not limited to: a desktop computer, a laptop computer, a mobile phone, a smartphone, a tablet personal computer (PC), a personal digital assistant (PDA), a heads-up display (HUD), a virtual reality (VR) headset, augmented reality (AR) headset or device, a standalone digital camera, or another type of wearable computing device.” [0062] “the 3D model generation component 118 can employ the derived 3D data 116 for respective images received by the computing device 104 to generate reconstructed 3D models of objects or environments included in the images. The 3D models described herein can include data representing positions, geometric shapes, curved surfaces, and the like. … Portions of the 3D model geometric data (e.g., the mesh) can include image data describing texture, color, intensity, and the like. For example, the geometric data can comprise data points of geometry in addition to comprising texture coordinates associated with the data points of geometry (e.g., texture coordinates that indicate how to apply texture data to geometric data). In various embodiments, received 2D image data 102 (or portions thereof) can be associated with portions of the mesh to associate visual data from the 2D image data 102 (e.g., texture data, color data, etc.) with the mesh.” [0070-0071] “the derived 3D data can include depth information for each and every pixel of a single 2D image, depth information for subsets or groups of pixels (e.g., superpixels), depth information for only one or more portions of a 2D image, and the like. In some implementations, the 2D images can also be associated with additional known or derived spatial information that can be used facilitate aligning the 2D image data to one another in the 3D coordinate space, including but not limited to, the relative capture position and the relative capture orientation of the respective 2D images relative to the 3D coordinate space.” [0074] “the 3D model generation component 118 can be configured to generate such reconstructed 3D models in real-time or substantially real-time as the 2D image data is received and the derived 3D data 116 for the 2D image data is generated. Accordingly, a user viewing the rendered 3D model be provided with live or substantially live feedback during the entire alignment process regarding the progression of the 3D model as new 2D image data 102 is received and aligned.” [0082] “The semantic labeling component 928 can be configured to process 2D image data 102 to determine semantic labels for features included in the image data.” [0161]) Gausebeck further teaches the system comprising machine-readable instructions configured to be executed by one or more hardware processors to: (“The devices and/or systems described in FIGS. 14-25 can include machine-executable components embodied within machine(s), e.g. embodied in one or more computer readable mediums (or media) associated with one or more machines. Such components, when executed by the one or more machines, e.g. computer(s), computing device(s), virtual machine(s), etc. can cause the machine(s) to perform the operations described. In this regard, although not shown, the devices and/or systems described in FIGS. 14-25 can include or be operatively coupled to at least one memory and at least one processor. The at least one memory can further store the computer-executable instructions/components that when executed by the at least one processor facilitate performance of operations defined by the computer-executable instructions/components.” [0178]) Gausebeck teaches receive video data of the location, the video data being generated via a camera, the video data comprising a plurality of successive frames; (“related 2D images can include neighboring images, images with partially overlapping fields-of-view, images with slightly different capture positions and/or capture orientations, stereo-image pairs, images providing different perspectives of a same object or environment captured at significantly different capture locations (e.g., beyond a threshold distance so as to not constitute a stereo-image pair, such as greater than the inter-ocular distance, which is about 6.5 centimeters), and the like. The source of the related 2D images included in the 2D image data 102 and the relationship between the related 2D images can vary. For example, in some implementations, the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time. In another example, similar to the sequential frames of video captured by a video camera, an entity (e.g., a user, a robot, an autonomous navigating vehicle, etc.) can capture several 2D images of an environment using a camera in association with movement of the entity about the environment.” [0141]) Gausebeck also teaches determine, with a depth estimation module, depth information for each of the plurality of successive frames of the video data; (“a method for using panoramic image data to generate accurate depth predictions using 3D-from-2D is provided that can comprise receiving, by a system operatively coupled to a processor, a request for depth data associated with a region of an environment depicted in a panoramic image. The method can further comprise, based on the receiving, deriving, by the system, depth data for an entirety of the panoramic image using a neural network model configured to derive depth data from a single 2D image. The method can further comprise extracting, by the system, a portion of the depth data corresponding to the region of the environment, and providing, by the system, the portion of the depth data to an entity associated with the request.” [0034] “in some implementations, the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time. … SLAM techniques employ algorithms that are configured to algorithms simultaneously localize (e.g., determine the position and orientation of) a capture device (e.g., a 2D image capture device or 3D capture device) with respect to its surroundings, while at the same time mapping the structure of that environment. SLAM algorithms can involve tracking sets of points through a sequence of images using these tracks to triangulate the 3D positions of the points, while simultaneously using the point locations to determine the relative position/orientation of the capture device that captured them. In this regard, in addition to determining the position/orientation of the capture device, SLAM algorithms can also be used to estimate depth information for features included in one or more images of the sequence of images.” [0141-0142] “The depth estimation component 916 can also evaluate related images to estimate depth data for one or more of the related images. For example, in some embodiments, the depth estimation component 916 can employ SLAM to estimate depth data based on a sequence of images. The depth estimation component 916 can also employ related photogrammetry techniques to determine depth information for a 2D images based on one or more related images.” [0144]) Gausebeck further teaches aggregate, with a reconstruction and rendering module, using surfels, the depth information for each of the plurality of successive frames of video data to generate a 3-dimensional (3D) model of the location; render, with the reconstruction and rendering module, the 3D model in real time for display on the hand-held computing device; (“the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time.” [0142] “Systems 100, 500, 800, and 1300 discussed above respectively depict an architecture wherein 2D image data, and optionally auxiliary data associated with the 2D image data, is received and processed by a universal computing device (e.g., computing device 104) to generate derived depth data for the 2D images, generate 3D reconstructed models and/or facilitate navigation of the 3D reconstructed models. For example, the universal computing device can be or correspond to a server device, a client device, a virtual machine, a cloud computing device, etc. Systems 100, 500, 800, and 1300 further include a user device 130 configured to receive and display the reconstructed models,” [0172] “The user device 1402 can also include 3D model generation component 118 to generate reconstructed 3D models based on the 3D data and the 2D image data, and a display/rendering to facilitate presenting the 3D reconstructed models at the user device 1402 (e.g., via a device display).” [0179] “the user device 3002 can include an AR headset configured to be worn by a user and including a display (e.g., a transparent glass display) that is position in front of the user's eyes (e.g., glasses, goggles, an HUD, etc.). In another, embodiment the user device can be or include a mobile handheld device, such as a mobile phone or smartphone, a tablet PC, or a similar device. Still in other embodiments, the user device 3002 can include a device that can be positioned in a relatively fixed position relative to an environment, such as a laptop PC, a desktop PC or the. Like.” [0216] “The AR component 3004 can employ 3D/depth data derived by the 3D-from-2D processing module 1406 from live 2D image data (e.g., snapshots or video frames) of an object or environment captured via the one or more cameras 1404 to facilitate various AR applications. In particular, the AR component 3004 can employ the 3D-from-2D techniques described herein to facilitate enhancing various AR applications with more accurate and photorealist integration of AR data objects as overlays onto a live view of an environment. In this regard, in accordance with various embodiments, the one or more cameras 1404 can capture live image data of an environment that corresponds to a current perspective of the environment view on or through a display of the user device 3002. The 3D-from-2D processing module 1406 can further derive depth data from the image data in real-time or substantially real-time.” [0220] “a mesh comprising a series of triangles, lines, curved surfaces (e.g. non-uniform rational basis splines (NURBS)), quads, n-grams, or other geometric shapes can connect the collection of points. For example, a 3D model of an interior environment of building can comprise mesh data (e.g., a triangle mesh, a quad mesh, a parametric mesh, etc.), one or more texture-mapped meshes (e.g., one or more texture-mapped polygonal meshes, etc.), a point cloud, a set of point clouds, surfels and/or other data constructed by employing one or more 3D sensors. In some implementations, portions of the 3D model geometric data (e.g., the mesh) can include image data describing texture, color, intensity, and the like.” [0247])
PNG
media_image1.png
270
529
media_image1.png
Greyscale
Gausebeck does not explicitly teach generate, based on the 3D model, a virtual representation of the location by annotating the 3D model with spatially localized data associated with the location. This is what Speasl teaches (“The UGV 180 drives a path 185 over the surface 150 around the structure 120, may test soil at the surface 150 and underground 155 at various points along the path 185 while outside the structure 120, and enters the interior 135 of the structure 120. Once the unmanned vehicles 105 and 180 are in the interior 135 the structure 120, they may map or model a virtual layout of the interior 135 as discussed further with respect to FIG. 2A and FIG. 2B. Digital media data gathered by the sensors of the UAV 105, the sensors of the UGV 180, and optionally other sensors may be combined, for example using a space mapping algorithm, to generate a two-dimensional or three-dimensional layout or model 190 of the property 110 and the structure 120 within it as illustrated in and discussed further with respect to FIG. 2B.” [0028-0029] “The generated layout or model 190 may include various “references” or “links” or “hyperlinks” or “pointers” at specific locations within the layout 190 that allow a user viewing the layout 190 to view the original media data captured at the corresponding location within the actual property … a first reference 160 is a reference image 160 identifying damage to the roof 140. The UAV 105 or UGV 180, or a server or other computer system 1300 that the UAV 105 or UGV 180 sends its media data to upon capture, may automatically identify irregularities in the property such as damage, and automatically mark those areas with reference images such as the reference image 160. Capture data associated with the reference image 160 shows it was captured at latitude/longitude coordinates (37.79, −122.39), that the capture device was facing north-east at the time of capture (more precise heading angle data may be used instead), that the capture device was at an altitude of 20 meters when this image 160 was captured, and that the inclination of the capture device was −16 degrees at capture.” [0034-0035]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Speasl into Gausebeck, in order to interact with a specific location and improve the property analysis.
8. With reference to claim 2, Gausebeck teaches the 3D model comprises components of the location, the components of the location comprising one or more rooms, a layout, walls, doors, windows, ceilings, openings, and/or floors. (“in association with generating a 3D model of an environment, the 3D model generation component 118 can determine positions of objects, barriers, flat planes, and the like. For example, based on aligned 3D data derived for respective images captured of the environment, the 3D model generation component 116 can identify barriers, walls, objects (e.g., countertops, furniture, etc.), or other 3D features included in the aligned 3D data” [0089] “a 3D model generated by the 3D model generation component 118 can be associated with tags at various defined locations relative to the 3D model that identify objects (e.g., appliances, furniture, walls, buildings, etc.), provide information about the objects, provide hyperlinks to applications associated with the objects, and the like.” [0227])
9. With reference to claim 3, Gausebeck teaches the 3D model comprises contents of the location, the contents of the location comprising furniture, wall hangings, personal items, and/or appliances. (“The navigation component 126 can provide various navigation tools that allow a user to provide input that facilitates viewing different parts or perspectives of the 3D model and interacting with the different parts of the 3D model. These navigation tools can include but are not limited to: selecting a location on the representation of the 3D model for viewing (e.g., which can include a point, an area, an object, a room, a surface, etc.), selecting a location on the representation of the 3D model for positioning a virtual camera (e.g., including a waypoint), … in association with generating a 3D model of an environment, the 3D model generation component 118 can determine positions of objects, barriers, flat planes, and the like. For example, based on aligned 3D data derived for respective images captured of the environment, the 3D model generation component 116 can identify barriers, walls, objects (e.g., countertops, furniture, etc.), or other 3D features included in the aligned 3D data” [0088-0089] “a 3D model generated by the 3D model generation component 118 can be associated with tags at various defined locations relative to the 3D model that identify objects (e.g., appliances, furniture, walls, buildings, etc.), provide information about the objects, provide hyperlinks to applications associated with the objects, and the like.” [0227])
10. With reference to claim 4, Gausebeck teaches the spatially localized data comprises dimensional information associated with components and/or contents of the location; color information associated with the components and/or contents of the location; geometric properties of the components and/or contents of the location; a condition of the components and/or contents of the location; audio, visual, or natural language notes; and/or metadata associated with the components and/or contents of the location. (“the derived 3D data can include depth information for each and every pixel of a single 2D image, depth information for subsets or groups of pixels (e.g., superpixels), depth information for only one or more portions of a 2D image, and the like. In some implementations, the 2D images can also be associated with additional known or derived spatial information that can be used facilitate aligning the 2D image data to one another in the 3D coordinate space, including but not limited to, the relative capture position and the relative capture orientation of the respective 2D images relative to the 3D coordinate space. …The alignment process can involve iteratively aligning different point clouds from neighboring and overlapping images captured from different positions and orientations relative to an object or environment to generate a global alignment between the respective point clouds using correspondences in derived position information for the respective points. Visual feature information including correspondences in color data, texture data, luminosity data, etc. for respective points or pixels included in the point clouds can also be used (along with other sensor data if available) to generate the aligned data.” [0074-0075] “The 3D model generation component 118 can further remove objects photographed (e.g., walls, furniture, fixtures, etc.) from the 3D model, integrate new 2D and 3D graphical objects on or within the 3D model in spatially aligned positions relative to the 3D model, change the appearance of visual features of the 3D model (e.g., color, texture, etc.), and the like. …A floorplan model can contain locations of boundary edges for each given surface, portal (e.g., door opening), and/or window opening. A floorplan model can also include one or more objects. Alternatively, a floorplan can be generated without objects (e.g., objects can be omitted from a floorplan). In some implementations, a floorplan model can include one or more dimensions associated with surfaces (e.g., walls, floors, ceilings, etc.), portals (e.g., door openings) and/or window openings.” [0077-0078] “the received 2D image data 102 can include 2D images captured by a 2D/3D capture device or 2D/3D capture device assembly that includes one or more 3D sensors in addition to one or more 2D cameras (e.g., RGB cameras). In various implementations, the 2D/3D capture device can be configured to capture a 2D image using the one or more cameras (or one or more camera lenses) and associated depth data for the 2D image using the one or more 3D sensors simultaneously (e.g., at or near the same time), or in a manner in which they can be correlated after capture if not simultaneously. The level of sophistication (e.g., complexity, hardware cost, etc.) of the such a 2D/3D capture device/assembly, can vary. For example, in some implementations, to reduce cost, the 2D/3D capture device can include one or more cameras (or one or more camera lenses) and limited range/field-of-view 3D sensor configured to capture partial 3D data for a 2D image. One version of such a 2D/3D capture device can include a 2D/3D capture device that produces spherical color images plus depth data. … it should be appreciated that 2D image data 102 can be received with native auxiliary data 802 associated therewith as single data object/file, as metadata, or the like. For example, the 2D image data 102 can include 2D images with 3D sensor depth data captured for the 2D images associated therewith, metadata describing the camera/image parameters, and the like.” [0135-0136])
11. With reference to claim 5, Gausebeck teaches the depth estimation module is configured to determine the depth information for each of the plurality of successive frames of the video data at a rate sufficient for real time virtual representation generation. (“a method for using panoramic image data to generate accurate depth predictions using 3D-from-2D is provided that can comprise receiving, by a system operatively coupled to a processor, a request for depth data associated with a region of an environment depicted in a panoramic image. The method can further comprise, based on the receiving, deriving, by the system, depth data for an entirety of the panoramic image using a neural network model configured to derive depth data from a single 2D image. The method can further comprise extracting, by the system, a portion of the depth data corresponding to the region of the environment, and providing, by the system, the portion of the depth data to an entity associated with the request.” [0034] “in some implementations in which the user is facilitating or controlling capture of the 2D image data 102 used to create a 3D model, system 100 can facilitate providing the user with real-time or live feedback over the course of the capture process regarding the progression of the 3D model generated based on the captured and aligned 2D image data (and derived 3D data).” [0082] “in some implementations, the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time. … SLAM techniques employ algorithms that are configured to algorithms simultaneously localize (e.g., determine the position and orientation of) a capture device (e.g., a 2D image capture device or 3D capture device) with respect to its surroundings, while at the same time mapping the structure of that environment. SLAM algorithms can involve tracking sets of points through a sequence of images using these tracks to triangulate the 3D positions of the points, while simultaneously using the point locations to determine the relative position/orientation of the capture device that captured them. In this regard, in addition to determining the position/orientation of the capture device, SLAM algorithms can also be used to estimate depth information for features included in one or more images of the sequence of images.” [0141-0142] “The depth estimation component 916 can also evaluate related images to estimate depth data for one or more of the related images. For example, in some embodiments, the depth estimation component 916 can employ SLAM to estimate depth data based on a sequence of images. The depth estimation component 916 can also employ related photogrammetry techniques to determine depth information for a 2D images based on one or more related images.” [0144])
12. With reference to claim 14, Gausebeck teaches the video data is captured by a mobile computing device associated with a user and transmitted to the one or more hardware processors without user interaction. (“the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time.” [0141] “the reception/communication component 1604 can be or include various hardware and software devices associated with establishing and/or conducting wireless communication between the user device 1602 and an external device. For example, reception/communication component 1604 can control operation of a transmitter-receiver or transceiver (not shown) of the user device to receive information from the capture device 1601 (e.g., 2D image data), provide information to the capture device 1601, and the like.” [0186] “, the user device 3002 can include an AR headset configured to be worn by a user and including a display (e.g., a transparent glass display) that is position in front of the user's eyes (e.g., glasses, goggles, an HUD, etc.). In another, embodiment the user device can be or include a mobile handheld device, such as a mobile phone or smartphone, a tablet PC, or a similar device. Still in other embodiments, the user device 3002 can include a device that can be positioned in a relatively fixed position relative to an environment, such as a laptop PC, a desktop PC or the. Like.” [0216])
13. With reference to claim 15, Gausebeck teaches receiving the video data of the location comprises receiving a real time video stream of the location. (“the reception component 108 can receive 2D image data in real-time as it is captured, (or within substantially real-time as it is captured such that it is received within a manner of seconds of capture) to facilitate real-time processing applications associated with real-time derivation of 3D data from the 2D image data, including real-time generation and rendering of 3D models based on the 2D image data, live object tracking, live relative position estimation, live AR applications, and the like.” [0064] “the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time. … the orientation estimation component 912 and/or the position estimation component 914 can employ capture device motion data 904, capture device location data 906, and/or 3D sensor data 910 in association with evaluating a sequence of images using visual odometry and/or SLAM to determine the capture position/orientation of a 2D image. SLAM techniques employ algorithms that are configured to algorithms simultaneously localize (e.g., determine the position and orientation of) a capture device (e.g., a 2D image capture device or 3D capture device) with respect to its surroundings, while at the same time mapping the structure of that environment. SLAM algorithms can involve tracking sets of points through a sequence of images using these tracks to triangulate the 3D positions of the points, while simultaneously using the point locations to determine the relative position/orientation of the capture device that captured them. In this regard, in addition to determining the position/orientation of the capture device, SLAM algorithms can also be used to estimate depth information for features included in one or more images of the sequence of images.” [0141-0142])
14. With reference to claim 16, Gausebeck teaches generating the virtual representation comprises generating or updating the 3D model based on the real time video stream of the location. (“the 3D models can include immersive virtual reality VR environments that can be navigated as facilitated by the navigation component 126. In the embodiment shown, the reconstructed representations/3D models and associated alignment data generated by the 3D model generation component 118 is identified as 3D model and alignment data 128. System 100 can further include a suitable user device 130 comprising a display 132 that can receive and render the reconstructed/3D models generated by the 3D model generation component 118. For example, the user device 130 can include but is not limited to: a desktop computer, a laptop computer, a mobile phone, a smartphone, a tablet personal computer (PC), a personal digital assistant (PDA), a heads-up display (HUD), a virtual reality (VR) headset, augmented reality (AR) headset or device, a standalone digital camera, or another type of wearable computing device.” [0062] “as new images are captured, they can be provided to the computing device 104, and 3D data can be derived for the respective images and used to align them to generate a 3D model of the environment. The 3D model can further be rendered at the user device 130 and updated in real-time based on new image data as it is received over the course of capture of the 2D image data. With these embodiments, system 100 can thus provide visual feedback during the capture process regarding the 2D image data that has been captured and aligned based on derived 3D data for the 2D image data, as well as the quality of the alignment and the resulting 3D model generated therefrom.” [0082] “the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time. … the orientation estimation component 912 and/or the position estimation component 914 can employ capture device motion data 904, capture device location data 906, and/or 3D sensor data 910 in association with evaluating a sequence of images using visual odometry and/or SLAM to determine the capture position/orientation of a 2D image. SLAM techniques employ algorithms that are configured to algorithms simultaneously localize (e.g., determine the position and orientation of) a capture device (e.g., a 2D image capture device or 3D capture device) with respect to its surroundings, while at the same time mapping the structure of that environment. SLAM algorithms can involve tracking sets of points through a sequence of images using these tracks to triangulate the 3D positions of the points, while simultaneously using the point locations to determine the relative position/orientation of the capture device that captured them. In this regard, in addition to determining the position/orientation of the capture device, SLAM algorithms can also be used to estimate depth information for features included in one or more images of the sequence of images.” [0141-0142])
15. With reference to claim 17, Gausebeck teaches the video data of the location is received room by room and used to reconstruct multiple rooms in a common coordinate space, and wherein the 3D model and the virtual representation of the location include all rooms on a least one floor of the location. (“FIG. 2 provides a visualization of an example 3D model 200 of a living room in association with generation of the 3D model by the 3D model generation component 118. In this regard, the 3D model 200 as depicted is currently under construction and includes missing image data. … FIG. 3 provides a visualization of an example 3D floorplan model 300 that can be generated by the 3D model generation component 118 based on image data captured of the environment. For example, in one implementation, 2D image data of the portion of the house depicted in the 3D floorplan model was captured by a camera held and operated by a user as the user walked from room to room and took pictures of the house from different perspectives within the rooms (e.g., while standing on the floor). Based on the captured image data, the 3D model generation component 118 can use depth data derived from the respective images to generate the 3D floorplan model 300 which provides an entirely new (not included in the 2D image data), reconstructed top-town perspective of the environment. FIG. 4 provides a visualization of an example 3D dollhouse view representation 400 of a model that can be generated by the 3D model generation component 118 based on image data captured of the environment. For example, in a same manner as that described above with respect to FIG. 3, in one implementation, 2D image data of the portion of the house depicted in the dollhouse view of the 3D could have been captured by a camera held and operated by a user as the user walked from room to room and took pictures of the house from different perspectives within the rooms (e.g., while standing on the floor). Based on the captured image data, the 3D model generation component 118 can use depth data derived from the respective images to generate a 3D model (e.g., a mesh) of the environment, by aligning the respective images to one another relative to a common 3D coordinate space using depth data respectively derived for the images.” [0084-0086] “the 2D image data 102 can include video data 902 comprising sequential frames of video captured in association with movement of the video camera. Related 2D images can also include frames of video captured by a video camera with a fixed position/orientation yet captured at different points in time as one or more characteristics of the environment change at the different points in time. … the orientation estimation component 912 and/or the position estimation component 914 can employ capture device motion data 904, capture device location data 906, and/or 3D sensor data 910 in association with evaluating a sequence of images using visual odometry and/or SLAM to determine the capture position/orientation of a 2D image. SLAM techniques employ algorithms that are configured to algorithms simultaneously localize (e.g., determine the position and orientation of) a capture device (e.g., a 2D image capture device or 3D capture device) with respect to its surroundings, while at the same time mapping the structure of that environment. SLAM algorithms can involve tracking sets of points through a sequence of images using these tracks to triangulate the 3D positions of the points, while simultaneously using the point locations to determine the relative position/orientation of the capture device that captured them. In this regard, in addition to determining the position/orientation of the capture device, SLAM algorithms can also be used to estimate depth information for features included in one or more images of the sequence of images.” [0141-0142])
16. With reference to claim 18, Gausebeck teaches instructions to determine whether a user has exited a room. (“FIG. 3 provides a visualization of an example 3D floorplan model 300 that can be generated by the 3D model generation component 118 based on image data captured of the environment. For example, in one implementation, 2D image data of the portion of the house depicted in the 3D floorplan model was captured by a camera held and operated by a user as the user walked from room to room and took pictures of the house from different perspectives within the rooms (e.g., while standing on the floor). Based on the captured image data, the 3D model generation component 118 can use depth data derived from the respective images to generate the 3D floorplan model 300 which provides an entirely new (not included in the 2D image data), reconstructed top-town perspective of the environment.” [0085] “Walking mode can refer to a mode for navigating and viewing a 3D model from viewpoints within the 3D model. The viewpoints can be based on a camera position, a point within a 3D model, a camera orientation, and the like. In an aspect, the walking mode can provide views of a 3D model that simulate a user walking through or otherwise traveling through the 3D model (e.g., a real-world scene). The user can rotate and move freely to view the scene from different angles, vantage points, heights, or perspectives.” [0090] “using a standalone digital camera, a smartphone, or similar device with a camera, a user can walk around an environment and take 2D images at several points nearby along the way, capturing different perspectives of the environment.” [0141])
Allowable Subject Matter
17. Claims 6-13, 19 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
Regarding claim 6, the prior art of record fails to either individually or in combination teach the claimed feature of “a cost volume is constructed using a set of reference keyframes that are selected based on a relative pose metric to select useful nearby images in the plurality of successive frames of the video data;” and “the cost volume is determined using a parallel algorithm; an input image and the cost volume are passed through a CNN to produce dense metric depth; and the CNN uses an efficiently parameterized backbone for real time inference.”
Regarding claim 12, the prior art of record fails to either individually or in combination teach the claimed feature of “for each surfel, the reconstruction and rendering module is configured to generate a canonical triangle, and use associated data to place a surfel in three dimensional space.”
Regarding claim 19, the prior art of record fails to either individually or in combination teach the claimed feature of “the bounding rectangle is divided into square bins and surfel normals that fall into each bin are accumulated, wherein an average normal vector for each bin is determined and bin dimensions are an implementation dependent variable, per-bin segments are determined for bins that accumulated more than a threshold number of surfels, where the threshold number of surfels is an implementation dependent variable,”
Claims 7-11 are also objected to for depending from claim 6.
Claim 13 is also objected to for depending from claim 12.
Claim 20 is also objected to for depending from claim 19.
Conclusion
18. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michelle Chin whose telephone number is (571)270-3697. The examiner can normally be reached on Monday-Friday 8:00 AM-4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http:/Awww.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Kent Chang can be reached on (571)272-7667. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https:/Awww.uspto.gov/patents/apply/patent- center for more information about Patent Center and https:/Awww.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHELLE CHIN/
Primary Examiner, Art Unit 2614