Last updated: April 19, 2026
Application No. 18/765,506
POSITIONAL TRACKING WITHIN AN OBJECT-BASED COORDINATE SYSTEM

Non-Final OA §103
Filed
Jul 08, 2024
Examiner
WANG, JIN CHENG
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
1 (Non-Final)
Interview Optional

— +10.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 832 resolved cases, 2023–2026
Examiner Intelligence

WANG, JIN CHENG View full profile →
Grants 59% of resolved cases
Career Allow Rate
492 granted / 832 resolved
-2.9% vs TC avg
Moderate +10% lift
Without
With
+10.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
40 currently pending
Career history
872
Total Applications
across all art units
Statute-Specific Performance

§101
11.8%
-28.2% vs TC avg
§103
62.7%
+22.7% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 832 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
The information disclosure statement filed 7/8/2024 fails to comply with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609 because Item 2 under the Non-Patent Literature Documents lacks a date of publication.  It has been placed in the application file, but the information referred to therein has not been considered as to the merits.  Applicant is advised that the date of any re-submission of any item of information contained in this information disclosure statement or the submission of any missing element(s) will be the date of submission for purposes of determining compliance with the requirements based on the time of filing the statement, including all certification requirements for statements under 37 CFR 1.97(e).  See MPEP § 609.05(a).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino).  
Re Claim 1:  
Zhu teaches a method comprising:
at a device having a processor and a camera (the device of FIG. 8 in relation to the image capture positions of FIG. 5):
prior to a movement of an object in a physical environment: acquiring a first set of one or more images of the object in the physical environment via the camera on the device:
identifying three-dimensional (3D) keypoints on one or more surfaces of the object in the first set of one or more images; and (
Zhu teaches at FIG. 9 and Paragraph 0051 that in referring back to FIG. 9, the cup depicted in the images 902, 904, 906 (surfaces) includes stickers comprising various graphics or images, which were placed on the cup so that the surface of the cup had distinct and/or identifiable features. 
Zhu teaches at FIG. 9 and Paragraph [0044] that, and referring back to FIG. 9, suppose that the electronic device 830 is used to capture images of a cup similar to the cup illustrated in FIG. 9, except that the cup imaged by the electronic device is not resting on a pedestal. The images of the cup captured by the electronic device 830 may be analyzed to identify features of the cup, such as the textures described above. The identified features may then be transformed into feature points and the model may be used to determine the position of the camera relative to the object by comparing the feature points identified from the images to feature points defined within the model. For example, if the feature points determined by the electronic device correspond to the feature points identified for image 902 during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the camera position shown at image 902 of FIG. 9. If, however, the features points determined by the electronic device correspond to the feature points identified for image 904 (e.g., feature points corresponding to the second texture and to the first texture are present, but not the third texture) during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the view shown at image 904 of FIG. 9, and so on.. 
Zhu teaches at Paragraph 0063-0064 analyzing each of the plurality of images to identify features of the 3D object within each image. The features may comprise: lines, edges, shapes, patterns, colors, textures, edge features such as between the object and the background, corner features, blob features….identification of the features of the 3D object, e.g., enables the image analysis algorithm to distinguish between the 3D object on the pedestal and background information. 
Zhu teaches at Paragraph 0034 capturing the plurality of images of the three-dimensional object from different angles may improve the capabilities of the model…the model may be utilized during tracking of the 3D object 200. During tracking of the three-dimensional object 200 using the model, an image of stream of images depicting the three-dimensional object 200 may be analyzed to identify features of the three-dimensional object 220. The model may be utilized to identify information corresponding to the identified features and then provide information associated with an orientation of the camera based on the model such as based on the coordinate system 400 and/or based on other information included in the model. Thus, acquiring images from different angles during construction of the model may enable the features of the three-dimensional model 200 to be identified more easily because there are more angles in which the features of the three-dimensional object can be identified);  
identifying a subset of data corresponding to the object, wherein the data is based on the sensor data and identifying the subset of the data comprises distinguishing the subset of the data associated with three-dimensional (3D) key-points corresponding to the object from other data of the data corresponding to the background (
Zhu teaches at Paragraph 0063-0064 analyzing each of the plurality of images to identify features of the 3D object within each image. The features may comprise: lines, edges, shapes, patterns, colors, textures, edge features such as between the object and the background, corner features, blob features….identification of the features of the 3D object, e.g., enables the image analysis algorithm to distinguish between the 3D object on the pedestal and background information. 
Zhu teaches at Paragraph 0073-0073 when the particular set of strong feature points is identified within the image, the 3D object may be identified as present within the image even when the another set of weak feature points are not identified in the image…models constructed according to embodiments are constructed with a relatively sparse points-based model of the 3D object with only the identified distinct feature points, whereas other 3D models comprise dense models that include large point clouds that include points corresponding to non-distinct and non-distinguishing features of the 3D object. This enables methods for tracking of cameras position/orientation and identification of 3D object to be performed faster since there are less points in the model to compare to points identified in real-time images received from a camera. 
Zhu teaches at Paragraph 0052 that by including only information associated with distinct features of the three dimensional object, and excluding information that does facilitate identification of the three dimensional object, such as information associated with smooth and texture-less portions of the object’s body, models constructed contains less points than three dimensional models generated using three dimensional scanners and/or three dimensional modelling software and result in models that are smaller size); 
tracking positions of the device in an object-based coordinate system during acquisition of the first set of images based on identifying the 3D keypoints in the first set of one or more images, wherein, during acquisition of the first set of one or more images, the object-based coordinate system has a first positional relationship with respect to a coordinate system of the physical environment (
Zhu teaches at Paragraph 0073-0073 when the particular set of strong feature points is identified within the image, the 3D object may be identified as present within the image even when the another set of weak feature points are not identified in the image…models constructed according to embodiments are constructed with a relatively sparse points-based model of the 3D object with only the identified distinct feature points, whereas other 3D models comprise dense models that include large point clouds that include points corresponding to non-distinct and non-distinguishing features of the 3D object. This enables methods for tracking of cameras position/orientation and identification of 3D object to be performed faster since there are less points in the model to compare to points identified in real-time images received from a camera. 
Zhu teaches that the coordinate system 400 is defined based on a position and orientation of the pedestal 100 wherein the position and orientation of pedestal is the same as the position and orientation of the object as the object is placed on the pedestal. Zhu teaches at Paragraph 0029-0030 that the coordinate system 400 may be defined relative to the pedestal 100….assume that a front portion of the 3D object faces in the direction of the positive Y-axis…the target orientation of the camera indicates the image of the environment or 3D object should be captured from the left side of the 3D object, e.g., the side of the object along the negative X-axis….the model may be used to determine that, in order to properly orient the camera to view the 3D object from the left side, the camera needs to be moved in a negative direction along both the X-axis and the Y-axis while maintaining the camera pointed towards the 3D object. Zhu teaches at Paragraph 0055 that the method 110 also includes defining by the processor a coordinate system with respect to the pedestal upon which the 3D object is placed…the coordinate system may be defined based at least in part on one or more markers placed on the pedestal…when defining the coordinate system with respect to the pedestal upon which the object is placed further comprises, the method 1100 may assign a point of origin for the coordinate system…the point of origin may be defined to be located at a center of the top surface of the pedestal. Defining the coordinate system may further include orienting the coordinate system with respect to the pedestal. 
Zhu teaches at Paragraph 0005-0006 the position of a camera may be determined by first matching feature points identified in an image of the three dimensional object to feature points of the model and then mapping the feature points to the corresponding camera position determined during construction of the model…..the position of the camera relative to the object may be determined based on the model by matching the features determined from the image to features included in the model and then mapping the features to a camera position based on the model and at Paragraph 0029 that to provide the position information, a coordinate system may be defined relative to the pedestal 100 and at Paragraph 0034 capturing the plurality of images of the three-dimensional object from different angles may improve the capabilities of the model…the model may be utilized during tracking of the 3D object 200. During tracking of the three-dimensional object 200 using the model, an image of stream of images depicting the three-dimensional object 200 may be analyzed to identify features of the three dimensional object 220. The model may be utilized to identify information corresponding to the identified features and then provide information associated with an orientation of the camera based on the model such as based on the coordinate system 400 and/or based on other information included in the model. Thus, acquiring images from different angles during construction of the model may enable the features of the three-dimensional model 200 to be identified more easily because there are more angles in which the features of the three-dimensional object can be identified); 
Acquiring a second set of one or more images of the object in the physical environment via the camera; identifying the 3D keypoints on the one or more surfaces of the object in the second set of one or more images and (
Zhu teaches at FIG. 9 and Paragraph 0051 that in referring back to FIG. 9, the cup depicted in the images 902, 904, 906 (surfaces) includes stickers comprising various graphics or images, which were placed on the cup so that the surface of the cup had distinct and/or identifiable features. 
Zhu teaches at FIG. 9 and Paragraph [0044] that, and referring back to FIG. 9, suppose that the electronic device 830 is used to capture images of a cup similar to the cup illustrated in FIG. 9, except that the cup imaged by the electronic device is not resting on a pedestal. The images of the cup captured by the electronic device 830 may be analyzed to identify features of the cup, such as the textures described above. The identified features may then be transformed into feature points and the model may be used to determine the position of the camera relative to the object by comparing the feature points identified from the images to feature points defined within the model. For example, if the feature points determined by the electronic device correspond to the feature points identified for image 902 during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the camera position shown at image 902 of FIG. 9. If, however, the features points determined by the electronic device correspond to the feature points identified for image 904 (e.g., feature points corresponding to the second texture and to the first texture are present, but not the third texture) during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the view shown at image 904 of FIG. 9, and so on.. 
Zhu teaches at FIG. 9 determining the feature points relating to the markers 132, 142, 152, 162 of the pedestal outside of the 3D object/cup. 
Zhu teaches at Paragraph 0005 matching feature points identified in an image of the 3D object to feature points of the model and at Paragraph 0048 and FIG. 10B that the AR application may analyze the image and determine that the first 3D object 1020 is present in the image by identifying features 1024 and 1026 of the first 3D object 1020….may recognize that the features 1024 correspond to the first 3D object 1020 by comparing the features to the models corresponding to each of the 3D object 1020, 1030 and 1040. 
Zhu teaches at Paragraph 0063-0064 analyzing each of the plurality of images to identify features of the 3D object within each image. The features may comprise: lines, edges, shapes, patterns, colors, textures, edge features such as between the object and the background, corner features, blob features….identification of the features of the 3D object, e.g., enables the image analysis algorithm to distinguish between the 3D object on the pedestal and background information. 
Zhu teaches at Paragraph 0052 that by including only information associated with distinct features of the three dimensional object, and excluding information that does facilitate identification of the three dimensional object); 
tracking positions of the device in an object-based coordinate system during acquisition of the second set of images based on identifying the 3D keypoints in the second set of one or more images, wherein, during acquisition of the first set of one or more images, the object-based coordinate system has a second positional relationship different than the first positional relationship (
Zhu teaches at FIG. 9 and Paragraph 0051 that in referring back to FIG. 9, the cup depicted in the images 902, 904, 906 (surfaces) includes stickers comprising various graphics or images, which were placed on the cup so that the surface of the cup had distinct and/or identifiable features. 
Zhu teaches at FIG. 9 and Paragraph [0044] that, and referring back to FIG. 9, suppose that the electronic device 830 is used to capture images of a cup similar to the cup illustrated in FIG. 9, except that the cup imaged by the electronic device is not resting on a pedestal. The images of the cup captured by the electronic device 830 may be analyzed to identify features of the cup, such as the textures described above. The identified features may then be transformed into feature points and the model may be used to determine the position of the camera relative to the object by comparing the feature points identified from the images to feature points defined within the model. For example, if the feature points determined by the electronic device correspond to the feature points identified for image 902 during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the camera position shown at image 902 of FIG. 9. If, however, the features points determined by the electronic device correspond to the feature points identified for image 904 (e.g., feature points corresponding to the second texture and to the first texture are present, but not the third texture) during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the view shown at image 904 of FIG. 9, and so on.. 
Zhu teaches at Paragraph 0073-0073 when the particular set of strong feature points is identified within the image, the 3D object may be identified as present within the image even when the another set of weak feature points are not identified in the image…models constructed according to embodiments are constructed with a relatively sparse points-based model of the 3D object with only the identified distinct feature points, whereas other 3D models comprise dense models that include large point clouds that include points corresponding to non-distinct and non-distinguishing features of the 3D object. This enables methods for tracking of cameras position/orientation and identification of 3D object to be performed faster since there are less points in the model to compare to points identified in real-time images received from a camera. 
Zhu teaches that the coordinate system 400 is defined based on a position and orientation of the pedestal 100 wherein the position and orientation of pedestal is the same as the position and orientation of the object as the object is placed on the pedestal. Zhu teaches at Paragraph 0029-0030 that the coordinate system 400 may be defined relative to the pedestal 100….assume that a front portion of the 3D object faces in the direction of the positive Y-axis…the target orientation of the camera indicates the image of the environment or 3D object should be captured from the left side of the 3D object, e.g., the side of the object along the negative X-axis….the model may be used to determine that, in order to properly orient the camera to view the 3D object from the left side, the camera needs to be moved in a negative direction along both the X-axis and the Y-axis while maintaining the camera pointed towards the 3D object. Zhu teaches at Paragraph 0055 that the method 110 also includes defining by the processor a coordinate system with respect to the pedestal upon which the 3D object is placed…the coordinate system may be defined based at least in part on one or more markers placed on the pedestal…when defining the coordinate system with respect to the pedestal upon which the object is placed further comprises, the method 1100 may assign a point of origin for the coordinate system…the point of origin may be defined to be located at a center of the top surface of the pedestal. Defining the coordinate system may further include orienting the coordinate system with respect to the pedestal. 
Zhu teaches at Paragraph 0005-0006 the position of a camera may be determined by first matching feature points identified in an image of the three dimensional object to feature points of the model and then mapping the feature points to the corresponding camera position determined during construction of the model…..the position of the camera relative to the object may be determined based on the model by matching the features determined from the image to features included in the model and then mapping the features to a camera position based on the model and at Paragraph 0029 that to provide the position information, a coordinate system may be defined relative to the pedestal 100 and at Paragraph 0034 capturing the plurality of images of the three-dimensional object from different angles may improve the capabilities of the model…the model may be utilized during tracking of the 3D object 200. During tracking of the three-dimensional object 200 using the model, an image of stream of images depicting the three-dimensional object 200 may be analyzed to identify features of the three dimensional object 220. The model may be utilized to identify information corresponding to the identified features and then provide information associated with an orientation of the camera based on the model such as based on the coordinate system 400 and/or based on other information included in the model. Thus, acquiring images from different angles during construction of the model may enable the features of the three-dimensional model 200 to be identified more easily because there are more angles in which the features of the three-dimensional object can be identified); 
subsequent to a movement of the object in the physical environment: acquiring a second set of one or more images of the object in the physical environment via the camera; identifying the 3D keypoints on the one or more surfaces of the object in the second set of one or more images; and tracking positions of the device in the object-based coordinate system during acquisition of the second set of images based on identifying the 3D keypoints in the second set of one or more images, wherein, during acquisition of the second set of one or more images, the object-based coordinate system has a second positional relationship with respect to the coordinate system of the physical environment, the second positional relationship different than the first positional relationship (
Applicant’s specification discloses at Paragraph 0020 that as the object moves within a field of view of the camera of the device, the guiding indicators move with respect to the object based on an adjusted coordinate system defined based on an adjusted position and an adjusted orientation of the object, wherein the adjusted position and the adjusted orientation of the object are based on the movement of the object. 
Zhu teaches at FIG. 9 and Paragraph 0051 that in referring back to FIG. 9, the cup depicted in the images 902, 904, 906 (surfaces) includes stickers comprising various graphics or images, which were placed on the cup so that the surface of the cup had distinct and/or identifiable features. 
Zhu teaches at FIG. 9 and Paragraph [0044] that, and referring back to FIG. 9, suppose that the electronic device 830 is used to capture images of a cup similar to the cup illustrated in FIG. 9, except that the cup imaged by the electronic device is not resting on a pedestal. The images of the cup captured by the electronic device 830 may be analyzed to identify features of the cup, such as the textures described above. The identified features may then be transformed into feature points and the model may be used to determine the position of the camera relative to the object by comparing the feature points identified from the images to feature points defined within the model. For example, if the feature points determined by the electronic device correspond to the feature points identified for image 902 during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the camera position shown at image 902 of FIG. 9. If, however, the features points determined by the electronic device correspond to the feature points identified for image 904 (e.g., feature points corresponding to the second texture and to the first texture are present, but not the third texture) during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the view shown at image 904 of FIG. 9, and so on.. 
Zhu teaches at Paragraph 0073-0073 when the particular set of strong feature points is identified within the image, the 3D object may be identified as present within the image even when the another set of weak feature points are not identified in the image…models constructed according to embodiments are constructed with a relatively sparse points-based model of the 3D object with only the identified distinct feature points, whereas other 3D models comprise dense models that include large point clouds that include points corresponding to non-distinct and non-distinguishing features of the 3D object. This enables methods for tracking of cameras position/orientation and identification of 3D object to be performed faster since there are less points in the model to compare to points identified in real-time images received from a camera. 
Zhu teaches that the coordinate system 400 is defined based on a position and orientation of the pedestal 100 wherein the position and orientation of pedestal is the same as the position and orientation of the object as the object is placed on the pedestal. Zhu teaches at Paragraph 0029-0030 that the coordinate system 400 may be defined relative to the pedestal 100….assume that a front portion of the 3D object faces in the direction of the positive Y-axis…the target orientation of the camera indicates the image of the environment or 3D object should be captured from the left side of the 3D object, e.g., the side of the object along the negative X-axis….the model may be used to determine that, in order to properly orient the camera to view the 3D object from the left side, the camera needs to be moved in a negative direction along both the X-axis and the Y-axis while maintaining the camera pointed towards the 3D object. Zhu teaches at Paragraph 0055 that the method 110 also includes defining by the processor a coordinate system with respect to the pedestal upon which the 3D object is placed…the coordinate system may be defined based at least in part on one or more markers placed on the pedestal…when defining the coordinate system with respect to the pedestal upon which the object is placed further comprises, the method 1100 may assign a point of origin for the coordinate system…the point of origin may be defined to be located at a center of the top surface of the pedestal. Defining the coordinate system may further include orienting the coordinate system with respect to the pedestal. 
Zhu teaches at Paragraph 0005-0006 the position of a camera may be determined by first matching feature points identified in an image of the three dimensional object to feature points of the model and then mapping the feature points to the corresponding camera position determined during construction of the model…..the position of the camera relative to the object may be determined based on the model by matching the features determined from the image to features included in the model and then mapping the features to a camera position based on the model and at Paragraph 0029 that to provide the position information, a coordinate system may be defined relative to the pedestal 100 and at Paragraph 0034 capturing the plurality of images of the three-dimensional object from different angles may improve the capabilities of the model…the model may be utilized during tracking of the 3D object 200. During tracking of the three-dimensional object 200 using the model, an image of stream of images depicting the three-dimensional object 200 may be analyzed to identify features of the three dimensional object 220. The model may be utilized to identify information corresponding to the identified features and then provide information associated with an orientation of the camera based on the model such as based on the coordinate system 400 and/or based on other information included in the model. Thus, acquiring images from different angles during construction of the model may enable the features of the three-dimensional model 200 to be identified more easily because there are more angles in which the features of the three-dimensional object can be identified
); 
generating a three-dimensional (3D) model of the object based on the first set of one or more images and the tracked positions of the device during acquisition of the first set of one or more images; and the second set of one or more images and the tracked positions of the device during acquisition of the second set of one or more images (
Zhu teaches at Paragraph 0005-0006 the position of a camera may be determined by first matching feature points identified in an image of the three dimensional object to feature points of the model and then mapping the feature points to the corresponding camera position determined during construction of the model…..the position of the camera relative to the object may be determined based on the model by matching the features determined from the image to features included in the model and then mapping the features to a camera position based on the model and at Paragraph 0029 that to provide the position information, a coordinate system may be defined relative to the pedestal 100 and at Paragraph 0034 capturing the plurality of images of the three-dimensional object from different angles may improve the capabilities of the model…the model may be utilized during tracking of the 3D object 200. During tracking of the three dimensional object 200 using the model, an image of stream of images depicting the three dimensional object 200 may be analyzed to identify features of the three dimensional object 220. The model may be utilized to identify information corresponding to the identified features and then provide information associated with an orientation of the camera based on the model such as based on the coordinate system 400 and/or based on other information included in the model. Thus, acquiring images from different angles during construction of the model may enable the features of the three-dimensional model 200 to be identified more easily because there are more angles in which the features of the three dimensional object can be identified)
Zhu teaches at Paragraph 0044 that the identified features may then be transformed into feature points and at Paragraph 0051 that typically a 3D model comprises a very dense point cloud that contains hundreds of thousand points. In contrast, models constructed according to embodiments are only interested in certain points on the object’s body, namely, feature points, e.g., information comprising distinguishing features or aspects of the object’s body. The model of the cup depicted in FIG. 9 may comprise information associated with those identified features and/or feature points, e.g., edge features of the object, features or feature points corresponding to the graphics depicted in the stickers applied to the cup, but may not comprise information associated with the smooth and texture-less parts of the cup which do not provide useful information. 

	Kaino implicitly teaches the claim limitation: 
Acquiring a second set of one or more images of the object in the physical environment via the camera; identifying the 3D keypoints on the one or more surfaces of the object in the second set of one or more images and (
Kaino teaches at Paragraph 0102-0103 that the SLAM can simultaneously estimate a position and an attitude of a camera and a position of a feature point included in an image of the camera…by performing matching between the environmental map and a 3D position of a feature point belonging to the object and at Paragraph 0059 that generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view…the observation information can include an estimation result of a position and an attitude of an observation device….the 3D model includes at least coordinate information of a feature point and at Paragraph 0061 that a plurality of pieces of observation information are generally used after having been subjected to averaging….by averaging spatial positional shifts, a model becomes less sharp at an edge portion such as an object boundary and at Paragraph 0100-0102 that the user performs observation while directing the terminal apparatus 200 toward an unobserved region of a real space…the system 1 generates a 3D model by estimating a position and an attitude of the terminal apparatus 200 and a relationship between the terminal apparatus 200 and a surround real object…The information processing apparatus 100 accumulates the acquired 3D model into a 3D model DB and the accumulation is performed in a case where an unobserved region of a real space is newly observed by the terminal apparatus and at Paragraph 0062 by integrating respective observation information pieces obtainable in a case where observation devices are at positions 27A- to 27H, it is possible to model the flat surface 26 and at Paragraph 0132 that prompting the user to perform additional observation for making a generation 3D model more detailed….the system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200);  
identifying the object in at least some of the images (
Kaino teaches at Paragraph 0151 the user preliminarily prepares a definite shaped model DB. Subsequently, the user causes a 3D model to be generated by observing a surrounding real space of the user and cause the 3D model to be updated by the allocation of a definite shaped model. At this time, complementing of an observed region by expansion of an allocated defined shape model such as extending a flat surface, granting of an Unknown flat, deletion of noise and the like are also performed and at Paragraph 0132 an unallocated portion can be typically generated due to insufficient observation or a complicated shape of a real object. The system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200. 
Kaino teaches at Paragraph 0217 the definite shaped model includes…information indicating a grounding surface and at FIGS. 17-19 and Paragraph 0136 that generating the cuboid virtual object 344A….the cuboid virtual object 362A is converted into a grass virtual object 364A and at Paragraph 0143 that a data creator pre-registers in a definite shaped model DB definite shaped models such as a floor surface, a wall surface, a sofa, a chair, a desk as real objects….the data creator performs such adjustment that 3D models corresponding to walls, a sofa, a chair and a desk are left and 3D models corresponding to clothes, books and goods that had been placed on the sofa are deleted and at Paragraph 0151 that the application may set as a dangerous region, all definite shaped models other than a flat surface allocated to a floor surface); 
tracking positions of the device in an object-based coordinate system during acquisition of the second set of images based on identifying the 3D keypoints in the second set of one or more images, wherein, during acquisition of the first set of one or more images, the object-based coordinate system has a second positional relationship different than the first positional relationship (
Kaino teaches at Paragraph 0059 generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view. The observation information can include an estimation result obtained by Pose Estimation, an estimation result of a position and an attitude of an observation device that is obtained by SLAM, depth information of each image obtained by a depth sensor. A 3D model to be output is represented by a point cloud including an aggregate of feature points, an aggregate of polygons including a plurality of feature points. The 3D model includes at least coordinate information of a feature point and at Paragraph 0100 the user performs observation, e.g., acquisition of a captured image and depth information while directing the terminal apparatus 200 toward an observed region of a real space and at Paragraph 0084 that the system 1 generates a 3D model 14 from the pet bottle 12 being a real object); 
generating a three-dimensional (3D) model of the object based on the first set of one or more images and the tracked positions of the device during acquisition of the first set of one or more images; and the second set of one or more images and the tracked positions of the device during acquisition of the second set of one or more images (
Kaino teaches at Paragraph 0059 generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view. The observation information can include an estimation result obtained by Pose Estimation, an estimation result of a position and an attitude of an observation device that is obtained by SLAM, depth information of each image obtained by a depth sensor. A 3D model to be output is represented by a point cloud including an aggregate of feature points, an aggregate of polygons including a plurality of feature points. The 3D model includes at least coordinate information of a feature point and at Paragraph 0100 the user performs observation, e.g., acquisition of a captured image and depth information while directing the terminal apparatus 200 toward an observed region of a real space and at Paragraph 0084 that the system 1 generates a 3D model 14 from the pet bottle 12 being a real object). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 

Re Claim 19: 
The claim is in parallel with the claim 1 in the form of an apparatus claim. The claim 19 is subject to the same rationale of rejection as the claim 1. 
The claim 19 recites a device comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations [of the claim 1]. 
However, Zhu further teaches the claim limitation of a device comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations [of the claim 1] (
Zhu teaches at FIG. 8 and Paragraph 0038-0040 that the model generation device 810 includes one or more processors, a memory 820…the memory 820 may store instructions 822 when executed by the one or more processors 812 cause the one or more processors to perform operations for generating models of 3D objects in accordance with embodiments). 

Re Claim 20: 
The claim 20 is in parallel with the claim 1 in the form of a computer product claim. The claim 20 is subject to the same rationale of rejection as the claim 1. 
The claim 20 further recites a non-transitory computer-readable storage medium, storing computer- executable program instructions on a computer to perform operations [of the method of the claim 1]. 
 However, Zhu further teaches the claim limitation of a non-transitory computer-readable storage medium, storing computer- executable program instructions on a computer to perform operations [of the method of the claim 1] (Zhu teaches at FIG. 8 and Paragraph 0038-0040 that the model generation device 810 includes one or more processors, a memory 820…the memory 820 may store instructions 822 when executed by the one or more processors 812 cause the one or more processors to perform operations for generating models of 3D objects in accordance with embodiments). 

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino).  
Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that acquiring the second set of one or more images of the object subsequent to the movement of the object comprises acquiring images from different perspectives of the object as the device is moved around the object. 
Zhu teaches the claim limitation that acquiring the second set of one or more images of the object subsequent to the movement of the object comprises acquiring images from different perspectives of the object as the device is moved around the object (
Zhu teaches at FIG. 9 and Paragraph 0051 that in referring back to FIG. 9, the cup depicted in the images 902, 904, 906 (surfaces) includes stickers comprising various graphics or images, which were placed on the cup so that the surface of the cup had distinct and/or identifiable features. 
Zhu teaches at FIG. 9 and Paragraph [0044] that, and referring back to FIG. 9, suppose that the electronic device 830 is used to capture images of a cup similar to the cup illustrated in FIG. 9, except that the cup imaged by the electronic device is not resting on a pedestal. The images of the cup captured by the electronic device 830 may be analyzed to identify features of the cup, such as the textures described above. The identified features may then be transformed into feature points and the model may be used to determine the position of the camera relative to the object by comparing the feature points identified from the images to feature points defined within the model. For example, if the feature points determined by the electronic device correspond to the feature points identified for image 902 during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the camera position shown at image 902 of FIG. 9. If, however, the features points determined by the electronic device correspond to the feature points identified for image 904 (e.g., feature points corresponding to the second texture and to the first texture are present, but not the third texture) during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the view shown at image 904 of FIG. 9, and so on). 
Kaino further teaches the claim limitation that acquiring the second set of one or more images of the object subsequent to the movement of the object comprises acquiring images from different perspectives of the object as the device is moved around the object (
Kaino teaches at Paragraph 0102-0103 that the SLAM can simultaneously estimate a position and an attitude of a camera and a position of a feature point included in an image of the camera…by performing matching between the environmental map and a 3D position of a feature point belonging to the object and at Paragraph 0059 that generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view…the observation information can include an estimation result of a position and an attitude of an observation device….the 3D model includes at least coordinate information of a feature point and at Paragraph 0061 that a plurality of pieces of observation information are generally used after having been subjected to averaging….by averaging spatial positional shifts, a model becomes less sharp at an edge portion such as an object boundary and at Paragraph 0100-0102 that the user performs observation while directing the terminal apparatus 200 toward an unobserved region of a real space…the system 1 generates a 3D model by estimating a position and an attitude of the terminal apparatus 200 and a relationship between the terminal apparatus 200 and a surround real object…The information processing apparatus 100 accumulates the acquired 3D model into a 3D model DB and the accumulation is performed in a case where an unobserved region of a real space is newly observed by the terminal apparatus and at Paragraph 0062 by integrating respective observation information pieces obtainable in a case where observation devices are at positions 27A- to 27H, it is possible to model the flat surface 26 and at Paragraph 0132 that prompting the user to perform additional observation for making a generation 3D model more detailed….the system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200. 
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino).  
Re Claim 3: 
The claim 3 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the device comprises a user interface, and wherein the method further comprises, during the movement of the object, displaying the acquired second set of one or more images of the physical environment including the object within the user interface. 
Zhu at teaches the claim limitation that the device comprises a user interface, and wherein the method further comprises, during the movement of the object, displaying the acquired second set of one or more images of the physical environment including the object within the user interface (
Zhu teaches at FIG. 9 and Paragraph 0051 that in referring back to FIG. 9, the cup depicted in the images 902, 904, 906 (surfaces) includes stickers comprising various graphics or images, which were placed on the cup so that the surface of the cup had distinct and/or identifiable features. 
Zhu teaches at FIG. 9 and Paragraph [0044] that, and referring back to FIG. 9, suppose that the electronic device 830 is used to capture images of a cup similar to the cup illustrated in FIG. 9, except that the cup imaged by the electronic device is not resting on a pedestal. The images of the cup captured by the electronic device 830 may be analyzed to identify features of the cup, such as the textures described above. The identified features may then be transformed into feature points and the model may be used to determine the position of the camera relative to the object by comparing the feature points identified from the images to feature points defined within the model. For example, if the feature points determined by the electronic device correspond to the feature points identified for image 902 during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the camera position shown at image 902 of FIG. 9. If, however, the features points determined by the electronic device correspond to the feature points identified for image 904 (e.g., feature points corresponding to the second texture and to the first texture are present, but not the third texture) during construction of the model, it may be determined, based on the model, that the camera is positioned in a manner similar to the view shown at image 904 of FIG. 9, and so on.. 
Zhu teaches at Paragraph 0039 that a user may capture images during the day and then the images may be processed overnight to generate the model and at Paragraph 0052 that when matching a live camera-fed image with a template image or information stored in a model constructed, the matching process will be faster. 
Zhu teaches at Paragraph 0005-0006 the position of a camera may be determined by first matching feature points identified in an image of the three dimensional object to feature points of the model and then mapping the feature points to the corresponding camera position determined during construction of the model…..the position of the camera relative to the object may be determined based on the model by matching the features determined from the image to features included in the model and then mapping the features to a camera position based on the model and at Paragraph 0029 that to provide the position information, a coordinate system may be defined relative to the pedestal 100 and at Paragraph 0034 capturing the plurality of images of the three-dimensional object from different angles may improve the capabilities of the model…the model may be utilized during tracking of the 3D object 200. During tracking of the three-dimensional object 200 using the model, an image of stream of images depicting the three-dimensional object 200 may be analyzed to identify features of the three dimensional object 220. The model may be utilized to identify information corresponding to the identified features and then provide information associated with an orientation of the camera based on the model such as based on the coordinate system 400 and/or based on other information included in the model. Thus, acquiring images from different angles during construction of the model may enable the features of the three-dimensional model 200 to be identified more easily because there are more angles in which the features of the three-dimensional object can be identified). 
Kaino further teaches the claim limitation that the device comprises a user interface, and wherein the method further comprises, during the movement of the object, displaying the acquired second set of one or more images of the physical environment including the object within the user interface (Kaino teaches at Paragraph 0135 that a UI 351 is a UI illustrating a live-view image of a real space…By such a UI, the user can recognize a region for which allocation is not sufficient and can perform additional observation and such a UI can be said to be a UI prompting the user to perform additional observation and at Paragraph 0138 that the system 1 acquires input information and generates a generation 3D model by sequentially generating and accumulating 3D models…and updates the generation 3D model and at Paragraph 0151 the user preliminarily prepares a definite shaped model DB. Subsequently, the user causes a 3D model to be generated by observing a surrounding real space of the user and cause the 3D model to be updated by the allocation of a definite shaped model. At this time, complementing of an observed region by expansion of an allocated defined shape model such as extending a flat surface, granting of an Unknown flat, deletion of noise and the like are also performed and at Paragraph 0132 an unallocated portion can be typically generated due to insufficient observation or a complicated shape of a real object. The system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200.
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino). 
Re Claim 4: 
The claim 4 encompasses the same scope of invention as that of the claim 1 except additional claim limitation identifying the 3D keypoints on the one or more surfaces of the object in the first set of one or more image comprises: generating a preliminary object model based on depth information from the first set of one or more images in the physical environment. 
Zhu implicitly teaches the claim limitation that identifying the 3D keypoints on the one or more surfaces of the object in the first set of one or more image comprises: generating a preliminary object model based on depth information from the first set of one or more images in the physical environment (
Zhu teaches at Paragraph 0055-0056 that the coordinate system 400 of FIG. 4 may be defined based at least in part on one or more markers present on the pedestal such as the markers 132, 142, 152 and 162 and at Paragraph 0056 that the coordinate system 400 serves as the coordinate system for model construction and the three dimensional coordinates of every marker corner relative to the coordinate system C can be determined by measuring the physical side lengths of the pedestal and the printed marker and the markers with known three dimensional structure in the reference coordinate system, may enable the camera pose to be determined with six degrees of freedom from the images captured of the three dimensional object.  
Zhu teaches at Paragraph 0073-0073 when the particular set of strong feature points is identified within the image, the 3D object may be identified as present within the image even when the another set of weak feature points are not identified in the image…models constructed according to embodiments are constructed with a relatively sparse points-based model of the 3D object with only the identified distinct feature points, whereas other 3D models comprise dense models that include large point clouds that include points corresponding to non-distinct and non-distinguishing features of the 3D object. This enables methods for tracking of cameras position/orientation and identification of 3D object to be performed faster since there are less points in the model to compare to points identified in real-time images received from a camera. 
Zhu teaches at Paragraph 0052 that by including only information associated with distinct features of the three dimensional object, and excluding information that does facilitate identification of the three dimensional object, such as information associated with smooth and texture-less portions of the object’s body, models constructed contains less points than three dimensional models generated using three dimensional scanners and/or three dimensional modelling software and result in models that are smaller size. 
Zhu teaches at Paragraph 0005-0006 the position of a camera may be determined by first matching feature points identified in an image of the three dimensional object to feature points of the model and then mapping the feature points to the corresponding camera position determined during construction of the model…..the position of the camera relative to the object may be determined based on the model by matching the features determined from the image to features included in the model and then mapping the features to a camera position based on the model and at Paragraph 0029 that to provide the position information, a coordinate system may be defined relative to the pedestal 100 and at Paragraph 0034 capturing the plurality of images of the three-dimensional object from different angles may improve the capabilities of the model…the model may be utilized during tracking of the 3D object 200. During tracking of the three-dimensional object 200 using the model, an image of stream of images depicting the three-dimensional object 200 may be analyzed to identify features of the three dimensional object 220. The model may be utilized to identify information corresponding to the identified features and then provide information associated with an orientation of the camera based on the model such as based on the coordinate system 400 and/or based on other information included in the model. Thus, acquiring images from different angles during construction of the model may enable the features of the three-dimensional model 200 to be identified more easily because there are more angles in which the features of the three-dimensional object can be identified). 
Kaino further teaches the claim limitation that identifying the 3D keypoints on the one or more surfaces of the object in the first set of one or more image comprises: generating a preliminary object model based on depth information from the first set of one or more images in the physical environment (Kaino teaches at Paragraph 0102-0103 that the SLAM can simultaneously estimate a position and an attitude of a camera and a position of a feature point included in an image of the camera…by performing matching between the environmental map and a 3D position of a feature point belonging to the object and at Paragraph 0059 that generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view…the observation information can include an estimation result of a position and an attitude of an observation device….the 3D model includes at least coordinate information of a feature point. 
Kaino teaches at Paragraph 0151 the user preliminarily prepares a definite shaped model DB. Subsequently, the user causes a 3D model to be generated by observing a surrounding real space of the user and cause the 3D model to be updated by the allocation of a definite shaped model. At this time, complementing of an observed region by expansion of an allocated defined shape model such as extending a flat surface, granting of an Unknown flat, deletion of noise and the like are also performed and at Paragraph 0132 an unallocated portion can be typically generated due to insufficient observation or a complicated shape of a real object. The system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200.  
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino).  
Re Claim 5: 
The claim 5 encompasses the same scope of invention as that of the claim 4 except additional claim limitation that the preliminary object model is a 3D bounding box, wherein generating a 3D bounding box comprises: 
obtaining a 3D representation of the physical environment that was generated based on the depth data; 
determining a ground plane corresponding to the object in the physical environment based on the 3D representation; and 
generating the 3D bounding box corresponding to the object in the physical environment based on the ground plane and the 3D representation. 
Kaino further teaches the claim limitation that the preliminary object model is a 3D bounding box, wherein generating a 3D bounding box comprises: 
obtaining a 3D representation of the physical environment that was generated based on the depth data; 
determining a ground plane corresponding to the object in the physical environment based on the 3D representation; and 
generating the 3D bounding box corresponding to the object in the physical environment based on the ground plane and the 3D representation (Kaino teaches at Paragraph 0217 the definite shaped model includes…information indicating a grounding surface and at FIGS. 17-19 and Paragraph 0136 that generating the cuboid virtual object 344A….the cuboid virtual object 362A is converted into a grass virtual object 364A and at Paragraph 0143 that a data creator pre-registers in a definite shaped model DB definite shaped models such as a floor surface, a wall surface, a sofa, a chair, a desk as real objects….the data creator performs such adjustment that 3D models corresponding to walls, a sofa, a chair and a desk are left and 3D models corresponding to clothes, books and goods that had been placed on the sofa are deleted and at Paragraph 0151 that the application may set as a dangerous region, all definite shaped models other than a flat surface allocated to a floor surface. 
Kaino teaches at Paragraph 0102-0103 that the SLAM can simultaneously estimate a position and an attitude of a camera and a position of a feature point included in an image of the camera…by performing matching between the environmental map and a 3D position of a feature point belonging to the object and at Paragraph 0059 that generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view…the observation information can include an estimation result of a position and an attitude of an observation device….the 3D model includes at least coordinate information of a feature point. 
Kaino teaches at Paragraph 0151 the user preliminarily prepares a definite shaped model DB. Subsequently, the user causes a 3D model to be generated by observing a surrounding real space of the user and cause the 3D model to be updated by the allocation of a definite shaped model. At this time, complementing of an observed region by expansion of an allocated defined shape model such as extending a flat surface, granting of an Unknown flat, deletion of noise and the like are also performed and at Paragraph 0132 an unallocated portion can be typically generated due to insufficient observation or a complicated shape of a real object. The system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200.  
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino). 
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 5 except additional claim limitation that identifying the object further comprises adjusting the preliminary object model based on the 3D keypoints corresponding to one or more surfaces of the object.
Kaino further teaches the claim limitation that identifying the object further comprises adjusting the preliminary object model based on the 3D keypoints corresponding to one or more surfaces of the object (Kaino teaches at Paragraph 0102-0103 that the SLAM can simultaneously estimate a position and an attitude of a camera and a position of a feature point included in an image of the camera…by performing matching between the environmental map and a 3D position of a feature point belonging to the object and at Paragraph 0059 that generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view…the observation information can include an estimation result of a position and an attitude of an observation device….the 3D model includes at least coordinate information of a feature point. 
Kaino teaches at Paragraph 0151 the user preliminarily prepares a definite shaped model DB. Subsequently, the user causes a 3D model to be generated by observing a surrounding real space of the user and cause the 3D model to be updated by the allocation of a definite shaped model. At this time, complementing of an observed region by expansion of an allocated defined shape model such as extending a flat surface, granting of an Unknown flat, deletion of noise and the like are also performed and at Paragraph 0132 an unallocated portion can be typically generated due to insufficient observation or a complicated shape of a real object. The system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200.  
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino).  
Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 6 except additional claim limitation that adjusting the preliminary object model is based on a 3D bounding box constraint used to remove background information included in the 3D bounding box to generate an updated 3D bounding box.
Kaino further teaches the claim limitation that adjusting the preliminary object model is based on a 3D bounding box constraint used to remove background information included in the 3D bounding box to generate an updated 3D bounding box (Kaino teaches Paragraph 0107 that the system 1 deletes feature points encompassed by the allocated definite shape models and Paragraph 0117 that the system 1 deletes surrounding feature points of the allocated definite shaped model and at Paragraph 0118 that the system 1 allocates a cuboid definite shaped model 182 similar to the 3D model 181…a feature point 183 that is not included in the definite shaped model 182 remains and the system 1 deletes the feature point 183 existing within a predetermined distance from the definite shaped model 182….such a feature can be typically generated by an observation error. 
Kaino teaches at Paragraph 0102-0103 that the SLAM can simultaneously estimate a position and an attitude of a camera and a position of a feature point included in an image of the camera…by performing matching between the environmental map and a 3D position of a feature point belonging to the object and at Paragraph 0059 that generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view…the observation information can include an estimation result of a position and an attitude of an observation device….the 3D model includes at least coordinate information of a feature point. 
Kaino teaches at Paragraph 0151 the user preliminarily prepares a definite shaped model DB. Subsequently, the user causes a 3D model to be generated by observing a surrounding real space of the user and cause the 3D model to be updated by the allocation of a definite shaped model. At this time, complementing of an observed region by expansion of an allocated defined shape model such as extending a flat surface, granting of an Unknown flat, deletion of noise and the like are also performed and at Paragraph 0132 an unallocated portion can be typically generated due to insufficient observation or a complicated shape of a real object. The system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200.  
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino). 
Re Claim 8: 
The claim 8 encompasses the same scope of invention as that of the claim 4 except additional claim limitation that the depth information includes a sparse 3D point cloud, wherein identifying the object further comprises densification of the sparse 3D point cloud based on the 3D keypoints corresponding to the object.
Kaino further teaches the claim limitation that the depth information includes a sparse 3D point cloud, wherein identifying the object further comprises densification of the sparse 3D point cloud based on the 3D keypoints corresponding to the object (Kaino teaches Paragraph 0107 that the system 1 deletes feature points encompassed by the allocated definite shape models and Paragraph 0117 that the system 1 deletes surrounding feature points of the allocated definite shaped model and at Paragraph 0118 that the system 1 allocates a cuboid definite shaped model 182 similar to the 3D model 181…a feature point 183 that is not included in the definite shaped model 182 remains and the system 1 deletes the feature point 183 existing within a predetermined distance from the definite shaped model 182….such a feature can be typically generated by an observation error. 
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino). 
Re Claim 9: 
The claim 9 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that generating the 3D model of the object further comprises keypoint interpolation based on 3D keypoints corresponding to the object, wherein keypoint interpolation comprises exclusion of 3D keypoints that are within a proximity range of depth edges of the object.
Kaino further teaches the claim limitation that generating the 3D model of the object further comprises keypoint interpolation based on 3D keypoints corresponding to the object, wherein keypoint interpolation comprises exclusion of 3D keypoints that are within a proximity range of depth edges of the object (Kaino teaches Paragraph 0107 that the system 1 deletes feature points encompassed by the allocated definite shape models and Paragraph 0117 that the system 1 deletes surrounding feature points of the allocated definite shaped model and at Paragraph 0118 that the system 1 allocates a cuboid definite shaped model 182 similar to the 3D model 181…a feature point 183 that is not included in the definite shaped model 182 remains and the system 1 deletes the feature point 183 existing within a predetermined distance from the definite shaped model 182….such a feature can be typically generated by an observation error. 
Kaino teaches at FIG. 12 and Paragraph 0118 the key-point interpolation comprises exclusion of a subset of 3D key-points such as the feature points 183 from the definite shaped model 182 and the system 1 delete the feature points 183 existing within a predetermined distance from the definite shaped model 182. 
It is understood that when the feature points 183 in the 3D space are within a predetermined distance from the definite shaped model 182, in the form of a square, the feature points 183 are also within a proximity range of depth edges of the definite shaped model 182). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Kaino’s teaching of exclusion of the 3D key-points that are within a predetermined distance of depth edges of the model 182 associated with the object into Zhu’s removal of feature points in a very dense point cloud of the object and the segmentation of the object from the background so as to remove the feature points associated with the background to have kept the distinguishing feature points to have determined the 3D model. One of the ordinary skill in the art would have been motivated to have provided key-point filtering to have kept the distinguishing feature points for generating the 3D object model. 
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino) and Holzer et al. US-PGPUB No. 2020/0234424 (hereinafter Holzer ‘424). 
Re Claim 10: 
The claim 10 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the tracked positions of the device identify position and orientation of the device with respect to the object-based coordinate system. 
Kaino and Holzer ‘424 further teach the claim limitation that the tracked positions of the device identify position and orientation of the device with respect to the object-based coordinate system (
Holzer ‘424 teaches at Paragraph 0346 that the user may adjust a position and/or orientation of the camera while it is capturing the image data. 
Holzer ‘424 teaches at Paragraph 0051 that a user may be guided to collect multi-view data and at Paragraph 0077 that a user may be presented with a graphical guide to assist the user in capturing an additional image from a target perspective and at Paragraph 0087 that a 3D reconstruction of the vehicle may be computed and fitted to an existing 3D CAD model of the vehicle in order to identify the single components and at Paragraph 0144-0145 that recording guidance for capturing an image for damage analysis is provided…the recording guidance may guide a user to position a camera to one or more specific positions and at Paragraph 0238 that a virtual guide can be inserted into live image data from a mobile and at Paragraph 0262 that the plurality of images can include images with different temporal information…the plurality of images can represent moving objects…the images may include an object of interest moving through scenery, such as a vehicle traveling along a road or a plane traveling through the sky and at Paragraph 0278 that the camera is moved in a convex motion 2910 and the convex motion 2910 can orbit around the object. It is noted that the object is moving, the convex motion 2910 orbiting around the object is also moving and at Paragraph 0338-0341 that the track can include indicators that provide feedback to a user while images associated with a MVIDMR are being recorded and the live image data is augmented with a path 3422…the cross hairs can move and remain on the object as the object 3500a moves in the image data. 
Kaino teaches at Paragraph 0059 generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view. The observation information can include an estimation result obtained by Pose Estimation, an estimation result of a position and an attitude of an observation device that is obtained by SLAM, depth information of each image obtained by a depth sensor. A 3D model to be output is represented by a point cloud including an aggregate of feature points, an aggregate of polygons including a plurality of feature points. The 3D model includes at least coordinate information of a feature point and at Paragraph 0100 the user performs observation, e.g., acquisition of a captured image and depth information while directing the terminal apparatus 200 toward an observed region of a real space and at Paragraph 0084 that the system 1 generates a 3D model 14 from the pet bottle 12 being a real object).  
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the object coordinate system to have calculated and determined the pose of the target object and based on which the 3D model of the target object can be formed. One of the ordinary skill in the art would have been motivated to have generated a 3D model based on the object coordinate system. 

Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of 
Holzer ‘424 et al. US-PGPUB No. 2020/0234424 (hereinafter Holzer ‘424); Shakib et al. US-PGPUB No. 2018/0144547 (hereinafter Shakib) and Zhou et al. US-PGPUB No. 2021/0158009 (hereinafter Zhou). 
Re Claim 11: 
The claim 11 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the device comprises a user interface, wherein tracking positions of the device in the object-based coordinate system during acquisition of the second set of images comprises displaying guiding indicators on the user interface to guide moving the device to a new position to acquire additional images of the object at the new position, wherein the guiding indicators guide moving the device to the new position and a new orientation, wherein the guiding indicators are positioned in 3D space in a live camera view of the device.
However, Holzer ‘424 implicitly teaches the claim limitation that the device comprises a user interface, wherein tracking positions of the device in the object-based coordinate system during acquisition of the second set of images comprises displaying guiding indicators on the user interface to guide moving the device to a new position to acquire additional images of the object at the new position, wherein the guiding indicators guide moving the device to the new position and a new orientation, wherein the guiding indicators are positioned in 3D space in a live camera view of the device (Holzer ‘424 teaches at Paragraph 0077 that a user may be presented with a graphical guide to assist the user in capturing an additional image from a target perspective and at Paragraph 0087 that a 3D reconstruction of the vehicle may be computed and fitted to an existing 3D CAD model of the vehicle in order to identify the single components).  
Shakib implicitly teaches the claim limitation the device comprises a user interface, wherein tracking positions of the device in the object-based coordinate system during acquisition of the second set of images comprises displaying guiding indicators on the user interface to guide moving the device to a new position to acquire additional images of the object at the new position (Shakib teaches at Paragraph 0029 that the user can walk around the perimeter of the room and capture different perspectives of the room from different points along the perimeter and at Paragraph 0030 that the trace lines 302 correspond to the path of the mobile device camera as it was moved by a user around the real world environment depicted in the 3D model 300 and at Paragraph 0031 that the system can present a user with interface including 3D model 200 and allow the user to essentially walk through and/or orbit around the room to view the room from different perspectives and to view specific objects in the room…the system can provide a representation of the 3D model from a first perspective and can receive input requesting movement of the virtual camera relative to the 3D model…the user can provide gesture input directing the virtual camera to move left, move right, move up, move down, or orbit/rotate horizontally around a vertical axis at a specific anchor point or orbit/rotate vertically around a horizontal axis at a specific anchor point ), wherein the guiding indicators guide moving the device to the new position and a new orientation, wherein the guiding indicators are positioned in 3D space in a live camera view of the device (Shakib teaches at Paragraph 0029 that the user can walk around the perimeter of the room and capture different perspectives of the room from different points along the perimeter and at Paragraph 0030 that the trace lines 302 correspond to the path of the mobile device camera as it was moved by a user around the real world environment depicted in the 3D model 300 and at Paragraph 0031 that the system can present a user with interface including 3D model 200 and allow the user to essentially walk through and/or orbit around the room to view the room from different perspectives and to view specific objects in the room…the system can provide a representation of the 3D model from a first perspective and can receive input requesting movement of the virtual camera relative to the 3D model…the user can provide gesture input directing the virtual camera to move left, move right, move up, move down, or orbit/rotate horizontally around a vertical axis at a specific anchor point or orbit/rotate vertically around a horizontal axis at a specific anchor point). 

However, Zhou explicitly teaches the claim limitation that the device comprises a user interface, wherein tracking positions of the device in the object-based coordinate system during acquisition of the second set of images comprises displaying guiding indicators on the user interface to guide moving the device to a new position to acquire additional images of the object at the new position, wherein the guiding indicators guide moving the device to the new position and a new orientation, wherein the guiding indicators are positioned in 3D space in a live camera view of the device (Zhou teaches at Paragraph 0011-0012 based on the sparse point cloud and the UAV flight trajectory, predict the completeness of the scene collection information and judge the details (density) of the building to obtain the confidence map of scene coverage and the details in need of close-up shots and optimize the flight path in real time…obtain the high-resolution images with more than 19 million pixels and at Paragraph 0019 extract SIFT feature to determine the area of the building in the current shot and at Paragraph 0025 for the remaining uncovered area, calculate the points to be added to the path and the orientation of the onboard camera lens, optimize the UAV flight path in real time and enable the UAV to complete the scene collection information in real time). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Zhou and/or Shakib to have modified the aggregation of feature points of Holzer ‘424 to have aggregated the feature points collected from the different observation points of view of the camera using the sparse point clouds collected from the different observation points. One of the ordinary skill in the art would have been motivated to have provided a complete 3D point clouds to have built the 3D model based on the complete coverage of the object by the point clouds. 
Re Claim 12: 
The claim 12 encompasses the same scope of invention as that of the claim 11 except additional claim limitation that when the object moves within a field of view of the camera of the device, the guiding indicators are moved with respect to the object based on an adjusted object-based coordinate system defined based on an adjusted position and an adjusted orientation of the object, wherein the adjusted position and the adjusted orientation of the object are based on the movement of the object.
However, Holzer ‘424 teaches the claim limitation that when the object moves within a field of view of the camera of the device, the guiding indicators are moved with respect to the object based on an adjusted object-based coordinate system defined based on an adjusted position and an adjusted orientation of the object, wherein the adjusted position and the adjusted orientation of the object are based on the movement of the object (Holzer ‘424 teaches at Paragraph 0051 that a user may be guided to collect multi-view data and at Paragraph 0077 that a user may be presented with a graphical guide to assist the user in capturing an additional image from a target perspective and at Paragraph 0087 that a 3D reconstruction of the vehicle may be computed and fitted to an existing 3D CAD model of the vehicle in order to identify the single components and at Paragraph 0144-0145 that recording guidance for capturing an image for damage analysis is provided…the recording guidance may guide a user to position a camera to one or more specific positions and at Paragraph 0238 that a virtual guide can be inserted into live image data from a mobile and at Paragraph 0262 that the plurality of images can include images with different temporal information…the plurality of images can represent moving objects…the images may include an object of interest moving through scenery, such as a vehicle traveling along a road or a plane traveling through the sky and at Paragraph 0278 that the camera is moved in a convex motion 2910 and the convex motion 2910 can orbit around the object. It is noted that the object is moving, the convex motion 2910 orbiting around the object is also moving and at Paragraph 0338-0341 that the track can include indicators that provide feedback to a user while images associated with a MVIDMR are being recorded and the live image data is augmented with a path 3422…the cross hairs can move and remain on the object as the object 3500a moves in the image data). 

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino); 
Holzer ‘424 et al. US-PGPUB No. 2020/0234424 (hereinafter Holzer ‘424) and Cherukuri US-PGPUB No. 2020/0242835 (hereinafter Cherukuri). 
Re Claim 13: 
The claim 13 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that tracking the positions of the device in the object-based coordinate system comprises adjusting the images of the physical environment using: a two-dimensional (2D) mask to remove background image pixels of the images, wherein the 2D mask is determined based on the coordinate system of the object; or a 3D bounding box constraint to remove background image pixels of the images, wherein the 3D bounding box constraint is determined based on the coordinate system of the object.
Holzer ‘424/Kaino implicitly teaches the claim limitation that tracking the positions of the device in the object-based coordinate system comprises adjusting the images of the physical environment using: a two-dimensional (2D) mask to remove background image pixels of the images, wherein the 2D mask is determined based on the coordinate system of the object (Holzer ‘424 teaches at Paragraph 0174 that a determination is made at 910 as to whether the pixel intersects with the object 3D mesh. If the pixel does not intersection with the object 3D mesh, then at 912 the pixel is set as belonging to the background and at Paragraph 0175 that the machine learning algorithm may identify 2D locations of each pixel in the top-down image and at Paragraph 0183 that a pixel may be ignored rather than setting it as a background pixel at 912); or a 3D bounding box constraint to remove background image pixels of the images, wherein the 3D bounding box constraint is determined based on the coordinate system of the object (Kaino teaches Paragraph 0107 that the system 1 deletes feature points encompassed by the allocated definite shape models and Paragraph 0117 that the system 1 deletes surrounding feature points of the allocated definite shaped model and at Paragraph 0118 that the system 1 allocates a cuboid definite shaped model 182 similar to the 3D model 181…a feature point 183 that is not included in the definite shaped model 182 remains and the system 1 deletes the feature point 183 existing within a predetermined distance from the definite shaped model 182….such a feature can be typically generated by an observation error). 
However, Cherukuri teaches the claim limitation that tracking the positions of the device in the object-based coordinate system comprises adjusting the images of the physical environment using: a two-dimensional (2D) mask to remove background image pixels of the images (Cherukuri teaches at Paragraph 0042 that the point cloud building component 122 is configured to extract 3D points triangulated out of the key frames ORB features and add them to the sparse point cloud and detecting and removing invalid cloud points, detecting and removing invalid key frames and optimizing the point cloud with local bundle adjustment), wherein the 2D mask is determined based on the coordinate system of the object (Cherukuri teaches at Paragraph 0042 that the point cloud building component 122 is configured to extract 3D points triangulated out of the key frames ORB features and add them to the sparse point cloud and detecting and removing invalid cloud points, detecting and removing invalid key frames and optimizing the point cloud with local bundle adjustment), or a 3D bounding box constraint to remove background image pixels of the images, wherein the 3D bounding box constraint is determined based on the coordinate system of the object (Cherukuri teaches at Paragraph 0044 that during construction of the sparse point cloud new points are successively added for each observation of a given point of view and at Paragraph 0045 the list of extracted features along with the sparse point cloud to build a dense point cloud and the computing device 106 converts the dense point cloud into a converted 3D model consisting of a mesh of triangles. The triangles constituting the 3D model are triangulated from a concave hull computing on the dense point cloud. 
Cherukuri teaches at Paragraph 0042 that the point cloud building component 122 is configured to extract 3D points triangulated out of the key frames ORB features and add them to the sparse point cloud and detecting and removing invalid cloud points, detecting and removing invalid key frames and optimizing the point cloud with local bundle adjustment). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Cherukuri to have modified the aggregation of feature points of Kaino and Holzer ‘424 to have aggregated the feature points collected from the different observation points of view of the camera (Kaino Paragraph 0059) using the sparse point clouds collected from the different observation points. One of the ordinary skill in the art would have been motivated to have provided a complete 3D point clouds to have built the 3D model based on the complete coverage of the object by the point clouds. 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of 
Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino); 
Smolic et al. US-PGPUB No. 2020/0320727 (hereinafter Smolic) and Zhou et al. US-PGPUB No. 2021/0158009 (hereinafter Zhou). 
Re Claim 14: 
The claim 14 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that acquiring the first set and the second set of one or more images of the object comprises acquiring depth information that includes a sparse 3D point cloud for each image, wherein tracking the positions of the device in the object-based coordinate system comprises adjusting the images of the physical environment based on a densification of the sparse 3D point clouds based on the 3D keypoints corresponding to the object..
With the exception of a densification (algorithm) of the sparse 3D point clouds, Kaino further teaches the claim limitation acquiring the first set and the second set of one or more images of the object comprises acquiring depth information that includes a sparse 3D point cloud for each image (Kaino teaches at Paragraph 0059 generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view. The observation information can include an estimation result obtained by Pose Estimation, an estimation result of a positon and an attitude of an observation device that is obtained by SLAM, depth information of each image obtained by a depth sensor. A 3D model to be output is represented by a point cloud including an aggregate of feature points, an aggregate of polygons including a plurality of feature points. The 3D model includes at least coordinate information of a feature point and at Paragraph 0100 the user performs observation, e.g., acquisition of a captured image and depth information while directing the terminal apparatus 200 toward an observed region of a real space and at Paragraph 0084 that the system 1 generates a 3D model 14 from the pet bottle 12 being a real object), where tracking the positions of the device in the object-based coordinate system comprises adjusting the images of the physical environment based on [a densification of] the sparse 3D point clouds based on the 3D keypoints corresponding to the object (Kaino teaches at Paragraph 0059 integrating a plurality of pieces of observation information obtained by a camera, a depth sensor or the like and a 3D model to be output is represented by a point cloud including an aggregate of feature points. 
Kaino teaches at Paragraph 0132 promoting the user to perform additional observation for making a generation of 3D model more detailed. 
Kaino teaches at Paragraph 0102-0103 that the SLAM can simultaneously estimate a position and an attitude of a camera and a position of a feature point included in an image of the camera…by performing matching between the environmental map and a 3D position of a feature point belonging to the object and at Paragraph 0059 that generating a 3D model by integrating a plurality of pieces of observation information obtained by a camera, a depth sensor from various types of points of view…the observation information can include an estimation result of a position and an attitude of an observation device….the 3D model includes at least coordinate information of a feature point. 
Kaino teaches at Paragraph 0151 the user preliminarily prepares a definite shaped model DB. Subsequently, the user causes a 3D model to be generated by observing a surrounding real space of the user and cause the 3D model to be updated by the allocation of a definite shaped model. At this time, complementing of an observed region by expansion of an allocated defined shape model such as extending a flat surface, granting of an Unknown flat, deletion of noise and the like are also performed and at Paragraph 0132 an unallocated portion can be typically generated due to insufficient observation or a complicated shape of a real object. The system 1 may display information specifically instructing a position and an attitude of the terminal apparatus 200).   
However, Smolic teaches the claim limitation of densification of the 3D sparse point clouds (Smolic teaches at Paragraph 0167 a sparse point cloud is calculated using SIFT features and a patch-based point cloud densification algorithm generates the final dense cloud where the density of the resulting 3D point cloud depends on the number of cameras and the amount of overlap in the image). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Smolic to have modified the aggregation of feature points of Kaino and Holzer ‘424 to have aggregated the feature points collected from the different observation points of view of the camera (Kaino Paragraph 0059) using the densification algorithm. One of the ordinary skill in the art would have been motivated to have provided a densified 3D point clouds to have built the 3D model based on the densified 3D point clouds. 
However, Zhou teaches the claim limitation of densification of the 3D sparse point clouds (Zhou teaches at Paragraph 0011-0012 based on the sparse point cloud and the UAV flight trajectory, predict the completeness of the scene collection information and judge the details (density) of the building to obtain the confidence map of scene coverage and the details in need of close-up shots and optimize the flight path in real time…obtain the high-resolution images with more than 19 million pixels and at Paragraph 0019 extract SIFT feature to determine the area of the building in the current shot and at Paragraph 0025 for the remaining uncovered area, calculate the points to be added to the path and the orientation of the onboard camera lens, optimize the UAV flight path in real time and enable the UAV to complete the scene collection information in real time). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Zhou to have modified the aggregation of feature points of Kaino and Holzer ‘424 to have aggregated the feature points collected from the different observation points of view of the camera (Kaino Paragraph 0059) using the sparse point clouds collected from the different observation points. One of the ordinary skill in the art would have been motivated to have provided a complete 3D point clouds to have built the 3D model based on the complete coverage of the object by the point clouds. 

Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino); 
Cherukuri US-PGPUB No. 2020/0242835 (hereinafter Cherukuri). 
Re Claim 15: 
The Claim 15 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that when the object is reoriented or repositioned within a field of view of the camera, the method further comprises a relocalization process, the relocalization process comprising: comparing a first image of the first set of images of the physical environment with a plurality of keyframe images of the object, the first image comprising the object; identifying a first keyframe from the plurality of keyframe images based on the comparing, the first keyframe associated with a first keyframe position in the coordinate system; and based on identifying the first keyframe, determining a re-localized position of the device with respect to the object-based coordinate system based on the first keyframe position. 
However, Cherukuri teaches the claim limitation when the object is reoriented or repositioned within a field of view of the camera, the method further comprises a relocalization process, the relocalization process comprising: comparing a first image of the first set of images of the physical environment with a plurality of keyframe images of the object, the first image comprising the object; identifying a first keyframe from the plurality of keyframe images based on the comparing, the first keyframe associated with a first keyframe position in the coordinate system; and based on identifying the first keyframe, determining a re-localized position of the device with respect to the object-based coordinate system based on the first keyframe position (Cherukuri teaches at Paragraph 0041 that the pose tracking component 120 is configured to determine if a new key frame need to be inserted into the set of key frames…so that it can re-localize the device 102 in the sparse point cloud and resume tracking. 
Cherukuri teaches at Paragraph 0039 that the computing device 106 is configured to compute a dense point cloud utilizing the scan data…convert the dense point cloud to a 3D mesh and at Paragraph 0040 that the scan data enables to track the position and orientation of the AR device 102 capturing the video feed data in 3D space. The constructed sparse point cloud provides a key frame data comprising a list of key frames. 
Cherukuri teaches at Paragraph 0009 the scan component comprises a pose tracking component, a point cloud building component and a loop closing component…The pose tracking component is configured to determine an orientation information of the AR device in a 3D space with respect to each frame of the video feed data and determine if each frame need to be added to the sparse point cloud and list of key frames based on the orientation information…The sparse point cloud is constructed from a local feature descriptor using SLAM algorithm). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Cherukuri to have modified the aggregation of feature points of Saunders and Li to have aggregated the feature points collected from the different observation points of view of the camera using the sparse point clouds collected from the different observation points. One of the ordinary skill in the art would have been motivated to have provided a complete 3D point clouds to have built the 3D model based on the complete coverage of the object by the point clouds. 
Re Claim 16: 
The claim 16 encompasses the same scope of invention as that of the claim 15 except additional claim limitation wherein the object is reoriented or repositioned within the field of view of the camera following a period in which the object is not within the field of view of the camera.
Cherukuri teaches the claim limitation wherein the object is reoriented or repositioned within the field of view of the camera following a period in which the object is not within the field of view of the camera (Cherukuri teaches at Paragraph 0041 that the pose tracking component 120 is configured to determine if a new key frame need to be inserted into the set of key frames….The output of the pose tracking component 120 is the current pose of the device in 3D space, as well as a decision whether the sparse point cloud and set of key frames should be expanded by that the object is reoriented or repositioned within the field of view of the camera following a period in which the object is not within the field of view of the camera he currently processed frame…so that it can re-localize the device 102 in the sparse point cloud and resume tracking…in case an initial pose estimate cannot be obtained, the system 100 is presumed to be lost). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Cherukuri to have modified the aggregation of feature points of Saunders and Li to have aggregated the feature points collected from the different observation points of view of the camera using the sparse point clouds collected from the different observation points. One of the ordinary skill in the art would have been motivated to have provided a complete 3D point clouds to have built the 3D model based on the complete coverage of the object by the point clouds. 

Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. US-PGPUB No. 2018/0211404 (hereinafter Zhu) in view of 
Kaino et al. US-PGPUB No. 2020/0394841 (hereinafter Kaino); 
Holzer ‘424 et al. US-PGPUB No. 2020/0234424 (hereinafter Holzer ‘424); and Cherukuri US-PGPUB No. 2020/0242835 (hereinafter Cherukuri). 
Re Claim 17: 
The claim 17 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the generated 3D model of the object is determined based on refined images, wherein the refined images are determined based on at least one of a 3D keypoint interpolation, densification of 3D sparse point clouds associated with the images, a two-dimensional (2D) mask corresponding to the object to remove background image pixels of the images, a 3D bounding box constraint corresponding to the object to remove background image pixels of the images.
Holzer ‘424/Kaino implicitly teaches the claim limitation that the generated 3D model of the object is determined based on refined images, wherein the refined images are determined based on at least one of a 3D keypoint interpolation, densification of 3D sparse point clouds associated with the images, a two-dimensional (2D) mask corresponding to the object to remove background image pixels of the images, a 3D bounding box constraint corresponding to the object to remove background image pixels of the images (Holzer ‘424 teaches at Paragraph 0174 that a determination is made at 910 as to whether the pixel intersects with the object 3D mesh. If the pixel does not intersection with the object 3D mesh, then at 912 the pixel is set as belonging to the background and at Paragraph 0175 that the machine learning algorithm may identify 2D locations of each pixel in the top-down image and at Paragraph 0183 that a pixel may be ignored rather than setting it as a background pixel at 912. 
Kaino teaches Paragraph 0107 that the system 1 deletes feature points encompassed by the allocated definite shape models and Paragraph 0117 that the system 1 deletes surrounding feature points of the allocated definite shaped model and at Paragraph 0118 that the system 1 allocates a cuboid definite shaped model 182 similar to the 3D model 181…a feature point 183 that is not included in the definite shaped model 182 remains and the system 1 deletes the feature point 183 existing within a predetermined distance from the definite shaped model 182….such a feature can be typically generated by an observation error). 
However, Smolic et al. US-PGPUB No. 2020/0320727 teaches the claim limitation of densification of the 3D sparse point clouds (Smolic teaches at Paragraph 0167 a sparse point cloud is calculated using SIFT features and a patch-based point cloud densification algorithm generates the final dense cloud where the density of the resulting 3D point cloud depends on the number of cameras and the amount of overlap in the image). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Smolic to have modified the aggregation of feature points of Kaino and Holzer ‘424 to have aggregated the feature points collected from the different observation points of view of the camera (Kaino Paragraph 0059) using the densification algorithm. One of the ordinary skill in the art would have been motivated to have provided a densified 3D point clouds to have built the 3D model based on the densified 3D point clouds. 
However, Cherukuri teaches the claim limitation of densification of the 3D sparse point clouds (Cherukuri teaches at Paragraph 0041 that the pose tracking component 120 is configured to determine if a new key frame need to be inserted into the set of key frames…so that it can re-localize the device 102 in the sparse point cloud and resume tracking. 
Cherukuri teaches at Paragraph 0039 that the computing device 106 is configured to compute a dense point cloud utilizing the scan data…convert the dense point cloud to a 3D mesh and at Paragraph 0040 that the scan data enables to track the position and orientation of the AR device 102 capturing the video feed data in 3D space. The constructed sparse point cloud provides a key frame data comprising a list of key frames). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Zhou to have modified the aggregation of feature points of Kaino and Holzer ‘424 to have aggregated the feature points collected from the different observation points of view of the camera (Kaino Paragraph 0059) using the sparse point clouds collected from the different observation points. One of the ordinary skill in the art would have been motivated to have provided a complete 3D point clouds to have built the 3D model based on the complete coverage of the object by the point clouds. 

Re Claim 18: 
The claim 18 encompasses the same scope of invention as that of the claim 17 except additional claim limitation that the 3D keypoint interpolation, the densification of the 3D sparse point clouds, the 2D mask, and the 3D bounding box constraint are based on the coordinate system of the object.
However, Cherukuri US-PGPUB No. 2020/0242835 teaches the claim limitation that the 3D keypoint interpolation, the densification of the 3D sparse point clouds, the 2D mask, and the 3D bounding box constraint are based on the coordinate system of the object (Cherukuri teaches at Paragraph 0041 that the pose tracking component 120 is configured to determine if a new key frame need to be inserted into the set of key frames….The output of the pose tracking component 120 is the current pose of the device in 3D space, as well as a decision whether the sparse point cloud and set of key frames should be expanded by that the object is reoriented or repositioned within the field of view of the camera following a period in which the object is not within the field of view of the camera he currently processed frame…so that it can re-localize the device 102 in the sparse point cloud and resume tracking…in case an initial pose estimate cannot be obtained, the system 100 is presumed to be lost). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the densification algorithm of Cherukuri to have modified the aggregation of feature points of Kaino and Holzer ‘424 to have aggregated the feature points collected from the different observation points of view of the camera (Kaino Paragraph 0059) using the sparse point clouds collected from the different observation points. One of the ordinary skill in the art would have been motivated to have provided a complete 3D point clouds to have built the 3D model based on the complete coverage of the object by the point clouds.  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at 571-270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIN CHENG WANG/Primary Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Jul 08, 2024
Application Filed
Mar 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/270,926
Patent 12594883
DISPLAY DEVICE FOR DISPLAYING PATHS OF A VEHICLE
2y 5m to grant Granted Apr 07, 2026
16/703,494
Patent 12597086
Tile Region Protection in a Graphics Processing System
2y 5m to grant Granted Apr 07, 2026
18/291,702
Patent 12592012
METHOD, APPARATUS, ELECTRONIC DEVICE AND READABLE MEDIUM FOR COLLAGE MAKING
2y 5m to grant Granted Mar 31, 2026
17/655,739
Patent 12586270
GENERATING AND MODIFYING DIGITAL IMAGES USING A JOINT FEATURE STYLE LATENT SPACE OF A GENERATIVE NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
17/888,216
Patent 12579709
IMAGE SPECIAL EFFECT PROCESSING METHOD AND APPARATUS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
59%
Grant Probability
69%
With Interview (+10.3%)
3y 7m
Median Time to Grant
Low
PTA Risk
Based on 832 resolved cases by this examiner. Grant probability derived from career allow rate.