Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed December 9th, 2025 has been entered. Claims 1, 9, 14 and 22 have been amended. Claims 1-26 remain pending and rejected in the application. Applicant’s amendments to the specifications and claims have overcome each and every objection previously set forth in the Non-Final Office Action mailed September 18th, 2025 and have therefore been withdrawn.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5 and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al. (Pub. No.: US 2023/0245396 A1), hereinafter Xiong, in view of Fleishman et al. (Pub. No.: US 2019/0043203 A1), hereinafter Fleishman and further in view of Wang et al. (US 2025/0181145 A1), hereinafter Wang.
Regarding claim 1, Xiong discloses an apparatus for image processing (Paragraph 22 teaches that FIG. 1 illustrates a non-limiting example of a device 100 for performing three-dimensional scene reconstruction and understanding in extended reality (XR) applications according to some embodiments of this disclosure.), comprising:
at least one memory (Paragraph 23 teaches that the device 100 further includes a speaker 130, a main processor 140, an input/output (I/O) interface (IF) 145, I/O device(s) 150, and a memory 160. The memory 160 includes an operating system (OS) program 161 and one or more applications 162.);
and at least one processor coupled to the at least one memory (Paragraph 27 teaches that the main processor 140 can include one or more processors or other processing devices and execute the OS program 161 stored in the memory 160 in order to control the overall operation of the device 100.) and configured to:
generate a set of surface representation values for one or more surfaces visible in one or more first images using a first three dimensional scene reconstruction (3DR) algorithm (Paragraph 80 teaches that referring to the illustrative example of FIG. 4, at an operation 445, values of a TSDF are computed based on the surface normal and dense depth map to generate one or more voxel grids for the one or more areas representing one or more planes detected and tracked at the operations 439 and 441. Additionally, paragraph 81 teaches that according to various embodiments, the three-dimensional mesh may be generated incrementally from the boundaries of planes or objects, although other meshing algorithms may be used.). However, Xiong fails to disclose preprocess the set of surface representation values to generate preprocessing information for a second 3DR algorithm.
Fleishman discloses preprocess the set of surface representation values to generate preprocessing information for a second 3DR algorithm (Paragraph 68 teaches that as part of the feature extraction, process 500 may include “extract historical semantic high level intermediate value features” 520. Here, feature extraction is performed on the semantic segmentation map, which already has semantic labels on the map from the 3D semantic model, and reflect the past semantic labeling on the 3D semantic model. This extraction may be performed separately from the extraction of features of the RGBD current image. Specifically, this may first involve a pre-processing operation by pre-processing unit 810 (FIG. 8) that converts the rendered semantic map input to an expected format for input to a neural network.). Since Xiong teaches generating surface representation values using a first type of three dimensional scene reconstruction (3DR) algorithm and Fleishman teaches a pre-processing method that can be used for generating a different type of 3DR algorithm, it would have been obvious to a person having ordinary skill in the art to combine the concepts together so that when generating surface representation values (e.g., TSDF values) for use in a machine learning algorithm or a different type of algorithm, such as an computer vision algorithm, the values could be pre-processed ahead of time and then could be applied and used with any necessary requested algorithm.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xiong to incorporate the teachings of Fleishman, so that any surface values generated for use for any specific algorithm could be pre-processed and thus provide better performing and more accurate algorithms.
However, Xiong in view of Fleishman fail to disclose wherein the preprocessing information indicates an area of the set of surface representation values for refinement, wherein the area of the set of surface representation values for refinement is less than the set of surface representation values for the one or more surfaces visible in the one or more first images.
Wang discloses wherein the preprocessing information indicates an area of the set of surface representation values for refinement, wherein the area of the set of surface representation values for refinement is less than the set of surface representation values for the one or more surfaces visible in the one or more first images (Paragraphs 63 and 64 teach that in Operation 402: Determine an SDF value of a voxel based on a distance from the voxel in the stereoscopic space to the camera and a depth value of a projection point of the voxel on a camera image, the SDF value being configured for representing a positional relationship between the voxel and a surface of an obstacle. For example, a signed distance field (SDF) value for each voxel of a plurality of voxels in the stereoscopic space is determined based on a distance from the respective voxel to the camera and a depth value of a voxel projection point on a surface of the obstacle. The voxel projection point is an intersection point between a three-dimensional line from the voxel to an optical center of the camera and the surface of the obstacle. The SDF value represents a positional relationship between the respective voxel and the surface of the obstacle. In this aspect, a positional relationship between a voxel and a surface of an obstacle in the environment is represented by an SDF value. SDF is a manner of expressing a three-dimensional model in a voxel grid. Additionally, paragraph 82 teaches that in Operation 403a: Perform three-dimensional reconstruction calculation based on the SDF values of all the voxels by using a marching cubes (MC) algorithm, to obtain an initial three-dimensional environment model and lastly, FIG. 8 and paragraph 113 teach that as shown in FIG. 8, after the initial three-dimensional environment model is obtained through calculation by using the MC algorithm, a repair process 801 may be included. In this process, patch division is first performed on the initial three-dimensional environment model. Then, duplicate points and duplicate patches are removed. In addition, an independent point, an independent edge, an independent patch, and a pathological patch are removed, and a connection relationship is re-established. After the repair process 801 ends, a simplification process 802 is performed, including removing a patch whose total quantity of connections is less than 10, removing an independent region with a total volume less than one thousandth of a total volume of a model (mesh), and removing a patch 3 m away. After the simplification process 802 ends, smoothing processing is performed, that is, Laplacian filtering is performed on the mesh, to obtain the three-dimensional environment model.). Since Xiong in view of Fleishman teach steps for preprocessing surface representation values using information from a three dimensional scene reconstruction (3DR) algorithm and Wang teaches a three-dimensional reconstruction refinement algorithm for determining and indicating surface representation values based on a distance relationship between a voxel and a surface area of an object and then making refinement updates based on functions that determine whether surface regions are less than a certain value compared to other surface regions, it would have been obvious to a person having ordinary skill in the art to combine the concepts together so that the three dimensional scene reconstruction algorithm being used to preprocess surface representation values would include and indicate a particular area of a surface representation value for refinement usage and be able recognize if those values were less than another visible surface representation value when generating information for updating and refining the environment.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xiong in view of Fleishman to incorporate the teachings of Wang, so that a specific reconstruction algorithm could be utilized to indicate a particular area of a surface representation value, which would improve overall accuracy during the reconstruction process by focusing on a particular set of surface area.
Furthermore, Xiong in view of Fleishman and Wang disclose generate, by the second 3DR algorithm, a refined set of surface representation values for the indicated areas based on the set of surface representation values and the preprocessing information (Paragraph 94 of Xiong teaches that in embodiments in which three-dimensional reconstruction involves calculating projections of rays from one or more specified poses (such as in TSDF-based approaches), errors or inconsistencies in pose data can be readily propagated into the three-dimensional reconstruction of the scene. According to various embodiments, at an operation 725, the back end generates a three-dimensional reconstruction of at least part of the real-world scene based on the image data, the depth data, and (in some embodiments) the optimized pose data. Additionally, paragraph 61 of Fleishman teaches that process 500 may include “generate/update 3D geometric model” 506, and also as mentioned above, the 3D geometric model, which may be operated by “performing RGB-SLAM” 508, or other methods, may initially be formed from one or more depth maps. ... Thus, in the RGBD-SLAM here, each frame not only adds new portions, but also refines and improves already existing areas of the geometric models which are seen in the frame.);
and output the refined set of surface representation values (Paragraph 66 of Xiong teaches that referring to the illustrative example of FIG. 4, a depth reconstruction stage 401 outputs a reconstructed high-resolution depth map. The reconstructed high-resolution depth map can be generated using various inputs.).
Regarding claim 2, Xiong in view of Fleishman and Wang disclose everything claimed as applied above (see claim 1), in addition, Xiong in view of Fleishman and Wang disclose wherein the first 3DR algorithm comprises at least one of a computer vision based 3DR algorithm or a machine learning based 3DR algorithm (Paragraph 24 of Fleishman teaches that a number of different RGBD-SLAM algorithms may be used to perform the 3D geometric construction. A dense RGBD-SLAM algorithm may use a 3D reconstruction algorithm that builds a 3D model incrementally, referring to adding increments, or 3D sections, to the 3D geometric model 110 one at a time, and may be provided by different frame each time and paragraph 74 of Fleishman teaches that the 3D semantic model consists of a truncated signed distance function for each voxel from RGBD-SLAM algorithm and a semantic data in the form of the top-X semantic classes.), and wherein the second 3DR algorithm comprises at least one of a machine learning based 3DR algorithm or a computer vision based 3DR algorithm (Paragraph 50 of Xiong teaches that according to various embodiments, the architecture 300 splits up the operations of generating an XR display such that the often computationally-expensive tasks associated with generating a scene reconstruction and scene comprehension are performed by the back end 395 and obtaining initial pose data and rendering virtual objects of a frame of an XR display are performed by the front end 397. This bifurcation between the back end 395 and the front end 397 facilitates the reapportionment of the processing tasks associated with generating an XR display. As a result, the computationally-intensive tasks performed by the back end 395 may, in certain embodiments, be performed using multicore processor architectures (such as one where one set of processing cores is designed for energy efficiency and a second set of processing cores is designed for performance) or chips (such as neural processing units (NPUs) specifically intended for implementing neural networks and machine learning algorithms)).
Regarding claim 3, Xiong in view of Fleishman and Wang disclose everything claimed as applied above (see claim 2), in addition, Xiong in view of Fleishman and Wang disclose wherein, to preprocess the set of surface representation values, the at least one processor is configured to identify areas of the set of surface representation values that may be refined (Paragraph 59 of Fleishman teaches now to the explanation of process 500, this method may include “obtain image data of current frame of video sequence” 502, and as mentioned above, may include obtaining raw RGB data, pre-processing the image data sufficient for geometric and semantic segmentation as well as other applications. Additionally, paragraph 60 of Xiong teaches that as shown in the explanatory example of FIG. 3, at an operation 333, the three-dimensional scene shape generating module 327 performs semantic segmentation, identifying objects and boundaries of object regions from the data obtained from the image sensor(s) 305 and the depth sensor(s) 303. According to some embodiments, the operation 333 may be performed by applying one or more neural network-based object recognition algorithms executed by a processor.).
Regarding claim 4, Xiong in view of Fleishman and Wang disclose everything claimed as applied above (see claim 3), in addition, Xiong in view of Fleishman and Wang disclose wherein the first 3DR algorithm generates weight values corresponding to areas of the set of surface representation values, and wherein identifying areas of the set of surface representation values that may be refined is based on the weight values (Paragraph 70 of Xiong teaches that as shown in FIG. 4, once Gaussian distribution weights for pose differentials, color differentials, and spatial differentials in the neighborhood of a point p are computed at the operation 421, a depth reconstruction filter F can propagate depth points in the neighborhood N of point p at the operation 423 to obtain a reconstruct dense depth map data 425.).
Regarding claim 5, Xiong in view of Fleishman and Wang disclose everything claimed as applied above (see claim 3), in addition, Xiong in view of Fleishman and Wang disclose wherein identifying areas of the set of surface representation values that may be refined is based on at least one of a segmentation map or object detection information (Paragraph 18 of Xiong teaches that FIGS. 5A-5C illustrate visual aspects of object detection, semantic segmentation, and instance segmentation according to some embodiments of this disclosure and paragraph 85 of Xiong teaches that FIGS. 5A-5C illustrate visual aspects of object detection, semantic segmentation, and instance segmentation according to some embodiments of this disclosure. Referring to the illustrative example of FIG. 5A, a frame of image data 500 of a scene is shown. The scene includes a person and six sculptures having human forms. In the example of FIG. 5A, object detection and extraction of regions of interest (ROIs) according to some embodiments of this disclosure have been performed. As shown in FIG. 5A, ROIs corresponding to objects for which a neural network trained for object recognition has been detected have been defined.).
Regarding claim 14, the method steps correspond to and are rejected similarly to the apparatus steps of claim 1 (see claim 1 above).
Regarding claim 15, the method steps correspond to and are rejected similarly to the apparatus steps of claim 2 (see claim 2 above).
Regarding claim 16, the method steps correspond to and are rejected similarly to the apparatus steps of claim 3 (see claim 3 above).
Regarding claim 17, the method steps correspond to and are rejected similarly to the apparatus steps of claim 4 (see claim 4 above).
Regarding claim 18, the method steps correspond to and are rejected similarly to the apparatus steps of claim 5 (see claim 5 above).
Claims 9, 10, 12, 13, 22, 23, 25 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Fleishman and further in view of Sato et al. (US 2010/0303344 A1), hereinafter Sato.
Regarding claim 9, Xiong discloses an apparatus for image processing (Paragraph 22 teaches that FIG. 1 illustrates a non-limiting example of a device 100 for performing three-dimensional scene reconstruction and understanding in extended reality (XR) applications according to some embodiments of this disclosure.), comprising:
at least one memory (Paragraph 23 teaches that the device 100 further includes a speaker 130, a main processor 140, an input/output (I/O) interface (IF) 145, I/O device(s) 150, and a memory 160. The memory 160 includes an operating system (OS) program 161 and one or more applications 162.);
and at least one processor coupled to the at least one memory (Paragraph 27 teaches that the main processor 140 can include one or more processors or other processing devices and execute the OS program 161 stored in the memory 160 in order to control the overall operation of the device 100.) and configured to:
generate a first set of surface representation values for one or more surfaces visible in one or more first images using a first three dimensional scene reconstruction (3DR) algorithm (Paragraph 80 teaches that referring to the illustrative example of FIG. 4, at an operation 445, values of a TSDF are computed based on the surface normal and dense depth map to generate one or more voxel grids for the one or more areas representing one or more planes detected and tracked at the operations 439 and 441. Additionally, paragraph 81 teaches that at an operation 447, the processing platform performs volume reconstruction based on the one or more voxel grids obtained through computation of the TSDF for one or more regions of the real-world operating environment to obtain a three-dimensional scene reconstruction 449. ... According to various embodiments, the three-dimensional mesh may be generated incrementally from the boundaries of planes or objects, although other meshing algorithms may be used.). However, Xiong fails to disclose generate a second set of surface representation values for surfaces visible in one or more second images using a second 3DR algorithm, wherein the second 3DR algorithm differs from the first 3DR algorithm.
Fleishman discloses generate a second set of surface representation values for surfaces visible in one or more second images using a second 3DR algorithm, wherein the second 3DR algorithm differs from the first 3DR algorithm (Paragraph 61 teaches that process 500 may include “generate/update 3D geometric model” 506, and also as mentioned above, the 3D geometric model, which may be operated by “performing RGB-SLAM” 508, or other methods, may initially be formed from one or more depth maps. Additionally, paragraph 74 teaches that process 500 may include “register semantic class outputs to update 3D semantic model” 532, and this is performed by placing the semantic labels from the output of the last neural network layer and onto the matching segments or voxels of the 3D semantic segmentation model. In order to conserve memory, the top-X class candidate semantic labels may be stored at each voxel. The 3D semantic model consists of a truncated signed distance function for each voxel from RGBD-SLAM algorithm and a semantic data in the form of the top-X semantic classes.). Since Xiong teaches generating surface representation values (e.g., TSDF values) using a first type of algorithm and Fleishman teaches utilizing generated TSDF values for a different type of algorithm, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that different types of algorithms could be utilized for processing and generating surface representation values.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xiong to incorporate the teachings of Fleishman, so that when generating surface representation values (e.g., TSDF values), multiple different algorithms could be used, which would help improve the quality and accuracy of 3D reconstructions.
In addition, Xiong in view of Fleishman disclose combine, using machine learning model, the first set of surface representation values and second set of surface representation values into a refined set of surface representation values (Paragraph 50 of Fleishman teaches that process 400 may include “generate a current and historical semantically segmented frame comprising using both the current semantic features and the historically-influenced semantic features as input to a neural network that indicates semantic labels for areas of the current historical semantically segmented frame” 410. This may occur in a number of different ways as long as both the current semantic features and the historically-influenced semantic features are input for analysis together such as input to a neural network, such as a CNN. Thus, the current semantic features and the historically-influenced semantic features may be combined, or by one example concatenated, before being input to the neural network together.). However, Xiong in view of Fleishman fail to disclose based on counters received from the first 3DR algorithm, wherein the counters indicate how many times a block of the first set of surface representation values was updated.
Sato discloses based on counters received from the first 3DR algorithm, wherein the counters indicate how many times a block of the first set of surface representation values was updated (Paragraph 303 teaches that the simplex method is based on the expectation that if one of the vertices of the simplex that has the greatest function value is selected, the function value of its mirror image will decrease. If this expectation is right, the minimum value of the function can be obtained by repeating the same process a number of times. That is to say, the parameter given by the initial value is repeatedly updated by three kinds of operations until the error from the target, represented by an estimate function, becomes less than a threshold value. Additionally, paragraph 306 teaches that in Step S411, counters n and k that memorize the number of times of update of iterative operations are reset to zero. In this case, the counter n counts the number of times the initial value has been updated. On the other hand, the counter k counts the number of times a candidate parameter according to the simplex method has been updated with respect to an initial value.). Since Xiong in view of Fleishman teach generating sets of surface representation values (e.g., TSDF values) using different reconstruction algorithms for use in a machine learning model and Sato teaches a functionality that can be used within an image reconstruction algorithm that can determine whether an particular value has been updated or not and then keep track and count how many times that that value has been updated, it would have been obvious to a person having ordinary skill in the art to combine the functions together so that the surface representation values being determined from the different reconstruction algorithms, could implement a counting function to then keep track of how many times a particular block of data related to the surface representation values has been updated.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xiong in view of Fleishman to incorporate the teachings of Sato, so that the amount of times that a block of surface representation values were updated could then be tracked and counted, which would help improve the overall surface accuracy and detail during the reconstruction/machine learning process by being able to identify frequently updated surface areas, which would indicate that those areas need more detail refinements compared to areas that show less updates being performed on them.
Furthermore, Xiong in view of Fleishman and Sato disclose and output the refined set of surface representation values (Paragraph 66 of Xiong teaches that referring to the illustrative example of FIG. 4, a depth reconstruction stage 401 outputs a reconstructed high-resolution depth map. The reconstructed high-resolution depth map can be generated using various inputs.).
Regarding claim 10, Xiong in view of Fleishman and Sato disclose everything claimed as applied above (see claim 9), in addition, Xiong in view of Fleishman and Sato disclose wherein the first 3DR algorithm and second 3DR algorithm comprise at least one of a computer vision based 3DR algorithm (Paragraph 24 of Fleishman teaches that a number of different RGBD-SLAM algorithms may be used to perform the 3D geometric construction. A dense RGBD-SLAM algorithm may use a 3D reconstruction algorithm that builds a 3D model incrementally, referring to adding increments, or 3D sections, to the 3D geometric model 110 one at a time, and may be provided by different frame each time and paragraph 74 of Fleishman teaches that the 3D semantic model consists of a truncated signed distance function for each voxel from RGBD-SLAM algorithm and a semantic data in the form of the top-X semantic classes.) and a machine learning based 3DR algorithm (Paragraph 50 of Xiong teaches that according to various embodiments, the architecture 300 splits up the operations of generating an XR display such that the often computationally-expensive tasks associated with generating a scene reconstruction and scene comprehension are performed by the back end 395 and obtaining initial pose data and rendering virtual objects of a frame of an XR display are performed by the front end 397. This bifurcation between the back end 395 and the front end 397 facilitates the reapportionment of the processing tasks associated with generating an XR display. As a result, the computationally-intensive tasks performed by the back end 395 may, in certain embodiments, be performed using multicore processor architectures (such as one where one set of processing cores is designed for energy efficiency and a second set of processing cores is designed for performance) or chips (such as neural processing units (NPUs) specifically intended for implementing neural networks and machine learning algorithms)).
Regarding claim 12, Xiong in view of Fleishman and Sato disclose everything claimed as applied above (see claim 9), in addition, Xiong in view of Fleishman and Sato disclose wherein the combining is based on at least one of depth information, segmentation information, or object detection information received by the machine learning model (Paragraph 18 of Xiong teaches that FIGS. 5A-5C illustrate visual aspects of object detection, semantic segmentation, and instance segmentation according to some embodiments of this disclosure and Paragraph 86 of Xiong teaches that FIG. 5B provides an illustrative visualization of a semantic segmentation 505 of frame of image data 500 from FIG. 5A. Referring to the explanatory example of FIG. 5B, each the constituent pixels of the image data 500 has been classified (such as by using a machine learning tool trained for semantic segmentation, like DeepLab)).
Regarding claim 13, Xiong in view of Fleishman and Sato disclose everything claimed as applied above (see claim 9), in addition, Xiong in view of Fleishman and Sato disclose wherein the set of surface representation values comprises a truncated sign distance function (TSDF) values (Paragraph 81 of Xiong teaches that at an operation 447, the processing platform performs volume reconstruction based on the one or more voxel grids obtained through computation of the TSDF for one or more regions of the real-world operating environment to obtain a three-dimensional scene reconstruction 449. Specifically, at the operation 447, image data is used to transform the voxel grid (which in many embodiments includes a grid or matrix of square or hexagonal voxels that may or may not accurately represent object boundaries or edges of objects in the real-world operating environment) to a three-dimensional mesh of depth points connected by lines to define planar regions (typically triangles) within part or all of the real-world operating environment described by the one or more voxel grids.).
Regarding claim 22, the method steps correspond to and are rejected similarly to the apparatus steps of claim 9 (see claim 9 above).
Regarding claim 23, the method steps correspond to and are rejected similarly to the apparatus steps of claim 10 (see claim 10 above).
Regarding claim 25, the method steps correspond to and are rejected similarly to the apparatus steps of claim 12 (see claim 12 above).
Regarding claim 26, the method steps correspond to and are rejected similarly to the apparatus steps of claim 13 (see claim 13 above).
Claims 6-8 and 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Fleishman and Wang, as applied to claims 2 and 15 above, and further in view of Paluri (Pub. No.: US 2019/0197667 A1).
Regarding claim 6, Xiong in view of Fleishman and Wang disclose everything claimed as applied above (see claim 2), however, Xiong in view of Fleishman and Wang fail to disclose wherein, to preprocess the set of surface representation values, the at least one processor is configured to upsample the set of surface representation values generated by the machine learning based 3DR algorithm.
Paluri discloses wherein, to preprocess the set of surface representation values, the at least one processor is configured to upsample the set of surface representation values generated by the machine learning based 3DR algorithm (Paragraph 44 teaches that with these inputs, the machine learning model generates and outputs upsampled depth information. In one embodiment, the upsampled depth information is a feature vector corresponding to an upsampled depth image, hereafter referred to as the upsampled depth feature vector. In another embodiment, the outputted upsampled depth information is an upsampled depth image. The upsampled depth image can be predicted from the upsampled depth feature vector.). Since Xiong in view of Fleishman teach the initial steps for performing pre-processing of the surface representation values and Paluri teaches the function of generating and up-sampling data by utilizing a machine learning model, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that a machine learning model could be implemented into the pre-processing process to incorporate the function of being able to then up-sample any generated data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xiong in view of Fleishman to incorporate the teachings of Paluri so that any pre-processed data and/or values could be up-sampled using a machine learning model which would help improve the models overall performance and accuracy.
Regarding claim 7, Xiong in view of Fleishman, Wang and Paluri disclose everything claimed as applied above (see claim 6), in addition, Xiong in view of Fleishman, Wang and Paluri disclose wherein, to generate the refined set of surface representation values, the at least one processor is configured to generate refined surface representation values based on the upsampled set of surface representation values (Paragraph 53 of Paluri teaches that in various embodiments, the model training module 145 backpropagates the error such that the parameters of the machine learning model are tuned to minimize the error. As one example, the parameters of the machine learning model are tuned to improve the generation of an upsampled depth feature vector. As another example, the parameters of the machine learning model can be tuned to better predict an upsampled depth image from an upsampled depth feature vector. Specifically, the assigned weight to each feature in the upsampled depth feature vector can be tuned to minimize the error. In one embodiment, the machine learning model is a fully convoluted neural network and therefore, the model training module 145 minimizes the error by tuning an N×N learned patch. Therefore, the machine learning model can identify and extract features from the color images and low resolution depth image using the N×N patch. The extracted features can be used by the machine learning model to generate an output with reduced error.).
Regarding claim 8, Xiong in view of Fleishman, Wang and Paluri disclose everything claimed as applied above (see claim 6), in addition, Xiong in view of Fleishman, Wang and Paluri disclose wherein the set of surface representation values comprises a truncated sign distance function (TSDF) values for voxels, wherein, to upsample the set of surface representation values, the at least one processor is configured to upsample a TSDF value for a voxel to multiple TSDF values for a block of voxels, wherein the at least one processor is configured to receive depth information associated with the block of voxels, and wherein, to generate the refined set of surface representation values, the at least one processor is configured to update the multiple TSDF values for the block of voxels based on the received depth information (Paragraph 59 of Xiong teaches that in some embodiments, the operation 335 may be performed using a truncated signed distance field (TSDF), which utilizes multiple views of an object or operating environment and determines distances between each object and its distance to the nearest surface and can be less sensitive to artifacts or irregularities in a depth map. Additionally, paragraph 81 of Xiong teaches that at an operation 447, the processing platform performs volume reconstruction based on the one or more voxel grids obtained through computation of the TSDF for one or more regions of the real-world operating environment to obtain a three-dimensional scene reconstruction 449. Specifically, at the operation 447, image data is used to transform the voxel grid (which in many embodiments includes a grid or matrix of square or hexagonal voxels that may or may not accurately represent object boundaries or edges of objects in the real-world operating environment) to a three-dimensional mesh of depth points connected by lines to define planar regions (typically triangles) within part or all of the real-world operating environment described by the one or more voxel grids. Also, paragraph 92 of Fleishman teaches that a semantic segmentation update unit 952 is provided to update a 3D semantic model with the semantic output of the segmentation output unit 952.).
Regarding claim 19, the method steps correspond to and are rejected similarly to the apparatus steps of claim 6 (see claim 6 above).
Regarding claim 20, the method steps correspond to and are rejected similarly to the apparatus steps of claim 7 (see claim 7 above).
Regarding claim 21, the method steps correspond to and are rejected similarly to the apparatus steps of claim 8 (see claim 8 above).
Claims 11 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Fleishman and Sato, as applied to claims 9 and 22 above, and further in view of Chandler et al. (Pub. No.: US 2022/0277515 A1), hereinafter Chandler.
Regarding claim 11, Xiong in view of Fleishman and Sato disclose everything claimed as applied above (see claim 9), however, Xiong in view of Fleishman and Sato fail to disclose wherein the combining is based on counters received from the first 3DR algorithm and weights received from the second 3DR algorithm.
Chandler discloses wherein the combining is based on counters received from the first 3DR algorithm and weights received from the second 3DR algorithm (Paragraphs 280-288 teach that an SDF representing an object is stored in a voxel tree format. There is one constituent data structure: the voxel. Each voxel contains the following data: a pointer to its parent voxel, a pointer to the first of its child voxels (all of which are stored contiguously, therefore only one pointer is needed), the distance from the voxel centre to the closest point on the object surface, denoted as α, a weight representing the confidence of this distance, a mean colour of the voxel and a variance of each channel of the RGB colour in each voxel, a count of the number of measurements which have been used to update voxel data, the coordinates of the voxel centre, the level of the voxel within the voxel tree.). Since Xiong in view of Fleishman and Sato teach the initial combining steps of surface representation values from different 3DR algorithms using a machine learning model and Chandler teaches different characteristics and features that can be utilized for surface representation values, such as weights and counters, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that that surface representation values used for the first 3DR algorithm would possess a counter feature and the second 3DR algorithm would possess a weight feature, so when combining the two together, the combination could then be based off of using a counter from a first 3DR algorithm and a weight from a second 3DR algorithm.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xiong in view of Fleishman and Sato to incorporate the teachings of Chandler so that the surface representation values could be combined based on the counters from one type of algorithm and the weights from a different type of algorithms, which would improve the overall accuracy and predictions from any used machine learning model.
Regarding claim 24, the method steps correspond to and are rejected similarly to the apparatus steps of claim 11 (see claim 11 above).
Response to Arguments
Applicant’s arguments with respect to independent claims 1, 9, 14 and 22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The prior art of Wang has been incorporated into the rejection of independent claims 1 and 14 and therefore teaches the newly amended claim language (see claims 1 and 14 above). Additionally, the prior art of Sato has been incorporated into the rejection of independent claims 9 and 22 and therefore teaches the newly amended claim language (see claims 9 and 22 above).
In regards to the additional arguments regarding the dependent claims 2-8, 10-13, 15-21 and 23-26, for the virtue of their dependency are moot because the independent claims are not allowable.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
McCormac et al. (U.S. Patent: #12,062,200 B2) teaches a method for applying an object recognition pipeline to frames of video data with the ability to count and keep track of voxel data and then update the data using updated TSDF values in 3D areas.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to George Renze whose telephone number is (703)756-5811. The examiner can normally be reached Monday-Friday 9:00am - 6:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/G.R./Examiner, Art Unit 2613
/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613