Last updated: April 19, 2026
Application No. 16/675,120
IMAGE ALIGNING NEURAL NETWORK

Non-Final OA §103
Filed
Nov 05, 2019
Examiner
TRAN, KIM THANH THI
Art Unit
2615
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
8 (Non-Final)
Interview Optional

— +24.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 367 resolved cases, 2023–2026
Examiner Intelligence

TRAN, KIM THANH THI View full profile →
Grants 77% — above average
Career Allow Rate
281 granted / 367 resolved
+14.6% vs TC avg
Strong +24% interview lift
Without
With
+24.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
12 currently pending
Career history
379
Total Applications
across all art units
Statute-Specific Performance

§101
9.7%
-30.3% vs TC avg
§103
65.1%
+25.1% vs TC avg
§102
15.3%
-24.7% vs TC avg
§112
3.3%
-36.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 367 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 05, 2026 has been entered.
 
  				Response to Arguments
Applicant’s arguments, see Remarks, filed January 05, 2026, with respect to rejections of claims 1, 3-52 under 35 U.S.C 103 have been fully considered and but not  persuasive.  Claims 1, 9, 15, 21, 27, 33 , 39 and 45-48 have been amended.  Claims 1, 3-52 are still pending.

Regarding claim 1:
Applicant’s argument:  Applicant argues one page 11 that fails to teach or suggest "align one or more point clouds using a statistical distribution representing a density of a plurality of points in the one or more point clouds generated by one or more neural networks from a plurality of images representing different views of an object; and generate a three- dimensional (3D) model of the object using the one or more point clouds that are aligned using the statistical distribution," as recited in claim 1. 
Examiner’s response:  Examiner respectfully disagree with the argument because Manivasagam discloses aligning multiple point clouds using a statistical distribution, and based on neural networks, which show 3D point clouds generated from LiDAR sweeps that are aligned to produce one set of 3D point cloud data (see FIG. 3A and par. [0091-0095]).
Regarding claims 9, 15, 21, 27, 33 , and 39 are similar with the argument of claim 1, therefore the arguments are not persuasive.

Claims 3-8, 45-52, 10-14, 16-20, 22-26, 28-32, 34-38 and 40-44 depend on either 1, 9, 15, 21, 27, 33 or 39, respectively.  For the same reason state above and the detail Office Action below.  Therefore, the argument is not persuasive.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be processor comprises an application specific integrated circuit negated by the manner in which the invention was made.

Claims 1, 3, 4-7, 9-10, 15-16, 21-22, 27-28, 33-34, 37, 39-40, 46 and 49-52 are rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20200301799 A1) in view of Di Febbo et al. (US 2018/0268256 A1, hereinafter Di Febbo).
Regarding claim 1.  Manivasagam discloses one or more processors, comprising, circuitry to: 
align one or more point clouds using a statistical distribution representing a density of a plurality of points in the one or more point clouds generated by one or more neural networks from a plurality of images representing different views of  an object (Manivasagam, see FIG. 3A, and pars. [0091-0095], The multiple LiDAR sweeps 204 can then be associated to a common coordinate system (e.g., referred to as map-relative frame) using, for example, offline Graph-SLAM with multi-sensory fusion (e.g., leveraging wheel-odometry, TMU, LiDAR and GPS). This provides centimeter level dense alignments of multiple LiDAR sweeps (e.g., shown as aligned frames 206). Without effective segmentation, the resulting maps will contain multiple instances of the same moving object.  [0092] Next, the aggregated LiDAR point cloud 206 from multiple drives can be converted into a surfel-based 3D mesh 208 of the scene (e.g., through voxel-based downsampling and normal estimation). In particular, in one example, all the points are bucketed into voxels (e.g., of size 4×4×4 cm.sup.3) and each occupied voxel returns exactly one point by averaging all the points inside it.  [0093] For each point, normal estimation can be conducted through principal components analysis over neighboring points. The surfel-based representation 208 can be used due to its simple construction, effective occlusion reasoning, and efficient collision checking. To be precise, in some implementations, each surfel can be generated from a single point.  [0094] Statistical outlier removal can be conducted to clean the road LiDAR mesh due to spurious points from incomplete dynamic object removal. For example, a point will be trimmed if its distance to its nearest neighbors is outside the global distance mean plus a standard deviation.  [0095] Since a majority of road points lie on the same xy-plane, a warped cartesian distance weighted heavily on the Z-dimension can be used to compute the nearest neighbors. A disk surfel can then be generated with the disk center to be the input point and disk orientation to be its normal direction.); and 
generate a three-dimensional (3D) model of the object using the one or more point clouds that are aligned using the statistical distribution (see pars [0091-0094 and [0138] In some implementations, the mesh (or other representations of virtual objects can also be generated from real-world LiDAR data. In one example, a process for generating a model of an object can include obtaining one or more sets of real-world LiDAR data physically collected by one or more LiDAR systems in a real-world environment. The one or more sets of real-world LiDAR data can respectively include one or more three-dimensional point clouds. The process can include defining a three-dimensional bounding box for an object included in the real-world environment; identifying points from the one or more three-dimensional point clouds that are included within the three-dimensional bounding box to generate a set of accumulated points).
Manivasagam does not disclose generate a three-dimensional (3D) model of the object using the one or more point clouds that are aligned using the statistical distribution.  However, 
Di Febbo discloses:
generate a three-dimensional (3D) model of the object using the one or more point clouds that are aligned using the statistical distribution (Di Febbo, see at least par. [0024]  and [0197], generate a 3-D model of an object in the scene).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have generate a three-dimensional (3D) model of the object using the one or more point clouds that are aligned using the statistical distribution, as taught by Di Febbo.  The modification provide an improved system and method for perform computer vision tasks using artificial intelligence and minimizing a distance between corresponding points in respective point clouds, thereby to reduced power consumption and reduced processing time in comparison to comparative keypoint detectors implemented using standard image processing techniques, thereby enabling real-time operation at a high frame rate (e.g., 60 frames per second) with a power consumption level appropriate for a mobile or handheld device and/or battery powered operation. (De Febbo, see par. [0008]).

As to claim 2,  Manivasagam in view of Di Febbo discloses claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein an image, of the plurality of images, comprises three-dimensional data indicative of three-dimensional locations on a surface of the object (Di Febbo, see at least par. [0183] Like any regular camera, an RGB-D camera records color images of the scene. In addition, an RGB-D camera it computes and records the distance of the closest surface element along the line of sight through each pixel. Proper calibration of the RGB-D camera allows the assignment of a three-dimensional coordinate (X, Y, Z) to each pixel (for example, by identifying matching features in the images along epipolar lines and computing a disparity map), precisely characterizing the location in 3-D of the surface element seen by each pixel).
Regarding claim 4.  Manivasagam in view of Di Febbo discloses claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the 3D model comprises a Gaussian mixture model (Di Febbo, see at least par. [0124], As shown in FIG. 6B, the resulting response function {circumflex over (r)} imposes a two-dimensional Gaussian over each keypoint (610, 612, and 614 of FIG. 6A), where the intensity of the color indicates the magnitude of the response function {circumflex over (r)} at a given point. The two-dimensional Gaussian simulates the smoothness of a typical response function of a CNN, where the constants A and σ model the amplitude and the smoothness of the Gaussian, which has the same shape for every keypoint.) wherein parameters for the Gaussian mixture model are generated based at least in part on alignment of the plurality of images of the object, the alignment based at least in part on a registration transform generated from the Gaussian mixture model (Di Febbo, see par. [0197] Some embodiments of the present invention are directed to using a CNN keypoint detector in accordance with embodiments of the present invention to accelerate the process of merging point clouds. FIG. 13 is a flowchart of a method for generating a three-dimensional (3-D) model of an object by aligning captured point clouds according to one embodiment of the present invention. Referring to FIG. 13, in operation 1302, a next frame (e.g., 3-D point cloud) of a scene is captured by a range camera and supplied to the CNN keypoint detector 308 (which may be configured with learned parameters computed by the training system 320). The CNN keypoint detector 308 computes keypoints 1304, and the keypoints ae supplied to a computer vision system (e.g., a computer system including a processor and memory) configured to perform the ICP computer vision task 1306. In particular, the computer vision system may be used to attempt to establish correspondences between the keypoints detected in the image with keypoints detected in previous image frames, and the point clouds can be merged by performing a rigid transformation on at least one of the point clouds to align the identified corresponding keypoints. The result of the ICP is the stitched 3-D point cloud 1308 (e.g., the combination of the newly captured frame with one or more previously captured frames). If scanning is to continue, another frame is captured by the range camera in operation 1302. If scanning is complete, then the merged point clouds can be rendered 1310 to generate a 3-D model 1312, which can be displayed (e.g., on a display device).)

As to claim 5.  Manivasagam in view of Di Febbo discloses claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the registration transform is generated to be in a closed form enabling back-propagation of a registration error (Di Febbo, see at least par. [0132] In operation 540, the training system 320 trains a convolutional neural network using the first training set, where the sampled labels from the response space correspond to the target output of the neural network and the patches correspond to the input that generates the target output. A metric is defined to compare the output of the system against the output of the desired keypoint detector, producing a distance value. The parameters can then be tuned to minimize the sum of estimated error distances (magnitude of training keypoint vectors vs. estimated keypoint vectors generated by the neural network) over all training images. Standard minimization algorithms (e.g., stochastic gradient descent and backpropagation) can be used for this purpose.).

As to claim 6.  Manivasagam in view of Di Febbo discloses claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the registration transform maps points in the plurality of images to a common coordinate system (Manivasagam, see at least par [0135] The computing system can associate the plurality of sets of real-world LiDAR data to a common coordinate system to generate an aggregate LiDAR point cloud. For example, each set of LiDAR data can be transitioned from respective vehicle coordinate system to a common coordinate system based on a respective pose (e.g., location and orientation) of the vehicle at the time of data collection.).

Regarding claim 7.  Manivasagam in view of Di Febbo discloses claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the one or more neural networks encode a geometry of the object (Di Febbo, see at least par. [0189] One by-product of computing SfM is the estimate of a “sparse” description of the scene three-dimensional geometry (in correspondence to tracked features). This estimate could be used also to initialize or complement other three-dimensional geometry estimation techniques, such as in dense stereoscopy.).
Regarding claims 9, 15, 21, 27, 33 and 39 perform the same steps of claim 1. Therefore, claims 9, 15, 21, 27, 33 and 39 are further rejected based on the same rationale as claim 1 set forth above and incorporated herein.

	As to claim 37.  Di Febbo in view of Manivasagam discloses claim 33 (as rejected above) and Di Febbo in view of Manivasagam further discloses wherein the one or more neural networks are trained based at least in part on back-propagation of a registration error (Di Febbo, see at least par. [0132] In operation 540, the training system 320 trains a convolutional neural network using the first training set, where the sampled labels from the response space correspond to the target output of the neural network and the patches correspond to the input that generates the target output. A metric is defined to compare the output of the system against the output of the desired keypoint detector, producing a distance value. The parameters can then be tuned to minimize the sum of estimated error distances (magnitude of training keypoint vectors vs. estimated keypoint vectors generated by the neural network) over all training images. Standard minimization algorithms (e.g., stochastic gradient descent and backpropagation) can be used for this purpose.).

	Regarding claim 46,  Di Febbo in view of Manivasagam discloses the one or more processors of  claim 1 (as rejected above) and Di Febbo in view of Manivasagam further discloses wherein the circuitry is to use the three-dimensional model to align images of the object (Di Febbo, see at least par. [0197] Some embodiments of the present invention are directed to using a CNN keypoint detector in accordance with embodiments of the present invention to accelerate the process of merging point clouds. FIG. 13 is a flowchart of a method for generating a three-dimensional (3-D) model of an object by aligning captured point clouds according to one embodiment of the present invention. Referring to FIG. 13, in operation 1302, a next frame (e.g., 3-D point cloud) of a scene is captured by a range camera and supplied to the CNN keypoint detector 308 (which may be configured with learned parameters computed by the training system 320).)

Regarding claim 49.  Manivasagam in view of Di Febbo discloses the one or more processors of claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the circuitry is to further use the 3D model to visualize the object (Manivasagam, see at least par. [0099] As one example, FIG. 3B shows one example visualization of the building of a model of an object from LiDAR data).  

Regarding claim 50.  Manivasagam in view of Di Febbo discloses the one or more processors of claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the circuitry is to further use the 3D model to visualize the object based, at least in part, on one or more points that are excluded from one or more visible points associated with the object (Manivasagam, see at least par. [0099] As one example, FIG. 3B shows one example visualization of the building of a model of an object from LiDAR data. Specifically, from left to right, FIG. 3B shows an individual sweep; an accumulated point cloud; symmetry completion and trimming; and outlier removal and surfel meshing. These steps can be performed as follows.  [0100] A large-scale collection of dynamic objects can be built using real-world LiDAR data (e.g., data collected from a self-driving fleet). It is difficult to build full 3D mesh representations from sparse LiDAR scans due to the motion of objects and the partial observations captured by the LiDAR due to occlusion. Naively accumulating point clouds will produce a trajectory of point clouds for each dynamic object. Automatic algorithms such as ICP or LiDAR flow do not work well enough to produce the quality necessary for simulation.).  

Regarding claim 51. Manivasagam in view of Di Febbo discloses the one or more processors of claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the circuitry is further to use the 3D model to visualize the object in an environment represented by the plurality of images (Manivasagam, see at least par. [0087] Referring to FIG. 2, a scenario can be generated which includes a virtual object (e.g., an autonomous vehicle featuring a LiDAR data collection system) included in an environment optionally along with one or more additional dynamic objects. In particular, the environment can be described by a three-dimensional map (e.g., generated according to process shown in FIG. 3A). A trajectory of the virtual object through the environment can be described by a six degree of freedom (DOF) pose (e.g., as contained within a generated scenario). The one or more additional (potentially dynamic) objects can be selected from an object bank (e.g., which can be generated as described with reference to FIGS. 3B and 3C).).  

Regarding claim 52. Manivasagam in view of Di Febbo discloses the one or more processors of claim 1 (as rejected above) and Manivasagam in view of Di Febbo further discloses wherein the circuitry is further to use the 3D model to visualize the object along a path in an environment represented by the plurality of images (Manivasagam, see at least par. [0175] The prediction system 860 can be configured to predict a motion of the object(s) within the surrounding environment of the vehicle 805. For instance, the prediction system 860 can generate prediction data 875 associated with such object(s). The prediction data 875 can be indicative of one or more predicted future locations of each respective object. For example, the prediction system 860 can determine a predicted motion trajectory along which a respective object is predicted to travel over time. A predicted motion trajectory can be indicative of a path that the object is predicted to traverse and an associated timing with which the object is predicted to travel along the path. The predicted path can include and/or be made up of a plurality of way points. In some implementations, the prediction data 875 can be indicative of the speed and/or acceleration at which the respective object is predicted to travel along its associated predicted motion trajectory. The prediction system 860 can output the prediction data 875 (e.g., indicative of one or more of the predicted motion trajectories) to the motion planning system 865.

Regarding claims 10, 16, 22, 28, 34 and 40, performs the same steps of claim 2. Therefore, 10, 16, 22, 28, 34 and 40 are further rejected based on the same rationale as claim 2 set forth above and incorporated herein.
Claims 8, 11, 35, 41 and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20200301799 A1) in view of Di Febbo et al. (US 2018/0268256 A1, hereinafter Di Febbo), as applied claims 1, 9, 15, 21, 33 and 39 above and further in view of Jiang et al. (US 20210103776 A1, hereinafter Jiang).
Regarding claim 8.  Manivasagam in view of Di Febbo discloses one or more processors of claim 1 (as rejected above), but Manivasagam in view of Di Febbo does not disclose wherein the plurality of images comprise one or more labelled points corresponding to locations on an occluded surface of the object.  However, 
Jiang discloses:
wherein the plurality of images comprise one or more labelled points corresponding to locations on an occluded surface of the object (Jiang, see at least pars. [0089], the computer executable components can further comprise an occlusion mapping component configured to determine a relative position of the graphical data object to another object included in the representation of the object or environment based on the current perspective and the 3D data. In this regard, based on a determination that the relative position of the graphical data object is behind the other object, the integration component can be configured to occlude at least a portion of the graphical data object located behind the other object in association with integrating the graphical data object on or within the representation of the object or environment). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have wherein the plurality of images comprise one or more labelled points corresponding to locations on an occluded surface of the object, as taught by Jiang.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide a convolutional neural network system, and more particularly, to a method in which to accurate ground-truth 2D-3D correspondences and  reduces noise and provides the more accurate ground-truth 2D-3D correspondences for training (Jiang, see par. [0093]).

Regarding claim 11.  Manivasagam in view of Di Febbo discloses one or more processors of claim 9 (as rejected above), but Manivasagam in view of Di Febbo does not disclose wherein the 3D model is a probabilistic model.  However, 
Jiang discloses:
wherein the 3D model is a probabilistic model (Jiang, see at least par. [0110] The output of the RoIAlign Layer 214B is fed into a fully convolutional network (FCN) 214C for object (or background) classification and 2D object surface coordinates 606 (e.g., estimated 2D object surface coordinates of each pixel). Classification of each pixel in the object bounding box 502 may be accomplished using, for example, the SVM-based method described above. As a result of classification, a set of K+1 probabilities is determined that indicates the probability of a pixel belonging to k-object parts or the non-object background (non-object part). That is, the pixels associated with each object part or non-object background are classified according to their probabilities, such as a classification score.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have wherein the 3D model is a probabilistic model, as taught by Jiang.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide a convolutional neural network system, and more particularly, to a method in which to accurate ground-truth 2D-3D correspondences and  reduces noise and provides the more accurate ground-truth 2D-3D correspondences for training (Jiang, see par. [0093]).

Regarding claims 35 and 41, performs the same steps of claim 8. Therefore, claims 22 and 34 are further rejected based on the same rationale as claim 8 set forth above and incorporated herein.

Regarding claim 45.  Manivasagam in view of Di Febbo discloses one or more processors of claim 1 (as rejected above), but Manivasagam in view of Di Febbo does not disclose wherein the one or more neural networks are trained based, at least in part, on differences between the three-dimensional model and the object.  However, Jiang discloses:
wherein the one or more neural networks are trained based, at least in part, on differences between the three-dimensional model and the object (Jiang, see at least par. [0127] Training the deep neural network begins at step 702, where training datasets include a set of two-dimensional (2D) images of the object from multiple views are received. The set of 2D images may be captured in different settings (e.g., different angles, different light settings, different environments, etc.) for each of the training datasets. A set of 3D models from the set of 2D images in each of the training datasets are reconstructed based on salient points of the object selected during reconstruction at 704. From the reconstructed 3D models, salient 3D models of the object may be generated that are an aggregation of the salient points of the object in the set of 3D models. At 706, a set of training 2D-3D correspondence data are generated between the set of 2D images of the object in a first training dataset and the salient 3D model of the object generated using the first training dataset. Using the set of training 2D-3D correspondence data generated using the first training dataset, a deep neural network is trained at 708 for object detection and segmentation).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have wherein the one or more neural networks are trained based, at least in part, on differences between the three-dimensional model and the object, as taught by Jiang, thereby to provide a convolutional neural network system, and more particularly, to a method in which to accurate ground-truth 2D-3D correspondences and  reduces noise and provides the more accurate ground-truth 2D-3D correspondences for training (Jiang, see par. [0093]). 

Claims 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20180268256 A1) in view of De Febbo et al. (US 20200301799 A1), further in view of Jiang et al. (US 20210103776 A1), as applied claim 11 above, and further in view of HomChaudhurit et al. (US 20180150125 A1, hereinafter “HomChaudhuri”).
Regarding claim 12.  Manivasagam in view of Di Febbo and further in view of Jiang discloses claim 11 (as rejected above), but Manivasagam in view of Di Febbo and further in view of Jiang does not disclose “wherein the probabilistic model is computed based at least in part on a weight matrix output by the one or more neural networks”.  However, 
HomChaudhuri discloses:
wherein the probabilistic model is computed based at least in part on a weight matrix output by the one or more neural networks (HomChaudhuri, see at least par. [0139] The model training module 744 can train the parameters (e.g., weight matrices of the hidden layers) of the neural network to predict the expected outputs from the inputs. Once trained, the trained parameters 745 represent encodings of the sequential patterns of the pages in the set of possible pages and can be stored as transition probabilities model 746; probabilities can be generated by passing the current window of page usages 743 through the trained model. Alternatively, a set of probabilities can be generated for some or all possible input pages using the trained model, and these pre-generated probabilities (or a subset thereof satisfying probability threshold requirements) can be stored as a graph or database for the transition probabilities model 746.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the Manivasagam disclosed invention, and have “wherein the probabilistic model is computed based at least in part on a weight matrix output by the one or more neural networks”, as taught by HomChaudhuri.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide provides a convolutional neural network system, and more particularly, to a method in which provides a holistic new architecture for saving memory power that can cut down leakage by 50% and also adapt dynamically (HomChaudhuri, see par. [0009]).

Regarding claim 13.  Manivasagam in view of Di Febbo and further in view of Jiang discloses claim 11 (as rejected above), but Manivasagam in view of Di Febbo and further in view of Jiang does not discloses wherein a registration transform is computed based at least in part on the probabilistic model.  However, 
HomChaudhuri discloses:
wherein a registration transform is computed based at least in part on the probabilistic model (HomChaudhuri, see at least pars. [0346] identifying, using a machine learning transition probabilities model, a predicted page predicted to be requested within a threshold amount of time after the page, [0347] fetching the page and prefetching the predicted page together from the host memory, and [0348] storing the page and the predicted page in the embedded memory. [0349] 72. The method of Clause 71, further comprising generating the machine learning transition probabilities model as a set of page transition probabilities, at least one of the page transition probabilities of the set of page transition probabilities representing a probability of requesting the predicted page within the threshold amount of time after the first page. [0350] 73. The method of Clause 72, wherein the threshold amount of time is one millisecond, the method further comprising identifying the predicted page based on the probability being at least 90%.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have wherein a registration transform is computed based at least in part on the probabilistic model, as taught by HomChaudhuri.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide provides a convolutional neural network system, and more particularly, to a method in which provides a holistic new architecture for saving memory power that can cut down leakage by 50% and also adapt dynamically (HomChaudhuri, see par. [0009]).

Regarding claim 14.  Manivasagam in view of Di Febbo, further in view of Jiang and further in view of HomChaudhuri discloses claim 11 (as rejected above),  and Di Febbo in view of Manivasagam and further in view of Jiang does not disclose, and further in view of HomChaudhuri further discloses wherein a registration error is back-propagated to the one or more neural networks during training (Di Febbo, see at least par. [0132] In operation 540, the training system 320 trains a convolutional neural network using the first training set, where the sampled labels from the response space correspond to the target output of the neural network and the patches correspond to the input that generates the target output. A metric is defined to compare the output of the system against the output of the desired keypoint detector, producing a distance value. The parameters can then be tuned to minimize the sum of estimated error distances (magnitude of training keypoint vectors vs. estimated keypoint vectors generated by the neural network) over all training images. Standard minimization algorithms (e.g., stochastic gradient descent and backpropagation) can be used for this purpose.).
Claim 17, 19-20, 23, 25-26, 29, 36 and 42  are rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20200301799 A1) in view of Di Febbo et al. (US 20180268256 A1), as applied claims 15, 21, 33 and 39 above, and further in view of Lucas et al. (US 20190088004 A1).
Regarding claim 17.  Manivasagam in view of Di Febbo discloses claim 15 (as rejected above), but Di Manivasagam in view of Di Febbo does not disclose cause the one or more processors to at least: align the plurality of images based, at least in part, on a Gaussian mixture model.  However, 
Lucas discloses:
cause the one or more processors to at least: align the plurality of images based, at least in part, on a Gaussian mixture model (Lucas, see at least par. 0046] The space carving methods, however, usually require very accurate 2D segmentation masks that are hard to produce with automated techniques (e.g. graph cuts, motion segmentation, convolutional neural networks (CNNs), Gaussian mixture models (GMMs), and so forth). Thus, the space-carving methods produce artifacts in the presence of imperfect segmentation masks. Also, the space carving algorithms do not naturally enforce re-projections of the original images onto a 3D model that is necessarily in photometrically consistent; or in other words, in alignment with stereo 3D reconstruction methods that match image features from different camera perspectives to form a 3D point cloud.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Di Febbo, and have cause the one or more processors to at least: align the plurality of images based, at least in part, on a Gaussian mixture model, as taught by Lucas.  The modification provide an improved system and method for perform computer vision tasks using artificial intelligence and minimizing a distance between corresponding points in respective point clouds, thereby to be able to capture all sides of an object in the scene. These images can then be used to generate depth maps, and in turn, point clouds that may be used to form 3D geometric or semantic models that accurately locate objects in a 3D space of the scene (Lucas, see par. [0001]).

Regarding to claim 19.  Manivasagam in view of Di Febbo and further in view of Lucas discloses claim 17 (as rejected above) and Manivasagam in view of Di Febbo and further in view of Lucas further discloses wherein a registration transform is computed based at least in part on the Gaussian mixture model (Lucas, see par. [0071] First, process 400 may include “perform chroma keying-based segmentation” 408, which involves chroma-keying foreground and background colors. This may include “separate background versus foreground colors” 410, “classify regions” 412 which may be performed by constructing a Gaussian Mixture Model to classify regions, and then “label high confidence regions” 414 to assign non-changing labels to regions where a high confidence exists as to being either foreground or background based on the color. See Gupta, L., et al., “A Gaussian-mixture-based image segmentation algorithm”, Pattern Recognition, Vol. 31.3, pp. 315-325 (1998); and Matsuyama, T, et al., “Multi-camera systems for 3d video production”, 3D Video and Its Applications, pp. 17-44, Springer, London (2012). Referring to FIG. 5, an image 500 shows results of a chroma-key tool for labeling foreground and background regions where the background is all one color such as red while the foreground is formed of other colors.).

Regarding claim 20.  Manivasagam in view of Di Febbo and further in view of Lucas discloses claim 19 (as rejected above) and Manivasagam in view of Di Febbo and further in view of Lucas further discloses wherein a registration error is back-propagated to the one or more neural networks during training (Manivasagam, see at least par. [0079] In some implementations, the machine learning computing system 130 and/or the LiDAR synthesis computing system 102 can train the machine-learned models 110 and/or 140 through use of a model trainer 160. The model trainer 160 can train the machine-learned models 110 and/or 140 using one or more training or learning algorithms. One example training technique is backwards propagation of errors).

As to claims 23, 29, 36 and 42 perform the same steps of claim 17. Therefore, claims 23, 36 and 42 are further rejected based on the same rationale as claim 17 set forth above and incorporated herein.

As to claim 25, the car of claim 25 performs the same steps of claim 19. Therefore, claim 25 is further rejected based on the same rationale as claim 19 set forth above and incorporated herein.

As to claim 26.  Manivasagam in view of Di Febbo and further in view of Lucas discloses claim 25 (as rejected above) and Manivasagam in view of Di Febbo and further in view of Lucas further discloses wherein a registration error is back-propagated through the one or more neural networks during training (Manivasagam, see at least par. [0079] In some implementations, the machine learning computing system 130 and/or the LiDAR synthesis computing system 102 can train the machine-learned models 110 and/or 140 through use of a model trainer 160. The model trainer 160 can train the machine-learned models 110 and/or 140 using one or more training or learning algorithms. One example training technique is backwards propagation of errors).”).

Claims 18 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20200301799 A1) in view of Di Febbo et al. (US 20180268256 A1), further in view of Lucas et al. (US 20190088004 A1), as applied claims 17 and 23 above and further in view of KOMATSU et al. (US 20200152171 A1, hereinafter “KOMATSU”).
Regarding claim 18.  Manivasagam in view of Di Febbo and further in view of Lucus discloses claim 17 (as rejected above), but Di Febbo in view of Manivasagam and further in view of Lucus does not discloses “wherein the Gaussian mixture model is computed based at least in part on a weight matrix output by the one or more neural networks”.  However, 
KOMATSU teaches 
wherein the Gaussian mixture model is computed based at least in part on a weight matrix output by the one or more neural networks (KOMATSU, see at least par. [0061], “The detection unit 204 receives, as input, weights transmitted, for example, as a weight matrix H from the analysis unit 103 …  The detection unit 204 may detect which object signal source exists in each time frame of Y by using a discriminator using a value of each element of H as a feature value. As a training model of a discriminator, for example, a support vector machine (SVM) or a Gaussian mixture model (GMM) is applicable”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Di Febbo, and have “wherein the Gaussian mixture model is computed based at least in part on a weight matrix output by the one or more neural networks”, as taught by KOMATSU.   The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide a convolutional neural network system, and more particularly, to a method in which provide a signal processing technique capable of acquiring information of an object signal component that is modeled at a low memory cost even when a variation of object signals is large, as discuss by KOMATSU, (see par. [0007]).

Regarding claim 24, performs the same steps of claim 18. Therefore, claim 24 is further rejected based on the same rationale as claim 18 set forth above and incorporated herein.

Claim 30-32 are rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20200301799 A1) in view of Di Febbo et al. (US 20180268256 A1), further in view of Lucas et al. (US 20190088004 A1), as applied claim 29 above, and further in view of KOMATSU et al. (US 20200152171 A1, hereinafter “KOMATSU”).
As to claim 30.  Manivasagam in view of Di Febbo, further in view of Lucas discloses claim 29 (as rejected above), but Manivasagam in view of Di Febbo, further in view of Lucas does not disclose “wherein the Gaussian mixture model is computed based at least in part on a weight matrix output by the one or more neural networks”.  However, 
KOMATSU teaches:
wherein the Gaussian mixture model is computed based at least in part on a weight matrix output by the one or more neural networks (KOMATSU, see at least par. [0061], “The detection unit 204 receives, as input, weights transmitted, for example, as a weight matrix H from the analysis unit 103 …  The detection unit 204 may detect which object signal source exists in each time frame of Y by using a discriminator using a value of each element of H as a feature value. As a training model of a discriminator, for example, a support vector machine (SVM) or a Gaussian mixture model (GMM) is applicable”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have, and have “wherein the Gaussian mixture model is computed based at least in part on a weight matrix output by the one or more neural networks”, as taught by KOMATSU.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide a convolutional neural network system, and more particularly, to a method in which provide a signal processing technique capable of acquiring information of an object signal component that is modeled at a low memory cost even when a variation of object signals is large, as discuss by KOMATSU, (see par. [0007]).

Regarding claim 31.  Manivasagam in view of Di Febbo, further in view of Lucas and further in view of KOMATSU discloses claim 30 (as rejected above) and Manivasagam in view of Di Febbo, further in view of Lucas and further in view of KOMATSU further discloses wherein a registration transform is computed based at least in part on the Gaussian mixture model (Lucas, see par. [0071] First, process 400 may include “perform chroma keying-based segmentation” 408, which involves chroma-keying foreground and background colors. This may include “separate background versus foreground colors” 410, “classify regions” 412 which may be performed by constructing a Gaussian Mixture Model to classify regions, and then “label high confidence regions” 414 to assign non-changing labels to regions where a high confidence exists as to being either foreground or background based on the color. See Gupta, L., et al., “A Gaussian-mixture-based image segmentation algorithm”, Pattern Recognition, Vol. 31.3, pp. 315-325 (1998); and Matsuyama, T, et al., “Multi-camera systems for 3d video production”, 3D Video and Its Applications, pp. 17-44, Springer, London (2012). Referring to FIG. 5, an image 500 shows results of a chroma-key tool for labeling foreground and background regions where the background is all one color such as red while the foreground is formed of other colors.).

Regarding claim 32.  Manivasagam in view of Di Febbo, further in view of Lucas and further in view of KOMATSU discloses claim 31 (as rejected above) and Manivasagam in view of Di Febbo, further in view of Lucas and further in view of KOMATSU further discloses further discloses wherein a registration error is back-propagated through the one or more neural networks during training (Manivasagam, see at least par. [0079] In some implementations, the machine learning computing system 130 and/or the LiDAR synthesis computing system 102 can train the machine-learned models 110 and/or 140 through use of a model trainer 160. The model trainer 160 can train the machine-learned models 110 and/or 140 using one or more training or learning algorithms. One example training technique is backwards propagation of errors).

	Claims 38 and 43 are rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20200301799 A1) in view of Di Febbo et al. (US 20180268256 A1), as applied claim 33 and 39 above, and further in view of Lombardi et al. (US 20190213772 A1).
Regarding claim 38.  Manivasagam in view of Di Febbo discloses the system of claim 33 (as rejected above), but Manivasagam in view of Di Febbo does not disclose “wherein the one or more neural networks are trained to comprise a latent encoding of a geometry of the object”.  However, 
Lombardi discloses
wherein the one or more neural networks are trained to comprise a latent encoding of a geometry of the object (Lombardi, see at least par. [0040], “The encoding module 108 may be configured to receive and jointly encode the texture information (e.g., the view-independent texture map) and the geometry information to provide a latent vector z. In certain embodiments, the building encoding module 108 may be configured to learn to compress the joint variation of texture and geometry into a latent encoding”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have “wherein a registration transform is computed based at least in part on the Gaussian mixture model”, as taught by Lombardi.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide a convolutional neural network system, and more particularly, to a method in which provides a latent vector. The autoencoder may further be configured to infer, using the latent vector, an inferred geometry of the subject for a predicted viewpoint, and an inferred view-dependent texture of the subject for the predicted viewpoint. The rendering module may be configured to render a reconstructed image of the subject for the predicted viewpoint using the inferred geometry and the inferred view-dependent texture, as discussed by Lombardi, (see par. [0004]).

	As to claim 43, is rejected for the same rationale of claim 38.

	Claim 44 is rejected under 35 U.S.C. 103 as being unpatentable over Manivasagam et al. (US 20200301799 A1) in view of Di Febbo et al. (US 20180268256 A1), further in view of Lombardi et al. (US 20190213772 A1), as applied claim 43 above, and further in view of GHAFFARZADEGAN et al. (US 20210042583 A1).
As to claim 44.  Manivasagam in view of Di Febbo and further in view of  Lombardi discloses the system of claim 43 (as rejected above), but Manivasagam in view of Di Febbo and further in view of  Lombardi does not disclose “wherein the one or more neural networks are trained to perform a computer vision task based at least in part on the latent encoding”.  However,
GHAFFARZADEGAN discloses: 
wherein the one or more neural networks are trained to perform a computer vision task based at least in part on the latent encoding (GHAFFARZADEGAN, see at least pars. [0045], [0050] “This process may continue and at each step the decoder 307 output and the residual transform the image according to the learned latent encoding…  ResNets have achieved the state-of-the-art performance in various computer vision benchmarks”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have wherein the one or more neural networks are trained to perform a computer vision task based at least in part on the latent encoding, as taught by GHAFFARZADEGAN.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to provide a convolutional neural network system, and more particularly, to a method in which generate a sequential reconstruction of the input data utilizing the decoder and at least the first latent variable, obtain a residual between the input data and the reconstruction utilizing a comparison of at least the first latent variable, and output a final reconstruction of the input data utilizing a plurality of residuals from a plurality of sequences, as discussed by GHAFFARZADEGAN, (see par. [0005]).

Claims 47-48 are rejected under 35 U.S.C. 103 as being unpatentable over Di Febbo et al. (US 20180268256 A1, hereinafter Di Febbo) in view of Manivasagam et al. (US 20200301799 A1), as applied claim 1 above, and further in view of Beck et al. (US 11080855 B1).
	As to claim 47, Manivasagam in view of Di Febbo discloses the one or more processors of claim 1 (as rejected above), but Manivasagam in view of Di Febbo does not disclose wherein the three- dimensional model generated by the one or more neural networks comprises parameters of a statistical model usable to generate a transform to align images of the object.  However, 
Beck discloses:
wherein the three- dimensional model generated by the one or more neural networks comprises parameters of a statistical model usable to generate a transform to align images of the object (Beck, see at least retrieving a trained statistical model from at least one storage device, wherein: the trained statistical model is trained on a set of training patches and a corresponding set of patch annotations derived from at least one annotated pathology image of the plurality of annotated pathology images, each training patch of the set of training patches includes one or more values obtained from a subset of pixels in the at least one annotated pathology image and is associated with a corresponding patch annotation determined based on an annotation associated with the subset of pixels, and the trained statistical model comprises a convolutional neural network including a plurality of layers, wherein at least one layer of the plurality of layers is aligned such that (N−K)/S is an integer, wherein N represents a size of each input dimension of the at least one layer, wherein K represents a size of a convolution filter of the at least one layer, and wherein S represents a size of a stride of the at least one layer; defining a set of patches from the pathology image, wherein the pathology image is different from the plurality of annotated pathology images, wherein each patch of the set of patches includes a subset of pixels from the pathology image; processing, using the trained statistical model, the set of patches to predict one or more annotations for each patch of the set of patches; and storing the predicted one or more annotations for each patch of the set of patches on the at least one storage device.). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the Manivasagam disclosed invention, and have wherein the three- dimensional model generated by the one or more neural networks comprises parameters of a statistical model usable to generate a transform to align images of the object, as taught by Beck.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to improve the performance of the model. In some embodiments, the statistical model may be used to assign annotations to one or more portions of a pathology image. Inaccurate annotations assigned by processing the pathology image using the statistical model may be identified, e.g., by a pathologist, and corresponding subsets of training data may be provided to the statistical model for retraining, so that the performance of the statistical model may be improved. (Beck, see col. 2, lines 7-16).
Regarding claim 48.  Manivasagam in view of Di Febbo discloses the one or more processors of claim 1 (as rejected above), but Manivasagam in view of Di Febbo does not disclose wherein the three- dimensional model comprises parameters of a statistical model generated by the one or more neural networks.  However, Beck discloses:
wherein the three- dimensional model comprises parameters of a statistical model generated by the one or more neural networks (Beck, see col. 2, lines 25-37, once a statistical model is trained to predict tissue characteristic categories for a pathology image, the pathology image may be fully annotated by processing the image using the statistical model. The fully annotated pathology image may be analyzed to determine values for one or more features. These feature values and corresponding patient prognostic information may be provided as input training data to a model (e.g., a random forest, a support vector machine, regression, a neural network, or another suitable model) to the model to predict prognostic information, such as patient survival time, from sample data for a patient.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the method and system of Manivasagam, and have wherein the three- dimensional model comprises parameters of a statistical model generated by the one or more neural networks, as taught by Beck.  The modification provide an improved system and method for performing computer vision tasks using artificial intelligence, thereby to improve the performance of the model. In some embodiments, the statistical model may be used to assign annotations to one or more portions of a pathology image. Inaccurate annotations assigned by processing the pathology image using the statistical model may be identified, e.g., by a pathologist, and corresponding subsets of training data may be provided to the statistical model for retraining, so that the performance of the statistical model may be improved. (Beck, see col. 2, lines 7-16).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KIM THANH THI TRAN whose telephone number is (571)270-1408. The examiner can normally be reached Monday-Friday 8:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALICIA HARRINGTON can be reached on 5712722330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/KIM THANH T TRAN/Examiner, Art Unit 2615                                                                                                                                                                                                        

/JAMES A THOMPSON/Primary Examiner, Art Unit 2615
Read full office action
Prosecution Timeline

Nov 05, 2019
Application Filed
Mar 27, 2021
Non-Final Rejection — §103
Aug 02, 2021
Interview Requested
Aug 10, 2021
Applicant Interview (Telephonic)
Aug 11, 2021
Examiner Interview Summary
Sep 01, 2021
Response Filed
Dec 12, 2021
Final Rejection — §103
May 26, 2022
Interview Requested
Jun 06, 2022
Applicant Interview (Telephonic)
Jun 06, 2022
Examiner Interview Summary
Jun 09, 2022
Request for Continued Examination
Jun 10, 2022
Response after Non-Final Action
Jun 17, 2022
Non-Final Rejection — §103
Nov 10, 2022
Interview Requested
Nov 21, 2022
Applicant Interview (Telephonic)
Nov 21, 2022
Examiner Interview Summary
Dec 22, 2022
Notice of Allowance
Apr 21, 2023
Applicant Interview (Telephonic)
Apr 27, 2023
Examiner Interview Summary
Jun 26, 2023
Request for Continued Examination
Jun 29, 2023
Response after Non-Final Action
Aug 29, 2023
Non-Final Rejection — §103
Dec 08, 2023
Applicant Interview (Telephonic)
Dec 12, 2023
Examiner Interview Summary
Mar 06, 2024
Response Filed
Jun 20, 2024
Final Rejection — §103
Jul 03, 2024
Applicant Interview (Telephonic)
Jul 03, 2024
Examiner Interview Summary
Dec 24, 2024
Notice of Allowance
Apr 02, 2025
Examiner Interview Summary
Apr 02, 2025
Applicant Interview (Telephonic)
Apr 03, 2025
Request for Continued Examination
Apr 04, 2025
Response after Non-Final Action
Apr 11, 2025
Non-Final Rejection — §103
May 02, 2025
Interview Requested
May 13, 2025
Applicant Interview (Telephonic)
May 13, 2025
Examiner Interview Summary
Jul 16, 2025
Response Filed
Nov 05, 2025
Final Rejection — §103
Nov 18, 2025
Interview Requested
Dec 23, 2025
Response after Non-Final Action
Jan 05, 2026
Request for Continued Examination
Jan 07, 2026
Response after Non-Final Action
Jan 09, 2026
Non-Final Rejection — §103
Feb 26, 2026
Interview Requested
Mar 05, 2026
Applicant Interview (Telephonic)
Mar 14, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/347,299
Patent 12578912
CHIP AND RELATED ELECTRONIC DEVICE
2y 5m to grant Granted Mar 17, 2026
18/305,904
Patent 12572997
GRAPHICS PROCESSING UNIT PROCESSING AND CACHING IMPROVEMENTS
2y 5m to grant Granted Mar 10, 2026
18/134,086
Patent 12567124
Technologies for Improved Whole Slide Imaging
2y 5m to grant Granted Mar 03, 2026
18/653,576
Patent 12561887
Mapping Texture Point Samples to Lanes of a Filter Pipeline
2y 5m to grant Granted Feb 24, 2026
19/011,351
Patent 12548277
SYSTEMS AND METHODS FOR USING AUGMENTED REALITY TO PREVIEW FALSE EYELASHES
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

8-9
Expected OA Rounds
77%
Grant Probability
99%
With Interview (+24.1%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 367 resolved cases by this examiner. Grant probability derived from career allow rate.
IMAGE ALIGNING NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email