Last updated: May 29, 2026
Application No. 18/547,628
MULTIRESOLUTION DEEP IMPLICIT FUNCTIONS FOR THREE-DIMENSIONAL SHAPE REPRESENTATION

Non-Final OA §103
Filed
Aug 23, 2023
Priority
Apr 30, 2021 — nonprovisional of PCTCN2021091706
Examiner
SALVUCCI, MATTHEW D
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
3 (Non-Final)
Interview Optional

— +28.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 72% grant rate with +28.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 487 resolved cases, 2023–2026
Examiner Intelligence

SALVUCCI, MATTHEW D View full profile →
Grants 72% — above average
Career Allowance Rate
350 granted / 487 resolved
+9.9% vs TC avg
Strong +28% interview lift
Without
With
+28.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
17 currently pending
Career history
504
Total Applications
across all art units
Statute-Specific Performance

§101
1.2%
-38.8% vs TC avg
§103
88.4%
+48.4% vs TC avg
§102
7.8%
-32.2% vs TC avg
§112
2.1%
-37.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 487 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 17 February 2026 has been entered.

Status of Claims
Applicant's amendments filed on 6 February 2026 have been entered.  Claims 1, 6, and 11-20 have been amended.  Claim 10 has been canceled.  No claims have been added.  Claims 1-9 and 11-20 are still pending in this application, with claims 1, 19, and 20 being independent.

Response to Arguments
Applicant's arguments filed 6 February 2026 have been fully considered but they are not persuasive. 
Applicant argues that the claims are allowable for incorporating subject matter from previously objected to claim 10 into independent claim 1. However, Examiner notes that the subject matter of claim 10, in its entirety, is not included in claim 1. Thus, the scope of claim 1 has changed in a way that had not been previously examined and is rejected as outlined below.
Examiner further notes that an examiner amendment was proposed but not agreed upon.

Allowable Subject Matter
Claim 18 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Objections
Applicant is advised that should claim 9 be found allowable, claim 12 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).
Claim 12 is objected to under 37 CFR 1.75 as being a substantial duplicate of claim 9. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9 and 10-17 are rejected under 35 U.S.C. 103 as being unpatentable over Chibane et al. (NPL: Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion), hereinafter Chibane, in view of He et al. (US Pub. 2015/0381961), hereinafter He.
Regarding claim 1, Chibane discloses a method comprising: generating a first representation of an object that has three dimensions based on a first grid and a position that has three dimensions associated with the object (Section 3.1: learned implicit reconstruction from 3D input differ in their inference and their output shape representation (signed distance or binary occupancies), they are conceptually similar. Here, we describe the occupancy formulation of [43]. Note that the strengths and limitations of these methods are very similar. They all encode a 3D shape using a latent vector z ∈ Z ⊂ R m. Then a continuous representation of the shape is obtained by learning a neural function); generating a second grid based on a divided volume of a feature set, the feature set being generated based on the first representation of the object (Section 3.2: we construct a rich encoding of the data X through subsequently convolving it with learned 3D convo lutions. This requires the input to lie on a discrete voxel grid, i.e. X = RN×N×N, where N ∈ N denotes the input resolution. To process point clouds we simply discretize them first. The convolutions are followed by down scaling the input, creating growing receptive fields and channels but shrinking resolution, just like commonly done in 2D [34]. Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1,..,Fn,); generating a second representation of the object based on portion of the second grid and portion of an upsampled first grid (Fig. 3; Fig. 4; Section 4.2: as a second task, we apply our method to 3D superresolution. To effectively solve this task, our method needs to again preserve the input shape while reconstructing details not present in the input. Our results in side-by-side comparison with the baselines are depicted in Fig. 3 (botInput OccNet IMNET DMC Ours GT Figure 4. Qualitative results of sparse (323 , upper) and dense (1283 , lower) 3D voxel super-resolution on the Humans dataset. tom). While most baseline methods either hallucinate structure or completely fail, our method consistently produces accurate and highly detailed results. This is also reflected in the numerical comparison in Tab. 3, where we improve over the baselines in all metrics); decoding the first representation of the object to generate a first decoded representation of the object (Fig. 2; Section 3.2: we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm); decoding the second representation of the object to generate a second decoded representation of the object (Fig. 2; Section 3.2: we extract the learned deep features F1(p), .., Fn(p) from the feature grids at location p. This is only possible because our encoding has a 3D structure aligned with the input data. Since feature grids are discrete, we use trilinear interpolation to query continuous 3D points p ∈ R 3 . In order to encode information of the local neighborhood into the point encoding, even at early grids with small receptive fields (e.g. F1), we extract features at the location of a query point p itself and additionally at surrounding points in a distance d along the Cartesian axe); and generating a reconstructed volume representing the object based on the composite representation (Section 3.2: in this formulation, the network classifies the point based on local and global shape features, instead of point coordinates, which are arbitrary under rotation, translation, and articulation transformations. Furthermore, due to our multi-scale encoding, details can be preserved while reasoning about global shape is still possible; Section 3.4: the goal is to reconstruct a continuous and complete representation, given only a discrete and incomplete 3D input X. First, we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Chibane does not explicitly disclose a second decoded representation including a residual of the object; generating a composite representation the object based on a sum of the first decoded representation and the second decoded representation.
However, He teaches 3D shape reconstruction (Abstract), further comprising a second decoded representation including a residual of the object (Abstract: recorder creating an encoded data stream comprising an encoded video stream and an encoded graphics stream, the video stream comprising an encoded 3D (three-dimensional) video object, and the graphics stream comprising at least a first encoded segment and a second encoded segment, the first segment comprising 2D (two-dimensional) graphics data and the second segment comprises a depth map for the 2D graphics data. A graphics decoder decoding the first and second encoded segments to form respective first and second decoded sequences. Outputting the first and second decoded sequences separately to a 3D display unit. The 3D display unit combining the first and second decoded sequences and rendering the combination as a 3D graphics image overlaying a 3D video image simultaneously rendered from a decoded 3D video object decoded from the encoded 3D video object); generating a composite representation the object based on a sum of the first decoded representation and the second decoded representation (Paragraphs [0025]-[0030]: a second aspect of the invention there is provided a method in a graphics system for decoding a data stream, wherein the data stream comprises at least first and second segments, the first segment comprising two-dimensional graphics data and the second segment comprising information relating to the two-dimensional graphics object, the method comprising: receiving the data stream; forming a first decoded data sequence from the first segment and a second decoded data sequence from the second segment; and outputting the first decoded data sequence and the second decoded data sequence to a display unit for rendering a three-dimensional graphics data by combining the first decoded data sequence and the second decoded data sequence). He teaches that this will allow for completeness in decoding data segments (Paragraphs [0002]-[0017]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chibane with the features of above as taught by He so as to allow for completeness as presented by He.
Regarding claim 2, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the first grid corresponds to a global shape of the object (Section 3.2: The feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data).
Regarding claim 3, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the second grid corresponds to a portion of the object (Fig. 2; Section 3.1: Then a continuous representation of the shape is obtained by learning a neural function f(z, p)… Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1, .., Fn…of decreasing resolution K = N 2 k−1 , and variable channel dimensionality Fk ∈ N at each stage Fk ⊂ R Fk . The feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder as g(X)).
Regarding claim 4, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the second grid includes a first set of vectors and a second set of vectors each representing a portion of the object, the first set of vectors includes fewer vectors than the second set of vectors, and the first set of vectors includes fewer details associated with the object than the second set of vectors (Fig. 2; Section 3.1: Then a continuous representation of the shape is obtained by learning a neural function f(z, p)… Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1, .., Fn…of decreasing resolution K = N 2 k−1 , and variable channel dimensionality Fk ∈ N at each stage Fk ⊂ R Fk . The feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder as g(X); Section 3.2: we propose a novel encoding and decoding tandem capable of addressing the above limitations for the task of 3D reconstruction from point clouds or occupancy grids. Given such 3D input data X ∈ X of an object, where X denotes the space of the inputs, and a 3D point p ∈ R 3 , we want to predict if p lies inside or outside the object…feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder).
Regarding claim 5, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the first representation is missing a portion of the object, and at least one of generating the first decoded representation and generating the second decoded representation includes at least partially completing the missing portion of the object (Fig. 2; Section 2: Implicit Functions for rigid objects: Recently, neural networks have been used to learn a continuous implicit function representing shape [43, 48, 10, 44, 35]. For this a neural network can be feed with a latent code and a query point (x-y-z) to predict the TSDF value [48] or the binary occupancy of the point [43, 10]. A recent method [76] achieved state-of-the-art results for 3D reconstruction from images combining 3D query point features with local image features; Section 3.2: we propose a novel encoding and decoding tandem capable of addressing the above limitations for the task of 3D reconstruction from point clouds or occupancy grids. Given such 3D input data X ∈ X of an object, where X denotes the space of the inputs, and a 3D point p ∈ R 3 , we want to predict if p lies inside or outside the object…feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder; Section 3.3: To train the multi-scale encoder gw(·) in Eq. (2), and decoder fw(·) in Eq. (4), parameterized with neural weights w, pairs {Xi , Si} T i=1 of 3D inputs Xi with corresponding 3D ground truth object surfaces Si are required, where i ∈ 1, . . . , T and T ∈ N denotes the number of such training examples; Section 3.4: we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Regarding claim 6, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein generating the first decoded representation is performed by a decoder including a neural network, the neural network is trained using latent code including dropped-out vectors configured to simulate that the first representation is missing a portion of the object, and the neural network is trained to complete the missing portion of the object (Fig. 2; Section 2: Implicit Functions for rigid objects: Recently, neural networks have been used to learn a continuous implicit function representing shape [43, 48, 10, 44, 35]. For this a neural network can be feed with a latent code and a query point (x-y-z) to predict the TSDF value [48] or the binary occupancy of the point [43, 10]. A recent method [76] achieved state-of-the-art results for 3D reconstruction from images combining 3D query point features with local image features; Section 3.2: we propose a novel encoding and decoding tandem capable of addressing the above limitations for the task of 3D reconstruction from point clouds or occupancy grids. Given such 3D input data X ∈ X of an object, where X denotes the space of the inputs, and a 3D point p ∈ R 3 , we want to predict if p lies inside or outside the object…feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder; Section 3.3: To train the multi-scale encoder gw(·) in Eq. (2), and decoder fw(·) in Eq. (4), parameterized with neural weights w, pairs {Xi , Si} T i=1 of 3D inputs Xi with corresponding 3D ground truth object surfaces Si are required, where i ∈ 1, . . . , T and T ∈ N denotes the number of such training examples; Section 3.4: we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Regarding claim 7, Chibane, in view of He teaches the method of claim 6, Chibane discloses wherein the neural network is trained using the reconstructed volume and a volume that the first representation is based on (Section 3.1: recent works [48, 43, 10] on learned implicit reconstruction from 3D input differ in their inference and their output shape representation (signed distance or binary occupancies), they are conceptually similar. Here, we describe the occupancy formulation of [43]. Note that the strengths and limitations of these methods are very similar. They all encode a 3D shape using a latent vector z ∈ Z ⊂ R m. Then a continuous representation of the shape is obtained by learning a neural function f(z, p): Z × R 3 7→ [0, 1], (1) which given a query point p ∈ R 3 , and the latent code z, classifies whether the point is inside (classification as 1) or outside (classification as 0) the surface).
Regarding claim 8, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the second representation is generated based on a concatenation of a first sampled vector and a second sampled vector, the first sampled vector is a trilinear interpolation of the upsampled first grid, and the second sampled vector is a trilinear interpolation of the second grid (Fig. 2; Section 3.2: we extract the learned deep features F1(p), .., Fn(p) from the feature grids at location p. This is only possible because our encoding has a 3D structure aligned with the input data. Since feature grids are discrete, we use trilinear interpolation to query continuous 3D points p ∈ R 3 . In order to encode information of the local neighborhood into the point encoding, even at early grids with small receptive fields (e.g. F1), we extract features at the location of a query point p itself and additionally at surrounding points in a distance d along the Cartesian axe; Section 4.2: we hypothesize that the current methods are not suited for tasks where classification into shape prototypes is not sufficient. This is for example the case for humans as they come in various shapes and articulations. To verify our hypothesis, we additionally perform 3D super-resolution on our Humans dataset. Here the advantages are even more prominent: Our method is the only one that consistently reconstructs all limbs and produces highly detailed results. Implicit learning-based baselines produce truncated or completely missing limbs).
Regarding claim 9, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein a latent code includes a plurality of hierarchical layers, a first layer of the plurality of hierarchical layers includes the first grid, and a second layer of the plurality of hierarchical layers includes the second grid (Fig. 2; Section 3.4: the goal is to reconstruct a continuous and complete representation, given only a discrete and incomplete 3D input X. First, we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Regarding claim 11, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein a number of iterations defines a resolution associated with the first decoded representation (Fig. 2; Sections 3.1-3.2: Then a continuous representation of the shape is obtained by learning a neural function f(z, p)… Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1, .., Fn…of decreasing resolution K = N 2 k−1 , and variable channel dimensionality Fk ∈ N at each stage Fk ⊂ R Fk . The feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder as g(X)).
Regarding claim 12, Chibane, in view of He teaches the method of claim 1, Chibane discloses further comprising a latent code that includes a plurality of hierarchical layers, wherein a first layer of the plurality of hierarchical layers includes the first grid, and a second layer of the plurality of hierarchical layers includes the second grid (Fig. 2; Section 3.4: the goal is to reconstruct a continuous and complete representation, given only a discrete and incomplete 3D input X. First, we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Regarding claim 13, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the feature set is generated using a neural network (Fig. 2; Section 2: Implicit Functions for rigid objects: Recently, neural networks have been used to learn a continuous implicit function representing shape [43, 48, 10, 44, 35]. For this a neural network can be feed with a latent code and a query point (x-y-z) to predict the TSDF value [48] or the binary occupancy of the point).
Regarding claim 14, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the first representation is missing a portion of the object, the feature set is generated using a neural network, the neural network is trained to complete the missing portion of the object while generating the feature set (Fig. 2; Section 2: Implicit Functions for rigid objects: Recently, neural networks have been used to learn a continuous implicit function representing shape [43, 48, 10, 44, 35]. For this a neural network can be feed with a latent code and a query point (x-y-z) to predict the TSDF value [48] or the binary occupancy of the point [43, 10]. A recent method [76] achieved state-of-the-art results for 3D reconstruction from images combining 3D query point features with local image features; Section 3.2: we propose a novel encoding and decoding tandem capable of addressing the above limitations for the task of 3D reconstruction from point clouds or occupancy grids. Given such 3D input data X ∈ X of an object, where X denotes the space of the inputs, and a 3D point p ∈ R 3 , we want to predict if p lies inside or outside the object…feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder; Section 3.3: To train the multi-scale encoder gw(·) in Eq. (2), and decoder fw(·) in Eq. (4), parameterized with neural weights w, pairs {Xi , Si} T i=1 of 3D inputs Xi with corresponding 3D ground truth object surfaces Si are required, where i ∈ 1, . . . , T and T ∈ N denotes the number of such training examples; Section 3.4: we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Regarding claim 15, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the feature set is generated using a neural network, the neural network is trained using latent code including dropped-out vectors associated with the second grid, the dropped-out vectors simulating that the first representation is missing a portion of the object, and the neural network is trained to complete the missing portion of the object while generating the feature set (Fig. 2; Section 2: Implicit Functions for rigid objects: Recently, neural networks have been used to learn a continuous implicit function representing shape [43, 48, 10, 44, 35]. For this a neural network can be feed with a latent code and a query point (x-y-z) to predict the TSDF value [48] or the binary occupancy of the point [43, 10]. A recent method [76] achieved state-of-the-art results for 3D reconstruction from images combining 3D query point features with local image features; Section 3.2: we propose a novel encoding and decoding tandem capable of addressing the above limitations for the task of 3D reconstruction from point clouds or occupancy grids. Given such 3D input data X ∈ X of an object, where X denotes the space of the inputs, and a 3D point p ∈ R 3 , we want to predict if p lies inside or outside the object…feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder; Section 3.3: To train the multi-scale encoder gw(·) in Eq. (2), and decoder fw(·) in Eq. (4), parameterized with neural weights w, pairs {Xi , Si} T i=1 of 3D inputs Xi with corresponding 3D ground truth object surfaces Si are required, where i ∈ 1, . . . , T and T ∈ N denotes the number of such training examples; Section 3.4: we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Regarding claim 16, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the first grid is generated using a neural network, the neural network is trained using latent code including dropped-out vectors associated with the second grid, the dropped-out vectors simulating that the first representation is missing a portion of the object, and the neural network is trained to complete the missing portion of the object while generating the first-latent grid (Fig. 2; Section 2: Implicit Functions for rigid objects: Recently, neural networks have been used to learn a continuous implicit function representing shape [43, 48, 10, 44, 35]. For this a neural network can be feed with a latent code and a query point (x-y-z) to predict the TSDF value [48] or the binary occupancy of the point [43, 10]. A recent method [76] achieved state-of-the-art results for 3D reconstruction from images combining 3D query point features with local image features; Section 3.2: we propose a novel encoding and decoding tandem capable of addressing the above limitations for the task of 3D reconstruction from point clouds or occupancy grids. Given such 3D input data X ∈ X of an object, where X denotes the space of the inputs, and a 3D point p ∈ R 3 , we want to predict if p lies inside or outside the object…feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder; Section 3.3: To train the multi-scale encoder gw(·) in Eq. (2), and decoder fw(·) in Eq. (4), parameterized with neural weights w, pairs {Xi , Si} T i=1 of 3D inputs Xi with corresponding 3D ground truth object surfaces Si are required, where i ∈ 1, . . . , T and T ∈ N denotes the number of such training examples; Section 3.4: we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).
Regarding claim 17, Chibane, in view of He teaches the method of claim 1, Chibane discloses wherein the second grid is generated using a neural network, the neural network is trained using latent code including dropped-out vectors associated with the second grid, the dropped-out vectors simulating that the second representation is missing a portion of the object, and the neural network is trained to complete the missing portion of the object while generating the second grid (Fig. 2; Section 2: Implicit Functions for rigid objects: Recently, neural networks have been used to learn a continuous implicit function representing shape [43, 48, 10, 44, 35]. For this a neural network can be feed with a latent code and a query point (x-y-z) to predict the TSDF value [48] or the binary occupancy of the point [43, 10]. A recent method [76] achieved state-of-the-art results for 3D reconstruction from images combining 3D query point features with local image features; Section 3.2: we propose a novel encoding and decoding tandem capable of addressing the above limitations for the task of 3D reconstruction from point clouds or occupancy grids. Given such 3D input data X ∈ X of an object, where X denotes the space of the inputs, and a 3D point p ∈ R 3 , we want to predict if p lies inside or outside the object…feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder; Section 3.3: To train the multi-scale encoder gw(·) in Eq. (2), and decoder fw(·) in Eq. (4), parameterized with neural weights w, pairs {Xi , Si} T i=1 of 3D inputs Xi with corresponding 3D ground truth object surfaces Si are required, where i ∈ 1, . . . , T and T ∈ N denotes the number of such training examples; Section 3.4: we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm).

Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kehl et al. (US Patent 11462023), hereinafter Kehl, in view of Chibane, and further in view of He.
Regarding claim 19, Kehl discloses a non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed by at least one processor (Column 19, lines 63-67: vehicle 100 can include one or more modules, at least some of which are described herein. The modules can be implemented as computer-readable program code that, when executed by a processor 110, implement one or more of the various processes described herein), are configured to cause a computing system to: generate a first representation of an object that has three dimensions based on a first grid and a position that has three dimensions associated with a signed distance function (SDF) representing the object (Column 9, lines 34-54: embodiments employ a coordinate-space framework known in the literature as “DeepSDF” to embed (watertight) vehicle models into a joint, compact shape-space representation with a single neural network (the CSS network discussed above). The concept is to transform input models into SDFs where each value signifies the distance to the closest surface, with positive and negative values representing exterior and interior regions, respectively. The SDF representation is desirable because it is generally easy for a neural network to learn. Eventually, DeepSDF forms a shape space of implicit surfaces with a decoder f that can be queried at spatially-continuous 3D locations x={x1, . . . , xN} with a provided latent code z; Column 14, line 64-Column 15, line 18: optimization module 230 proceeds with the optimization stage (refer once again to FIG. 6). By concatenating the latent vector z (520) with the query 3D grid x (420), the input is formed for the DeepSDF network 620. The DeepSDF network 620 outputs SDF values for each query point on the query grid 420, which are used for the 0-isosurface projection, providing a dense surface-point cloud. The resulting point cloud is then transformed using the estimated pose and scale coming from the pose estimator 610. The points that should not be visible from the given camera view can be filtered using simple back-face culling, since surface normals have been already computed for 0-isosurface projection. At this stage, optimization module 230 can apply 3D losses between the resulting transformed point cloud and the input LIDAR frustum points. The surface point cloud is also used as an input to the differentiable renderer 660, which renders NOCS as RGB and applies 2D losses between the CSS Network's (330) NOCS prediction and the renderer's (660) output NOCS. The latent vector (520) and the pose (630) are then updated, and the process is repeated until termination); generating a second grid based on a divided volume of a feature set, the feature set being generated based on the first representation of the object (Section 3.2: we construct a rich encoding of the data X through subsequently convolving it with learned 3D convo lutions. This requires the input to lie on a discrete voxel grid, i.e. X = RN×N×N, where N ∈ N denotes the input resolution. To process point clouds we simply discretize them first. The convolutions are followed by down scaling the input, creating growing receptive fields and channels but shrinking resolution, just like commonly done in 2D [34]. Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1,..,Fn,); generate a second representation of the object based on the second grid and an upsampled first grid (Fig. 6; Column 14, lines 7-25: FIG. 6 illustrates a system architecture of a 3D autolabeling pipeline for a 3D object-detection system 170, in accordance with an illustrative embodiment of the invention. As mentioned above, in these embodiments, the CSS Network 330 includes a ResNet18 backbone architecture. In these embodiments, the decoders use bilinear interpolation as an upsampling operation rather than deconvolution to decrease the number of parameters and the required number of computations. Each upsampling is followed by concatenation of the output feature map with the feature map from the previous level and one convolutional layer. Since CSS Network 330 is trained on synthetic input data 260, it can be initialized with ImageNet weights, and the first five layers are frozen to prevent overfitting to peculiarities of the rendered data. Five heads 605 of CSS Network 330 are responsible for the output of U, V, and W channels of the NOCS as well as the object's mask (510) and its latent vector (shape vector 520), encoding its DeepSDF shape); decode the first representation of the object to generate a first SDF (Fig. 6; Column 14, lines 7-25: FIG. 6 illustrates a system architecture of a 3D autolabeling pipeline for a 3D object-detection system 170, in accordance with an illustrative embodiment of the invention. As mentioned above, in these embodiments, the CSS Network 330 includes a ResNet18 backbone architecture. In these embodiments, the decoders use bilinear interpolation as an upsampling operation rather than deconvolution to decrease the number of parameters and the required number of computations. Each upsampling is followed by concatenation of the output feature map with the feature map from the previous level and one convolutional layer. Since CSS Network 330 is trained on synthetic input data 260, it can be initialized with ImageNet weights, and the first five layers are frozen to prevent overfitting to peculiarities of the rendered data. Five heads 605 of CSS Network 330 are responsible for the output of U, V, and W channels of the NOCS as well as the object's mask (510) and its latent vector (shape vector 520), encoding its DeepSDF shape);.
Kehl does not explicitly disclose decode the second representation of the object to generate second SDF including a residual; generate a composite SDF based on a sum of the first SDF and the second SDF; and generate a reconstructed volume representing the object based on the composite SDF.
	However, Chibane teaches 3D shape reconstruction with implicit functions (Abstract), further comprising iteratively decode the second representation of the object to generate second SDF (Fig. 2; Section 2: , the approaches are complicated to implement, require multiple passes over the input, and are still limited to grids of size 2563 , which result in visible quantization artifacts. To smooth out noise, it is possible to represent shapes as Truncated Signed Distance functions [12] for learning [14, 36, 58, 64]. The resolution is however still bounded by the 3D grid storing the TSDF values; Section 3.2: we extract the learned deep features F1(p), .., Fn(p) from the feature grids at location p. This is only possible because our encoding has a 3D structure aligned with the input data. Since feature grids are discrete, we use trilinear interpolation to query continuous 3D points p ∈ R 3 . In order to encode information of the local neighborhood into the point encoding, even at early grids with small receptive fields (e.g. F1), we extract features at the location of a query point p itself and additionally at surrounding points in a distance d along the Cartesian axe); generate a composite SDF based on the first SDF and the second SDF (Fig. 2; Sections 3.1-3.2: Then a continuous representation of the shape is obtained by learning a neural function f(z, p)… Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1, .., Fn…of decreasing resolution K = N 2 k−1 , and variable channel dimensionality Fk ∈ N at each stage Fk ⊂ R Fk . The feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder as g(X)); and generate a reconstructed volume representing the object based on the composite SDF (Section 3.2: in this formulation, the network classifies the point based on local and global shape features, instead of point coordinates, which are arbitrary under rotation, translation, and articulation transformations. Furthermore, due to our multi-scale encoding, details can be preserved while reasoning about global shape is still possible; Section 3.4: the goal is to reconstruct a continuous and complete representation, given only a discrete and incomplete 3D input X. First, we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm). Chibane teaches that this will allow for predicting shape in continuous space potentially at any resolution (Section 2). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kehl with the features of above as taught by Chibane so as to allow for predicting shape in continuous space potentially at any resolution as presented by Chibane.
	While Chibane teaches to generate a second SDF, Chibane does not explicitly disclose to generate a residual; or generate a composite SDF based on a sum of the first SDF and the second SDF.
	However, He teaches 3D shape reconstruction (Abstract), further comprising a second decoded representation including a residual (Abstract: recorder creating an encoded data stream comprising an encoded video stream and an encoded graphics stream, the video stream comprising an encoded 3D (three-dimensional) video object, and the graphics stream comprising at least a first encoded segment and a second encoded segment, the first segment comprising 2D (two-dimensional) graphics data and the second segment comprises a depth map for the 2D graphics data. A graphics decoder decoding the first and second encoded segments to form respective first and second decoded sequences. Outputting the first and second decoded sequences separately to a 3D display unit. The 3D display unit combining the first and second decoded sequences and rendering the combination as a 3D graphics image overlaying a 3D video image simultaneously rendered from a decoded 3D video object decoded from the encoded 3D video object); and generate a composite SDF based on a sum of the first SDF and the second SDF (Paragraphs [0025]-[0030]: a second aspect of the invention there is provided a method in a graphics system for decoding a data stream, wherein the data stream comprises at least first and second segments, the first segment comprising two-dimensional graphics data and the second segment comprising information relating to the two-dimensional graphics object, the method comprising: receiving the data stream; forming a first decoded data sequence from the first segment and a second decoded data sequence from the second segment; and outputting the first decoded data sequence and the second decoded data sequence to a display unit for rendering a three-dimensional graphics data by combining the first decoded data sequence and the second decoded data sequence). He teaches that this will allow for completeness in decoding data segments (Paragraphs [0002]-[0017]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kehl, in view of Chibane with the features of above as taught by He so as to allow for completeness as presented by He.
Regarding claim 20, Kehl discloses a non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed by at least one processor (Column 19, lines 63-67: vehicle 100 can include one or more modules, at least some of which are described herein. The modules can be implemented as computer-readable program code that, when executed by a processor 110, implement one or more of the various processes described herein), are configured to cause a computing system to: generate a feature set based on a representation of an object that has three dimensions (Fig. 6; Column 14, lines 7-25: FIG. 6 illustrates a system architecture of a 3D autolabeling pipeline for a 3D object-detection system 170, in accordance with an illustrative embodiment of the invention. As mentioned above, in these embodiments, the CSS Network 330 includes a ResNet18 backbone architecture. In these embodiments, the decoders use bilinear interpolation as an upsampling operation rather than deconvolution to decrease the number of parameters and the required number of computations. Each upsampling is followed by concatenation of the output feature map with the feature map from the previous level and one convolutional layer. Since CSS Network 330 is trained on synthetic input data 260, it can be initialized with ImageNet weights, and the first five layers are frozen to prevent overfitting to peculiarities of the rendered data. Five heads 605 of CSS Network 330 are responsible for the output of U, V, and W channels of the NOCS as well as the object's mask (510) and its latent vector (shape vector 520), encoding its DeepSDF shape); generate a first grid based on the feature set (Column 9, lines 34-54: embodiments employ a coordinate-space framework known in the literature as “DeepSDF” to embed (watertight) vehicle models into a joint, compact shape-space representation with a single neural network (the CSS network discussed above). The concept is to transform input models into SDFs where each value signifies the distance to the closest surface, with positive and negative values representing exterior and interior regions, respectively. The SDF representation is desirable because it is generally easy for a neural network to learn. Eventually, DeepSDF forms a shape space of implicit surfaces with a decoder f that can be queried at spatially-continuous 3D locations x={x1, . . . , xN} with a provided latent code z; Column 14, line 64-Column 15, line 18: optimization module 230 proceeds with the optimization stage (refer once again to FIG. 6). By concatenating the latent vector z (520) with the query 3D grid x (420), the input is formed for the DeepSDF network 620. The DeepSDF network 620 outputs SDF values for each query point on the query grid 420, which are used for the 0-isosurface projection, providing a dense surface-point cloud. The resulting point cloud is then transformed using the estimated pose and scale coming from the pose estimator 610. The points that should not be visible from the given camera view can be filtered using simple back-face culling, since surface normals have been already computed for 0-isosurface projection. At this stage, optimization module 230 can apply 3D losses between the resulting transformed point cloud and the input LIDAR frustum points. The surface point cloud is also used as an input to the differentiable renderer 660, which renders NOCS as RGB and applies 2D losses between the CSS Network's (330) NOCS prediction and the renderer's (660) output NOCS. The latent vector (520) and the pose (630) are then updated, and the process is repeated until termination); generate a plurality of hierarchical layers including a first layer that includes the first grid and a second layer of the plurality of hierarchical layers includes the second grid (Fig. 6; Column 14, lines 7-39: FIG. 6 illustrates a system architecture of a 3D autolabeling pipeline for a 3D object-detection system 170, in accordance with an illustrative embodiment of the invention. As mentioned above, in these embodiments, the CSS Network 330 includes a ResNet18 backbone architecture. In these embodiments, the decoders use bilinear interpolation as an upsampling operation rather than deconvolution to decrease the number of parameters and the required number of computations. Each upsampling is followed by concatenation of the output feature map with the feature map from the previous level and one convolutional layer. Since CSS Network 330 is trained on synthetic input data 260, it can be initialized with ImageNet weights, and the first five layers are frozen to prevent overfitting to peculiarities of the rendered data. Five heads 605 of CSS Network 330 are responsible for the output of U, V, and W channels of the NOCS as well as the object's mask (510) and its latent vector (shape vector 520), encoding its DeepSDF shape… pose estimation block 610 is based on 3D-3D correspondence estimation. The procedure is defined as follows: CSS Network 330 outputs NOCS, mapping each RGB pixel to a 3D location on the object's surface. The NOCS are back-projected onto the LIDAR frustum points 550 using the provided camera parameters. Additionally, CSS Network 330 outputs a latent vector (shape vector 520), which is then fed to the DeepSDF network 620 and transformed to a surface point cloud using 0-isosurface projection, as discussed above); generate a first representation of the object based on the first grid and a position that has three dimensions and is associated with the object (Column 9, lines 34-54: embodiments employ a coordinate-space framework known in the literature as “DeepSDF” to embed (watertight) vehicle models into a joint, compact shape-space representation with a single neural network (the CSS network discussed above). The concept is to transform input models into SDFs where each value signifies the distance to the closest surface, with positive and negative values representing exterior and interior regions, respectively. The SDF representation is desirable because it is generally easy for a neural network to learn. Eventually, DeepSDF forms a shape space of implicit surfaces with a decoder f that can be queried at spatially-continuous 3D locations x={x1, . . . , xN} with a provided latent code z; Column 14, line 64-Column 15, line 18: optimization module 230 proceeds with the optimization stage (refer once again to FIG. 6). By concatenating the latent vector z (520) with the query 3D grid x (420), the input is formed for the DeepSDF network 620. The DeepSDF network 620 outputs SDF values for each query point on the query grid 420, which are used for the 0-isosurface projection, providing a dense surface-point cloud. The resulting point cloud is then transformed using the estimated pose and scale coming from the pose estimator 610. The points that should not be visible from the given camera view can be filtered using simple back-face culling, since surface normals have been already computed for 0-isosurface projection. At this stage, optimization module 230 can apply 3D losses between the resulting transformed point cloud and the input LIDAR frustum points. The surface point cloud is also used as an input to the differentiable renderer 660, which renders NOCS as RGB and applies 2D losses between the CSS Network's (330) NOCS prediction and the renderer's (660) output NOCS. The latent vector (520) and the pose (630) are then updated, and the process is repeated until termination); generate a second representation of the object based on the second grid and an upsampled first grid (Fig. 6; Column 14, lines 7-25: FIG. 6 illustrates a system architecture of a 3D autolabeling pipeline for a 3D object-detection system 170, in accordance with an illustrative embodiment of the invention. As mentioned above, in these embodiments, the CSS Network 330 includes a ResNet18 backbone architecture. In these embodiments, the decoders use bilinear interpolation as an upsampling operation rather than deconvolution to decrease the number of parameters and the required number of computations. Each upsampling is followed by concatenation of the output feature map with the feature map from the previous level and one convolutional layer. Since CSS Network 330 is trained on synthetic input data 260, it can be initialized with ImageNet weights, and the first five layers are frozen to prevent overfitting to peculiarities of the rendered data. Five heads 605 of CSS Network 330 are responsible for the output of U, V, and W channels of the NOCS as well as the object's mask (510) and its latent vector (shape vector 520), encoding its DeepSDF shape); decode the first representation of the object to generate a first decoded representation of the 3D object (Fig. 6; Column 14, lines 7-25: FIG. 6 illustrates a system architecture of a 3D autolabeling pipeline for a 3D object-detection system 170, in accordance with an illustrative embodiment of the invention. As mentioned above, in these embodiments, the CSS Network 330 includes a ResNet18 backbone architecture. In these embodiments, the decoders use bilinear interpolation as an upsampling operation rather than deconvolution to decrease the number of parameters and the required number of computations. Each upsampling is followed by concatenation of the output feature map with the feature map from the previous level and one convolutional layer. Since CSS Network 330 is trained on synthetic input data 260, it can be initialized with ImageNet weights, and the first five layers are frozen to prevent overfitting to peculiarities of the rendered data. Five heads 605 of CSS Network 330 are responsible for the output of U, V, and W channels of the NOCS as well as the object's mask (510) and its latent vector (shape vector 520), encoding its DeepSDF shape).
	Kehl does not explicitly disclose generating a second grid based on a divided volume of the feature set; decode the second representation of the object to generate a second decoded representation of the object; generate a composite representation of the object based on a sum of the first decoded representation of the object and the second decoded representation of the object; and generate a reconstructed volume representing the 3D object based on the composite representation of the object.
	However, Chibane teaches 3D shape reconstruction with implicit functions (Abstract), further comprising generating a second grid based on a current iteration of the subdivided volume of the feature set (Fig. 3; Section 3.1: Then a continuous representation of the shape is obtained by learning a neural function f(z, p)… Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1, .., Fn…of decreasing resolution K = N 2 k−1 , and variable channel dimensionality Fk ∈ N at each stage Fk ⊂ R Fk . The feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder as g(X)); decode the second representation of the object to generate a second decoded representation of the object (Fig. 2; Section 3.2: we extract the learned deep features F1(p), .., Fn(p) from the feature grids at location p. This is only possible because our encoding has a 3D structure aligned with the input data. Since feature grids are discrete, we use trilinear interpolation to query continuous 3D points p ∈ R 3 . In order to encode information of the local neighborhood into the point encoding, even at early grids with small receptive fields (e.g. F1), we extract features at the location of a query point p itself and additionally at surrounding points in a distance d along the Cartesian axe); generate a composite representation of the object based on a sum of the first decoded representation of the object and the second decoded representation of the object (Fig. 2; Section 3.1: Then a continuous representation of the shape is obtained by learning a neural function f(z, p)… Applying this procedure recursively n times on the input data X, we create multi-scale deep feature grids F1, .., Fn…of decreasing resolution K = N 2 k−1 , and variable channel dimensionality Fk ∈ N at each stage Fk ⊂ R Fk . The feature grids Fk at the early stages (starting at k = 1) capture high frequencies (shape detail), whereas feature grids Fk at the late stages (ending at stage k = n) have a large receptive fields, which capture the global structure of the data. This enables to reason about missing or sparse data, while retaining detail when is present in the input. We denote the encoder as g(X)); and generate a reconstructed volume representing the 3D object based on the composite representation of the object (Section 3.2: in this formulation, the network classifies the point based on local and global shape features, instead of point coordinates, which are arbitrary under rotation, translation, and articulation transformations. Furthermore, due to our multi-scale encoding, details can be preserved while reasoning about global shape is still possible; Section 3.4: the goal is to reconstruct a continuous and complete representation, given only a discrete and incomplete 3D input X. First, we use the learned encoder network to construct the multi-scale feature grids g(X) = F1, .., Fn. Then, we use the point-wise decoder network f(g(X, p)) to create occupancy predictions at continuous point locations p ∈ R 3 (cf. Sec. 3.2). In order to construct a mesh, we evaluate the IF-Net on points on a grid of the desired resolution. Then, the resulting high resolution occupancy grid is transformed into a mesh using the classical marching cubes [42] algorithm). Chibane teaches that this will allow for predicting shape in continuous space potentially at any resolution (Section 2). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kehl with the features of above as taught by Chibane so as to allow for predicting shape in continuous space potentially at any resolution as presented by Chibane.
While Chibane teaches to generate a composite representation of the object based on the first decoded representation and the at least one second decoded representation, Chibane does not explicitly disclose to generate a composite representation of the object based on a sum of the first decoded representation and the at least one second decoded representation.
	However, He teaches 3D shape reconstruction (Abstract), further comprising to generate a composite representation of the object based on a sum of the first decoded representation and the at least one second decoded representation (Paragraphs [0025]-[0030]: a second aspect of the invention there is provided a method in a graphics system for decoding a data stream, wherein the data stream comprises at least first and second segments, the first segment comprising two-dimensional graphics data and the second segment comprising information relating to the two-dimensional graphics object, the method comprising: receiving the data stream; forming a first decoded data sequence from the first segment and a second decoded data sequence from the second segment; and outputting the first decoded data sequence and the second decoded data sequence to a display unit for rendering a three-dimensional graphics data by combining the first decoded data sequence and the second decoded data sequence). He teaches that this will allow for completeness in decoding data segments (Paragraphs [0002]-[0017]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kehl, in view of Chibane with the features of above as taught by He so as to allow for completeness as presented by He.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW D SALVUCCI whose telephone number is (571)270-5748. The examiner can normally be reached M-F: 7:30-4:00PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XIAO WU can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW SALVUCCI/Primary Examiner, Art Unit 2613
Read full office action
Prosecution Timeline

Show 5 earlier events
Oct 14, 2025
Response Filed
Nov 05, 2025
Examiner Interview (Telephonic)
Nov 18, 2025
Final Rejection mailed — §103
Feb 06, 2026
Response after Non-Final Action
Feb 17, 2026
Request for Continued Examination
Feb 25, 2026
Response after Non-Final Action
Mar 31, 2026
Examiner Interview (Telephonic)
Apr 02, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/644,322
Patent 12639890
SAMPLING LIGHT DIRECTIONS ON NEURAL MATERIALS
2y 1m to grant Granted May 26, 2026
18/466,620
Patent 12632946
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND IMAGE PROCESSING PROGRAM
2y 8m to grant Granted May 19, 2026
18/493,424
Patent 12633073
EFFICIENT AVATAR CREATION WITH MESH PENETRATION AVOIDANCE
2y 6m to grant Granted May 19, 2026
18/769,972
Patent 12626473
DYNAMIC VIRTUAL OBJECTS
1y 10m to grant Granted May 12, 2026
18/320,600
Patent 12620160
GRAPHICS LIBRARY EXTENSIONS
2y 11m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
99%
With Interview (+28.3%)
2y 11m (~2m remaining)
Median Time to Grant
High
PTA Risk
Based on 487 resolved cases by this examiner. Grant probability derived from career allowance rate.