Last updated: May 29, 2026
Application No. 17/828,326
POINT CLOUD COMPRESSION USING OCCUPANCY NETWORKS

Final Rejection §103§112
Filed
May 31, 2022
Priority
Jul 14, 2021 — provisional 63/221,552
Examiner
CHIO, TAT CHI
Art Unit
2486
Tech Center
2400 — Computer Networks
Assignee
Sony Corporation Of America
OA Round
8 (Final)
Interview Optional

— +17.1% interview lift. Examiner has a relatively high allowance rate (73%); +17.1% interview lift. A written response may suffice.
Based on 844 resolved cases, 2023–2026
Examiner Intelligence

CHIO, TAT CHI View full profile →
Grants 73% — above average
Career Allowance Rate
616 granted / 844 resolved
+15.0% vs TC avg
Strong +17% interview lift
Without
With
+17.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
34 currently pending
Career history
888
Total Applications
across all art units
Statute-Specific Performance

§101
2.0%
-38.0% vs TC avg
§103
82.7%
+42.7% vs TC avg
§102
9.0%
-31.0% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 844 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 Response to Arguments
Applicant's arguments filed 7/26/2025 have been fully considered but they are not persuasive.
 Applicant argues that the Present Specification provides support in page 8, lines 6-14 for the claimed limitation “…the threshold is 0.50.”
In response, the Examiner respectfully disagrees. Applicant cited two portions (p. 7, lines 4-6 and p. 8, lines 6-14) of the Present Specification to provide support for the limitations “repetitively decide based on a threshold whether data belongs inside or outside a 3D structure to define a surface of a volumetric representation, wherein the threshold is 0.50.”
 However, both of the cited portions do not provide adequate support for the above limitation. Regarding p. 7, lines 4-6, although this portion mentions a threshold of 0.50, the threshold is directed to the probabilities that indicate whether the position is being occupied or not. It is not directed to the threshold that decides whether or not data belongs inside or outside a 3D structure. Regarding p.8, lines 6-14, although this portion describes decides based on a boundary (threshold) whether a point belongs inside or outside a 3D structure, it does not specifically describe that threshold is 0.50. Thus, the Present Specification does not provide support for the claimed limitation “…the threshold is 0.50….”
Applicant argues that the combination of Ma, Mescheder, and Hur does not explicitly teach the one or more occupancy networks implicitly represent 3D surfaces using a continuous decision boundary based on a deep neural network classifier, and repetitively decide based on a threshold whether data belongs inside or outside a 3D structure to define a surface of a volumetric representation.
In response, the Examiner respectfully disagrees.  Mescheder teaches Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.  Abstract of Mescheder.
We would like to reason about the occupancy not only at fixed discrete 3D locations (as in voxel representations) but at every possible 3D point p ∈ R3. We call the resulting function
o : R3 → {0, 1} (1)
the occupancy function of the 3D object. Our key insight is that we can approximate this 3D function with a neural network that assigns to every location p ∈ R3 an occupancy probability between 0 and 1. Note that this network is equivalent to a neural network for binary classification, except that we are interested in the decision boundary which implicitly represents the object’s surface. When using such a network for 3D reconstruction of an object based on observations of that object (e.g., image, point cloud, etc.), we must condition it on the input. Fortunately, we can make use of the following simple functional equivalence: a function that takes an observation x ∈ X as input and has a function from p ∈ R3 to R as output can be equivalently described by a function that takes a pair (p, x) ∈ R3 × X as input and outputs a real number. The latter representation can be simply parameterized by a neural network fθ that takes a pair (p, x) as input and outputs a real number which represents the probability of occupancy: 
fθ : R3 ×X →[0, 1] (2)
We call this network the Occupancy Network.  Section 3.1 of Mescheder.
To learn the parameters θ of the neural network fθ(p, x), we randomly sample points in the 3D bounding volume of the object under consideration: for the i-th sample in a training batch we sample K points pij ∈ R3, j = 1, . . . , K. We then evaluate the mini-batch loss LB at those locations:

    PNG
    media_image1.png
    188
    735
    media_image1.png
    Greyscale

Here, xi is the i’th observation of batch B, oij ≡ o(pij) de- notes the true occupancy at point pij, and L(·, ·) is a cross- entropy classiﬁcation loss. The performance of our method depends on the sampling scheme that we employ for drawing the locations pij that are used for training. In practice, we found that sampling uniformly in- side the bounding box of the object with an additional small padding yields the best results. 
Our 3D representation can also be used for learning probabilistic latent variable models. Towards this goal, we introduce an encoder network gψ(·) that takes locations pij and occupancies oij as input and predicts mean μψ and standard deviation σψ of a Gaussian distribution qψ(z|(pij, oij)j=1:K) on latent z ∈ RL as output.  We optimize a lower bound [21, 39, 59] to the negative log-likelihood of the generative model
p((oij )j=1:K |(pij )j=1:K ):




    PNG
    media_image2.png
    211
    730
    media_image2.png
    Greyscale

where KL denotes the KL-divergence, p0(z) is a prior dis- tribution on the latent variable zi (typically Gaussian) and zi is sampled according to qψ(zi|(pij, oij)j=1:K).  Section 3.2 of Mescheder
For extracting the isosurface corresponding to a new ob- servation given a trained occupancy network, we introduce Multiresolution IsoSurface Extraction (MISE), a hierarchi- cal isosurface extraction algorithm (Fig. 2). By incremen- tally building an octree [30, 46, 66, 73], MISE enables us to extract high resolution meshes from the occupancy network without densely evaluating all points of a high-dimensional occupancy grid.
We ﬁrst discretize the volumetric space at an initial reso- lution and evaluate the occupancy network fθ(p, x) for all p in this grid. We mark all grid points p as occupied for which fθ(p, x) is bigger or equal to some threshold2 τ . Next, we mark all voxels as active for which at least two adjacent grid points have differing occupancy predictions. These are the voxels which would intersect the mesh if we applied the marching cubes algorithm at the current resolution. We subdivide all active voxels into 8 subvoxels and evaluate all new grid points which are introduced to the occupancy grid through this subdivision. We repeat these steps until the desired ﬁnal resolution is reached. At this ﬁnal resolution, we apply the Marching Cubes algorithm [44] to extract an approximate isosurface
{p ∈ R3 | fθ(p, x) = τ }.	(5)
Our algorithm converges to the correct mesh if the occu- pancy grid at the initial resolution contains points from ev- ery connected component of both the interior and the ex- terior of the mesh. It is hence important to take an initial resolution which is high enough to satisfy this condition. In practice, we found that an initial resolution of 323 was sufﬁcient in almost all cases.
The initial mesh extracted by the Marching Cubes algo- rithm can be further reﬁned. In a ﬁrst step, we simplify the mesh using the Fast-Quadric-Mesh-Simpliﬁcation algo- rithm3 [20]. Finally, we reﬁne the output mesh using ﬁrst and second order (i.e., gradient) information. Towards this goal, we sample random points pk from each face of the output mesh and minimize the loss

    PNG
    media_image3.png
    127
    726
    media_image3.png
    Greyscale


where n(pk) denotes the normal vector of the mesh at p . In practice, we set λ = 0.01.  Minimization of the second term in (6) uses second order gradient information and can be efficiently implemented using Double-Bakpropagation.  Section 3.3 of Mescheder.
Further, Figure 2 shows that we first mark all points at a given resolution which have already been evaluated as either occupied (red circles) or unoccupied (cyan diamonds). We then determine all voxels that have both occupied and unoccupied corners and mark them as active (light red) and subdivide them into 8 subvoxels each. Next, we evaluate all new grid points (empty circles) that have been introduced by the subdivision. The previous two steps are repeated until the desired output resolution is reached. Finally, we extract the mesh using the marching cubes algorithm [44], simplify and reﬁne the output mesh using ﬁrst and second order gradient information.

Applicant argues that the combination of Ma, Mescheder, and Hur does not explicitly teach progressively dividing the one or more samples of the 3D space into divisions of various sizes, wherein the dividing is limited by processing power and the non-transitory memory of the device.
In response, the Examiner respectfully disagrees.  Hur teaches the LOD generator 40009 according to the embodiments generates a level of detail (LOD) to perform prediction transform coding. The LOD according to the embodiments is a degree of detail of point cloud content. As the LOD value decrease, it indicates that the detail of the point cloud content is degraded. As the LOD value increases, it indicates that the detail of the point cloud content is enhanced. Points may be classified by the LOD.  [0105].
FIG. 25 illustrates an exemplary octree according to embodiments.  A point cloud encoder (e.g., the point cloud encoder of FIG. 4) and a point cloud decoder (e.g., the point cloud decoder of FIG. 11) according to the embodiments recursively partition a 3D space of point cloud content to create an octree structure.  The left part of FIG. 25 shows an occupancy code 2500 of the octree described with reference to FIGS. 5 and 6. An LOD configurator (e.g., the LOD configurator 2210) according to the embodiments may partition a point cloud region according to the LOD level determined based on the depth of the occupancy code 2500 in order to perform sampling.  The right part of FIG. 25 shows an example 2510 of the process of partitioning the point cloud region according to LOD level 0 (LOD.sub.0) corresponding to depth 2 of the occupancy code 2500. Since the number of occupancy code nodes of depth 2 according to the embodiments is 8, the LOD configurator may recursively partition the region 2520 in the point cloud corresponding to depth 2 into eight equal regions. The regions created by recursive partitioning have cube shapes or cuboid shapes having the same volume. Accordingly, the point cloud region is portioned into eight regions 2520 corresponding to depth 2. As shown in the figure, the occupancy code of the octree of depth 2 includes multiple nodes, and the recursively partitioned regions correspond to child nodes of depth 2. That is, each of the regions 2520 corresponding to depth 2 is partitioned again into eight equal regions, wherein the partitioned region 2530 corresponds to depth 3. Accordingly, one region 2520 corresponding to depth 2 includes eight regions 2530 corresponding to depth 3. The regions 2530 corresponding to depth 3 correspond to LOD level 1 (LOD.sub.1). The region 2530 corresponding to depth 3 is partitioned again into eight equal regions, wherein the partitioned regions 2540 correspond to depth 4. Accordingly, the region 2530 corresponding to depth 3 includes the eight regions 2540 corresponding to depth 4.  [0292] – [0295].
Components of the point cloud data processing devices according to the embodiments described with reference to FIGS. 1 to 39 may be implemented as hardware, software, firmware, or a combination thereof including one or more processors coupled with a memory. The components of the devices according to the embodiments may be implemented as a single chip, for example, a single hardware circuit. Alternatively, the components of the point cloud data processing devices according to the embodiments may be implemented as separate chips. In addition, at least one of the components of the point cloud data processing devices according to the embodiments may include one or more processors capable of executing one or more programs, wherein the one or more programs may include are instructions that execute or are configured to execute one or more of the operations/methods of the point cloud data processing devices described with reference to FIGS. 1 to 39.  [0408].
Because components of the point cloud data device of Hur may be implemented as hardware, software, firmware, or a combination thereof including one or more processors coupled with memory, the dividing process taught by Hur is limited by the processing power and the memory of the device.
Applicant argues that the combination of Ma, Mescheder, and Hur does not explicitly teach the threshold is 0.50.
In response, the Examiner respectfully disagrees.  Mescheder teaches we explain this by the fact that other sampling strategies introduce bias to the model: for example, when sampling an equal number of points inside and outside the mesh, we implicitly tell the model that every object has a volume of 0.5.  Section 4.6.
The volume of the object is the threshold that decides whether the points are inside or outside the mesh. For example, if a point is at a location that is inside the volume of 0.5, then the point is less than the threshold. If a point is at a location that is outside the volume of 0.5, then the point is greater than the threshold.



Applicant argues that the combination of Ma, Mescheder, and Hur does not explicitly teach the function represents a set of classes.
In response, the Examiner respectfully disagrees. Mescheder teaches We would like to reason about the occupancy not only at fixed discrete 3D locations (as in voxel representations) but at every possible 3D point p ∈ R3. We call the resulting function
o : R3 → {0, 1} (1)
the occupancy function of the 3D object. Our key insight is that we can approximate this 3D function with a neural network that assigns to every location p ∈ R3 an occupancy probability between 0 and 1. Note that this network is equivalent to a neural network for binary classification, except that we are interested in the decision boundary which implicitly represents the object’s surface. When using such a network for 3D reconstruction of an object based on observations of that object (e.g., image, point cloud, etc.), we must condition it on the input. Fortunately, we can make use of the following simple functional equivalence: a function that takes an observation x ∈ X as input and has a function from p ∈ R3 to R as output can be equivalently described by a function that takes a pair (p, x) ∈ R3 × X as input and outputs a real number. The latter representation can be simply parameterized by a neural network fθ that takes a pair (p, x) as input and outputs a real number which represents the probability of occupancy: 
fθ : R3 ×X →[0, 1] (2)
We call this network the Occupancy Network.  Section 3.1 of Mescheder.
To learn the parameters θ of the neural network fθ(p, x), we randomly sample points in the 3D bounding volume of the object under consideration: for the i-th sample in a training batch we sample K points pij ∈ R3, j = 1, . . . , K. We then evaluate the mini-batch loss LB at those locations:

    PNG
    media_image1.png
    188
    735
    media_image1.png
    Greyscale

Here, xi is the i’th observation of batch B, oij ≡ o(pij) de- notes the true occupancy at point pij, and L(·, ·) is a cross- entropy classiﬁcation loss. The performance of our method depends on the sampling scheme that we employ for drawing the locations pij that are used for training. In practice, we found that sampling uniformly in- side the bounding box of the object with an additional small padding yields the best results. 
Our 3D representation can also be used for learning probabilistic latent variable models. Towards this goal, we introduce an encoder network gψ(·) that takes locations pij and occupancies oij as input and predicts mean μψ and standard deviation σψ of a Gaussian distribution qψ(z|(pij, oij)j=1:K) on latent z ∈ RL as output.  We optimize a lower bound [21, 39, 59] to the negative log-likelihood of the generative model
p((oij )j=1:K |(pij )j=1:K ):




    PNG
    media_image2.png
    211
    730
    media_image2.png
    Greyscale

where KL denotes the KL-divergence, p0(z) is a prior dis- tribution on the latent variable zi (typically Gaussian) and zi is sampled according to qψ(zi|(pij, oij)j=1:K).  Section 3.2 of Mescheder
The occupancy function represents a set of classes and the classes being  any function that takes an observation x ∈ X as input and has a function from p ∈ R3 to R as output can be equivalently described by a function that takes a pair (p, x) ∈ R3 × X as input and outputs a real number.
Applicant argues that the combination of Ma, Mescheder, and Hur does not explicitly teach an object is recovered based on an input.
In response, the Examiner respectfully disagrees. Mescheder teaches when using such a network for 3D reconstruction of an object based on observations of that object (e.g., image, point cloud, etc.), we must condition it on the input. Fortunately, we can make use of the following simple functional equivalence: a function that takes an observation x ∈ X as input and has a function from p ∈ R3 to R as output can be equivalently described by a function that takes a pair (p, x) ∈ R3 × X as input and outputs a real number. The latter representation can be simply parameterized by a neural network fθ that takes a pair (p, x) as input and outputs a real number which represents the probability of occupancy: 
fθ : R3 ×X →[0, 1] (2)
We call this network the Occupancy Network.  Section 3.1 of Mescheder.

Applicant argues that the combination of Ma, Mescheder, and Hur does not explicitly teach aspects of the object have different amounts of refinement.
In response, the Examiner respectfully disagrees. Mescheder teaches We would like to reason about the occupancy not only at fixed discrete 3D locations (as in voxel representations) but at every possible 3D point p ∈ R3. We call the resulting function
o : R3 → {0, 1} (1)
the occupancy function of the 3D object. Our key insight is that we can approximate this 3D function with a neural network that assigns to every location p ∈ R3 an occupancy probability between 0 and 1. Note that this network is equivalent to a neural network for binary classification, except that we are interested in the decision boundary which implicitly represents the object’s surface. When using such a network for 3D reconstruction of an object based on observations of that object (e.g., image, point cloud, etc.), we must condition it on the input. Fortunately, we can make use of the following simple functional equivalence: a function that takes an observation x ∈ X as input and has a function from p ∈ R3 to R as output can be equivalently described by a function that takes a pair (p, x) ∈ R3 × X as input and outputs a real number. The latter representation can be simply parameterized by a neural network fθ that takes a pair (p, x) as input and outputs a real number which represents the probability of occupancy: 
fθ : R3 ×X →[0, 1] (2)
We call this network the Occupancy Network.  Section 3.1 of Mescheder.
To learn the parameters θ of the neural network fθ(p, x), we randomly sample points in the 3D bounding volume of the object under consideration: for the i-th sample in a training batch we sample K points pij ∈ R3, j = 1, . . . , K. We then evaluate the mini-batch loss LB at those locations:

    PNG
    media_image1.png
    188
    735
    media_image1.png
    Greyscale

Here, xi is the i’th observation of batch B, oij ≡ o(pij) de- notes the true occupancy at point pij, and L(·, ·) is a cross- entropy classiﬁcation loss. The performance of our method depends on the sampling scheme that we employ for drawing the locations pij that are used for training. In practice, we found that sampling uniformly in- side the bounding box of the object with an additional small padding yields the best results. 
Our 3D representation can also be used for learning probabilistic latent variable models. Towards this goal, we introduce an encoder network gψ(·) that takes locations pij and occupancies oij as input and predicts mean μψ and standard deviation σψ of a Gaussian distribution qψ(z|(pij, oij)j=1:K) on latent z ∈ RL as output.  We optimize a lower bound [21, 39, 59] to the negative log-likelihood of the generative model
p((oij )j=1:K |(pij )j=1:K ):




    PNG
    media_image2.png
    211
    730
    media_image2.png
    Greyscale

where KL denotes the KL-divergence, p0(z) is a prior dis- tribution on the latent variable zi (typically Gaussian) and zi is sampled according to qψ(zi|(pij, oij)j=1:K).  Section 3.2 of Mescheder
For extracting the isosurface corresponding to a new ob- servation given a trained occupancy network, we introduce Multiresolution IsoSurface Extraction (MISE), a hierarchi- cal isosurface extraction algorithm (Fig. 2). By incremen- tally building an octree [30, 46, 66, 73], MISE enables us to extract high resolution meshes from the occupancy network without densely evaluating all points of a high-dimensional occupancy grid.
We ﬁrst discretize the volumetric space at an initial reso- lution and evaluate the occupancy network fθ(p, x) for all p in this grid. We mark all grid points p as occupied for which fθ(p, x) is bigger or equal to some threshold2 τ . Next, we mark all voxels as active for which at least two adjacent grid points have differing occupancy predictions. These are the voxels which would intersect the mesh if we applied the marching cubes algorithm at the current resolution. We subdivide all active voxels into 8 subvoxels and evaluate all new grid points which are introduced to the occupancy grid through this subdivision. We repeat these steps until the desired ﬁnal resolution is reached. At this ﬁnal resolution, we apply the Marching Cubes algorithm [44] to extract an approximate isosurface
{p ∈ R3 | fθ(p, x) = τ }.	(5)
Our algorithm converges to the correct mesh if the occu- pancy grid at the initial resolution contains points from ev- ery connected component of both the interior and the ex- terior of the mesh. It is hence important to take an initial resolution which is high enough to satisfy this condition. In practice, we found that an initial resolution of 323 was sufﬁcient in almost all cases.
The initial mesh extracted by the Marching Cubes algorithm can be further reﬁned. In a ﬁrst step, we simplify the mesh using the Fast-Quadric-Mesh-Simpliﬁcation algo- rithm3 [20]. Finally, we reﬁne the output mesh using ﬁrst and second order (i.e., gradient) information. Towards this goal, we sample random points pk from each face of the output mesh and minimize the loss

    PNG
    media_image3.png
    127
    726
    media_image3.png
    Greyscale


where n(pk) denotes the normal vector of the mesh at p . In practice, we set λ = 0.01.  Minimization of the second term in (6) uses second order gradient information and can be efficiently implemented using Double-Bakpropagation.  Section 3.3 of Mescheder.
The refinement uses first and second order (i.e., gradient). The first and second gradient information varies based on the input. Random points pk from each face of the output mesh is sampled and the loss is minimized according the formula (6). The minimization of the second term in (6) uses second order gradient information. Because the sampled random points from each face vary and first gradient and second gradient are computed based on the sampled random points, first gradient and second gradient for each face of the output mesh vary as well. Thus, the amounts of refinements are different for each face of the output mesh.

Claim Rejections - 35 USC § 112

The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-2, 4, 6-12, 14, 16-22, 24, 26-30 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Specifically, claims 1, 11, and 21 recite the limitation “…the threshold is 0.50.”  This threshold is directed to the criterion whether data belongs inside or outside a 3D structure to define a surface of a volumetric representation.  The only place the specification discloses a threshold of 0.50 is in [0027] of US 2023/0013421 A1 (the publication of the application).  However, this threshold is the specification is directed to probabilities indicating whether the position is likely occupied or not.  It is not directed to whether or not data belongs inside or outside a 3D structure to define a surface of a volumetric representation.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 4, 6, 10-12, 14, 16,19-20, and 30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al. (US 2023/0075442 A1) in view of L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin and A. Geiger, "Occupancy Networks: Learning 3D Reconstruction in Function Space," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 4455-4465, doi: 10.1109/CVPR.2019.00459. (hereinafter “Mescheder”) and Hur et al. (US 2022/0159284 A1).
Consider claim 1, Ma teaches a method (fig. 2) programmed in a non-transitory memory of a device comprising: receiving a bitstream  ([0032] and Fig. 2); determining a probability of a position in the bitstream being occupied with the one or more occupancy networks ([0089], [0106] – [0111], and [0139]); the bitstream comprises one or more samples of a 3D space to be used to generate a 3D object with the one or more occupancy networks ([0062] – [0074], [0087], [0136] – [0137], and [0140]).
However, Ma does not explicitly teach one or more occupancy networks and generating a function based on the probability of positions being occupied.
Mescheder teaches one or more occupancy networks (Section 3.1) and generating a function based on the probability of positions being occupied (Section 3.1 and 3.2), wherein the one or more occupancy networks implicitly represent 3D surfaces using a continuous decision boundary based on a deep neural network classifier (Abstract, Section 3.1 – 3.3, and Fig. 2), and repetitively decide based on a threshold whether data belongs inside or outside a 3D structure to define a surface of a volumetric representation (Abstract, Section 3.1 – 3.3, and Fig. 2), wherein the threshold is approximately 0.50 (when sampling an equal number of points inside and outside the mesh, we implicitly tell the model that every object has a volume of 0.5.  Section 4.6); the function represents a set of classes (Section 3.1 and 3.2), and an object is recovered based on an input (Section 3.1 and 3.2), wherein aspects of the object have different amounts of refinement (Section 3.1 – Section 3.3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using occupancy networks because such incorporation would not be constrained by the discretization of the 3D space and be used to represent realistic high-resolution meshes.  Conclusion.
However, the combination of Ma and Mescheder does not explicitly teach progressively dividing the one or more samples of the 3D space into divisions of various sizes; and the divisions of various sizes enable outputting point clouds with varying degrees of detail.
Hur teaches progressively dividing the one or more samples of the 3D space into divisions of various sizes ([0105], [0292] – [0295], and Fig. 25), wherein the dividing is limited by processing power and the non-transitory memory of he device (Because components of the point cloud data device of Hur may be implemented as hardware, software, firmware, or a combination thereof including one or more processors coupled with memory, the dividing process taught by Hur is limited by the processing power and the memory of the device.  [0408]); and the divisions of various sizes enable outputting point clouds with varying degrees of detail ([0105], [0292] – [0295], and Fig. 25).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of progressively dividing the one or more samples of the 3D space into divisions of various sizes because such incorporation would help process point cloud data with high efficiency.  [0009].
	Consider claim 2, Ma teaches the bitstream comprises voxels, points, meshes, or projected images of 3D objects ([0045], [0058] – [0065]).
	Consider claim 4, Mescheder teaches the probability is determined using machine learning to implement implicit neural functions (Abstract, Section 3.1 and Section 3.2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using occupancy networks because such incorporation would not be constrained by the discretization of the 3D space and be used to represent realistic high-resolution meshes.  Conclusion.
	Consider claim 6, Ma teaches the probability is determined based neighboring position classification information ([0089] and [0139]).
Consider claim 10, Mescheder teaches a size of the function is smaller than the bitstream (Section 3.1, Section 3.2, and Section 3.3.  A size of the function with points p as occupied for which fθ(p, x) is bigger or equal to some threshold is smaller than the bitstream.  See Fig. 2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using occupancy networks because such incorporation would not be constrained by the discretization of the 3D space and be used to represent realistic high-resolution meshes.  Conclusion.
Consider claim 11, claim 11 recites an apparatus comprising: a non-transitory memory for storing an application ([0128] of Ma), the application for performing the method recited in claim 1 (see rejection for claim 1).
Consider claim 12, claim 12 recites the apparatus that performs the method recited in claim 2.  Thus, it is rejected for the same reasons.
Consider claim 14, claim 14 recites the apparatus that performs the method recited in claim 4.  Thus, it is rejected for the same reasons.
Consider claim 16, claim 16 recites the apparatus that performs the method recited in claim 6.  Thus, it is rejected for the same reasons.
Consider claim 20, claim 20 recites the apparatus that performs the method recited in claim 10.  Thus, it is rejected for the same reasons.
Consider claim 30, Mescheder teaches a point of a point cloud of the 3D object is marked active if at least two adjacent grid point have differing occupancy predictions (Section 3.3), and scalability is enabled by voxelizing a volumetric space of the volumetric representation an initial resolution (Section 3.3) and evaluating the one or more occupancy networks for all points in a grid (Section 3.3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using occupancy networks because such incorporation would not be constrained by the discretization of the 3D space and be used to represent realistic high-resolution meshes.  Conclusion.
Claim(s) 7 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al. (US 2023/0075442 A1) in view of L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin and A. Geiger, "Occupancy Networks: Learning 3D Reconstruction in Function Space," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 4455-4465, doi: 10.1109/CVPR.2019.00459. (hereinafter “Mescheder”), Hur et al. (US 2022/0159284 A1), and Zhang et al. (US 2022/0385907 A1).
Consider claim 7, the combination of Ma and Mescheder teaches all the limitations in claim 1but does not explicitly teach the probability is used by an entropy encoder to define a code length of an occupancy code of points in 3D space.
Zhang teaches the probability is used by an entropy encoder to define a code length of an occupancy code of points in 3D space ([0099] – [0103]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using the probability by an entropy encoder to define a code length of an occupancy code of points in 3D space because such incorporation would achieve more efficient coding due to the bitstream of variable length based on the accuracy of a prediction of code.  [0103].
Consider claim 17, Zhang teaches the probability is used by an entropy encoder to define a code length of an occupancy code of points in 3D space ([0099] – [0103]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using the probability by an entropy encoder to define a code length of an occupancy code of points in 3D space because such incorporation would achieve more efficient coding due to the bitstream of variable length based on the accuracy of a prediction of code.  [0103]. 
Claim(s) 8, 18, 21-22, 26, and 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al. (US 2023/0075442 A1) in view of L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin and A. Geiger, "Occupancy Networks: Learning 3D Reconstruction in Function Space," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 4455-4465, doi: 10.1109/CVPR.2019.00459. (hereinafter “Mescheder”), Hur et al. (US 2022/0159284 A1), and Peng, Songyou, et al. "Convolutional occupancy networks." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020. (hereinafter “Peng”).
Consider claim 8, the combination of Ma and Mescheder teaches all the limitations in claim 1 but does not explicitly teach the one or more occupancy networks learn the function to recover a specific shape based on a sparse input.
	Peng teaches the one or more occupancy networks learn the function to recover a specific shape based on a sparse input (Section 3.1 and 3.2).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of learning the function to recover a specific shape based on a sparse input because such incorporation would work well on synthetic scenes and allows for larger feature resolutions and save memory.  Conclusion.
Consider claim 18,	Peng teaches the one or more occupancy networks learn the function to recover a specific shape based on a sparse input (Section 3.1 and 3.2).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of learning the function to recover a specific shape based on a sparse input because such incorporation would work well on synthetic scenes and allows for larger feature resolutions and save memory.  Conclusion.
Consider claim 21, Ma teaches a system comprising: an encoder configured for: receiving a bitstream ([0032] and Fig. 2); determining a probability of a position in the bitstream being occupied with the one or more occupancy networks ([0089], [0106] – [0111], and [0139]); the bitstream comprises one or more samples of a 3D space to be used to generate a 3D object with the one or more occupancy networks ([0062] – [0074], [0087], [0136] – [0137], and [0140]).
However, Ma does not explicitly teach one or more occupancy networks and generating a function based on the probability of positions being occupied.
Mescheder teaches one or more occupancy networks (Section 3.1) and generating a function based on the probability of positions being occupied (Section 3.1 and 3.2), wherein the one or more occupancy networks implicitly represent 3D surfaces using a continuous decision boundary based on a deep neural network classifier (Abstract, Section 3.1 – 3.3, and Fig. 2), and repetitively decide based on a threshold whether data belongs inside or outside a 3D structure to define a surface of a volumetric representation (Abstract, Section 3.1 – 3.3, and Fig. 2), wherein the threshold is approximately 0.50 (when sampling an equal number of points inside and outside the mesh, we implicitly tell the model that every object has a volume of 0.5.  Section 4.6); wherein the probability is determined using machine learning to implement implicit neural functions (Abstract, Section 3.1 and Section 3.2), the function represents a set of classes (Section 3.1 and 3.2), and an object is recovered based on an input (Section 3.1 and 3.2), wherein aspects of the object have different amounts of refinement (Section 3.1 – Section 3.3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using occupancy networks because such incorporation would not be constrained by the discretization of the 3D space and be used to represent realistic high-resolution meshes.  Conclusion.
However, the combination of Ma and Mescheder does not explicitly teach recovering an object based on the function and an input
Peng teaches recovering an object based on the function and an input (Section 3.1 and 3.2).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of learning the function to recover a specific shape based on a sparse input because such incorporation would work well on synthetic scenes and allows for larger feature resolutions and save memory.  Conclusion.
However, the combination of Ma and Mescheder does not explicitly teach progressively dividing the one or more samples of the 3D space into divisions of various sizes; and the divisions of various sizes enable outputting point clouds with varying degrees of detail.
Hur teaches progressively dividing the one or more samples of the 3D space into divisions of various sizes ([0105], [0292] – [0295], and Fig. 25); and the divisions of various sizes enable outputting point clouds with varying degrees of detail ([0105], [0292] – [0295], and Fig. 25).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of progressively dividing the one or more samples of the 3D space into divisions of various sizes because such incorporation would help process point cloud data with high efficiency.  [0009].
	Consider claim 22, Ma teaches the bitstream comprises voxels, points, meshes, or projected images of 3D objects ([0045], [0058] – [0065]).
	Consider claim 26, Ma teaches the probability is determined based neighboring position classification information ([0089] and [0139]).
Consider claim 28, Peng teaches the one or more occupancy networks learn the function to recover a specific shape based on a sparse input (Section 3.1 and 3.2).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of learning the function to recover a specific shape based on a sparse input because such incorporation would work well on synthetic scenes and allows for larger feature resolutions and save memory.  Conclusion.
Claim(s) 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al. (US 2023/0075442 A1) in view of L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin and A. Geiger, "Occupancy Networks: Learning 3D Reconstruction in Function Space," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 4455-4465, doi: 10.1109/CVPR.2019.00459. (hereinafter “Mescheder”), Hur et al. (US 2022/0159284 A1), Peng, Songyou, et al. "Convolutional occupancy networks." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020. (hereinafter “Peng”), and Zhang et al. (US 2022/0385907 A1).
Consider claim 27, the combination of Ma, Mescheder, and Peng teaches all the limitations in claim 1but does not explicitly teach the probability is used by an entropy encoder to define a code length of an occupancy code of points in 3D space.
Zhang teaches the probability is used by an entropy encoder to define a code length of an occupancy code of points in 3D space ([0099] – [0103]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of using the probability by an entropy encoder to define a code length of an occupancy code of points in 3D space because such incorporation would achieve more efficient coding due to the bitstream of variable length based on the accuracy of a prediction of code.  [0103]. 
Claim(s) 29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al. (US 2023/0075442 A1) in view of L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin and A. Geiger, "Occupancy Networks: Learning 3D Reconstruction in Function Space," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 4455-4465, doi: 10.1109/CVPR.2019.00459. (hereinafter “Mescheder”), Hur et al. (US 2022/0159284 A1), Wang et al. (US 2023/0216521 A1), and Cheong et al. (US 2015/0365674 A1).
Consider claim 29, the combination of Ma, Mescheder, and Hur teaches all the limitations in claim 1 but does not explicitly teach the bitstream comprises network coefficients and random samples of the 3D space.
Wang teaches the bitstream comprises network coefficient ([0050] – [0055], [0061], [0069] – [0071], [0114] – [0121]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of including network coefficients in the bitstream because such incorporation would allow memory space to be more efficiently used.  [0067].
Cheong teaches the bitstream comprises random samples of the 3D space ([0187]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of including random samples of the 3D space in the bitstream because such incorporation would improve performance by selecting the appropriate compression mode based on the data in the random access block.  [0112].
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAT CHI CHIO whose telephone number is (571)272-9563. The examiner can normally be reached Monday-Thursday 10am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JAMIE J ATALA can be reached at 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TAT C CHIO/Primary Examiner, Art Unit 2486
Read full office action
Prosecution Timeline

Show 16 earlier events
Jan 10, 2025
Response Filed
Feb 26, 2025
Final Rejection mailed — §103, §112
Apr 07, 2025
Response after Non-Final Action
May 20, 2025
Request for Continued Examination
May 28, 2025
Response after Non-Final Action
Jun 27, 2025
Non-Final Rejection mailed — §103, §112
Jul 26, 2025
Response Filed
Sep 26, 2025
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/216,206
Patent 12634427
SYSTEM AND METHOD FOR IMAGING A BODY OF A PERSON
2y 10m to grant Granted May 19, 2026
18/426,805
Patent 12634495
Encoding and Decoding Method, and Apparatus
2y 3m to grant Granted May 19, 2026
18/683,662
Patent 12627812
HISTORY-BASED RICE PARAMETER DERIVATIONS FOR WAVEFRONT PARALLEL PROCESSING IN VIDEO CODING
2y 2m to grant Granted May 12, 2026
18/769,745
Patent 12621477
OPTIMIZED POSITION AND CONNECTIVITY CODING FOR DUAL DEGREE MESH COMPRESSION
1y 9m to grant Granted May 05, 2026
18/504,284
Patent 12610128
METHOD AND SYSTEM FOR MEASURING COPLANAR POINT DISTANCES USING AN RGB-D CAMERA
2y 5m to grant Granted Apr 21, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

9-10
Expected OA Rounds
73%
Grant Probability
90%
With Interview (+17.1%)
3y 3m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 844 resolved cases by this examiner. Grant probability derived from career allowance rate.