Last updated: April 19, 2026
Application No. 18/311,024
SYSTEMS AND METHODS FOR VISUOTACTILE OBJECT POSE ESTIMATION WITH SHAPE COMPLETION

Non-Final OA §103§112
Filed
May 02, 2023
Examiner
ESQUINO, CALEB LOGAN
Art Unit
2677
Tech Center
2600 — Communications
Assignee
Honda Motor Co. Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +41.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 16 resolved cases, 2023–2026
Examiner Intelligence

ESQUINO, CALEB LOGAN View full profile →
Grants 69% — above average
Career Allow Rate
11 granted / 16 resolved
+6.8% vs TC avg
Strong +42% interview lift
Without
With
+41.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
27 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
6.1%
-33.9% vs TC avg
§103
55.8%
+15.8% vs TC avg
§102
17.2%
-22.8% vs TC avg
§112
18.6%
-21.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases
Office Action

§103 §112
DETAILED ACTION
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on March 5th, 2026 has been entered.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
	Applicant’s arguments regarding the 112(a) and 112(b) rejections of claims 1-20 have been considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, claims 3, 10, and 17 also include the claim language of “conditional” input.
Applicant's arguments regarding independent claims 1, 8, and 15 and dependent claim 6, filed March 5th, 2026 have been fully considered but they are not persuasive.
On page 10, paragraph 2 of “Remarks”, applicant alleges that "Cai does not disclose receiving tactile sensor data, does not generate voxel grids from tactile-derived information, and does not encode tactile information into a latent representation used for shape completion." Cai solely is not relied upon for these teachings. Instead, Cai in view of Dikhale and Herman is relied upon, as their combined teachings suggest that a point cloud derived from tactile information from force or pressure measurements could be transformed into a voxel grid, which could then be used to encode information into a complete latent space for completion.
On page 10, paragraph 2 of “Remarks”, applicant alleges that "Cai does not teach or suggest the amended claims' use of visual features as additional input to guide latent-space completion of a visuotactile volumetric representation." Examiner respectfully disagrees. Cai discloses using multiple partial point clouds to create a completed point cloud. Cai does not disclose exactly where these partial point clouds would have originated from. However, Cai does disclose that their method aims to complete tasks such as robotics navigation and shape classification (Cai Section 1 "Point cloud completion aims at estimating the corresponding complete point cloud of a partial point cloud, which is an important task and can assist downstream applications such as shape classification [17,26–28,34], robotics navigation [12, 31] and scene understanding [1, 2, 10, 19], as raw point clouds are often noisy, sparse and partial.") which would suggest that their invention could be applied to a camera system which would use visual features to create at least one point cloud. Therefore, the system of Cai uses visual features as additional input, and the rejection is maintained.
On page 10, paragraph 3 through page 11 paragraph 1 of “Remarks”, applicant alleges that "Incorporating Dikhale's point-level tactile fusion into Cai's visual-only latent completion framework would require substantial redesign of Cai's encoder architecture and training regime, which is neither taught nor suggested by either reference and would not constitute a predictable modification." Examiner respectfully disagrees. Dikhale teaches creating a point cloud from visual and tactile sensor data. Cai teaches receiving as input multiple partial point clouds. Examiner believe there would be no significant change to the system of Cai to incorporate the point cloud of Dikhale, as the point cloud of Dikhale could be used as input to the system of Cai. Nowhere in Dikhale is it suggested that the created point cloud is of a different form, and a point cloud is merely a sparse 3D representation of real world data, of which Cai may take as input. Therefore, the rejection is maintained.
On page 11, paragraph 2 of “Remarks”, applicant alleges that "Herman does not disclose encoding voxel grids into a latent space, nor does Herman suggest using voxel grids generated from tactile sensor data to infer complete object shapes." Herman alone is not cited to teach encoding voxel grinds into a latent space or using voxel grids generated from tactile sensor data to infer complete object shapes. Instead, Herman is cited to show that it is a known technique to transform point clouds into voxel grids. This technique would then be used on the system of Cai in view of Dikhale to show that the input point clouds of Cai could be transformed into voxel grids and used to infer object shapes. Therefore, the rejection is maintained.
On page 11, paragraph 3 of “Remarks”, applicant alleges that "The Office Action's proposed combination therefore relies on treating the cited references as interchangeable building blocks, but the amended independent claims require a specific ordering and integration of operations that the references do not disclose. In particular, the claims require that tactile-derived information be incorporated into the volumetric voxel grid generated from the visual and tactile sensor data and encoded into latent space for shape completion. Neither Cai nor Dikhale teaches or suggests encoding tactile information into a latent space used for completing occluded object geometry, and Herman's voxelization does not address inference or completion at all. " Examiner respectfully disagrees. Dikhale teaches that a point cloud can be derived from visual and tactile data. This point cloud could then be used as input to the system of Cai and Herman, and would therefore include the visual and tactile data. Then, any further processing performed on this point cloud would be based on the visual and tactile data, and therefore include the tactile information which would be encoded into the laten space and used for completing occluded object geometry. Therefore, the rejection is maintained.
On page 11, paragraph 3 of “Remarks”, applicant alleges that "Arriving at the claimed architecture would require impermissible hindsight reconstruction rather than routine substitution or predictable variation." It must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
On page 12, paragraph 3 of “Remarks”, applicant alleges that "Accordingly, even if one were to assume arguendo that incorporating tactile-derived geometric information into a latent-space shape completion pipeline were obvious, the cited references, including Ge, provide no teaching or suggestion to encode force or pressure information indicative of contact interaction into the latent representation, as required by amended claim 6." Examiner respectfully disagrees. The system of Cai in view of Dikhale and Herman as a whole suggests that a point cloud created from tactile information derived from force or pressure measurements (as taught by Dikhale) could be transformed into a voxel grid (as taught by Herman) and then used to create a partial into complete latent space (as taught by Cai). This would then suggest that when encoding the voxel grid, tactile information derived from force or pressure measurements would be incorporated into the partial latent vector. Therefore, the rejection is maintained.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 3, 10, and 17 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Dependent claims 3, 10, and 17 describe using visual features extracted from the sensor data as conditional input. However, the corresponding description of a condition is never provided. Paragraph [0053] describes how one or more visual features is input as conditional input for the generator, but never describes what condition would be checked to decide whether or not a visual feature should be used as input
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3, 10, and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. For the same reasons as above, it is unclear what condition needs to be met for input in dependent claims 3, 10, and 17.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 8-10, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over “Learning a Structured Latent Space for Unsupervised Point Cloud Completion” (herein after referred to by its primary author, Cai) in view of US20220084241 (herein after referred to by its primary author, Dikhale) and US20200167956 (herein after referred to by its primary author, Herman).
In regards to claim 1, Cai teaches a system for Cai Section 4 “We use 8 TITAN GPUs to implement our experiments.” Examiner note: GPUs are known to contain a processor and memory.) that when executed by the processor cause the processor to: receive visual sensor data for a visualized area of an object as at least one point cloud representation; (Cai Figure 2 “Input”) encode the input Cai Figure 2 “Input, Unified Latent Space”; Section 1 “Point cloud completion aims at estimating the corresponding complete point cloud of a partial point cloud, which is an important task and can assist downstream applications such as shape classification [17,26–28,34], robotics navigation [12, 31] and scene understanding [1, 2, 10, 19], as raw point clouds are often noisy, sparse and partial.”; Section 3.1 “Specifically, as illustrated in Figure 2 (b), we map any partial point cloud P into a complete shape code z ∈ Rd and a corresponding occlusion code o ∈ Rd via a point cloud encoder Ep [46] consisting of EdgeConv [40] layers.” Examiner note: This reference shows inputting a point cloud, then using that point cloud to map to a partial latent space, and then fusing that latent space with others to form a unified latent space, which is analogous to a complete latent space. While this reference uses point clouds, voxel grids and point clouds are considered analogous, as they are both 3D representations. Furthermore, Cai takes as input multiple partial point clouds, and describes in section 1 that this disclosure would be applied to systems which attempt to understand camera scenes. Therefore, at least one partial point cloud input into their system would be from visual data, which include visual features.) estimate a complete shape of the object based on the complete latent space, wherein the complete shape includes the visualized area of the object and an occluded area of the object; and estimate a six degrees of freedom (6D) pose of the object based on the complete latent vector. (Cai Figure 2 “Completed Point Cloud” Examiner note: A six degrees of freedom pose is a representation of the object, where its orientation and position are known. This reference shows a completed point cloud, the point cloud is a 3D representation of the object in space. Therefore it is a representation of the objects position and orientation, and is therefore analogous to a six degrees of freedom pose.)
Cai does not teach receiving tactile sensor data for a visualized area of an object as at least one point cloud representation, the tactile sensor data including force or pressure measurements; and transforming the at least one point cloud representation into an input voxel grid of the visualized area of the object, wherein the input voxel grid is a volumetric representation, and wherein transforming the at least one point cloud representation into the input voxel grid comprises generating the input voxel grid from the visual and tactile sensor data.
However, Dikhale teaches a system for visuotactile object pose estimation and shape completion; and receiving visual and tactile sensor data for a visualized area of an object as at least one point cloud representation, the tactile sensor data including force or pressure measurements (Dikhale Figure 3; Paragraph [0061] “The tactile data 114 may be received from the force sensor 206. The force sensor 206 may include tensile force sensors, compressions force sensors, tensile and force compression sensors, or other measurement components.”).
Dikhale is considered to be analogous to the claimed invention because they are both in the same field of visuotactile object pose estimation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cai to include the teachings of Dikhale, to provide the benefit of detecting occluded portions of an object grasped by a robot (Dikhale Paragraph [0017] “In the tactile-channel, the point cloud features from the depth image and the features from the tactile sensors are fused at a point level. Fusing the tactile point cloud with the point cloud from the depth image generates a surface point cloud, which allows the network to account for parts occluded by the robot's grippers. Moreover, tactile data also helps capture the object's surface geometry, otherwise self-occluded by the object.”)
Furthermore, Herman teaches transforming the at least one point cloud representation into an input voxel grid of the visualized area of the object, wherein the input voxel grid is a volumetric representation, and wherein transforming the at least one point cloud representation into the input voxel grid comprises generating the input voxel grid from the visual and tactile sensor data (Herman Paragraph [0005] “The processor is also configured to generate a 3D point cloud from the image data using a structure-from-motion algorithm. The processor is further configured to remove temporal varying objects from the point cloud using semantic segmentation. Also, the processor is configured to convert the point cloud to a voxel map” Examiner note: This reference teaches transforming a point cloud into a voxel map. When considered in combination with Cai in view of Dikhale, the transformed voxel map would include the data from the visual and tactile sensors, as the point cloud which it is transformed from would be created from the tactile sensor data as taught by Dikhale, and any further processing performed on this point cloud would include the tactile sensor data.)
Herman is considered to be analogous to the claimed invention because they are both in the same field of using 3D point clouds to model real world scenes. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cai in view of Dikhale to include the teachings of Herman to provide the advantage of reduced data transferred to a server, and a reduced file size. (Herman Paragraph [0055] “An algorithm may then convert the 3D point cloud into a voxel map 339 where key features may be identified. One effect of this step is to reduce the data transferred to a central server per each user. By converting temporally stable classified point cloud points into a voxel map (and later hashed), the process can dramatically reduce the file size.”)
In regards to claim 2, Cai in view of Dikhale and Herman teaches the system of claim 1, wherein the mapping is based on visual features extracted from the sensor data. (Cai Figure 2 “Partial Input” Examiner note: The mapping of the unified latent space is based on the input that is passed through the encoder, wherein the input are visual representations of the object.)
In regards to claim 8, Cai in view of Dikhale and Herman renders obvious the claim limitations as in the consideration of claim 1.
In regards to claim 9, Cai in view of Dikhale and Herman renders obvious the claim limitations as in the consideration of claims 2 and 8.
In regards to claim 10, Cai in view of Dikhale and Herman teaches the computer implemented method of claim 8, further comprising extracting visual features from the sensor data (Herman Paragraph [0005] “Also, the processor is configured to convert the point cloud to a voxel map, identify key voxel features“), wherein the predicting the complete latent vector is further based on the visual features as conditional input (Cai Figure 2 “Encoder and Partial Input”).
In regards to claim 15, Cai in view of Dikhale and Herman renders obvious the claim limitations as in the consideration of claim 1.
In regards to claim 16, Cai in view of Dikhale and Herman renders obvious the claim limitations as in the consideration of claims 2 and 15.
In regards to claim 17, Cai in view of Dikhale and Herman renders obvious the claim limitations as in the consideration of claims 10 and 15.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Cai in view of Dikhale and Herman, and further in view of US20200242736 (herein after referred to by its primary author, Liu)
In regards to claim 3, Cai in view of Dikhale and Herman teaches the system of claim 2, wherein the system of claim 1 includes an autoencoder which performs generation (Cai Figure 2 “Encoder” Examiner note: As can be seen with the output in respect to the partial inputs, the NN of this disclosure generates new point cloud data.), and wherein visual features of the sensor data are input into the generator as conditional input (Cai Figure 2 “Partial Input and Encoder”).
Cai in view of Dikhale and Herman does not teach the autoencoder having a generator.
However, Liu teaches the autoencoder having a generator. (Liu Paragraph [0033] “One or more encoders of a generator of the network can extract 408 a class-invariant latent representation corresponding to the target pose from the source image or class.”)
Liu is considered to be analogous to the claimed invention because they are both in the same field of using 3D point clouds to model real world scenes. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cai in view of Dikhale and Herman to include the teachings of Liu to provide the advantage of a system which can generate unknown objects, which in this case would be the occluded portions of an object. (Liu Paragraph [0024] “Using such a generator design, a class-invariant latent representation (e.g., an object pose) can be extracted using the content encoder, and a class-specific latent representation (e.g., an object appearance) can be extracted using the class encoder. By feeding the class latent code to the image decoder via the AdaIN layers, the class images are enabled to control the spatially invariant means and variances, while the content image determines the remaining information. At training time, the class encoder can learn to extract a class-specific latent representation from the images of the source classes. At testing or translation time, this generalizes to images of previously unseen class.”)
Claims 4-5, 11-12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Cai in view of Dikhale and Herman, and further in view of US20220058827 (herein after referred to by its primary author, Montserrat).
In regards to claim 4, Cai in view of Dikhale and Herman teaches the system of claim 1, but fails to teach a first neural network and a second neural network, and wherein the instructions further cause the processor to: provide the first neural network the complete latent vector to estimate a three-dimensional (3D) translation; and provide the second neural network the complete latent vector to estimate a 3D rotation in quaternion, wherein the pose is determined based on the 3D translation residual and the 3D rotation in the quaternion. 
However, Montserrat teaches a first neural network and a second neural network (Montserrat Figure 4A 460 & 470 Examiner note: These Linear Layers are considered separate networks, since they produce different outputs.), and wherein the instructions further cause the processor to: provide the first neural network the complete latent vector to estimate a three-dimensional (3D) translation residual (Montserrat Figure 4A 470; Paragraph [0026] “The output of the first fully-connected layer contains encoded pose parameters as a high-dimensional vector.” Examiner note: While this reference does not explicitly state a latent vector is provided to the linear layer, it can be inferred that since a NN is being used, and NN’s inherently perform latent space calculation, that the high dimensional vector of this disclosure is part of the latent space); and provide the second neural network the complete latent vector to estimate a 3D rotation in quaternion (Montserrat Figure 4A 460; Paragraph [0038] “In equation 1, p=[q|t] and phat=[qhat|that] are the target and estimation rotation quaternion and translation parameters, respectively.”), wherein the pose is determined based on the 3D translation residual and the 3D rotation in the quaternion. (Montserrat Paragraph [0026] “The initial pose estimate, intermediary pose estimates (e.g., refined pose estimates), and/or the final pose estimate may be expressed or defined as a combination of three-dimensional rotation parameters and three-dimensional translation parameters from the final fully-connected layer of the multi-view CNN 190.”)  
Montserrat is considered to be analogous to the claimed invention because they are both in the same field of determining an orientation and location of an object in an image. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cai in view of Dikhale and Herman to include the teachings of Montserrat to provide the advantage of pose estimation which can be refined iteratively (Montserrat Paragraph [0014] “The pose estimation system may continue to refine the pose via the single-view matching network any number of times or until the difference between the two most recently generated refined poses are sufficiently similar. The single-view matching network may stop the refinement process as being completed when the estimated rotation angle (as estimated by the single-view matching network) is below a threshold.”)
In regards to claim 5, Cai in view of Dikhale, Herman, and Montserrat teaches the system of claim 4, wherein the first neural network and the second neural network are also provided visual features extracted from the sensor data. (Montserrat Figure 4A “Rendered and Observed Image” Examiner note: The two linear layers are provided with the images features that have been passed through the previous convolution and linear layers.)
In regards to claim 11, Cai in view of Dikhale, Herman, and Montserrat renders obvious the claim limitations as in the consideration of claim 4.
In regards to claim 12, Cai in view of Dikhale, Herman, and Montserrat renders obvious the claim limitations as in the consideration of claim 5.
In regards to claim 18, Cai in view of Dikhale, Herman, and Montserrat renders obvious the claim limitations as in the consideration of claim 4.
Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Cai in view of Dikhale and Herman, and further in view of “Hand PointNet: 3D Hand Pose Estimation using Point Sets” (herein after referred to by its primary author, Ge).
In regards to claim 6, Cai in view of Dikhale and Herman teaches the system of claim 1, wherein the tactile sensor data comprises force or pressure measurements indicative of contact interaction with the object, and wherein encoding the input voxel grid into the partial latent vector encodes information indicative of contact force or pressure associated with the object (Dikhale Figure 3; Paragraph [0061] “The tactile data 114 may be received from the force sensor 206. The force sensor 206 may include tensile force sensors, compressions force sensors, tensile and force compression sensors, or other measurement components.” Examiner note: Cai in view of Dikhale and Herman teaches that the input point cloud could be derived from tactile sensor data, as taught by Dikhale. Then, any further processing performed on this point cloud, such as encoding it into a voxel grid or encoding the voxel grid into the partial latent vector, would still include the tactile sensor data).
Cai in view of Dikhale and Herman fails to teach wherein the at least one point cloud representation is normalized based on a centroid of the at least one point cloud representation and a farthest distance of the at least one point cloud representation from the centroid, and wherein the pose is a residual pose based on the centroid and the farthest distance.
However, Ge teaches wherein the at least one point cloud representation is normalized based on a centroid of the at least one point cloud representation and a farthest distance of the at least one point cloud representation from the centroid (Ge Page 8420 Equation 1 Examiner note: The point cloud in this reference is normalized based on p-obb, which is defined as the centroid of the point cloud, and Lobb which is the maximum edge of the OBB. The maximum edge of the OBB is analogous to the farthest distance, as they both represent the farthest distance between two points in the point cloud.), and wherein the pose is a residual pose based on the centroid and the farthest distance. (Ge Page 8419 Figure 3)  
Ge is considered to be analogous to the claimed invention because they are both in the same field of determining an orientation and location of an object in an image. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cai in view of Dikhale and Herman to include the teachings of Ge to provide the advantage of more consistent pose estimation when using widely varying inputs (Ge Page 8418 Section 1 “In order to make our method robust to variations in hand global orientations, we propose to normalize the sampled 3D points in an oriented bounding box without applying any additional network to transform the hand point cloud. The normalized point clouds with more consistent global orientations make the PointNet easier to learn 3D hand articulations.”)
In regards to claim 13, Cai in view of Dikhale, Herman, and Ge renders obvious the claim limitations as in the consideration of claim 6.
In regards to claim 19, Cai in view of Dikhale, Herman, and Ge renders obvious the claim limitations as in the consideration of claim 6.
Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Cai in view of Dikhale, Herman, and Ge, and further in view of US20230145048  (herein after referred to by its primary author, Walker).
In regards to claim 7, Cai in view of Dikhale, Herman, and Ge teaches the system of claim 6, but fails to teach an inverse operation based on the residual pose to calculate an absolute 6D pose. 
However, Walker teaches an inverse operation based on the residual pose to calculate an absolute 6D pose. (Walker Paragraph [0085] “The keypoints are normalized, as described in the Offline Phase: Keypoint Completion Training section.”; Paragraph [0086] “To obtain the final denormalized completed keypoints, the output keypoints can be denormalized, by undergoing the reverse procedure of the normalization, then reshaped into a 2D array, yielding Completed 2D Keypoints 402.”) 
Cai in view of Dikhale, Herman, and Ge can be seen as a base device, upon which the claimed invention improves by undoing the normalization applied in a preprocessing step. Walker teaches a method of reversing a normalization, thereby yielding a point cloud with points that are no longer defined by their reference to a certain point. One of ordinary skill in the art, before the effective filing date of the claimed invention, could see that applying the reverse normalization method of Walker to the system of Cai in view of Dikhale, Herman, and Ge, would yield the predictable result of a final pose estimation that is not referential to a centroid of a point cloud. This improved system would thereby be able to describe an angle and position of an object absolutely.
In regards to claim 14, Cai in view of Dikhale, Herman, Ge, and Walker renders obvious the claim limitations as in the consideration of claim 7.
In regards to claim 20, Cai in view of Dikhale, Herman, Ge, and Walker renders obvious the claim limitations as in the consideration of claim 7.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
“3D Shape Perception from Monocular Vision, Touch, and Shape Prior” teaches a method of combining visual and tactile sensor data to estimate the shape of an object.
“Gaussian Process-Based Active Exploration Strategies in Vision and Touch” gives an in depth overview of vision and touch based methods of estimating an objects shape.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CALEB LOGAN ESQUINO whose telephone number is (703)756-1462. The examiner can normally be reached M-Fr 8:00AM-4:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Bee can be reached at (571) 270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CALEB L ESQUINO/Examiner, Art Unit 2677                                                                                                                                                                                                        
/ANDREW W BEE/Supervisory Patent Examiner, Art Unit 2677
Read full office action
Prosecution Timeline

May 02, 2023
Application Filed
Jul 25, 2025
Non-Final Rejection — §103, §112
Oct 07, 2025
Response Filed
Dec 05, 2025
Final Rejection — §103, §112
Feb 09, 2026
Response after Non-Final Action
Mar 05, 2026
Request for Continued Examination
Mar 09, 2026
Response after Non-Final Action
Mar 13, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/710,516
Patent 12602924
Method for Semantic Localization of an Unmanned Aerial Vehicle
2y 5m to grant Granted Apr 14, 2026
17/960,741
Patent 12602813
DEEP APERTURE
2y 5m to grant Granted Apr 14, 2026
18/240,305
Patent 12541857
SYNTHESIZING IMAGES FROM THE PERSPECTIVE OF THE DOMINANT EYE
2y 5m to grant Granted Feb 03, 2026
18/069,842
Patent 12530787
TECHNIQUES FOR DIGITAL IMAGE REGISTRATION
2y 5m to grant Granted Jan 20, 2026
18/186,630
Patent 12518425
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER READABLE MEDIUM
2y 5m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+41.7%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allow rate.