Prosecution Insights
Last updated: April 19, 2026
Application No. 18/420,913

Method for Determining Pose of Target Object, and Computing Device Implementing the Same

Non-Final OA §102§103
Filed
Jan 24, 2024
Examiner
WAIT, CHRISTOPHER
Art Unit
2683
Tech Center
2600 — Communications
Assignee
Solomon Technology Corporation
OA Round
1 (Non-Final)
76%
Grant Probability
Favorable
1-2
OA Rounds
2y 4m
To Grant
90%
With Interview

Examiner Intelligence

Grants 76% — above average
76%
Career Allow Rate
303 granted / 399 resolved
+13.9% vs TC avg
Moderate +14% lift
Without
With
+13.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
12 currently pending
Career history
411
Total Applications
across all art units

Statute-Specific Performance

§101
11.2%
-28.8% vs TC avg
§103
43.4%
+3.4% vs TC avg
§102
23.3%
-16.7% vs TC avg
§112
17.7%
-22.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 399 resolved cases

Office Action

§102 §103
8DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statement (IDS) submitted on 1/24/24, 3/25/25 & 6/24/25 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement are being considered by the examiner. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1-2, 4-9, 11-14 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by: LIU YUAN ET AL: "Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images”, 11 November 2022 (2022-11-11), SPRINGER INTERNATIONAL PUBLISHING, XP047639542, ISBN: 978-3-031-19824-3 vol. 13692, pages 298-315, DOI: https://doi.org/ 10.1007/978-3-031-19824-3_18. Regarding claim 1. LIU discloses a method for determining a pose of a target object, the method to be implemented by a computing device that stores a database related to a specific type to which the target object belongs, the database including a plurality of template images each containing a reference object that belongs to the specific type (page 299, [2]: "Given input reference images of an arbitrary object with known poses, Gen6D is able to directly predict its object pose in any query images"), the template images corresponding respectively to different deflection angles that are relative to a reference angle in which the reference object is captured (page 302, [4]: "Given N, images of an object with known camera poses, called reference images, our target is to predict the pose of the object in a query image"; page 304, [4]: "To account for in-plane rotations, every reference image is rotated by N, predefined angles and all rotated versions are used in the element-wise product with the query image"), the method comprising: obtaining an input image that contains the target object belonging to the specific type (page 302, [4]: "our target is to predict the pose of the object in a query image"); selecting a matching image that best matches with the input image from among the template images in the database based on an appearance of the target object in the input image (page 300, [3]: "compare the query image with every reference image to produce similarity scores and select the reference image with highest similarity score"), wherein the matching image is one of the template images in which an angle of the reference object shown in the template image is closest to an angle of the target object shown in the input image (page 304, [2]: "Viewpoint selection aims to select a reference image whose viewpoint is the nearest to the query image"); performing a keypoint matching procedure based on the input image and the matching image, so as to identify a plurality of first feature points that are shown in the input image and that are related to the appearance of the target object, and a plurality of second feature points that are shown in the matching image and that respectively match with the first feature points (page 305, [5]: "to construct the features on these vertices, we first select N,, = 6 reference images that are near to the input pose. We extract feature maps on these selected reference images by a 2D CNN. Then, these feature maps are unprojected into the 3D volume and we compute the mean and variance of features among all reference images as features for volume vertices. For the query image, we also extract its feature map by the same 2D CNN, unproject feature map into the 3D volume using the input pose and concatenate the unprojected query features with the mean and variance of reference image features"); and generating a pose-determination result that indicates the pose of the target object based on relationships among the first feature points and the second feature points (page 305, [5]: "Finally, we apply a 3D CNN on the concatenated features of the volume to predict a pose residual to update the input pose"). Regarding claim 2. LIU discloses the database further including a plurality of reference feature datasets that correspond respectively to the template images, each of the reference feature datasets indicating an appearance feature of the reference object at the angle shown in the corresponding one of the template images, wherein selecting a matching image includes: generating a target feature dataset that corresponds to the input image based on a plurality of feature parts of the target object shown in the input image, where the target feature dataset indicates an appearance feature of the target object at the angle shown in the input image; calculating, for each of the reference feature datasets, a degree of matching between the target feature dataset and the reference feature dataset; and selecting, as the matching image, one of the template images that corresponds to the reference feature dataset having a highest degree of matching with the target feature dataset among the template images (page 300, [3] “use neural networks to pixel-wisely compare the query image with every reference image to produce similarity scores and select the reference image with highest similarity score. This pixel-wise comparison enables our selector to concentrate on object regions and reduces the influence of cluttered background. Furthermore, we add global normalization layers and self-attention layers to share similarity information cross different reference images. These two kinds of layers enable every reference images to commute with each other, which provides context information for the selector to select the most similar reference image”; page 304, [2,3,4] “Viewpoint selection aims to select a reference image whose viewpoint is the nearest to the query image. Meanwhile, we will estimate an in-plane rotation between the query image and the selected reference image. We approximately regard the viewpoint of the selected reference image as the viewpoint of the query image, which along with the estimated in-plane rotation forms an initial rotation for the object pose. As shown in Fig. 5, we design a viewpoint selector to compare the query image with every reference image to compute similarity scores. Specifically, we first extract feature maps by applying a VGG [52]-11 on reference images and the query image. Then, for every feature map of reference images, we compute its element-wise product with the feature map of the query image to produce a correlation score map. Finally, the correlation score map is processed by a similarity network to produce a similarity score and a relative in-plane rotation to align the query image with the reference image. In our viewpoint selector, we have three special designs. In-Plane Rotation. To account for in-plane rotations, every reference image is rotated by Na predefined angles and all rotated versions are used in the element-wise product with the query engine”). Regarding claim 4. LIU discloses the template images including an original template image and a plurality of produced template images, the method further comprising, before obtaining an input image and selecting a matching image: obtaining the original template image; generating the reference feature dataset that corresponds to the original template image based on the original template image; generating the produced template images by rotating the original template image multiple times, respectively; and generating the reference feature datasets that correspond respectively to the produced template images based on the produced template images (page 302, [4] “Given Nr images of an object with known camera poses, called reference images, our target is to predict the pose of the object in a query image. The object pose here means a translation t and a rotation R that transform the object coordinate xobj to the camera coordinate xcam = Rxobj + t. All the intrinsics parameters of images are already known”; page 304, [2,3,4] “Viewpoint selection aims to select a reference image whose viewpoint is the nearest to the query image. Meanwhile, we will estimate an in-plane rotation between the query image and the selected reference image. We approximately regard the viewpoint of the selected reference image as the viewpoint of the query image, which along with the estimated in-plane rotation forms an initial rotation for the object pose. As shown in Fig. 5, we design a viewpoint selector to compare the query image with every reference image to compute similarity scores. Specifically, we first extract feature maps by applying a VGG [52]-11 on reference images and the query image. Then, for every feature map of reference images, we compute its element-wise product with the feature map of the query image to produce a correlation score map. Finally, the correlation score map is processed by a similarity network to produce a similarity score and a relative in-plane rotation to align the query image with the reference image. In our viewpoint selector, we have three special designs. In-Plane Rotation. To account for in-plane rotations, every reference image is rotated by Na predefined angles and all rotated versions are used in the element-wise product with the query engine”). Regarding claim 5. LIU discloses wherein generating the produced template images includes generating the produced template images each by rotating the original template image at the corresponding one of the deflection angles that corresponds to the produced template images (page 302, [4] “Given Nr images of an object with known camera poses, called reference images, our target is to predict the pose of the object in a query image. The object pose here means a translation t and a rotation R that transform the object coordinate xobj to the camera coordinate xcam = Rxobj + t. All the intrinsics parameters of images are already known”; page 304, [2,3,4] “Viewpoint selection aims to select a reference image whose viewpoint is the nearest to the query image. Meanwhile, we will estimate an in-plane rotation between the query image and the selected reference image. We approximately regard the viewpoint of the selected reference image as the viewpoint of the query image, which along with the estimated in-plane rotation forms an initial rotation for the object pose. As shown in Fig. 5, we design a viewpoint selector to compare the query image with every reference image to compute similarity scores. Specifically, we first extract feature maps by applying a VGG [52]-11 on reference images and the query image. Then, for every feature map of reference images, we compute its element-wise product with the feature map of the query image to produce a correlation score map. Finally, the correlation score map is processed by a similarity network to produce a similarity score and a relative in-plane rotation to align the query image with the reference image. In our viewpoint selector, we have three special designs. In-Plane Rotation. To account for in-plane rotations, every reference image is rotated by Na predefined angles and all rotated versions are used in the element-wise product with the query engine”). Regarding claim 6. LIU discloses the database further including a plurality of reference pose datasets that correspond respectively to the template images, each of the reference pose datasets indicating a pose of the reference object shown in the corresponding one of the template images, wherein: performing a keypoint matching procedure further includes generating a calibration dataset based on the first feature points and the second feature points, where the calibration dataset indicates the relationships among the first feature points and the second feature points, and the pose-determination result is generated based on the reference pose dataset that corresponds to the matching image, and on the calibration dataset (page 305, [4,5] “Specifically, since the objects are already normalized inside an unit sphere at the origin, we build a volume within the unit cube at the origin with S3 v = 323 vertices. As shown in Fig. 6 (a), to construct the features on these vertices, we first select Nn = 6 reference images that are near to the input pose. We extract feature maps on these selected reference images by a 2D CNN. Then, these feature maps are unprojected into the 3D volume and we compute the mean and variance of features among all reference images as features for volume vertices. For the query image, we also extract its feature map by the same 2D CNN, unproject feature map into the 3D volume using the input pose and concatenate the unprojected query features with the mean and variance of reference image features. Finally, we apply a 3D CNN on the concatenated features of the volume to predict a pose residual to update the input pose. Similarity Approximation . Instead of regressing the rigid pose residual directly, we approximate it with a similarity transformation, as shown in Fig. 6 (b). The approximate similarity transformation consists of a 2D in-plane offset, a scale factor and a residual 3D rotation. The reason of using this approximation is that it avoids direct regression of the 3D translation from the red circle to the solid green circle in Fig. 6, which is out of the scope of the feature volume. Instead, we regress a similarity transformation from red circle to dotted green circle, which can be easily inferred from the features defined in the volume. More details can be found in the supplementary materials. In our implementation, we apply the refiner iteratively 3 times by default.”). Regarding claim 7. LIU discloses wherein each one of the first feature points is homogeneous with one of the second feature points in a one-to-one relationship, and the calibration dataset indicates, for each one of the first feature points, the relationship between the first feature point and the respective one of the second feature points using six degrees of freedom in three-dimensional space (page 305, [4,5] “Specifically, since the objects are already normalized inside an unit sphere at the origin, we build a volume within the unit cube at the origin with S3 v = 323 vertices. As shown in Fig. 6 (a), to construct the features on these vertices, we first select Nn = 6 reference images that are near to the input pose. We extract feature maps on these selected reference images by a 2D CNN. Then, these feature maps are unprojected into the 3D volume and we compute the mean and variance of features among all reference images as features for volume vertices. For the query image, we also extract its feature map by the same 2D CNN, unproject feature map into the 3D volume using the input pose and concatenate the unprojected query features with the mean and variance of reference image features. Finally, we apply a 3D CNN on the concatenated features of the volume to predict a pose residual to update the input pose. Similarity Approximation . Instead of regressing the rigid pose residual directly, we approximate it with a similarity transformation, as shown in Fig. 6 (b). The approximate similarity transformation consists of a 2D in-plane offset, a scale factor and a residual 3D rotation. The reason of using this approximation is that it avoids direct regression of the 3D translation from the red circle to the solid green circle in Fig. 6, which is out of the scope of the feature volume. Instead, we regress a similarity transformation from red circle to dotted green circle, which can be easily inferred from the features defined in the volume. More details can be found in the supplementary materials. In our implementation, we apply the refiner iteratively 3 times by default.”; page 306, [1] “Discussion. The key difference between our volume-based refiner and other pose refiners [29,56,73] is that our pose refiner does not require rendering an image on the input pose, which thus is more suitable for the model-free pose estimation. Meanwhile, since the 3D volume is constructed by multiple reference images with different poses, our volume-based refiner is able to know the image features under different poses and infer how pose changes affect the image features for unseen objects. In comparison, previous pose refiners [29,56,73] only compare a rendered image with the input query image to compute a pose residual. Such a 2D image does not provide enough 3D structure information to infer how pose changes affect image patterns, especially for unseen objects. Thus, it is hard for these methods to predict correct pose residuals for unseen objects”; Page 312, [1] “we propose an easy-to-use 6-DoF pose estimator Gen6D for unseen objects. To predict poses for unseen objects, Gen6D does not require the object model but only needs some posed images of the object to predict its pose in arbitrary environments. In Gen6D, we design a novel viewpoint selector and a novel volume-based pose refiner. Experiments demonstrate the superior performance of Gen6D estimator in predicting poses for unseen objects in the model-free setting”). Regarding claim 8. Claim 8 is rejected for the same reasons and rational as provided above for claim 1. Regarding claim 9. Claim 9 is rejected for the same reasons and rational as provided above for claim 2. Regarding claim 11. Claim 11 is rejected for the same reasons and rational as provided above for claim 4. Regarding claim 12. Claim 12 is rejected for the same reasons and rational as provided above for claim 5. Regarding claim 13. Claim 13 is rejected for the same reasons and rational as provided above for claim 6. Regarding claim 14. Claim 14 is rejected for the same reasons and rational as provided above for claim 7. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claim(s) 3 & 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over LIU as applied to claim 1 above, and further in view of US PG Pub 2022/0189049 to Watson et al. Regarding claim 3. LIU dose not disclose calculating a Minkowski distance between the target feature dataset and the reference feature dataset. However, Watson in the same art of pose calculation/determination, discloses calculating a Minkowski distance between the target feature dataset and the reference feature dataset (“other distance metrics may be implemented (e.g., l.sub.2 distance, Manhattan distance, Minkowski distance, Hamming distance, etc.)”, paragraph 61). Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to have modified LIU' s pose estimation to include: calculating a Minkowski distance between the target feature dataset and the reference feature dataset. It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to have modified LIU' s pose estimation by the teaching of Watson because of the following reasons: (a) During training of the model, a secondary depth estimation network that takes a single image input may be used to guide the training process to reduce overfitting, (paragraph 5 Watson) and (b) for one of ordinary skill in the art it would be a simple substitution of one known element, Minkowski distance, for another distance metric to obtain predictable results. Regarding claim 10. Claim 10 is rejected for the same reasons and rational as provided above for claim 3. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US PG Pub 20130322767 to CHAO et al. discloses a method for estimating camera pose includes: obtaining an image of a location captured via a camera, where the image includes a target object and edge line features outside of the target object; and calculating a pose of the camera with respect to the target object based on the edge line features. Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER D. WAIT, Esq. whose telephone number is (571)270-5976. The examiner can normally be reached Monday-Friday, 9:30- 6:00. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abderrahim Merouan can be reached at 571 270-5254. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. CHRISTOPHER D. WAIT, Esq. Primary Examiner Art Unit 2683 /CHRISTOPHER WAIT/Primary Examiner, Art Unit 2683
Read full office action

Prosecution Timeline

Jan 24, 2024
Application Filed
Dec 27, 2025
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597085
Use of Imperfect Patterns to Encode Data on Surfaces
2y 5m to grant Granted Apr 07, 2026
Patent 12591964
COMPUTATIONAL METHOD AND SYSTEM FOR IMPROVED IDENTIFICATION OF BREAST LESIONS
2y 5m to grant Granted Mar 31, 2026
Patent 12590797
METHOD TO REQUALIFY DIE AFTER STORAGE
2y 5m to grant Granted Mar 31, 2026
Patent 12586148
EFFICIENT IMAGE WARPING BASED ON USER INPUT
2y 5m to grant Granted Mar 24, 2026
Patent 12585906
IMAGE FORMING APPARATUS
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
90%
With Interview (+13.6%)
2y 4m
Median Time to Grant
Low
PTA Risk
Based on 399 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month