Last updated: May 29, 2026
Application No. 18/475,720
BOUNDING BOX TRANSFORMATION FOR OBJECT DEPTH ESTIMATION IN A MULTI-CAMERA DEVICE

Non-Final OA §103
Filed
Sep 27, 2023
Priority
Aug 10, 2023 — GR 20230100669
Examiner
BUDISALICH, ANDREW STEVEN
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Snap Inc.
OA Round
3 (Non-Final)
Interview Optional

— +11.0% interview lift. Interview lift (+11.0%) is below the 15.0% threshold. A written response is recommended.
Based on 52 resolved cases, 2023–2026
Examiner Intelligence

BUDISALICH, ANDREW STEVEN View full profile →
Grants 79% — above average
Career Allowance Rate
41 granted / 52 resolved
+16.8% vs TC avg
Moderate +11% lift
Without
With
+11.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
22 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
5.3%
-34.7% vs TC avg
§103
93.5%
+53.5% vs TC avg
§112
1.2%
-38.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 52 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/06/2026 has been entered.

Status of Claims
Claims 1-20 are pending.

Response to Arguments
Applicant’s arguments, see p.9-18, filed 04/02/2026, with respect to the rejections of Claims 1-20 under 35 U.S.C. 103 have been fully considered but are moot in light of the newly presented analyses of the claims. The Request for Continued Examination has reset prosecution in which new grounds of rejection are presented below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 2, 6, 9, 10, 14, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 20190354746 A1) in view of Iqbal et al. (US 20210248772 A1) and Kim et al. (US 20230186512 A1).

Regarding Claim 1, Zhang teaches "A device comprising: a processor; a first image sensor; a second image sensor; and a memory storing instructions thereon, which, when executed by the processor, cause the device to perform operations comprising: obtaining a first image from the first image sensor; obtaining a second image from the second image sensor"; (Zhang, Paras. 30-31, teaches an apparatus comprising a processor and memory for storing instructions that are executed by the processor wherein the apparatus includes a first image sensor which is configured to captures the first image and a second image sensor configured to capture the second image);
"processing the first image with an object detector to identify coordinates of a first region of interest, the first region of interest indicating a position of an object depicted in the first image"; (Zhang, Paras. 6-7 and 59, teaches performing target object detection on a first image to obtain a first target region wherein key point detection may be performed on the first target region to obtain position information of the key points of the target object in the first image, i.e., process the first image with an object detector to identify coordinates of a first region of interest wherein the region indicates a position of the object in the first image);
"processing the area of the first image corresponding with the first region of interest with a landmark detector to determine two-dimensional (2-D) positional data of one or more landmarks associated with the object"; (Zhang, Para. 59, teaches key point detection being performed on the first target region to obtain position information of the key points of the target object in the first image wherein the key points include points at specified positions on the target object, i.e., process the area of the first image corresponding to the first region with a landmark detector as the key point detector to determine 2D positional data of the landmarks associated with the object);
"
"
"processing the area of the second image corresponding with the second region of interest with the landmark detector to determine 2-D positional data of one or more landmarks associated with the object"; (Zhang, Para. 59, teaches key point detection being performed on the second target region to obtain position information of the key points of the target object in the second image wherein the key points include points at specified positions on the target object, i.e., process the area of the second image corresponding to the second region with a landmark detector as the key point detector to determine 2D positional data of the landmarks associated with the object).
However, Zhang does not explicitly teach "deriving, with a monocular pose estimator comprising a neural network with convolutional layers configured to extract hierarchical visual features from image data, first three dimensional (3-D) positional data of the one or more landmarks, using as input to the monocular pose estimator at least the first image, the 2-D positional data of the one or more landmarks, and parameters of the device; using the 3-D positional data of the one or more landmarks, determining coordinates of a second region of interest, the second region of interest indicating a position of the object as depicted in the second image; and using a triangulation calculation to derive second 3-D positional data for the one or more landmarks using as input to the triangulation calculation i) the 2-D positional data of the one or more landmarks determined from processing the area of the first image corresponding with the first region of interest, ii) the 2-D positional data of the one or more landmarks determined from processing the area of the second image corresponding with the second region of interest, and iii) parameters of the device”.
In an analogous field of endeavor, Iqbal teaches "deriving, with a monocular pose estimator comprising a neural network with convolutional layers configured to extract hierarchical visual features from image data, first three dimensional (3-D) positional data of the one or more landmarks, using as input to the monocular pose estimator at least the first image, the 2-D positional data of the one or more landmarks, and parameters of the device"; (Iqbal, FIG. 2 and Abstract and Paras. 37-42, 46-49, and 74-75, teaches estimating a 3D pose of a body or object from a single 2D image wherein a convolutional neural network infers a 2.5D pose from a given input image which the final output is the 3D reconstruction as the scale normalized 3D pose which includes 3D locations of body joints with respect to the camera wherein the 2.5D pose representation has several key features, i.e., monocular pose estimator being the estimation of 3D pose from a single image comprises a neural network with convolutional layers to extract hierarchical visual features from the image data being the CNN which infers 2.5D pose with key features to derive 3D position data of landmarks being the 3D pose including 3D locations of joints, and wherein an RGB image is input to the neural network to output 2D heatmaps and depth maps which are converted to a vector of 2D pose coordinates wherein the 2D pose coordinates, relative depths, and intrinsic camera parameters are given and used to reconstruct the 3D pose using the second layer of the network, i.e., input the image, 2D position data of the landmarks being the 2D pose coordinates, and device parameters being the intrinsic camera parameters to the monocular pose estimator network).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Zhang by including the derivation of 3D positional data of the one or more landmarks using a monocular pose estimator comprising convolutional layers and using an image, 2D position data, and device parameters as input taught by Iqbal. One of ordinary skill in the art would be motivated to combine the references since it improves the generalizability of the model (Iqbal, Para. 71, teaches the motivation of combination to be to improve generalizability of the models to previously unseen environments).
However, the combination of references of Zhang in view of Iqbal does not explicitly teach "using the 3-D positional data of the one or more landmarks, determining coordinates of a second region of interest, the second region of interest indicating a position of the object as depicted in the second image; and using a triangulation calculation to derive second 3-D positional data for the one or more landmarks using as input to the triangulation calculation i) the 2-D positional data of the one or more landmarks determined from processing the area of the first image corresponding with the first region of interest, ii) the 2-D positional data of the one or more landmarks determined from processing the area of the second image corresponding with the second region of interest, and iii) parameters of the device”.
In an analogous field of endeavor, Kim teaches "using the 3-D positional data of the one or more landmarks, determining coordinates of a second region of interest, the second region of interest indicating a position of the object as depicted in the second image"; (Kim, Paras. 10, 53, and 68, teaches obtaining a second region of interest from the second image based on the first skeleton data including 3D position coordinates of a key point to 2D position coordinates on the second image and identifying a block including the 2D position coordinates in the second image as the second region of interest wherein the regions of interest comprise an object and at least one key point of the object, i.e., use 3D positional data of the landmarks to determine coordinates of a second region of interest indicating the position of the object in the second image);
"and using a triangulation calculation to derive second 3-D positional data for the one or more landmarks using as input to the triangulation calculation i) the 2-D positional data of the one or more landmarks determined from processing the area of the first image corresponding with the first region of interest, ii) the 2-D positional data of the one or more landmarks determined from processing the area of the second image corresponding with the second region of interest, and iii) parameters of the device"; (Kim, FIG. 7 and Para. 73, teaches a triangulation method used to obtain 3D position coordinates by using two 2D position coordinates wherein the 2D position coordinates used are projected onto the plane of the first image and the 2D position coordinates obtained from the second image and wherein information about the relative position between the first camera and the second camera as well as their depth and focal length are used, i.e., triangulation calculation to derive second 3D position data of the key points using input of the 2D position data of the key points from the regions of interest of both the first and second image as well as parameters of the device). 
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Zhang and Iqbal, by including the triangulation of 3D position data of landmarks using 2D position data of landmarks of a first region of a first image with 2D position data of landmarks of a second region of a second image with parameters of the device taught by Kim. One of ordinary skill in the art would be motivated to combine the references since it reduces power consumption (Kim, Para. 6, teaches the motivation of combination to be to reduce the amount of power consumed when recognizing hand pose and gesture by a plurality of cameras).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.
 
Regarding Claim 2, the combination of references of Zhang in view of Iqbal and Kim teaches "The device of claim 1, further comprising: a display device"; (Zhang, Para. 172, teaches projecting the video or the picture onto a display screen, i.e., a display device);
"wherein the object is a hand, and the memory is storing additional instructions thereon, which, when executed by the processor, cause the device to perform additional operations comprising: tracking the position and the orientation of the hand using the second 3-D positional data for the one or more landmarks"; (Kim, Paras. 36, 44, and 115, teaches the object to be tracked being a user's hand wherein a key point or feature point may be included in the joints and wherein a 3D pose of the hand is recognized using the tracking technology for detecting the moving joints in which 3D hand joint coordinates are taken into account for stable hand position tracking and gesture recognition, i.e., the object is a hand wherein the position and orientation of the hand using 3D data for the landmarks).
The proposed combination as well as the motivation for combining the Zhang in view of Iqbal and Kim references presented in the rejection of Claim 1, applies to claim 2. Thus, the device recited in claim 2 is met by Zhang in view of Iqbal and Kim.

Regarding Claim 6, the combination of references of Zhang in view of Iqbal and Kim teaches "The device of claim 1, wherein determining the coordinates of the second region of interest using the 3-D positional data of the one or more landmarks, comprises: computing the smallest rectangle that encloses all of the one or more landmarks after projecting the landmarks from a coordinate system associated with the first image and first image sensor, to a coordinate system associated with the second image and second image sensor"; (Zhang, Paras. 114-115, 123, and 180, teaches the mapping position information of the key points in the second image being obtained according to the position information of the determined key points in the first image wherein the smallest rectangles according to the key points are determined after image pre-processing and frame selection operations are performed as well as key point detection with coordinate extraction from the face images, i.e., computing the smallest rectangle enclosing the landmarks after projecting the landmarks from a first image coordinate system to a second image coordinate system).

Claim 9 recites a method with steps corresponding to the elements of the system recited in Claim 1. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal and Kim references, presented in rejection of Claim 1, apply to this claim.

Claim 10 recites a method with steps corresponding to the elements of the system recited in Claim 2. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal and Kim references, presented in rejection of Claim 1, apply to this claim.

Claim 14 recites a method with steps corresponding to the elements of the system recited in Claim 6. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal and Kim references, presented in rejection of Claim 1, apply to this claim.

Claim 17 recites a system with elements corresponding to the steps recited in Claim 1. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal and Kim references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Zhang in view of Iqbal and Kim references discloses a processor, a memory, and image sensors (for example, see Zhang, Paragraphs 30-31).

Claim 18 recites a system with elements corresponding to the steps recited in Claim 2. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal and Kim references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Zhang in view of Iqbal and Kim references discloses a processor, a memory, and image sensors (for example, see Zhang, Paragraphs 30-31).

Claims 3, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Iqbal, Kim, and Uguz et al. (US 12333760 B1).

Regarding Claim 3, the combination of references of Zhang in view of Iqbal and Kim does not explicitly teach "The device of claim 1, wherein deriving, with the monocular pose estimator, the first 3-D positional data of the one or more landmarks, further comprises: using as input to the monocular pose estimator a reference measurement representing an estimated length or distance between two specific landmarks; or using as input to the monocular pose estimator an estimated size of the object".
In an analogous field of endeavor, Uguz teaches "The device of claim 1, wherein deriving, with the monocular pose estimator, the first 3-D positional data of the one or more landmarks, further comprises: using as input to the monocular pose estimator a reference measurement representing an estimated length or distance between two specific landmarks; or using as input to the monocular pose estimator an estimated size of the object"; (Uguz, Col. 6 lines 55-60, Col. 9 lines 22-25, Col. 16 lines 40-45, and Col. 18 lines 35-41, teaches the input and calibration of the 3D pose estimation system being based on the lengths of the subject's body segments and combined shape wherein the pose estimation is monocular and wherein the 2D key points are visually overlaid on the body for the 3D reconstruction and to estimate the metrics, i.e., use length between landmarks or size/shape/volume as input to a monocular pose estimator).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Zhang in view of Iqbal and Kim by including the use of a length between landmarks as input to the pose estimator taught by Uguz. One of ordinary skill in the art would be motivated to combine the references since it improves accuracy (Uguz, Col. 9 lines 22-44, teaches the motivation of combination to be to further improve the pose estimation accuracy).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 11 recites a method with steps corresponding to the elements of the system recited in Claim 3. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal, Kim, and Uguz references, presented in rejection of Claim 3, apply to this claim.

Claim 19 recites a system with elements corresponding to the steps recited in Claim 3. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal, Kim, and Uguz references, presented in rejection of Claim 3, apply to this claim.  Finally, the combination of the Zhang in view of Iqbal, Kim, and Uguz references discloses a processor, a memory, and image sensors (for example, see Zhang, Paragraphs 30-31).

Claims 4, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Iqbal, Kim, Uguz, and Ali et al. (US 20220277489 A1).

Regarding Claim 4, the combination of references of Zhang in view of Iqbal, Kim, and Uguz does not explicitly teach "The device of claim 3, wherein the estimated length or distance between two specific landmarks represents an estimated length of a bone having as endpoints the two specific landmarks, the estimated length derived from the second 3-D positional data".
In an analogous field of endeavor, Ali teaches "The device of claim 3, wherein the estimated length or distance between two specific landmarks represents an estimated length of a bone having as endpoints the two specific landmarks, the estimated length derived from the second 3-D positional data"; (Ali, Paras. 11 and 65, teaches the 3D mesh model of the object comprising 3D key points being used to calculated estimated lengths of 3D parts such as bone lengths and estimated orientations of the parts, i.e., estimated length represents estimated length of a bone having endpoints of two landmarks wherein the estimated length is derived from the 3D positional data of the mesh model).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Zhang in view of Iqbal, Kim, and Uguz wherein second 3D positional data is used by including the length between two landmarks is the estimated bone length having endpoints in which the estimated length is derived from the 3D positional data taught by Ali. One of ordinary skill in the art would be motivated to combine the references since it increases accuracy of the system (Ali, Para. 111, teaches the motivation of combination to be to increase an accuracy of the system).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

	Claim 12 recites a method with steps corresponding to the elements of the system recited in Claim 4. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal, Kim, Uguz, and Ali references, presented in rejection of Claim 4, apply to this claim.

	Claim 20 recites a system with elements corresponding to the steps recited in Claim 4. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang in view of Iqbal, Kim, Uguz, and Ali references, presented in rejection of Claim 4, apply to this claim.  Finally, the combination of the Zhang in view of Iqbal, Kim, Uguz, and Ali references discloses a processor, a memory, and image sensors (for example, see Zhang, Paragraphs 30-31).

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Iqbal, Kim, and Stanton et al. (US 20210236207 A1).

Regarding Claim 5, the combination of references of Zhang in view of Iqbal and Kim does not explicitly teach "The device of claim 1, wherein determining the coordinates of the second region of interest using the 3-D positional data of the one or more landmarks, comprises: using a rigid transformation matrix defined for the device to convert the 3-D positional data of the one or more landmarks from a coordinate system associated with the first image and first image sensor, to a coordinate system associated with the second image and second image sensor".
In an analogous field of endeavor, Stanton teaches "The device of claim 1, wherein determining the coordinates of the second region of interest using the 3-D positional data of the one or more landmarks, comprises: using a rigid transformation matrix defined for the device to convert the 3-D positional data of the one or more landmarks from a coordinate system associated with the first image and first image sensor, to a coordinate system associated with the second image and second image sensor"; (Stanton, Para. 36, teaches a first image dataset being registered to the common coordinate system as the second image dataset is the patient coordinate system wherein a rigid transformation may be performed to map each pixel of the first image data set into corresponding 3D coordinates of the common coordinate system, i.e., rigid transformation matrix to convert 3D position data from a coordinate system associated with a first image to a coordinate system associated with a second image).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Zhang, Iqbal, and Kim wherein the 3D position data includes landmarks and the first image is associated with a first sensor and the second image is associated with a second sensor by including the use of a rigid transformation matrix to convert 3D positional data of landmarks of a first image to coordinates of a second image taught by Stanton. One of ordinary skill in the art would be motivated to combine the references since it improves safety and ease of use (Stanton, Para. 2, teaches the motivation of combination to be to improve safety and ease of use of the system).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.
 
Claim 13 recites a method with steps corresponding to the elements of the system recited in Claim 5. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang, Iqbal, Kim, and Stanton references, presented in rejection of Claim 5, apply to this claim.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Iqbal, Kim, and Sarkar et al. (US 20200134056 A1).

Regarding Claim 7, the combination of references of Zhang in view of Iqbal and Kim does not explicitly teach "The device of claim 1, wherein determining the coordinates of the second region of interest using the 3-D positional data of the one or more landmarks, comprises: applying a scaling factor to the coordinates of the second region of interest that will enlarge the size of the second region of interest to account for inaccuracies that may have resulted from using the monocular pose estimator to derive the first 3-D positional data of the one or more landmarks".
In an analogous field of endeavor, Sarkar teaches "The device of claim 1, wherein determining the coordinates of the second region of interest using the 3-D positional data of the one or more landmarks, comprises: applying a scaling factor to the coordinates of the second region of interest that will enlarge the size of the second region of interest to account for inaccuracies that may have resulted from using the monocular pose estimator to derive the first 3-D positional data of the one or more landmarks"; (Sarkar, Para. 64, teaches expanding a size of the bounding box alone one or more dimensions in order to increase accuracy in the image search, i.e., apply a scaling factor to a region of interest being the bounding box to enlarge the size of the region to account for inaccuracy).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Zhang, Iqbal, and Kim wherein accuracy of the model is dependent upon monocular pose estimation deriving 3D positional data of landmarks by including the enlarging of a region of interest to account for inaccuracies taught by Sarkar. One of ordinary skill in the art would be motivated to combine the references since it increases accuracy and computational efficiency (Sarkar, Para. 5, teaches the motivation of combination to be to increase accuracy and computation efficiency of the model).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.
 
Claim 15 recites a method with steps corresponding to the elements of the system recited in Claim 7. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang, Iqbal, Kim, and Sarkar references, presented in rejection of Claim 7, apply to this claim.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Iqbal, Kim, and Cai et al. (US 20210165997 A1).

Regarding Claim 8, the combination of references of Zhang in view of Iqbal and Kim does not explicitly teach "The device of claim 1, wherein processing the area of the first image corresponding with the first region of interest with the landmark detector to determine 2-D positional data of one or more landmarks associated with the object comprises identifying a single representative landmark via which the object can be transformed".
In an analogous field of endeavor, Cai teaches "The device of claim 1, wherein processing the area of the first image corresponding with the first region of interest with the landmark detector to determine 2-D positional data of one or more landmarks associated with the object comprises identifying a single representative landmark via which the object can be transformed"; (Cai, Para. 7, teaches obtaining 2D coordinates of one key point of a target object to be processed wherein a 3D detection body of the target object is constructed according to the 2D coordinates of the of the one key point, i.e., a single representative landmark with 2D positional data used to transform an object to 3D).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Zhang, Iqbal, and Kim by including the identification of a landmark to transform the object taught by Cai. One of ordinary skill in the art would be motivated to combine the references since it improves accuracy in constructing the 3D body (Cai, Para. 86, teaches the motivation of combination to be to improve accuracy in constructing the 3D detection body for the target object).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.
 
Claim 16 recites a method with steps corresponding to the elements of the system recited in Claim 8. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding device claim.  Additionally, the rationale and motivation to combine the Zhang, Iqbal, Kim, and Cai references, presented in rejection of Claim 8, apply to this claim.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW STEVEN BUDISALICH whose telephone number is (703)756-5568. The examiner can normally be reached Monday - Friday 8:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ANDREW S BUDISALICH/Examiner, Art Unit 2662

/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Sep 27, 2023
Application Filed
Sep 18, 2025
Non-Final Rejection mailed — §103
Dec 17, 2025
Response Filed
Feb 06, 2026
Final Rejection mailed — §103
Apr 06, 2026
Response after Non-Final Action
May 06, 2026
Request for Continued Examination
May 07, 2026
Response after Non-Final Action
May 19, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/923,181
Patent 12626481
LIGHT FIELD RENDERING
3y 6m to grant Granted May 12, 2026
18/361,340
Patent 12620119
VIDEO ANALYTICS ACCURACY USING TRANSFER LEARNING
2y 9m to grant Granted May 05, 2026
18/190,728
Patent 12608970
GRAPH ALGORITHM AND MOTION CAPTURE FOR IMPROVING MANUFACTURING PROCESSES
3y 0m to grant Granted Apr 21, 2026
18/342,892
Patent 12602820
METHOD AND APPARATUS WITH ATTENTION-BASED OBJECT ANALYSIS
2y 9m to grant Granted Apr 14, 2026
18/038,197
Patent 12597106
METHOD AND APPARATUS FOR IDENTIFYING DEFECT GRADE OF BAD PICTURE, AND STORAGE MEDIUM
2y 10m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
79%
Grant Probability
90%
With Interview (+11.0%)
2y 8m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 52 resolved cases by this examiner. Grant probability derived from career allowance rate.