Last updated: April 19, 2026
Application No. 18/790,201
METHOD AND SYSTEM FOR ESTIMATING 3D CAMERA POSE BASED ON 2D IMAGE FEATURES AND APPLICATION THEREOF

Non-Final OA §101§103
Filed
Jul 31, 2024
Examiner
LIU, GORDON G
Art Unit
2618
Tech Center
2600 — Communications
Assignee
Edda Technology Inc.
OA Round
1 (Non-Final)
Interview Optional

— +15.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 673 resolved cases, 2023–2026
Examiner Intelligence

LIU, GORDON G View full profile →
Grants 83% — above average
Career Allow Rate
556 granted / 673 resolved
+20.6% vs TC avg
Strong +15% interview lift
Without
With
+15.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
29 currently pending
Career history
702
Total Applications
across all art units
Statute-Specific Performance

§101
6.7%
-33.3% vs TC avg
§103
73.3%
+33.3% vs TC avg
§102
3.0%
-37.0% vs TC avg
§112
5.7%
-34.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 673 resolved cases
Office Action

§101 §103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-21 are pending under this Office action.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 8-14 are rejected under §35 U.S.C. 101 as not falling within one of the four statutory categories of invention because the claimed invention is directed to computer program per se. See MPEP 2106(1). A claim directed toward a non-transitory computer readable medium having the program encoded thereon establishes a sufficient functional relationship between the program and a computer so as to remove it from the realm of “program per se”. MPEP 2111.05(111). Hence, adding the limitation of “non-transitory” before “machine-readable medium” for claims 8-14 would resolve this issue.

Claim Objections
Claims 9-14 are objected to because of the following informalities: “The medium” may be “The machine-readable medium” or better “The non-transitory machine-readable medium” to overcome CRM 101 rejections.  Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-6, 8-9, 12-13, 15-16 and 19-20, are rejected under 35 U.S.C. 103 as being unpatentable over Liu, etc. (US 20120069167 A1) in view of Pheiffer, etc. (US 20180150929 A1), further in view of Kurz, etc. (US 20120219188 A1) and Birchfield, etc. (US 20220277472 A1).
Regarding claim 1, Liu teaches that a method implemented on at least one processor, a memory, and a communication platform (See Liu: Fig. 7, and [0044], “Referring to FIG. 7, a system 400 for image-based registration between images is illustratively shown. The system 400 includes a computer tomography (CT) scanner 402 (or other pre-operative imager or scanner) although the scanner 402 is not needed as the CT images may be stored in memory 404 and transferred to the system 400 using storage media or network connections. The memory 404 and/or scanner are employed to store/collect CT images of a subject, such as a patient for surgery. An endoscope 406 includes a camera 408 for collecting real-time images during a procedure. The endoscope 406 includes a tracker system 410, e.g., an electromagnetic (EM) tracker for locating a tip of the endoscope. The tracker system 410 needs to have its coordinate system mapped or transformed into the CT coordinate system. The tracker system 410 employs an NDI field generator 411 to track the progress of the endoscope 406”), comprising:
generating a plurality of three-dimensional (3D) virtual camera poses (See Liu: Figs. 2-3, and [0038], “Referring to FIGS. 2 and 3, a virtual image 20 is shown at a carina position of a lung. A camera pose at the virtual position (VB) is recorded as P.sub.V. The operator moves an endoscope 22 with a camera for collecting images close enough to match the image VB. The VB camera pose is known and stored in memory. When the operator is satisfied with the pose of the scope, the operator can start to acquire a series of images from pose P.sub.i to P.sub.i+N (or from P.sub.i-N). A mutual-information based registration method will be employed to find the most similar image whose pose is denoted as P.sub.R. The camera pose P.sub.R corresponds to the best match between VB and the selected RB. The transformation matrix between P.sub.V and P.sub.R is constructed and becomes the desired registration result. Image similarity may be determined using computer implemented software tools or may be performed by a human operator depending on the circumstances”, Note that the virtual camera positions P.sub.i to P.sub.i+N are mapped to the a plurality of three-dimensional (3D) virtual camera poses); 
with respect to each of the plurality of 3D virtual camera poses (See Liu: Figs. 2-3, and [0038], “Referring to FIGS. 2 and 3, a virtual image 20 is shown at a carina position of a lung. A camera pose at the virtual position (VB) is recorded as P.sub.V”, Note that the every camera virtual position P.sub.VB, a virtual image is generated, and this is mapped to “with respect to each of the plurality of 3D virtual camera pose”), 
projecting a 3D model for a target organ onto a (two-dimensional) 2D image plane determined based on the 3D virtual camera pose to generate a virtual 2D image of the target organ in a perspective corresponding to the 3D virtual camera pose (See Liu: Fig. 4, and [0027], “The present disclosure describes systems and methods for scope calibration and registration. A simple method for calibrating an electro-magnetic (EM) guided endoscopy system computes a transformation matrix for an offset between a camera coordinate and an endoscope tracker. The offset distance between a camera frame and an endoscope tracker frame is reflected in a disparity in 2D projection images between a real video image and a virtual fly-through image. Human eyes or a computer are used to differentiate this spatial difference and rebuild the spatial correspondence. The spatial offset becomes the calibration result”; and [0039], “Referring to FIG. 4, a relationship between an EM tracker coordinate system 40, a camera coordinate system 42 and a CT coordinate system 44 is illustratively depicted. The three local coordinate systems 40, 42 and 44 need to be interconnected to permit transformation between the camera coordinate system 42 (where the center of projection and optical axis are located), EM sensor coordinate system 40, and CT coordinate system 44”. Note that there are three coordinates systems: EM, camera, and CT coordinates systems, and there are transform matrixes among them; the projected 2D images are with respect to the camera coordinate system 42, which is mapped to “based on the 3D virtual camera pose to generate a virtual 2D image of the target organ in a perspective corresponding to the 3D virtual camera pose”. However, Liu does not teach that “projecting a 3D model for a target organ onto a (two-dimensional) 2D image plane”, and secondary art will be used to address this limitation),
obtaining 2D features of the virtual 2D image (See Liu: Fig. 6, and [0042], “Referring to FIG. 6, a method for image-based registration between images is illustratively shown in accordance with one illustrative embodiment. In block 302, computer tomography (CT) (or other pre-operative) images of a subject are collected or provided. Advantageously, no markers are needed in the CT images. In block 304, an anatomical reference or feature is located in a video image (e.g., a real-time image taken with a camera of an endoscope) which corresponds to a particular pre-operative image. This may include tracking an endoscope with electromagnetic tracking”. Note that the CT image reference or feature is mapped to the 2D features pf the virtual 2D images), and 
creating a pair representing a mapping (See Liu: Figs. 4-6, and [0035], “In accordance with the present principles, three local coordinate systems need to be inter-connected to permit a mapping of events therebetween. These include a camera coordinate system (where the center of projection and optical axis are located), EM sensor coordinate system, and CT coordinate system”) from the 2D features to the 3D virtual camera pose; 
obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses; 
obtaining a 3D pose estimate of a laparoscopic camera by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate; and 
refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera via differential rendering of the 3D model with respect to the 3D pose estimate.
However, Liu fails to explicitly disclose that projecting a 3D model for a target organ onto a (two-dimensional) 2D image plane; creating a pair representing a mapping from the 2D features to the 3D virtual camera pose; obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses; obtaining a 3D pose estimate of a laparoscopic camera by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate; and refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera via differential rendering of the 3D model with respect to the 3D pose estimate.
However, Pheiffer teaches that projecting a 3D model for a target organ (See Pheiffer: Fig. 1, and [0018], “According to an embodiment of the present invention, the sequence of intra-operative images can be acquired by a user (e.g., doctor, clinician, etc.) performing a complete scan of the target organ using the image acquisition device (e.g., laparoscope or endoscope). In this case the user moves the image acquisition device while the image acquisition device continually acquires images (frames), so that the frames of the intra-operative image sequence cover the complete surface of the target organ. This may be performed at a beginning of a surgical procedure to obtain a full picture of the target organ at a current deformation. A 3D stitching procedure may be performed to stitch together the intra-operative images to form an intra-operative 3D model of the target organ, such as the liver”) onto a (two-dimensional) 2D image plane (See Pheiffer: Fig. 1, and [0019], “At step 106, the pre-operative 3D medical image volume is registered to the 2D/2.5D intra-operative images using the relative orientation measurements of the intra-operative images to constrain the registration. According to an embodiment of the present invention, this registration is performed by simulating camera projections from the pre-operative 3D volume using a parameters space defining the position and orientation of a virtual camera (e.g., virtual endoscope/laparoscope). The simulation of the projection images from the pre-operative 3D volume can include photorealistic rendering. The position and orientation parameters determine the appearance and well as the geometry of simulated 2D/2.5D projection images from the 3D medical image volume, which are directly compared to the observed 2D/2.5D intra-operative images via a similarity metric”. Note that the 3D model of tissues are projected to generate 2D/2.5D images, the 2D projected images are projected from the 3D volumes so it is projected on the 2D planes);
creating a pair representing a mapping from the 2D features to the 3D virtual camera pose;
obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses;
obtaining a 3D pose estimate of a laparoscopic camera (See Pheiffer: Fig. 1, and [0017], “At step 104, a sequence of intra-operative images is received along with corresponding relative orientation measurements. The sequence of intra-operative images can also be referred to as a video, with each intra-operative image being a frame of the video. For example, the intra-operative image sequence can be a laparoscopic image sequence acquired via a laparoscope or an endoscopic image sequence acquired via an endoscope”; and [0020], “An optimization framework is used to select the pose parameters for the virtual camera that maximize the similarity (or minimize the difference) between the simulated projection images and the received intra-operative images. That is, the optimization problem calculates position and orientation parameters that maximize a total similarity (or minimizes a total difference) between each 2D/2.5 intra-operative image and a corresponding simulated 2D/2.5D projection image from the pre-operative 3D volume over all of the intra-operative images”. Note that “select the pose parameters for the virtual camera” is mapped to “obtaining a 3D pose estimate of a laparoscopic camera”) by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate; and
refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera (See Pheiffer: Fig. 1, and [0020], “According to an embodiment of the present invention, the similarity metric is calculated for the target organ in intra-operative images and the corresponding simulated projection images. This optimization problem can be performed using any similarity or difference metric and can be solved using any optimization algorithm. For example, the similarity metric can be cross correlation, mutual information, normalized mutual information, etc., and the similarity metric may be combined with a geometry fitting term for fitting the simulated 2.5D depth data to the observed 2.5D depth data based on the geometry of the target organ. As described above the orientation sensors mounted to the intra-operative image acquisition device (e.g., endoscope/laparoscope) provide relative orientations of the intra-operative images with respect to each other. These relative orientations are used to constrain the optimization problem. In particular, the relative orientations of the intra-operative images constrain the set of orientation parameters calculated for the corresponding simulated projection images. Additionally, the scaling is known due to metric 2.5D sensing, resulting in an optimization for pose refinement on the unit sphere. The optimization may be further constrained based on other a priori information from a known surgical plan used in the acquisition of the intra-operative images, such as a position of the operating room table, position of the patient on the operating room table, and a range of possible camera orientations”. Note that the virtual camera pose parameters are refined by the optimization process, and this is mapped to “refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera”) via differential rendering of the 3D model with respect to the 3D pose estimate.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Liu to have projecting a 3D model for a target organ onto a (two-dimensional) 2D image plane; obtaining a 3D pose estimate of a laparoscopic camera; and refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera as taught by Pheiffer in order to enable minimizing the difference between the simulated projection images and the received intra-operative images in an effective manner and improves the registration results (See Pheiffer: Fig. 1, and [0020], “An optimization framework is used to select the pose parameters for the virtual camera that maximize the similarity (or minimize the difference) between the simulated projection images and the received intra-operative images. That is, the optimization problem calculates position and orientation parameters that maximize a total similarity (or minimizes a total difference) between each 2D/2.5 intra-operative image and a corresponding simulated 2D/2.5D projection image from the pre-operative 3D volume over all of the intra-operative images”). Liu teaches a method and system for marker-free image-based registration between images locating a feature in a pre-operative image and comparing real-time images taken with a tracked scope with the pre-operative image taken of the feature to find a real-time image that closely matches the pre-operative image that may determine a transformation matrix between a position of the pre-operative image and a position of the real-time image provided by a tracker so that the determined transformation matrix can be used to determine the real-time image coordinates using the pre-operative image space and the transformation matrix; while Pheiffer teaches a system and method for registration of 2D/2.5D laparoscopic or endoscopic image data to 3D volumetric image data that may register the 3D medical image volume of the target organ to the plurality of the projected 2D/2.5D intra-operative images by calculating pose parameters to match simulated projection images of the 3D medical image volume to the plurality of 2D/2.5D intra-operative images and refine the 3D virtual camera pose parameters through an optimization process. Therefore, it is obvious to one of ordinary skill in the art to modify Liu by Pheiffer to obtain the pre-operative images by projecting the 3D models of the target organs into 2D images and register the 3D mode with the 2D projected images. The motivation to modify Liu by Pheiffer is “Use of known technique to improve similar devices (methods, or products) in the same way”.
However, Liu, modified by Pheiffer, fails to explicitly disclose that creating a pair representing a mapping from the 2D features to the 3D virtual camera pose; obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses; by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate; and via differential rendering of the 3D model with respect to the 3D pose estimate.
However, Kurz teaches that creating a pair representing a mapping from the 2D features to the 3D virtual camera pose (See Kurz: Fig. 3, and [0044], “According to another embodiment of the invention, as the capturing device a range data capturing device may be used, wherein pixels of images taken with any kind of range data capturing device, such as laser scanners, time-of-flight cameras, or stereo cameras may have associated 3D coordinates. In this case any orientation in a common coordinate system for a particular feature point can be computed from the 3D positions of the neighboring pixels of the feature point”; and [0015], “In FIG. 3, there is shown a standard approach for creating a feature descriptor. In step S1, an image is captured by a capturing device, e.g. a camera, or loaded from a storage medium. In step S2, feature points are extracted from the image and stored in a 2-dimensional description (parameters u, v). In step S3, an orientation assignment is performed as described above with respect to FIG. 2, to add to the parameters u, v an orientation angle a. Thereafter, a neighborhood normalization step S4 is performed, as described above with respect to FIG. 2 to gain normalized neighborhood pixel intensities i[ ]. In the final step S5, a feature descriptor in the form of a descriptor vector d[ ] is created for the respective extracted feature as a function of the normalized neighborhood pixel intensities i[ ]. Approaches exist that may assign multiple orientation angles to a feature in step S3 and consequently carry out the steps S4 and S5 for each orientation resulting in one descriptor per assigned orientation”. Note that the features a 2-dimensional description (parameters u, v) is mapped to the 2D features, and the feature descriptor has a feature and orientation of the camera capturing the image is mapped to the pair of feature and camera pose); 
obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses (See Kurz: Fig. 4, and [0043], “In step SI 4, an orientation assignment is performed to add to the parameters u, v an orientation angle a based on the feature orientation angle a determined in step SI 3. Thereafter, a neighborhood normalization step SI 5 is performed to gain normalized neighborhood pixel intensities i[ ]. In the final step SI 6, a feature descriptor in the form of a descriptor vector d[ ] is created for the respective extracted feature depending on a parameter which is indicative of an orientation of the extracted feature, particularly resulting from the orientation assignment in step SI 4”. Note that the a descriptor vector d[ ] is mapped to t a 2D feature-camera pose mapping model, because each feature descriptor has a pair of 2D image feature and camera pose, and a full set of feature descriptors is a model that provides a mapping between features and the camera poses); 
by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate (See Kurz: Figs. 5A-B, and [0057], “In the first image IMI, a real static object ROI as shown in FIG. 5A is captured by a camera (not shown). In the image IMI features of the real object ROI are extracted, such as shown in FIG. 5A by features F51. In a following step, descriptors may be computed for every extracted feature F51 in accordance with the method of the invention. These features F51 are then matched with features F52 extracted in the second image IM2. The second image IM2 is depicting a real object R02 which corresponds with real object ROI under a different viewpoint, wherein for the features F52 also a respective descriptor is determined. Particularly, if the descriptors of features F51 and F52 are relatively close in terms of a certain similarity measure, they are matched. For example, if every descriptor is written as a vector of numbers, when comparing two descriptors, one can use the Euclidian distance between two corresponding vectors as similarity measure”; and [0053], “Camera tracking describes the process of computing the pose (position and orientation) of a camera given one or more camera images. Features in the camera image are either matched against reference features with known 3D positions to compute an absolute pose or against features from the previous frame to compute the relative change in position and orientation”. Note that the computed feature descriptors with known camera pose are matched to the real image feature descriptors, and the camera pose for the real image is estimated/determined, and this is mapped to “obtaining a 3D pose estimate of a laparoscopic camera by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate”); and 
via differential rendering of the 3D model with respect to the 3D pose estimate.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Liu to have creating a pair representing a mapping from the 2D features to the 3D virtual camera pose; obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses; by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate as taught by Kurz in order to enable performing a tracking process to evaluate the information regarding correspondence between attributes, which are associated with the real object, and corresponding characteristics of the real object in order to obtain the conclusion of the pose of the camera, thus determining the pose of the camera relative to the real object of the real environment with increased robustness against changing environmental conditions (See Kurz: Fig. 1, and [0053], “Camera tracking describes the process of computing the pose (position and orientation) of a camera given one or more camera images. Features in the camera image are either matched against reference features with known 3D positions to compute an absolute pose or against features from the previous frame to compute the relative change in position and orientation”). Liu teaches a method and system for marker-free image-based registration between images locating a feature in a pre-operative image and comparing real-time images taken with a tracked scope with the pre-operative image taken of the feature to find a real-time image that closely matches the pre-operative image that may determine a transformation matrix between a position of the pre-operative image and a position of the real-time image provided by a tracker so that the determined transformation matrix can be used to determine the real-time image coordinates using the pre-operative image space and the transformation matrix; while Kurz teaches a system and method of providing a descriptor for at least one feature of an image that may generate a feature descriptor with 2D image features and the camera pose mapping, extract features of the real images, match the extracted feature descriptor with the pre-computed feature descriptor to determine the camera pose estimate. Therefore, it is obvious to one of ordinary skill in the art to modify Liu by Kurz to generate the 2D feature-camera mapping model and determine the camera pose based on the mapping model and the real image features. The motivation to modify Liu by Kurz is “Use of known technique to improve similar devices (methods, or products) in the same way”.
However, Liu, modified by Pheiffer and Kurz, fails to explicitly disclose that via differential rendering of the 3D model with respect to the 3D pose estimate.
However, Birchfield teaches that via differential rendering of the 3D model with respect to the 3D pose estimate (See Birchfield: Fig. 8, and [0123], “A system for object pose estimation may perform a single-stage method category-level 6-DoF pose prediction of previously unseen object instances. A system for object pose estimation may not require various 3D models of instances at training and/or test time, and synthetic data may not be required for training. For 2D keypoint detection, a system for object pose estimation may utilize a combined representation of both displacements and heatmaps to mitigate uncertainty. A system for object pose estimation may estimate the relative dimensions of the 3D bounding cuboid. To further improve accuracy, a system for object pose estimation may utilize a convGRU sequential feature association. A system for object pose estimation may be evaluated in connection with one or more other systems using a dataset such as an Objectron dataset. A system for object pose estimation may be utilized in various contexts, including robotic grasping tasks and various other real-world applications. A system for object pose estimation may be specific to one or more categories, incorporate differential rendering, and leverage iterative post refinement”. Note that the pose estimate may be refined by differential rendering or iterative post refinement, and this is mapped to “refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera via differential rendering of the 3D model with respect to the 3D pose estimate because Liu modified by Pheiffer teaches to refine the 3D camera pose estimate by some optimization process, and Birchfield teaches that the pose estimate can be refined by differential rendering of the 3D object, and combining with them will arrive at exactly the current cited limitations).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Liu to have via differential rendering of the 3D model with respect to the 3D pose estimate as taught by Birchfield in order to improve the range of object pose estimation by providing the system that determines a six DoF pose of an object and the object's relative dimensions from a single RGB image representing the object (See Birchfield: Fig. 1, and [0067], “Techniques described and suggested in the present disclosure improve the field of object pose estimation, by providing a system that determines a 6-DoF pose of an object and relative dimensions of the object from a single RGB image depicting the object”). Liu teaches a method and system for marker-free image-based registration between images locating a feature in a pre-operative image and comparing real-time images taken with a tracked scope with the pre-operative image taken of the feature to find a real-time image that closely matches the pre-operative image that may determine a transformation matrix between a position of the pre-operative image and a position of the real-time image provided by a tracker so that the determined transformation matrix can be used to determine the real-time image coordinates using the pre-operative image space and the transformation matrix; while Birchfield teaches a system and method that may determine a pose and relative dimensions of an object from an image with differential rendering of the 3D object to refine the pose estimation of the 3D object. Therefore, it is obvious to one of ordinary skill in the art to modify Liu by Birchfield to refine the pose estimation by differential renderings of the target organ. The motivation to modify Liu by Birchfield is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 2, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 1 as outlined above. Further, Liu, Pheiffer, and Kurz teach that the method of claim 1, wherein the 2D features include one or more of: 
intensity features (See Kurz: Fig. 2, and [0014], “A variety of local feature descriptors exist, wherein a good overview and comparison is given in Krystian Mikolajczyk and Cordelia Schmid, "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis & Machine Intelligence, 10, 27 (2005), pp. 1615-1630. Most of them are based on the creation of histograms of either intensity values of the normalized local neighborhood pixels or of functions of them, such as gradients. The final descriptor is expressed as an n-dimensional vector (as shown in FIG. 2 on the right) and can be compared to other descriptors using a similarity measure such as the Euclidian distance”) characterizing the appearance of the 3D model (See Pheiffer: Fig. 1, and [0019], “The simulation of the projection images from the pre-operative 3D volume can include photorealistic rendering. The position and orientation parameters determine the appearance and well as the geometry of simulated 2D/2.5D projection images from the 3D medical image volume, which are directly compared to the observed 2D/2.5D intra-operative images via a similarity metric”) when projected to the 2D image plane (See Kurz: Fig. 4, and [0042], “A very simple way to gain the orientation for all features is to transform the gravitational force to a coordinate system attached to the capturing device using the Euler angles first and then project it onto the image plane. Thereby, the direction of the gravitational force in the image is computed and used for all features in the image. This technique assumes orthogonal projection which is generally not the case. Incorporating the intrinsic parameters of the camera relaxes this assumption but still all techniques based on 2D images assume everything visible in the image to lie on a plane and therefore are approximations”); and 
geometric features  (See Liu: Fig. 3, and [0007], “Generally speaking, calibration is an offline procedure: the calibration parameters can be obtained by imaging an EM-tracked phantom (with a calibration pattern such as a checkerboard) that has known geometric properties, using an EM-tracked endoscope. This involves a cumbersome engineering procedure”) characterizing the shape of the projected 3D model (See Liu: Fig. 3, and [0030], “It should be understood that the present invention will be described in terms of a bronchoscope; however, the teachings of the present invention are much broader and are applicable to any optical scope that can be employed in internal viewing of branching, curved, coiled or other shaped systems (e.g., digestive systems, circulatory systems, piping systems, passages, mines, caverns, etc.)”) in the 2D image plane (See Kurz: Fig. 4, and [0042], “A very simple way to gain the orientation for all features is to transform the gravitational force to a coordinate system attached to the capturing device using the Euler angles first and then project it onto the image plane. Thereby, the direction of the gravitational force in the image is computed and used for all features in the image. This technique assumes orthogonal projection which is generally not the case. Incorporating the intrinsic parameters of the camera relaxes this assumption but still all techniques based on 2D images assume everything visible in the image to lie on a plane and therefore are approximations”).
Regarding claim 5, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 1 as outlined above. Further, Birchfield teaches that the method of claim 1, wherein the step of obtaining the 2D feature-camera pose mapping model comprises: 
generating training data based on the pairs (See Birchfield: Fig. 1, and [0097], “One or more systems, such as a training framework, may train one or more neural networks of a system for object pose estimation 102. One or more systems may obtain training data from one or more datasets or other systems and utilize the training data to train one or more neural networks of a system for object pose estimation 102. Training data may comprise images with 6-DoF poses and relative dimensions of bounding cuboids of objects depicted in the images indicated. Training data may comprise images and ground truth data (e.g., indicating 6-DoF poses and relative dimensions of bounding cuboids of objects depicted in the images). Training data may comprise images in which each image is associated with 2D points (e.g., vertices) of a bounding cuboid of an object depicted in the image, a centroid, and relative dimensions”. Note that “Training data may comprise images with 6-DoF poses” is mapped to “generating training data based on the pairs”); and 
obtaining, via machine learning, the 2D feature-camera pose mapping model capable of mapping input 2D features to a 3D camera pose (See Birchfield: Fig. 1, and [0100], “One or more systems may compute overall loss, denoted by custom-character.sub.all, for one or more neural networks of a system for object pose estimation 102, which may be a weighted combination of one or more loss terms, which can be denoted by a following formula, although any variations thereof can be utilized”. Note that the neural network model after training to estimate pose from the image is mapped to “2D feature-camera pose mapping model” because the neural network can generate/estimate (map) the object pose based on the input image features through analyzing the input images; in addition, the term “mapping” is broad, which may be a relationship, a function, a look-up table, or a neural network model, etc.).
Regarding claim 6, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 1 as outlined above. Further, Liu and Birchfield teach that the method of claim 1, wherein the input 2D features are obtained by: 
acquiring, during a surgery via the laparoscopic camera positioned at a 3D camera pose (See Liu: Figs. 1-3, and [0029], “In particularly useful embodiments, the scope may include a bronchoscope or any scope for pulmonary, digestive system, or other minimally invasive surgical viewing. In other embodiments, an endoscope or the like is employed for other medical procedures as well. These procedures may include minimally invasive endoscopic pituitary surgery, endoscopic skull base tumor surgery, intraventricular neurosurgery, arthroscopic surgery, laparoscopic surgery, etc. Other scoping applications are also contemplated”), the real time 2D image of the target organ (See Liu: Figs. 4-7, and [0043], “In block 306, a series of video images are collected around the feature to attempt to replicate the pose of the virtual or pre-operative image”; and [0044], “Referring to FIG. 7, a system 400 for image-based registration between images is illustratively shown. The system 400 includes a computer tomography (CT) scanner 402 (or other pre-operative imager or scanner) although the scanner 402 is not needed as the CT images may be stored in memory 404 and transferred to the system 400 using storage media or network connections. The memory 404 and/or scanner are employed to store/collect CT images of a subject, such as a patient for surgery. An endoscope 406 includes a camera 408 for collecting real-time images during a procedure. The endoscope 406 includes a tracker system 410, e.g., an electromagnetic (EM) tracker for locating a tip of the endoscope. The tracker system 410 needs to have its coordinate system mapped or transformed into the CT coordinate system. The tracker system 410 employs an NDI field generator 411 to track the progress of the endoscope 406”);
processing the real time 2D image to generate a segmentation of the target organ (See Birchfield: Fig. 41, and [0598], “In addition, deployment pipeline 4010A may include additional processing tasks or applications that may be implemented to prepare data for use by applications (e.g., DICOM adapter 4002B and DICOM reader 4106 may be used in deployment pipeline 4010A to prepare data for use by CT reconstruction 4108, organ segmentation 4110, etc.). In at least one embodiment, deployment pipeline 4010A may be customized or selected for consistent deployment, one time use, or for another frequency or interval. In at least one embodiment, a user may desire to have CT reconstruction 4108 and organ segmentation 4110 for several subjects over a specific interval, and thus may deploy pipeline 4010A for that period of time. In at least one embodiment, a user may select, for each request from system 4000, applications that a user wants to perform processing on that data for that request. In at least one embodiment, deployment pipeline 4010A may be adjusted at any interval and, because of adaptability and scalability of a container structure within system 4000, this may be a seamless process”); 
extracting input 2D features of the target organ as it appears in the real time 2D image (See Birchfield: Fig. 41, and [0061], “In an embodiment, a system for object pose estimation obtains an RGB (red-green-blue) image depicting an object of a particular category. A category may refer to a classification or class of objects, in which objects belonging to the category may be similar, such as, for example, a category referred to as “mugs” may include instances of mugs with different colors, sizes, with or without handles, and/or variations thereof. The system may utilize a neural network to extract features from the image, and calculate various outputs based at least in part on the extracted features”).
Regarding claim 8, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 1 as outlined above. Further, Liu, Pheiffer, Kurz, and Birchfield teach that a machine-readable medium having information recorded thereon, wherein the information, when read by the machine, causes the machine to perform the following (See Liu: Fig. 7, and [0044], “Referring to FIG. 7, a system 400 for image-based registration between images is illustratively shown. The system 400 includes a computer tomography (CT) scanner 402 (or other pre-operative imager or scanner) although the scanner 402 is not needed as the CT images may be stored in memory 404 and transferred to the system 400 using storage media or network connections. The memory 404 and/or scanner are employed to store/collect CT images of a subject, such as a patient for surgery. An endoscope 406 includes a camera 408 for collecting real-time images during a procedure. The endoscope 406 includes a tracker system 410, e.g., an electromagnetic (EM) tracker for locating a tip of the endoscope. The tracker system 410 needs to have its coordinate system mapped or transformed into the CT coordinate system. The tracker system 410 employs an NDI field generator 411 to track the progress of the endoscope 406”) steps:
generating a plurality of three-dimensional (3D) virtual camera poses (See Liu: Figs. 2-3, and [0038], “Referring to FIGS. 2 and 3, a virtual image 20 is shown at a carina position of a lung. A camera pose at the virtual position (VB) is recorded as P.sub.V. The operator moves an endoscope 22 with a camera for collecting images close enough to match the image VB. The VB camera pose is known and stored in memory. When the operator is satisfied with the pose of the scope, the operator can start to acquire a series of images from pose P.sub.i to P.sub.i+N (or from P.sub.i-N). A mutual-information based registration method will be employed to find the most similar image whose pose is denoted as P.sub.R. The camera pose P.sub.R corresponds to the best match between VB and the selected RB. The transformation matrix between P.sub.V and P.sub.R is constructed and becomes the desired registration result. Image similarity may be determined using computer implemented software tools or may be performed by a human operator depending on the circumstances”, Note that the virtual camera positions P.sub.i to P.sub.i+N are mapped to the a plurality of three-dimensional (3D) virtual camera poses); 
with respect to each of the plurality of 3D virtual camera poses  (See Liu: Figs. 2-3, and [0038], “Referring to FIGS. 2 and 3, a virtual image 20 is shown at a carina position of a lung. A camera pose at the virtual position (VB) is recorded as P.sub.V”, Note that the every camera virtual position P.sub.VB, a virtual image is generated, and this is mapped to “with respect to each of the plurality of 3D virtual camera pose”),
projecting a 3D model for a target organ (See Pheiffer: Fig. 1, and [0018], “According to an embodiment of the present invention, the sequence of intra-operative images can be acquired by a user (e.g., doctor, clinician, etc.) performing a complete scan of the target organ using the image acquisition device (e.g., laparoscope or endoscope). In this case the user moves the image acquisition device while the image acquisition device continually acquires images (frames), so that the frames of the intra-operative image sequence cover the complete surface of the target organ. This may be performed at a beginning of a surgical procedure to obtain a full picture of the target organ at a current deformation. A 3D stitching procedure may be performed to stitch together the intra-operative images to form an intra-operative 3D model of the target organ, such as the liver”) onto a (two-dimensional) 2D image plane (See Pheiffer: Fig. 1, and [0019], “At step 106, the pre-operative 3D medical image volume is registered to the 2D/2.5D intra-operative images using the relative orientation measurements of the intra-operative images to constrain the registration. According to an embodiment of the present invention, this registration is performed by simulating camera projections from the pre-operative 3D volume using a parameters space defining the position and orientation of a virtual camera (e.g., virtual endoscope/laparoscope). The simulation of the projection images from the pre-operative 3D volume can include photorealistic rendering. The position and orientation parameters determine the appearance and well as the geometry of simulated 2D/2.5D projection images from the 3D medical image volume, which are directly compared to the observed 2D/2.5D intra-operative images via a similarity metric”. Note that the 3D model of tissues are projected to generate 2D/2.5D images, the 2D projected images are projected from the 3D volumes so it is projected on the 2D planes) determined based on the 3D virtual camera pose to generate a virtual 2D image of the target organ in a perspective corresponding to the 3D virtual camera pose (See Liu: Fig. 4, and [0027], “The present disclosure describes systems and methods for scope calibration and registration. A simple method for calibrating an electro-magnetic (EM) guided endoscopy system computes a transformation matrix for an offset between a camera coordinate and an endoscope tracker. The offset distance between a camera frame and an endoscope tracker frame is reflected in a disparity in 2D projection images between a real video image and a virtual fly-through image. Human eyes or a computer are used to differentiate this spatial difference and rebuild the spatial correspondence. The spatial offset becomes the calibration result”; and [0039], “Referring to FIG. 4, a relationship between an EM tracker coordinate system 40, a camera coordinate system 42 and a CT coordinate system 44 is illustratively depicted. The three local coordinate systems 40, 42 and 44 need to be interconnected to permit transformation between the camera coordinate system 42 (where the center of projection and optical axis are located), EM sensor coordinate system 40, and CT coordinate system 44”. Note that there are three coordinates systems: EM, camera, and CT coordinates systems, and there are transform matrixes among them; the projected 2D images are with respect to the camera coordinate system 42, which is mapped to “based on the 3D virtual camera pose to generate a virtual 2D image of the target organ in a perspective corresponding to the 3D virtual camera pose”. However, Liu does not teach that “projecting a 3D model for a target organ onto a (two-dimensional) 2D image plane”, and secondary art will be used to address this limitation),
obtaining 2D features of the virtual 2D image (See Liu: Fig. 6, and [0042], “Referring to FIG. 6, a method for image-based registration between images is illustratively shown in accordance with one illustrative embodiment. In block 302, computer tomography (CT) (or other pre-operative) images of a subject are collected or provided. Advantageously, no markers are needed in the CT images. In block 304, an anatomical reference or feature is located in a video image (e.g., a real-time image taken with a camera of an endoscope) which corresponds to a particular pre-operative image. This may include tracking an endoscope with electromagnetic tracking”. Note that the CT image reference or feature is mapped to the 2D features pf the virtual 2D images), and 
creating a pair representing a mapping (See Liu: Figs. 4-6, and [0035], “In accordance with the present principles, three local coordinate systems need to be inter-connected to permit a mapping of events therebetween. These include a camera coordinate system (where the center of projection and optical axis are located), EM sensor coordinate system, and CT coordinate system”) from the 2D features to the 3D virtual camera pose (See Kurz: Fig. 3, and [0044], “According to another embodiment of the invention, as the capturing device a range data capturing device may be used, wherein pixels of images taken with any kind of range data capturing device, such as laser scanners, time-of-flight cameras, or stereo cameras may have associated 3D coordinates. In this case any orientation in a common coordinate system for a particular feature point can be computed from the 3D positions of the neighboring pixels of the feature point”; and [0015], “In FIG. 3, there is shown a standard approach for creating a feature descriptor. In step S1, an image is captured by a capturing device, e.g. a camera, or loaded from a storage medium. In step S2, feature points are extracted from the image and stored in a 2-dimensional description (parameters u, v). In step S3, an orientation assignment is performed as described above with respect to FIG. 2, to add to the parameters u, v an orientation angle a. Thereafter, a neighborhood normalization step S4 is performed, as described above with respect to FIG. 2 to gain normalized neighborhood pixel intensities i[ ]. In the final step S5, a feature descriptor in the form of a descriptor vector d[ ] is created for the respective extracted feature as a function of the normalized neighborhood pixel intensities i[ ]. Approaches exist that may assign multiple orientation angles to a feature in step S3 and consequently carry out the steps S4 and S5 for each orientation resulting in one descriptor per assigned orientation”. Note that the features a 2-dimensional description (parameters u, v) is mapped to the 2D features, and the feature descriptor has a feature and orientation of the camera capturing the image is mapped to the pair of feature and camera pose); 
obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses (See Kurz: Fig. 4, and [0043], “In step SI 4, an orientation assignment is performed to add to the parameters u, v an orientation angle a based on the feature orientation angle a determined in step SI 3. Thereafter, a neighborhood normalization step SI 5 is performed to gain normalized neighborhood pixel intensities i[ ]. In the final step SI 6, a feature descriptor in the form of a descriptor vector d[ ] is created for the respective extracted feature depending on a parameter which is indicative of an orientation of the extracted feature, particularly resulting from the orientation assignment in step SI 4”. Note that the a descriptor vector d[ ] is mapped to t a 2D feature-camera pose mapping model, because each feature descriptor has a pair of 2D image feature and camera pose, and a full set of feature descriptors is a model that provides a mapping between features and the camera poses);
obtaining a 3D pose estimate of a laparoscopic camera (See Pheiffer: Fig. 1, and [0017], “At step 104, a sequence of intra-operative images is received along with corresponding relative orientation measurements. The sequence of intra-operative images can also be referred to as a video, with each intra-operative image being a frame of the video. For example, the intra-operative image sequence can be a laparoscopic image sequence acquired via a laparoscope or an endoscopic image sequence acquired via an endoscope”; and [0020], “An optimization framework is used to select the pose parameters for the virtual camera that maximize the similarity (or minimize the difference) between the simulated projection images and the received intra-operative images. That is, the optimization problem calculates position and orientation parameters that maximize a total similarity (or minimizes a total difference) between each 2D/2.5 intra-operative image and a corresponding simulated 2D/2.5D projection image from the pre-operative 3D volume over all of the intra-operative images”. Note that “select the pose parameters for the virtual camera” is mapped to “obtaining a 3D pose estimate of a laparoscopic camera”) by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate (See Kurz: Figs. 5A-B, and [0057], “In the first image IMI, a real static object ROI as shown in FIG. 5A is captured by a camera (not shown). In the image IMI features of the real object ROI are extracted, such as shown in FIG. 5A by features F51. In a following step, descriptors may be computed for every extracted feature F51 in accordance with the method of the invention. These features F51 are then matched with features F52 extracted in the second image IM2. The second image IM2 is depicting a real object R02 which corresponds with real object ROI under a different viewpoint, wherein for the features F52 also a respective descriptor is determined. Particularly, if the descriptors of features F51 and F52 are relatively close in terms of a certain similarity measure, they are matched. For example, if every descriptor is written as a vector of numbers, when comparing two descriptors, one can use the Euclidian distance between two corresponding vectors as similarity measure”; and [0053], “Camera tracking describes the process of computing the pose (position and orientation) of a camera given one or more camera images. Features in the camera image are either matched against reference features with known 3D positions to compute an absolute pose or against features from the previous frame to compute the relative change in position and orientation”. Note that the computed feature descriptors with known camera pose are matched to the real image feature descriptors, and the camera pose for the real image is estimated/determined, and this is mapped to “obtaining a 3D pose estimate of a laparoscopic camera by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate”); and
refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera (See Pheiffer: Fig. 1, and [0020], “According to an embodiment of the present invention, the similarity metric is calculated for the target organ in intra-operative images and the corresponding simulated projection images. This optimization problem can be performed using any similarity or difference metric and can be solved using any optimization algorithm. For example, the similarity metric can be cross correlation, mutual information, normalized mutual information, etc., and the similarity metric may be combined with a geometry fitting term for fitting the simulated 2.5D depth data to the observed 2.5D depth data based on the geometry of the target organ. As described above the orientation sensors mounted to the intra-operative image acquisition device (e.g., endoscope/laparoscope) provide relative orientations of the intra-operative images with respect to each other. These relative orientations are used to constrain the optimization problem. In particular, the relative orientations of the intra-operative images constrain the set of orientation parameters calculated for the corresponding simulated projection images. Additionally, the scaling is known due to metric 2.5D sensing, resulting in an optimization for pose refinement on the unit sphere. The optimization may be further constrained based on other a priori information from a known surgical plan used in the acquisition of the intra-operative images, such as a position of the operating room table, position of the patient on the operating room table, and a range of possible camera orientations”. Note that the virtual camera pose parameters are refined by the optimization process, and this is mapped to “refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera”) via differential rendering of the 3D model with respect to the 3D pose estimate (See Birchfield: Fig. 8, and [0123], “A system for object pose estimation may perform a single-stage method category-level 6-DoF pose prediction of previously unseen object instances. A system for object pose estimation may not require various 3D models of instances at training and/or test time, and synthetic data may not be required for training. For 2D keypoint detection, a system for object pose estimation may utilize a combined representation of both displacements and heatmaps to mitigate uncertainty. A system for object pose estimation may estimate the relative dimensions of the 3D bounding cuboid. To further improve accuracy, a system for object pose estimation may utilize a convGRU sequential feature association. A system for object pose estimation may be evaluated in connection with one or more other systems using a dataset such as an Objectron dataset. A system for object pose estimation may be utilized in various contexts, including robotic grasping tasks and various other real-world applications. A system for object pose estimation may be specific to one or more categories, incorporate differential rendering, and leverage iterative post refinement”. Note that the pose estimate may be refined by differential rendering or iterative post refinement, and this is mapped to “refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera via differential rendering of the 3D model with respect to the 3D pose estimate because Liu modified by Pheiffer teaches to refine the 3D camera pose estimate by some optimization process, and Birchfield teaches that the pose estimate can be refined by differential rendering of the 3D object, and combining with them will arrive at exactly the current cited limitations).
Regarding claim 9, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 8 as outlined above. Further, Liu, Pheiffer, and Kurz teach that the medium of claim 8, wherein the 2D features include one or more of: 
intensity features (See Kurz: Fig. 2, and [0014], “A variety of local feature descriptors exist, wherein a good overview and comparison is given in Krystian Mikolajczyk and Cordelia Schmid, "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis & Machine Intelligence, 10, 27 (2005), pp. 1615-1630. Most of them are based on the creation of histograms of either intensity values of the normalized local neighborhood pixels or of functions of them, such as gradients. The final descriptor is expressed as an n-dimensional vector (as shown in FIG. 2 on the right) and can be compared to other descriptors using a similarity measure such as the Euclidian distance”) characterizing the appearance of the 3D model (See Pheiffer: Fig. 1, and [0019], “The simulation of the projection images from the pre-operative 3D volume can include photorealistic rendering. The position and orientation parameters determine the appearance and well as the geometry of simulated 2D/2.5D projection images from the 3D medical image volume, which are directly compared to the observed 2D/2.5D intra-operative images via a similarity metric”) when projected to the 2D image plane (See Kurz: Fig. 4, and [0042], “A very simple way to gain the orientation for all features is to transform the gravitational force to a coordinate system attached to the capturing device using the Euler angles first and then project it onto the image plane. Thereby, the direction of the gravitational force in the image is computed and used for all features in the image. This technique assumes orthogonal projection which is generally not the case. Incorporating the intrinsic parameters of the camera relaxes this assumption but still all techniques based on 2D images assume everything visible in the image to lie on a plane and therefore are approximations”); and 
geometric features (See Liu: Fig. 3, and [0007], “Generally speaking, calibration is an offline procedure: the calibration parameters can be obtained by imaging an EM-tracked phantom (with a calibration pattern such as a checkerboard) that has known geometric properties, using an EM-tracked endoscope. This involves a cumbersome engineering procedure”) characterizing the shape of the projected 3D model (See Liu: Fig. 3, and [0030], “It should be understood that the present invention will be described in terms of a bronchoscope; however, the teachings of the present invention are much broader and are applicable to any optical scope that can be employed in internal viewing of branching, curved, coiled or other shaped systems (e.g., digestive systems, circulatory systems, piping systems, passages, mines, caverns, etc.)”) in the 2D image plane (See Kurz: Fig. 4, and [0042], “A very simple way to gain the orientation for all features is to transform the gravitational force to a coordinate system attached to the capturing device using the Euler angles first and then project it onto the image plane. Thereby, the direction of the gravitational force in the image is computed and used for all features in the image. This technique assumes orthogonal projection which is generally not the case. Incorporating the intrinsic parameters of the camera relaxes this assumption but still all techniques based on 2D images assume everything visible in the image to lie on a plane and therefore are approximations”).
Regarding claim 12, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 8 as outlined above. Further, Birchfield teaches that the medium of claim 8, wherein the step of obtaining the 2D feature-camera pose mapping model comprises:
generating training data based on the pairs  (See Birchfield: Fig. 1, and [0097], “One or more systems, such as a training framework, may train one or more neural networks of a system for object pose estimation 102. One or more systems may obtain training data from one or more datasets or other systems and utilize the training data to train one or more neural networks of a system for object pose estimation 102. Training data may comprise images with 6-DoF poses and relative dimensions of bounding cuboids of objects depicted in the images indicated. Training data may comprise images and ground truth data (e.g., indicating 6-DoF poses and relative dimensions of bounding cuboids of objects depicted in the images). Training data may comprise images in which each image is associated with 2D points (e.g., vertices) of a bounding cuboid of an object depicted in the image, a centroid, and relative dimensions”. Note that “Training data may comprise images with 6-DoF poses” is mapped to “generating training data based on the pairs”); and 
obtaining, via machine learning, the 2D feature-camera pose mapping model capable of mapping input 2D features to a 3D camera pose (See Birchfield: Fig. 1, and [0100], “One or more systems may compute overall loss, denoted by custom-character.sub.all, for one or more neural networks of a system for object pose estimation 102, which may be a weighted combination of one or more loss terms, which can be denoted by a following formula, although any variations thereof can be utilized”. Note that the neural network model after training to estimate pose from the image is mapped to “2D feature-camera pose mapping model” because the neural network can generate/estimate (map) the object pose based on the input image features through analyzing the input images; in addition, the term “mapping” is broad, which may be a relationship, a function, a look-up table, or a neural network model, etc.).
Regarding claim 13, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 8 as outlined above. Further, Liu and Birchfield teach that the medium of claim 8, wherein the input 2D features are obtained by: 
acquiring, during a surgery via the laparoscopic camera positioned at a 3D camera pose (See Liu: Figs. 1-3, and [0029], “In particularly useful embodiments, the scope may include a bronchoscope or any scope for pulmonary, digestive system, or other minimally invasive surgical viewing. In other embodiments, an endoscope or the like is employed for other medical procedures as well. These procedures may include minimally invasive endoscopic pituitary surgery, endoscopic skull base tumor surgery, intraventricular neurosurgery, arthroscopic surgery, laparoscopic surgery, etc. Other scoping applications are also contemplated”), the real time 2D image of the target organ (See Liu: Figs. 4-7, and [0043], “In block 306, a series of video images are collected around the feature to attempt to replicate the pose of the virtual or pre-operative image”; and [0044], “Referring to FIG. 7, a system 400 for image-based registration between images is illustratively shown. The system 400 includes a computer tomography (CT) scanner 402 (or other pre-operative imager or scanner) although the scanner 402 is not needed as the CT images may be stored in memory 404 and transferred to the system 400 using storage media or network connections. The memory 404 and/or scanner are employed to store/collect CT images of a subject, such as a patient for surgery. An endoscope 406 includes a camera 408 for collecting real-time images during a procedure. The endoscope 406 includes a tracker system 410, e.g., an electromagnetic (EM) tracker for locating a tip of the endoscope. The tracker system 410 needs to have its coordinate system mapped or transformed into the CT coordinate system. The tracker system 410 employs an NDI field generator 411 to track the progress of the endoscope 406”);
processing the real time 2D image to generate a segmentation of the target organ (See Birchfield: Fig. 41, and [0598], “In addition, deployment pipeline 4010A may include additional processing tasks or applications that may be implemented to prepare data for use by applications (e.g., DICOM adapter 4002B and DICOM reader 4106 may be used in deployment pipeline 4010A to prepare data for use by CT reconstruction 4108, organ segmentation 4110, etc.). In at least one embodiment, deployment pipeline 4010A may be customized or selected for consistent deployment, one time use, or for another frequency or interval. In at least one embodiment, a user may desire to have CT reconstruction 4108 and organ segmentation 4110 for several subjects over a specific interval, and thus may deploy pipeline 4010A for that period of time. In at least one embodiment, a user may select, for each request from system 4000, applications that a user wants to perform processing on that data for that request. In at least one embodiment, deployment pipeline 4010A may be adjusted at any interval and, because of adaptability and scalability of a container structure within system 4000, this may be a seamless process”); 
extracting input 2D features of the target organ as it appears in the real time 2D image (See Birchfield: Fig. 41, and [0061], “In an embodiment, a system for object pose estimation obtains an RGB (red-green-blue) image depicting an object of a particular category. A category may refer to a classification or class of objects, in which objects belonging to the category may be similar, such as, for example, a category referred to as “mugs” may include instances of mugs with different colors, sizes, with or without handles, and/or variations thereof. The system may utilize a neural network to extract features from the image, and calculate various outputs based at least in part on the extracted features”).
Regarding claim 15, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 1 as outlined above. Further, Liu, Pheiffer, Kurz, and Birchfield teach that a system (See Liu: Fig. 7, and [0044], “Referring to FIG. 7, a system 400 for image-based registration between images is illustratively shown. The system 400 includes a computer tomography (CT) scanner 402 (or other pre-operative imager or scanner) although the scanner 402 is not needed as the CT images may be stored in memory 404 and transferred to the system 400 using storage media or network connections. The memory 404 and/or scanner are employed to store/collect CT images of a subject, such as a patient for surgery. An endoscope 406 includes a camera 408 for collecting real-time images during a procedure. The endoscope 406 includes a tracker system 410, e.g., an electromagnetic (EM) tracker for locating a tip of the endoscope. The tracker system 410 needs to have its coordinate system mapped or transformed into the CT coordinate system. The tracker system 410 employs an NDI field generator 411 to track the progress of the endoscope 406”) comprising:
a camera pose generator (See Liu: Fig. 7, and [0045], “A computer implemented program 412 is stored in memory 404 of a computer device 414. The program 412 includes a module 416 configured to compare a real-time video image 452 taken by the camera 408 with CT images 450 to find a closest match between the real-time images and the CT image. The program 412 includes an optimization module 422 configured to find a maximum similarity to determine the closest match CT image. The program 412 is configured to register a closest matched real-time image to a pre-operative image in CT space to find a transformation matrix 420 between the CT space and image tracking space such that the transformation matrix 420 is based solely on image registration, is operator independent, and free of any external markers or anatomic landmarks to perform the registration. The transformation matrix 420 is employed to register coordinates of the CT images to electromagnetic tracking coordinates during an endoscopic procedure. A display 456 may be employed to view the real-time and/or virtual/pre-operative images during the procedure. The display 456 is configured to show endoscope progression in pre-operative image space. The marker-free registration process assumes a calibration process is employed beforehand to determine the relationship between the camera coordinate system and the tracking (EM) coordinate system”. Note that the software portion of “an optimization module 422” and “transformation matrix 420” to determine the camera coordination system or pose is mapped to the camera pose generator) implemented by a processor (See Liu: Fig. 7, and [0033], “A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The processor or processing system may be provided with the scope system or provided independently of the scope system”) and configured for generating a plurality of three-dimensional (3D) virtual camera poses (See Liu: Figs. 2-3, and [0038], “Referring to FIGS. 2 and 3, a virtual image 20 is shown at a carina position of a lung. A camera pose at the virtual position (VB) is recorded as P.sub.V. The operator moves an endoscope 22 with a camera for collecting images close enough to match the image VB. The VB camera pose is known and stored in memory. When the operator is satisfied with the pose of the scope, the operator can start to acquire a series of images from pose P.sub.i to P.sub.i+N (or from P.sub.i-N). A mutual-information based registration method will be employed to find the most similar image whose pose is denoted as P.sub.R. The camera pose P.sub.R corresponds to the best match between VB and the selected RB. The transformation matrix between P.sub.V and P.sub.R is constructed and becomes the desired registration result. Image similarity may be determined using computer implemented software tools or may be performed by a human operator depending on the circumstances”; Note that the virtual camera positions P.sub.i to P.sub.i+N are mapped to the a plurality of three-dimensional (3D) virtual camera poses);
a two-dimensional (2D) feature-camera pose mapping model generator implemented by a processor (See Liu: Fig. 7, and [0045], “A computer implemented program 412 is stored in memory 404 of a computer device 414. The program 412 includes a module 416 configured to compare a real-time video image 452 taken by the camera 408 with CT images 450 to find a closest match between the real-time images and the CT image. The program 412 includes an optimization module 422 configured to find a maximum similarity to determine the closest match CT image. The program 412 is configured to register a closest matched real-time image to a pre-operative image in CT space to find a transformation matrix 420 between the CT space and image tracking space such that the transformation matrix 420 is based solely on image registration, is operator independent, and free of any external markers or anatomic landmarks to perform the registration. The transformation matrix 420 is employed to register coordinates of the CT images to electromagnetic tracking coordinates during an endoscopic procedure. A display 456 may be employed to view the real-time and/or virtual/pre-operative images during the procedure. The display 456 is configured to show endoscope progression in pre-operative image space. The marker-free registration process assumes a calibration process is employed beforehand to determine the relationship between the camera coordinate system and the tracking (EM) coordinate system”. Note that the software portions of “a module 416 configured to compare”, “an optimization module 422”, and “The transformation matrix 420” to determine the camera coordination system or pose based on the input images is mapped to the camera pose generator) and configured for, with respect to each of the plurality of 3D virtual camera poses (See Liu: Figs. 2-3, and [0038], “Referring to FIGS. 2 and 3, a virtual image 20 is shown at a carina position of a lung. A camera pose at the virtual position (VB) is recorded as P.sub.V”, Note that the every camera virtual position P.sub.VB, a virtual image is generated, and this is mapped to “with respect to each of the plurality of 3D virtual camera pose”),
projecting a 3D model for a target organ  (See Pheiffer: Fig. 1, and [0018], “According to an embodiment of the present invention, the sequence of intra-operative images can be acquired by a user (e.g., doctor, clinician, etc.) performing a complete scan of the target organ using the image acquisition device (e.g., laparoscope or endoscope). In this case the user moves the image acquisition device while the image acquisition device continually acquires images (frames), so that the frames of the intra-operative image sequence cover the complete surface of the target organ. This may be performed at a beginning of a surgical procedure to obtain a full picture of the target organ at a current deformation. A 3D stitching procedure may be performed to stitch together the intra-operative images to form an intra-operative 3D model of the target organ, such as the liver”) onto a 2D image plane (See Pheiffer: Fig. 1, and [0019], “At step 106, the pre-operative 3D medical image volume is registered to the 2D/2.5D intra-operative images using the relative orientation measurements of the intra-operative images to constrain the registration. According to an embodiment of the present invention, this registration is performed by simulating camera projections from the pre-operative 3D volume using a parameters space defining the position and orientation of a virtual camera (e.g., virtual endoscope/laparoscope). The simulation of the projection images from the pre-operative 3D volume can include photorealistic rendering. The position and orientation parameters determine the appearance and well as the geometry of simulated 2D/2.5D projection images from the 3D medical image volume, which are directly compared to the observed 2D/2.5D intra-operative images via a similarity metric”. Note that the 3D model of tissues are projected to generate 2D/2.5D images, the 2D projected images are projected from the 3D volumes so it is projected on the 2D planes) determined based on the 3D virtual camera pose to generate a virtual 2D image of the target organ in a perspective corresponding to the 3D virtual camera pose  (See Liu: Fig. 4, and [0027], “The present disclosure describes systems and methods for scope calibration and registration. A simple method for calibrating an electro-magnetic (EM) guided endoscopy system computes a transformation matrix for an offset between a camera coordinate and an endoscope tracker. The offset distance between a camera frame and an endoscope tracker frame is reflected in a disparity in 2D projection images between a real video image and a virtual fly-through image. Human eyes or a computer are used to differentiate this spatial difference and rebuild the spatial correspondence. The spatial offset becomes the calibration result”; and [0039], “Referring to FIG. 4, a relationship between an EM tracker coordinate system 40, a camera coordinate system 42 and a CT coordinate system 44 is illustratively depicted. The three local coordinate systems 40, 42 and 44 need to be interconnected to permit transformation between the camera coordinate system 42 (where the center of projection and optical axis are located), EM sensor coordinate system 40, and CT coordinate system 44”. Note that there are three coordinates systems: EM, camera, and CT coordinates systems, and there are transform matrixes among them; the projected 2D images are with respect to the camera coordinate system 42, which is mapped to “based on the 3D virtual camera pose to generate a virtual 2D image of the target organ in a perspective corresponding to the 3D virtual camera pose”. However, Liu does not teach that “projecting a 3D model for a target organ onto a (two-dimensional) 2D image plane”, and secondary art will be used to address this limitation), 
obtaining 2D features of the virtual 2D image (See Liu: Fig. 6, and [0042], “Referring to FIG. 6, a method for image-based registration between images is illustratively shown in accordance with one illustrative embodiment. In block 302, computer tomography (CT) (or other pre-operative) images of a subject are collected or provided. Advantageously, no markers are needed in the CT images. In block 304, an anatomical reference or feature is located in a video image (e.g., a real-time image taken with a camera of an endoscope) which corresponds to a particular pre-operative image. This may include tracking an endoscope with electromagnetic tracking”. Note that the CT image reference or feature is mapped to the 2D features pf the virtual 2D images), and 
creating a pair representing a mapping (See Liu: Figs. 4-6, and [0035], “In accordance with the present principles, three local coordinate systems need to be inter-connected to permit a mapping of events therebetween. These include a camera coordinate system (where the center of projection and optical axis are located), EM sensor coordinate system, and CT coordinate system”) from the 2D features to the 3D virtual camera pose (See Kurz: Fig. 3, and [0044], “According to another embodiment of the invention, as the capturing device a range data capturing device may be used, wherein pixels of images taken with any kind of range data capturing device, such as laser scanners, time-of-flight cameras, or stereo cameras may have associated 3D coordinates. In this case any orientation in a common coordinate system for a particular feature point can be computed from the 3D positions of the neighboring pixels of the feature point”; and [0015], “In FIG. 3, there is shown a standard approach for creating a feature descriptor. In step S1, an image is captured by a capturing device, e.g. a camera, or loaded from a storage medium. In step S2, feature points are extracted from the image and stored in a 2-dimensional description (parameters u, v). In step S3, an orientation assignment is performed as described above with respect to FIG. 2, to add to the parameters u, v an orientation angle a. Thereafter, a neighborhood normalization step S4 is performed, as described above with respect to FIG. 2 to gain normalized neighborhood pixel intensities i[ ]. In the final step S5, a feature descriptor in the form of a descriptor vector d[ ] is created for the respective extracted feature as a function of the normalized neighborhood pixel intensities i[ ]. Approaches exist that may assign multiple orientation angles to a feature in step S3 and consequently carry out the steps S4 and S5 for each orientation resulting in one descriptor per assigned orientation”. Note that the features a 2-dimensional description (parameters u, v) is mapped to the 2D features, and the feature descriptor has a feature and orientation of the camera capturing the image is mapped to the pair of feature and camera pose); 
obtaining a 2D feature-camera pose mapping model based on the pairs of 2D features and the plurality of 3D virtual camera poses (See Kurz: Fig. 4, and [0043], “In step SI 4, an orientation assignment is performed to add to the parameters u, v an orientation angle a based on the feature orientation angle a determined in step SI 3. Thereafter, a neighborhood normalization step SI 5 is performed to gain normalized neighborhood pixel intensities i[ ]. In the final step SI 6, a feature descriptor in the form of a descriptor vector d[ ] is created for the respective extracted feature depending on a parameter which is indicative of an orientation of the extracted feature, particularly resulting from the orientation assignment in step SI 4”. Note that the a descriptor vector d[ ] is mapped to t a 2D feature-camera pose mapping model, because each feature descriptor has a pair of 2D image feature and camera pose, and a full set of feature descriptors is a model that provides a mapping between features and the camera poses); and 
a camera pose estimator implemented by a processor (See Liu: Fig. 7, and [0045], “A computer implemented program 412 is stored in memory 404 of a computer device 414. The program 412 includes a module 416 configured to compare a real-time video image 452 taken by the camera 408 with CT images 450 to find a closest match between the real-time images and the CT image. The program 412 includes an optimization module 422 configured to find a maximum similarity to determine the closest match CT image. The program 412 is configured to register a closest matched real-time image to a pre-operative image in CT space to find a transformation matrix 420 between the CT space and image tracking space such that the transformation matrix 420 is based solely on image registration, is operator independent, and free of any external markers or anatomic landmarks to perform the registration. The transformation matrix 420 is employed to register coordinates of the CT images to electromagnetic tracking coordinates during an endoscopic procedure. A display 456 may be employed to view the real-time and/or virtual/pre-operative images during the procedure. The display 456 is configured to show endoscope progression in pre-operative image space. The marker-free registration process assumes a calibration process is employed beforehand to determine the relationship between the camera coordinate system and the tracking (EM) coordinate system”. Note that the software portion of “transformation matrix 420” without the optimization module 422 to determine the camera coordination system or pose is mapped to a camera pose estimator) and configured for obtaining a 3D pose estimate of a laparoscopic camera (See Pheiffer: Fig. 1, and [0017], “At step 104, a sequence of intra-operative images is received along with corresponding relative orientation measurements. The sequence of intra-operative images can also be referred to as a video, with each intra-operative image being a frame of the video. For example, the intra-operative image sequence can be a laparoscopic image sequence acquired via a laparoscope or an endoscopic image sequence acquired via an endoscope”; and [0020], “An optimization framework is used to select the pose parameters for the virtual camera that maximize the similarity (or minimize the difference) between the simulated projection images and the received intra-operative images. That is, the optimization problem calculates position and orientation parameters that maximize a total similarity (or minimizes a total difference) between each 2D/2.5 intra-operative image and a corresponding simulated 2D/2.5D projection image from the pre-operative 3D volume over all of the intra-operative images”. Note that “select the pose parameters for the virtual camera” is mapped to “obtaining a 3D pose estimate of a laparoscopic camera”) by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate (See Kurz: Figs. 5A-B, and [0057], “In the first image IMI, a real static object ROI as shown in FIG. 5A is captured by a camera (not shown). In the image IMI features of the real object ROI are extracted, such as shown in FIG. 5A by features F51. In a following step, descriptors may be computed for every extracted feature F51 in accordance with the method of the invention. These features F51 are then matched with features F52 extracted in the second image IM2. The second image IM2 is depicting a real object R02 which corresponds with real object ROI under a different viewpoint, wherein for the features F52 also a respective descriptor is determined. Particularly, if the descriptors of features F51 and F52 are relatively close in terms of a certain similarity measure, they are matched. For example, if every descriptor is written as a vector of numbers, when comparing two descriptors, one can use the Euclidian distance between two corresponding vectors as similarity measure”; and [0053], “Camera tracking describes the process of computing the pose (position and orientation) of a camera given one or more camera images. Features in the camera image are either matched against reference features with known 3D positions to compute an absolute pose or against features from the previous frame to compute the relative change in position and orientation”. Note that the computed feature descriptors with known camera pose are matched to the real image feature descriptors, and the camera pose for the real image is estimated/determined, and this is mapped to “obtaining a 3D pose estimate of a laparoscopic camera by mapping, via the 2D feature-camera pose mapping model, input 2D features extracted from a real-time 2D image of the target organ acquired by the laparoscopic camera to the 3D camera estimate”), and 
refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera  (See Pheiffer: Fig. 1, and [0020], “According to an embodiment of the present invention, the similarity metric is calculated for the target organ in intra-operative images and the corresponding simulated projection images. This optimization problem can be performed using any similarity or difference metric and can be solved using any optimization algorithm. For example, the similarity metric can be cross correlation, mutual information, normalized mutual information, etc., and the similarity metric may be combined with a geometry fitting term for fitting the simulated 2.5D depth data to the observed 2.5D depth data based on the geometry of the target organ. As described above the orientation sensors mounted to the intra-operative image acquisition device (e.g., endoscope/laparoscope) provide relative orientations of the intra-operative images with respect to each other. These relative orientations are used to constrain the optimization problem. In particular, the relative orientations of the intra-operative images constrain the set of orientation parameters calculated for the corresponding simulated projection images. Additionally, the scaling is known due to metric 2.5D sensing, resulting in an optimization for pose refinement on the unit sphere. The optimization may be further constrained based on other a priori information from a known surgical plan used in the acquisition of the intra-operative images, such as a position of the operating room table, position of the patient on the operating room table, and a range of possible camera orientations”. Note that the virtual camera pose parameters are refined by the optimization process, and this is mapped to “refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera”) via differential rendering of the 3D model with respect to the 3D pose estimate (See Birchfield: Fig. 8, and [0123], “A system for object pose estimation may perform a single-stage method category-level 6-DoF pose prediction of previously unseen object instances. A system for object pose estimation may not require various 3D models of instances at training and/or test time, and synthetic data may not be required for training. For 2D keypoint detection, a system for object pose estimation may utilize a combined representation of both displacements and heatmaps to mitigate uncertainty. A system for object pose estimation may estimate the relative dimensions of the 3D bounding cuboid. To further improve accuracy, a system for object pose estimation may utilize a convGRU sequential feature association. A system for object pose estimation may be evaluated in connection with one or more other systems using a dataset such as an Objectron dataset. A system for object pose estimation may be utilized in various contexts, including robotic grasping tasks and various other real-world applications. A system for object pose estimation may be specific to one or more categories, incorporate differential rendering, and leverage iterative post refinement”. Note that the pose estimate may be refined by differential rendering or iterative post refinement, and this is mapped to “refining the 3D pose estimate to derive an estimated 3D camera pose of the laparoscopic camera via differential rendering of the 3D model with respect to the 3D pose estimate because Liu modified by Pheiffer teaches to refine the 3D camera pose estimate by some optimization process, and Birchfield teaches that the pose estimate can be refined by differential rendering of the 3D object, and combining with them will arrive at exactly the current cited limitations).
Regarding claim 16, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 15 as outlined above. Further, Liu, Pheiffer, and Kurz teach that the system of claim 15, wherein the 2D features include one or more of: 
intensity features (See Kurz: Fig. 2, and [0014], “A variety of local feature descriptors exist, wherein a good overview and comparison is given in Krystian Mikolajczyk and Cordelia Schmid, "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis & Machine Intelligence, 10, 27 (2005), pp. 1615-1630. Most of them are based on the creation of histograms of either intensity values of the normalized local neighborhood pixels or of functions of them, such as gradients. The final descriptor is expressed as an n-dimensional vector (as shown in FIG. 2 on the right) and can be compared to other descriptors using a similarity measure such as the Euclidian distance”) characterizing the appearance of the 3D model (See Pheiffer: Fig. 1, and [0019], “The simulation of the projection images from the pre-operative 3D volume can include photorealistic rendering. The position and orientation parameters determine the appearance and well as the geometry of simulated 2D/2.5D projection images from the 3D medical image volume, which are directly compared to the observed 2D/2.5D intra-operative images via a similarity metric”) when projected to the 2D image plane (See Kurz: Fig. 4, and [0042], “A very simple way to gain the orientation for all features is to transform the gravitational force to a coordinate system attached to the capturing device using the Euler angles first and then project it onto the image plane. Thereby, the direction of the gravitational force in the image is computed and used for all features in the image. This technique assumes orthogonal projection which is generally not the case. Incorporating the intrinsic parameters of the camera relaxes this assumption but still all techniques based on 2D images assume everything visible in the image to lie on a plane and therefore are approximations”); and
geometric features (See Liu: Fig. 3, and [0007], “Generally speaking, calibration is an offline procedure: the calibration parameters can be obtained by imaging an EM-tracked phantom (with a calibration pattern such as a checkerboard) that has known geometric properties, using an EM-tracked endoscope. This involves a cumbersome engineering procedure”) characterizing the shape of the projected 3D model (See Liu: Fig. 3, and [0030], “It should be understood that the present invention will be described in terms of a bronchoscope; however, the teachings of the present invention are much broader and are applicable to any optical scope that can be employed in internal viewing of branching, curved, coiled or other shaped systems (e.g., digestive systems, circulatory systems, piping systems, passages, mines, caverns, etc.)”) in the 2D image plane (See Kurz: Fig. 4, and [0042], “A very simple way to gain the orientation for all features is to transform the gravitational force to a coordinate system attached to the capturing device using the Euler angles first and then project it onto the image plane. Thereby, the direction of the gravitational force in the image is computed and used for all features in the image. This technique assumes orthogonal projection which is generally not the case. Incorporating the intrinsic parameters of the camera relaxes this assumption but still all techniques based on 2D images assume everything visible in the image to lie on a plane and therefore are approximations”).
Regarding claim 19, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 15 as outlined above. Further, Birchfield teaches that the system of claim 15, wherein the step of obtaining the 2D feature-camera pose mapping model comprises:
generating training data based on the pairs (See Birchfield: Fig. 1, and [0097], “One or more systems, such as a training framework, may train one or more neural networks of a system for object pose estimation 102. One or more systems may obtain training data from one or more datasets or other systems and utilize the training data to train one or more neural networks of a system for object pose estimation 102. Training data may comprise images with 6-DoF poses and relative dimensions of bounding cuboids of objects depicted in the images indicated. Training data may comprise images and ground truth data (e.g., indicating 6-DoF poses and relative dimensions of bounding cuboids of objects depicted in the images). Training data may comprise images in which each image is associated with 2D points (e.g., vertices) of a bounding cuboid of an object depicted in the image, a centroid, and relative dimensions”. Note that “Training data may comprise images with 6-DoF poses” is mapped to “generating training data based on the pairs”); and
obtaining, via machine learning, the 2D feature-camera pose mapping model capable of mapping input 2D features to a 3D camera pose  (See Birchfield: Fig. 1, and [0100], “One or more systems may compute overall loss, denoted by custom-character.sub.all, for one or more neural networks of a system for object pose estimation 102, which may be a weighted combination of one or more loss terms, which can be denoted by a following formula, although any variations thereof can be utilized”. Note that the neural network model after training to estimate pose from the image is mapped to “2D feature-camera pose mapping model” because the neural network can generate/estimate (map) the object pose based on the input image features through analyzing the input images; in addition, the term “mapping” is broad, which may be a relationship, a function, a look-up table, or a neural network model, etc.).
Regarding claim 20, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 15 as outlined above. Further, Liu and Birchfield teach that the system of claim 15, wherein the input 2D features are obtained by:  
acquiring, during a surgery via the laparoscopic camera positioned at a 3D camera pose (See Liu: Figs. 1-3, and [0029], “In particularly useful embodiments, the scope may include a bronchoscope or any scope for pulmonary, digestive system, or other minimally invasive surgical viewing. In other embodiments, an endoscope or the like is employed for other medical procedures as well. These procedures may include minimally invasive endoscopic pituitary surgery, endoscopic skull base tumor surgery, intraventricular neurosurgery, arthroscopic surgery, laparoscopic surgery, etc. Other scoping applications are also contemplated”), the real time 2D image of the target organ (See Liu: Figs. 4-7, and [0043], “In block 306, a series of video images are collected around the feature to attempt to replicate the pose of the virtual or pre-operative image”; and [0044], “Referring to FIG. 7, a system 400 for image-based registration between images is illustratively shown. The system 400 includes a computer tomography (CT) scanner 402 (or other pre-operative imager or scanner) although the scanner 402 is not needed as the CT images may be stored in memory 404 and transferred to the system 400 using storage media or network connections. The memory 404 and/or scanner are employed to store/collect CT images of a subject, such as a patient for surgery. An endoscope 406 includes a camera 408 for collecting real-time images during a procedure. The endoscope 406 includes a tracker system 410, e.g., an electromagnetic (EM) tracker for locating a tip of the endoscope. The tracker system 410 needs to have its coordinate system mapped or transformed into the CT coordinate system. The tracker system 410 employs an NDI field generator 411 to track the progress of the endoscope 406”);
processing the real time 2D image to generate a segmentation of the target organ (See Birchfield: Fig. 41, and [0598], “In addition, deployment pipeline 4010A may include additional processing tasks or applications that may be implemented to prepare data for use by applications (e.g., DICOM adapter 4002B and DICOM reader 4106 may be used in deployment pipeline 4010A to prepare data for use by CT reconstruction 4108, organ segmentation 4110, etc.). In at least one embodiment, deployment pipeline 4010A may be customized or selected for consistent deployment, one time use, or for another frequency or interval. In at least one embodiment, a user may desire to have CT reconstruction 4108 and organ segmentation 4110 for several subjects over a specific interval, and thus may deploy pipeline 4010A for that period of time. In at least one embodiment, a user may select, for each request from system 4000, applications that a user wants to perform processing on that data for that request. In at least one embodiment, deployment pipeline 4010A may be adjusted at any interval and, because of adaptability and scalability of a container structure within system 4000, this may be a seamless process”); 
extracting input 2D features of the target organ as it appears in the real time 2D image (See Birchfield: Fig. 41, and [0061], “In an embodiment, a system for object pose estimation obtains an RGB (red-green-blue) image depicting an object of a particular category. A category may refer to a classification or class of objects, in which objects belonging to the category may be similar, such as, for example, a category referred to as “mugs” may include instances of mugs with different colors, sizes, with or without handles, and/or variations thereof. The system may utilize a neural network to extract features from the image, and calculate various outputs based at least in part on the extracted features”).




Claims 3, 10, and 17, are rejected under 35 U.S.C. 103 as being unpatentable over Liu, etc. (US 20120069167 A1) in view of Pheiffer, etc. (US 20180150929 A1), further in view of Kurz, etc. (US 20120219188 A1), Birchfield, etc. (US 20220277472 A1), and Kang, etc. (US 20110103657 A1).
Regarding claim 3, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 2 as outlined above. Further, Birchfield teaches that the method of claim 2, wherein the step of obtaining 2D features comprises: 
processing the 2D image to obtain a segmentation of the target organ (See Birchfield: Fig. 41, and [0602], “Although illustrated as consecutive application in deployment pipeline 4010A, CT reconstruction 4108 and organ segmentation 4110 applications may be processed in parallel in at least one embodiment. In at least one embodiment, where applications do not have dependencies on one another, and data is available for each application (e.g., after DICOM reader 4106 extracts data), applications may be executed at a same time, substantially at a same time, or with some overlap. In at least one embodiment, where two or more applications require similar services 3920, a scheduler of system 4000 may be used to load balance and distribute compute or processing resources between and among various applications. In at least one embodiment, in some embodiments, parallel computing platform 4030 may be used to perform parallel processing for applications to decrease run-time of deployment pipeline 4010A to provide real-time results”);
computing the intensity features within the segmentation; and 
determining the geometric features of the target organ based on the segmentation (See Birchfield: Fig. 1, and [0073], “A feature extraction 106 may implement various neural network processes to generate feature maps from an input image 104. A feature extraction 106 may perform any suitable feature extraction process, such as those including operations such as convolution, pooling, upsampling, downsampling, aggregation, concatenation, skip-connections, activation functions, projection, interpolation, normalization, and/or variations thereof, to generate a feature map. In an embodiment, a feature map, which can be referred to as an activation map, is a set of data that indicates output activations for a given filter. A feature map may indicate an output of one or more operations applied to an input. A feature map may indicate various features of an image. In some examples, channels of a feature map correspond to various features and/or aspects of the feature map. In at least one embodiment, each channel of a feature map represents an aspect of information, such as particular features (e.g., edges, corners), particular colors (e.g., red, blue, green), and/or variations thereof. A feature extraction 106 may perform any suitable processes to determine a feature map from an input image 104. A feature map may be denoted as Φ(I)∈custom-character.sup.H/4×W/4×64”. Note that the edges, corners, etc., are mapped to the geometric features).
However, Liu, modified by Pheiffer, Kurz, and Birchfield, fails to explicitly disclose that computing the intensity features within the segmentation.
However, Kang teaches that computing the intensity features within the segmentation (See Kang: Figs. 3-4, and [0068], “The density distribution along the longitudinal axis of the cylinder (i.e., into and out of the page in FIG. 3B) is substantially uniform and does not vary substantially and may be modeled as a constant function of the cross-sectional distribution along the longitudinal axis, that is, as a constant function of the radial distance d from the center of the distribution. FIG. 4 illustrates schematically a cylindrical vessel segment intensity distribution model. In particular, the model of the cylindrical vessel segment has a maximum density at the center that decays exponentially to the boundary of the vessel as a function of the radial distance d, from the center. At each distance d, the density is uniform along the z-axis. For example, the density at d=0 is the density maximum along the length of the vessel. This density maximum shown by line 405 is referred to as a ridge, and corresponds to the centerline of a vessel”. Note that the cylindrical vessel segment intensity distribution is mapped to computing the intensity features within the segmentation).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Liu to have computing the intensity features within the segmentation as taught by Kang in order to perform extraction of geometry from image of blood vessel for analyzing biological tubular structure more efficiently (See Kang: Fig. 1, and [0002], “Aspects of the present invention relate to extracting geometry from one or more images for use in analyzing biological tubular structures for diagnostic and therapeutic applications in animals. In particular, aspects of the invention relate to extracting geometry from images of blood vessels to identify structural features useful for detecting, monitoring, and/or treating diseases, and/or for evaluating and validating new therapies”). Liu teaches a method and system for marker-free image-based registration between images locating a feature in a pre-operative image and comparing real-time images taken with a tracked scope with the pre-operative image taken of the feature to find a real-time image that closely matches the pre-operative image that may determine a transformation matrix between a position of the pre-operative image and a position of the real-time image provided by a tracker so that the determined transformation matrix can be used to determine the real-time image coordinates using the pre-operative image space and the transformation matrix; while Kang teaches a system and method of obtaining geometry from images that may detect features associated with a blood vessel in the image using matching filter and segmentation analysis to analyze the image intensity features. Therefore, it is obvious to one of ordinary skill in the art to modify Liu by Kang to detect 2D image features using segmentation process and intensity feature analysis. The motivation to modify Liu by Kang is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 10, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 9 as outlined above. Further, Birchfield and Kang teach that the medium of claim 9, wherein the step of obtaining 2D features comprises: 
processing the 2D image to obtain a segmentation of the target organ (See Birchfield: Fig. 41, and [0602], “Although illustrated as consecutive application in deployment pipeline 4010A, CT reconstruction 4108 and organ segmentation 4110 applications may be processed in parallel in at least one embodiment. In at least one embodiment, where applications do not have dependencies on one another, and data is available for each application (e.g., after DICOM reader 4106 extracts data), applications may be executed at a same time, substantially at a same time, or with some overlap. In at least one embodiment, where two or more applications require similar services 3920, a scheduler of system 4000 may be used to load balance and distribute compute or processing resources between and among various applications. In at least one embodiment, in some embodiments, parallel computing platform 4030 may be used to perform parallel processing for applications to decrease run-time of deployment pipeline 4010A to provide real-time results”);
computing the intensity features within the segmentation (See Kang: Figs. 3-4, and [0068], “The density distribution along the longitudinal axis of the cylinder (i.e., into and out of the page in FIG. 3B) is substantially uniform and does not vary substantially and may be modeled as a constant function of the cross-sectional distribution along the longitudinal axis, that is, as a constant function of the radial distance d from the center of the distribution. FIG. 4 illustrates schematically a cylindrical vessel segment intensity distribution model. In particular, the model of the cylindrical vessel segment has a maximum density at the center that decays exponentially to the boundary of the vessel as a function of the radial distance d, from the center. At each distance d, the density is uniform along the z-axis. For example, the density at d=0 is the density maximum along the length of the vessel. This density maximum shown by line 405 is referred to as a ridge, and corresponds to the centerline of a vessel”. Note that the cylindrical vessel segment intensity distribution is mapped to computing the intensity features within the segmentation); and 
determining the geometric features of the target organ based on the segmentation (See Birchfield: Fig. 1, and [0073], “A feature extraction 106 may implement various neural network processes to generate feature maps from an input image 104. A feature extraction 106 may perform any suitable feature extraction process, such as those including operations such as convolution, pooling, upsampling, downsampling, aggregation, concatenation, skip-connections, activation functions, projection, interpolation, normalization, and/or variations thereof, to generate a feature map. In an embodiment, a feature map, which can be referred to as an activation map, is a set of data that indicates output activations for a given filter. A feature map may indicate an output of one or more operations applied to an input. A feature map may indicate various features of an image. In some examples, channels of a feature map correspond to various features and/or aspects of the feature map. In at least one embodiment, each channel of a feature map represents an aspect of information, such as particular features (e.g., edges, corners), particular colors (e.g., red, blue, green), and/or variations thereof. A feature extraction 106 may perform any suitable processes to determine a feature map from an input image 104. A feature map may be denoted as Φ(I)∈custom-character.sup.H/4×W/4×64”. Note that the edges, corners, etc., are mapped to the geometric features).
Regarding claim 17, Liu, Pheiffer, Kurz, and Birchfield teach all the features with respect to claim 16 as outlined above. Further, Birchfield and Kang teach that the system of claim 16, wherein the step of obtaining 2D features comprises: 
processing the 2D image to obtain a segmentation of the target organ (See Birchfield: Fig. 41, and [0602], “Although illustrated as consecutive application in deployment pipeline 4010A, CT reconstruction 4108 and organ segmentation 4110 applications may be processed in parallel in at least one embodiment. In at least one embodiment, where applications do not have dependencies on one another, and data is available for each application (e.g., after DICOM reader 4106 extracts data), applications may be executed at a same time, substantially at a same time, or with some overlap. In at least one embodiment, where two or more applications require similar services 3920, a scheduler of system 4000 may be used to load balance and distribute compute or processing resources between and among various applications. In at least one embodiment, in some embodiments, parallel computing platform 4030 may be used to perform parallel processing for applications to decrease run-time of deployment pipeline 4010A to provide real-time results”);
computing the intensity features within the segmentation (See Kang: Figs. 3-4, and [0068], “The density distribution along the longitudinal axis of the cylinder (i.e., into and out of the page in FIG. 3B) is substantially uniform and does not vary substantially and may be modeled as a constant function of the cross-sectional distribution along the longitudinal axis, that is, as a constant function of the radial distance d from the center of the distribution. FIG. 4 illustrates schematically a cylindrical vessel segment intensity distribution model. In particular, the model of the cylindrical vessel segment has a maximum density at the center that decays exponentially to the boundary of the vessel as a function of the radial distance d, from the center. At each distance d, the density is uniform along the z-axis. For example, the density at d=0 is the density maximum along the length of the vessel. This density maximum shown by line 405 is referred to as a ridge, and corresponds to the centerline of a vessel”. Note that the cylindrical vessel segment intensity distribution is mapped to computing the intensity features within the segmentation); and
determining the geometric features of the target organ based on the segmentation (See Birchfield: Fig. 1, and [0073], “A feature extraction 106 may implement various neural network processes to generate feature maps from an input image 104. A feature extraction 106 may perform any suitable feature extraction process, such as those including operations such as convolution, pooling, upsampling, downsampling, aggregation, concatenation, skip-connections, activation functions, projection, interpolation, normalization, and/or variations thereof, to generate a feature map. In an embodiment, a feature map, which can be referred to as an activation map, is a set of data that indicates output activations for a given filter. A feature map may indicate an output of one or more operations applied to an input. A feature map may indicate various features of an image. In some examples, channels of a feature map correspond to various features and/or aspects of the feature map. In at least one embodiment, each channel of a feature map represents an aspect of information, such as particular features (e.g., edges, corners), particular colors (e.g., red, blue, green), and/or variations thereof. A feature extraction 106 may perform any suitable processes to determine a feature map from an input image 104. A feature map may be denoted as Φ(I)∈custom-character.sup.H/4×W/4×64”. Note that the edges, corners, etc., are mapped to the geometric features).

Allowable Subject Matter
Claims 4, 11, and 18, are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The best arts searched do not teach the cited limitation of “The method of claim 1, wherein the step of obtaining the 2D feature-camera pose mapping model comprises: constructing a look-up table (LUT) based on the pairs, wherein the LUT represents relationships between 2D features extracted from 2D images and 3D camera poses.”
Claims 7, 14, and 21, are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The best arts searched do not teach the cited limitation of “The method of claim 1, wherein the step of refining the 3D pose estimate by: generating a perturbed 3D camera pose based on the 3D pose estimate; creating a differential rendering of the 3D model based on the perturbed 3D camera pose; computing a loss based on the discrepancy between the real time 2D image and the differential rendering; outputting the perturbed 3D camera pose as the estimated 3D camera pose of the laparoscopic camera, if the loss satisfies a convergence condition; and repeating the steps of generating, creating, computing, and outputting until the perturbed 3D camera pose yields a differential rendering that satisfies the convergence condition.”


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Devona E Faulk can be reached at 571-272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2618
Read full office action
Prosecution Timeline

Jul 31, 2024
Application Filed
Mar 19, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/236,346
Patent 12602846
GENERATING REALISTIC MACHINE LEARNING-BASED PRODUCT IMAGES FOR ONLINE CATALOGS
2y 5m to grant Granted Apr 14, 2026
18/442,998
Patent 12602840
IMAGE PROCESSING SYSTEM, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 14, 2026
18/468,209
Patent 12602871
MESH TOPOLOGY GENERATION USING PARALLEL PROCESSING
2y 5m to grant Granted Apr 14, 2026
18/527,183
Patent 12592022
INTEGRATION CACHE FOR THREE-DIMENSIONAL (3D) RECONSTRUCTION
2y 5m to grant Granted Mar 31, 2026
18/014,973
Patent 12586330
DISPLAYING A VIRTUAL OBJECT IN A REAL-LIFE SCENE
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
98%
With Interview (+15.1%)
2y 4m
Median Time to Grant
Low
PTA Risk
Based on 673 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND SYSTEM FOR ESTIMATING 3D CAMERA POSE BASED ON 2D IMAGE FEATURES AND APPLICATION THEREOF

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email