DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s Response to Restriction Requirement
In the Office Action of October 15, 2025, claims 1-20 were restricted to Group I, consisting of claims 1-15 and Group II, consisting of claims 16-20. In the response filed on December 12, 2025, Applicant elected, without traverse, claims 1-15 for prosecution on the merits.
Claim Interpretation
The claims in this application are given their broadest reasonable interpretation (BRI) using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The BRI of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification.
In the following, some of the terms in the claims have been given BRIs in light of the specification. These BRIs are used for purposes of searching for prior art and examining the claims, but cannot be incorporated into the claims. Should Applicant believe that different interpretations are appropriate, Applicant should point to the portions of the specification that clearly support a different interpretation.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
Claim 2 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 2 recites that the relative pose of claim 1 is a first relative pose and recites steps of generating second and third relative poses of the reference camera relative to the camera of the device and of the camera of the device relative to the reference camera, respectively, and determining an updated relative pose based on the first relative pose and the third relative pose. The present specification does not provide explicit or implicit support for this combination of limitations in such a way as to reasonably convey to one skilled in the relevant art that the inventors, at the time the application was filed, had possession of the claimed invention.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claim 2 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites, inter alia, applying the query image and the reference image to the relative pose regression network to generate a second relative pose indicating a pose of the reference camera relative to the camera of the device. However, claim 1 recites applying the query image and the reference image to the relative pose regression network to generate a relative pose indicating a pose of the camera of the device relative to the reference camera. It is unclear the query image and the reference image that are applied in order to generate the relative pose recited in claim 1 are the same as or different from the query image and the reference image that are applied in order to generate the second relative pose recited in claim 2. It is also unclear whether the applying step recited in claim 2 is the same as or different from the applying step recited in claim 1. Furthermore, it is unclear what is meant by the step recited in claim 2 of determining an updated relative pose based on the first relative pose and the third relative pose. For these reasons, claim 2 is indefinite.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-5 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over an article entitled “Learning to Localize in New Environments from Synthetic Training data”, by Winkelbauer et al., published May 31, 2021-June 4, 2021 in IEEE International Conference on Robotics and Automation (ICRA 2021) (hereinafter referred to as “Winkelbauer”) in view of an article entitled “Paying Attention to Activation Maps in Camera Pose Regression”, by Shavit et al., published April 11, 2021 in arXiv:2103.11477v2 [cs.CV] (hereinafter referred to as “Shavit”).
Regarding claim 1, Winkelbuaer discloses a computer-implemented method for providing map-free relocalization of a device (Section III, Problem Statement: “[i]nstead of estimating the absolute pose directly, relative pose estimation methods like our approach estimate the relative pose Ti
→
q
between the query image Iq and one or multiple reference images Ii and combine these with the known absolute poses of the reference images to get an estimate for the absolute pose of the query image”), the method comprising:
obtaining a reference image of an environment captured by a reference camera at a reference pose (Section V. Experiments; in the experiment documented in Winkelbauer, the reference images are of the environment, but they are not captured by a reference camera; rather, the reference images are synthetically-generated realistic images of the environment (e.g., rooms with furniture) that are obtained from the SUNCG dataset; these synthetic images are associated with respective ground truth absolute virtual camera poses);
receiving a query image taken by a camera of the device (Section IV. Pose Estimation; the query image whose pose is to be determined in Winkelbauer is acquired using some camera of some device);
applying the reference image and the query image to a relative pose regression network to output a relative pose of the camera of the device relative to the reference camera in the environment (Section IV. Pose Estimation Pipeline; the reference image and the query image are applied to the inputs of the relative pose estimation network shown in Fig. 2; the network outputs the relative pose estimation of the query image), the relative pose regression network comprising:
a Siamese network configured to receive the reference image to generate a first set of feature maps, and receive the query image to generate a second set of feature maps (Fig. 2 shows a Siamese network that receives the reference image and the query image and generates first and second feature maps based on the reference image and the query image, respectively; Section IV. B discusses the configuration of the Siamese network and the extraction of the respective sets of feature maps);
a correlation network configured to receive the first set of feature maps and the second set of feature maps as input to generate a set of global features (the BRI for this limitation, based on Fig. 4A and para. [0048] of the present disclosure, is a module that receives the sets of feature maps and generates a set of global features from the first and second feature map sets; the example given in the present disclosure for this limitation is taking the dot product of the first and second feature map sets; Section IV. B, equation 1 of Winkelbuaer discloses that a correlation layer, represented by the block labeled “Ext. corr. Layer” in Fig. 2, generates an 8x8x64 set of 3D global features by taking the dot product of the first and second sets of feature maps);
a residual network configured to receive the set of global features as input to generate a global feature vector (the BRI for this limitation, based on para. [0050] and Fig. 4B, is a residual network module that receives the set of global features and processes it to generate a global features vector; Section IV. B, the guided correlation layer labeled “Guided corr. layer” in Fig. 2 of Winkelbauer is a residual network that perform feature matching guided by feature matching in previous layers of the network and generates a global feature vector, which is shown in Fig. 2 as a 32x32x144 feature vector); and
a multilayer perceptron network configured to receive the global feature vector as input to determine the relative pose of the camera of the device relative to the reference camera in the environment (Winkelbauer does not explicitly disclose that the global feature vector that is output from the guided correlation layer residual network is received and processed by a multilayer perceptron (MLP) network to determine the relative pose of the camera of the device relative to the reference camera in the environment; the network that follows the guided correlation layer that processes the global feature vector to determine the relative pose of the camera comprises a residual stack of layers, a global average pooling layer, a fully-connected layer, a dropout layer and additional fully-connected layers that process the global feature vector to determine the relative pose of the camera; the term “multilayer perceptron network” is not defined in the specification and therefore its BRI is its plain meaning as would be understood by one of ordinary skill in the art, which is a neural network having at least three layers, namely, an input layer, a hidden layer and a fully-connected output layer; the network layers that follow the guided correlation layers shown in Fig. 2 of Winkelbauer comprise a multilayer perceptron because they comprise input residual layers, fully-connected output layers and hidden layers in between the residual layers and the fully-connected output layers); and
determining a pose of the camera of the device based on the relative pose of the camera of the device and the reference pose of the reference camera (Section IV. B describes that the result of the operations performed by the relative pose estimation network shown in Fig. 2 being a determination of the relative pose of the query image relative to the reference image; since the absolute pose of the virtual camera associated with the reference image is known, the determination of the pose of the query image relative to the reference image constitutes a determination of the pose of the camera that captured the query image relative to the pose of the virtual camera associated with the reference image).
As indicated above, in the experiment documented in Winkelbauer, the reference images are of the environment, but they are not captured by a reference camera. Rather, the reference images are synthetically-generated realistic images of the environment (e.g., rooms with furniture) that are obtained from the SUNCG dataset. These synthetic images are associated with respective ground truth absolute virtual camera poses.
Shavit, also in the field of relative pose regression, discloses using the Cambridge Landmarks dataset as the database from which the reference images are retrieved for processing with the query image to determine the relative pose of the query image relative to the pose of the reference image (Section 4 discusses the Cambridge Landmarks database; Section 4.2 discusses determining the relative pose of the camera that captured the query image relative to the pose of the camera that captured the reference image). As is known in the art, the images contained in the Cambridge Landmarks dataset are images of landmarks around the United Kingdom that are captured by a camera of a device such as a handheld smartphone.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the present disclosure, to modify the system and method of Winkelbauer to use reference images from a database such as the Cambridge Landmarks database that are captured by cameras of devices as taught by Shavit. One of ordinary skill in the art would have been motivated to make the modification to avoid domain gap, i.e., the difference in appearance between rendered and real-world images, and take advantage of the superior accuracy and generalization that real images provide in terms of lighting, texture and environmental details. The modification could have been made by one of ordinary skill in the art before the effective filing date of the present disclosure with a reasonable expectation of success because making the modification merely involves combining prior art elements according to known methods to yield predictable results (replacing the image database that is used for the reference images).
Regarding claim 3, Winkelbauer discloses that the Siamese network comprises a first deep residual UNET and a second deep residual UNET, each of which is configured to receive the reference image or the query image (Section IV. Pose Estimation Pipeline, subsection B, the feature extraction performed by the Siamese network is a ResNet50 configured in “UNet-like” fashion).
Regarding claim 4, Winkelbauer does not explicitly disclose that the relative pose regression network includes a second multilayer perceptron (MLP). Shavit discloses a relative pose regression network that includes first and second MLPs that receive the global feature vector where the second MLP is configured to generate an angular error indicating a confidence level of the determined relative pose (Section 3.2, Camera Pose Loss, and Fig. 1, the second MLP generates the angular error, which is referred to as the orientation loss, Lq, in Shavit, and performs regression to minimize the orientation loss).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the present disclosure, to modify the system of Winkelbauer to include the second MLP of Shavit for generating the angular error indicating a confidence level of the determined relative pose as taught by Shavit. One of ordinary skill in the art would have been motivated to make the modification to provide separate MLPs that are separately trained to regress the positional loss and the orientation loss to better balance the two losses as taught by Shavit. The modification could have been made by one of ordinary skill in the art before the effective filing date of the present disclosure with a reasonable expectation of success because making the modification merely involves combining prior art elements according to known methods to yield predictable results (using the second MLP of Shavit in parallel with the MLP of Winkelbauer).
Regarding claim 5, in Shavit, the angular error is determined with respect to a ground truth relative pose (Section 3.2 discloses determining the angular error, i.e., the orientation loss, with respect to a ground truth relative pose because the second MLP is disclosed as being optimized to minimize the orientation loss Lq by minimizing the deviation between the ground truth pose associated with the camera that captured the reference image and the predicted pose of the camera that captured the query image).
Regarding claim 7, the BRI for this limitation, based on para. [0048] and Fig. 4A of the present disclosure, is that the 4-dimensional correlation volume is computed by taking the dot product of the height (H) x width (W) feature maps of the first set of feature maps and the H x W feature maps of the second set of feature maps. As indicated above in the rejection of claim 1, Section IV. B, equation 1 of Winkelbuaer discloses that a correlation layer, represented by the block labeled “Ext. corr. Layer” in Fig. 2, generates a set of global features by taking the dot product of the first and second sets of feature maps obtained by the Siamese network.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Winkelbauer in view of Shavit as applied to claims 1, 3-5 and 7 and further in view of an article entitled “Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses”, by Brachmann et al., published in arXiv:1905.04132v2 [cs.CV] on July 31, 2019 (hereinafter referred to as “Brachmann”).
The combined teachings of Winkelbauer and Shavit do not explicitly teach that the second MLP network is trained based on a soft clamping function, describing a ground truth error and a network prediction. Brachmann, also in the field of relative pose regression, discloses training MLPs based on a soft clamping function describing a ground truth and a network prediction (Section 5. A. Initialization Procedure discloses using a parameter σ in a soft clamping function to control the softness of the target distribution during training of the MLPs).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the present disclosure, to further modify the system of Winkelbauer as modified by Shavit based on the teachings of Brachmann to train the second MLP based on the soft clamping function of Brachmann. One of ordinary skill in the art would have been motivated to make the modification to train the second MLP to reduce the influence of outliers and generalize across different scenes. The modification could have been made by one of ordinary skill in the art before the effective filing date of the present disclosure with a reasonable expectation of success because making the modification merely involves combining prior art elements according to known methods to yield predictable results (using a soft clamping function during training of the second MLP).
Allowable Subject Matter
Claims 8-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claim 8, none of the art of record teaches or suggests, in combination with the other limitations, that the correlation network is further configured to use the 4-dimensional correlation volume to warp the second set of feature maps and a regular grid of coordinates.
Regarding claims 9-15, these claims recite allowable subject matter due to their direct or indirect dependencies from claim 8.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL J SANTOS whose telephone number is (571)272-2867. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Bella can be reached at (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DANIEL J. SANTOS/Examiner, Art Unit 2667
/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667