Prosecution Insights
Last updated: April 19, 2026
Application No. 18/648,212

METHOD AND DEVICE WITH DETERMINING POSE OF TARGET OBJECT IN QUERY IMAGE

Non-Final OA §102§103
Filed
Apr 26, 2024
Examiner
BEKELE, MEKONEN T
Art Unit
2699
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
1 (Non-Final)
79%
Grant Probability
Favorable
1-2
OA Rounds
2y 11m
To Grant
92%
With Interview

Examiner Intelligence

Grants 79% — above average
79%
Career Allow Rate
599 granted / 757 resolved
+17.1% vs TC avg
Moderate +13% lift
Without
With
+13.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
23 currently pending
Career history
780
Total Applications
across all art units

Statute-Specific Performance

§101
12.8%
-27.2% vs TC avg
§103
42.2%
+2.2% vs TC avg
§102
27.5%
-12.5% vs TC avg
§112
9.8%
-30.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 757 resolved cases

Office Action

§102 §103
Detailed Action 1. Claims 1-20 are pending in this Application. Notice of Pre-AIA or AIA Status 2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless - (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention. 3. Claims 1-5 and 11-14 are rejected under 35 U.S.C. 102(a)(1)/102(a)(2) as being anticipated by Hajime et al., ( hereafter Hajime), “Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization” ICCV paper, IEEE explore, pub. 2019 As to claim 1, Hajime teaches A method performed by an electronic device, comprising: obtaining a query image ( Fig.2 Given a set of camera pose estimates for a query image) ; obtaining reference images corresponding to the query image (section 3.1 Candidate location retrieval. Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query.), wherein the reference images are obtained based on having respective reference objects therein that have a same object type as an object type of an object in the query image( as discussed above, the Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query image ) ; determining a first semantic feature and first information corresponding to the query image, wherein the first information comprises first geometric information of the query image or first positional information of the query image; determining second semantic features and second pieces of information of the respectively corresponding reference images, wherein the second pieces of information each comprise second geometric information or second positional information of their respectively corresponding reference images, each reference image having a corresponding second semantic feature and second piece of information (Abstract, Section I left col., 2nd par., Fig. 1, Section 3 Geometric-Semantic Pose Verification, the authors propose multiple approaches for pose verification based on the combination of appearance, scene geometry, and semantic information. They integrate their approach into the Inlock pipeline [72], a state-of-the-art visual localization approach for large-scale indoor scenes. The approach verifies the estimated pose by comparing Geometric-Semantic Pose of query image and database images . Specifically the approach verifies the estimated pose by comparing the semantics and surface normal extracted from the query (d, j) and database (f, l) ); and determining a pose of the target object based on (i) the first semantic feature and the first information and (ii) the second semantic features and the second pieces of information ( as discussed above the approach verifies the estimated pose by comparing the semantics and surface normal extracted from the query (d, j) and database (f, l) );). As to claim 2, Hajime teaches the determining of the pose of the target object comprises: generating a first association feature of the query image based on the first semantic feature and the first geometric information of the query image (Abstract, Section I left col., 2nd par., Fig. 1, Geometric information and Semantic information is extracted from the query image QD, where the geometric information of QD, is surface normal extracted from the query Q (d, j)); generating a second association feature of the query image based on the second semantic features and the second pieces of geometric information of the reference images (Abstract, Section I left col., 2nd par., Fig. 1, similarly the Geometric information and Semantic information is extracted from the database image Q, where the geometric information of Q, is surface normal extracted from the query Q (f, l));and determining the pose of the target object based on the first association feature and the second association feature( Abstract, Section I left col., 2nd par., Fig. 1, the estimated pose is calculated by comparing the semantics and surface normal extracted from the query (d, j) and database (f, l)) As to claim 3, Hajime teaches the obtaining of the reference images corresponding to the query image comprises: based on determining that the target object in the query image is an object registered in a database, obtaining the reference images from the database (section 3.1 right col., 2nd par., Candidate location retrieval. Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query). As to claim 4, Hajime teaches the determining of the pose of the target object based on the first association feature and the second association feature comprises: generating correlation matrixes of correlation between the query image and each of the respectively corresponding reference images based on the first association feature and the second association feature, wherein each correlation matrix represents a relative position of a first pixel block of the query image with respect to a positionally-corresponding second pixel block of its corresponding reference image (Section 3.1 right col., 2nd par., InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position. Let PNG media_image1.png 38 358 media_image1.png Greyscale be the local descriptor similarity function between Root- SIFT descriptors extracted at pixel position (x, y) in Q and QD. The similarity score between Q and QD); and determining the pose of the target object based on the correlation matrixes ( as discuss above the pose is determined by evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position given by equation 1). As to claim 5, Hajime teaches the generating of one of the correlation matrixes comprises: inputting the first association feature and the second association feature corresponding to the one of the correlation matrixes into an attention network(section 4. Trainable Pose Verification, Network architecture for pose verification. The network design follows an approach similar to that of DensePV, where given the original Q and a synthetic query image QD we first extract dense feature descriptors d(Q, x, y) and d(QD, x, y) using a fully convolutional network. Then, a descriptor similarity score map is computed by the cosine similarity PNG media_image2.png 32 324 media_image2.png Greyscale Finally, the 2D descriptor similarity score-map given by Eq. 7 is processed by a score regression CNN that estimates the agreement between Q and QD, resulting in a scalar score.) As to claim 11, Hajime teaches the determining of the pose of the target object comprises: selecting a target reference image from among the reference images based on a semantic feature corresponding to the query image, semantic features corresponding to each of the respective reference images, and similarity information associated with positional information between the query image and each of the reference images; and determining the pose of the target object based on the query image and the target reference image ( the limitations of these claim are discussed in claim 4 and 5 above, For example as discussed in claim 5 the Trainable Pose Verification Network architecture for pose verification design follows an approach similar to that of DensePV, where given the original Q and a synthetic query image QD the network first extract dense feature descriptors d(Q, x, y) and d(QD, x, y) using a fully convolutional network. Then, a descriptor similarity score map is computed by the cosine similarity PNG media_image2.png 32 324 media_image2.png Greyscale Finally, the 2D descriptor similarity score-map given by Eq. 7 is processed by a score regression CNN that estimates the agreement between Q and QD, resulting in a scalar score.) . As to claim 12, Hajime teaches the determining of the target reference image from among the reference images comprises: for a first reference image of the reference images, determining a second pixel of the first reference image that is most similar to a first pixel of the query image from among pixels of the first reference image corresponding to a first position range with respect to the first pixel of the query image, based on the semantic feature of the query image and a semantic feature of the first reference image (Section 3.1 right col., 2nd par., InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position. Let PNG media_image1.png 38 358 media_image1.png Greyscale be the local descriptor similarity function between Root- SIFT descriptors extracted at pixel position (x, y) in Q and QD. The similarity score between Q and QD);); for the first reference image, determining a third pixel of the first reference image that is most similar to the second pixel of the first reference image from among pixels of the query image corresponding to a second position range with respect to the second pixel of the first reference image, based on the semantic feature of the query image and the semantic feature of the first reference image; and determining the target reference image from among the reference images based on the first pixel, the second pixel, and the third pixel (Section 3.1 right col.2nd par., - page 4376 right col 1st par., .Euclidean distance between descriptors corresponding to the same pixel position given by equation 1 PNG media_image1.png 38 358 media_image1.png Greyscale carry out similarity score between Q and QD pixel by pixel bases. The similarity score between Q and QD then is given by PNG media_image3.png 42 386 media_image3.png Greyscale The median is used instead of the mean as it is more robust to outliers. Invalid pixels, i.e., pixels into which no 3D point projects, are not considered in Eq. 2. Inlock finally selects the pose estimated using database image D that maximizes DensePV(Q,QD).) As to claim 14, Hajime teaches the determining of the pose of the target object based on the query image and the target reference image comprises: generating a similarity matrix based on the first semantic feature of the query image and a second target semantic feature of the target reference image (Section 3.1 right col., 2nd par., Euclidean distance between descriptors corresponding to the same pixel position given by equation 1 PNG media_image1.png 38 358 media_image1.png Greyscale carry out similarity score between Q and QD pixel by pixel bases); optimizing the similarity matrix based on first saliency information of the query image, second target saliency information of the target reference image, first geometric consistency information of the query image, or second target geometric consistency information of the target reference image ( as discussed above the similarity score between Q and QD then is given by PNG media_image3.png 42 386 media_image3.png Greyscale The median is used instead of the mean as it is more robust to outliers. Invalid pixels, i.e., pixels into which no 3D point projects, are not considered in Eq. 2. Inlock finally selects the pose estimated using database image D that maximizes DensePV(Q,QD); and determining the pose of the target object based on the optimized similarity matrix, a depth image corresponding to the query image, and a target depth image corresponding to the target reference image (Section 3.1 right col., 2nd par., The dense2D-2D matches between the query image and a retrieved database image define a set of 2D-3D matches when taking the depth map of the database image into account. The pose is then estimated using standard P3P-RANSAC [25].) As to claim 13, Hajime teaches the determining of the target reference image from among the reference images based on the first pixel, the second pixel, and the third pixel comprises: for each reference image, determining a preset number of second pixel pairs from among first pixel pairs for a corresponding reference image, in order of similarity, wherein each of the first pixel pairs comprises the first pixel and the third pixel corresponding to the first pixel, and each of the second pixel pairs comprises the first pixel and the second pixel corresponding to the first pixel; fusing similarities of the second pixel pairs; and determining the target reference image from among the reference images, based on the fused similarity of the second pixel pairs for each reference image (Section 3.1 right col., 2nd par., as discussed above InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position, where Euclidean distance given by equation PNG media_image1.png 38 358 media_image1.png Greyscale where the Euclidean distance equation(1) calculate (dis)similarity between the two images based on a pixel-by-pixel basis. Specifically equation (1) calculates the straight-line distance between corresponding pixel intensities (e.g., RGB or grayscale values) of two images. A lower distance indicates higher similarity. ). Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103, which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 4. Claim 15-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Hajime, “Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization ”, in view of Simek et al., (hereafter Simek), US20180139431 A, Pub 05/17/2018 Regarding claim 15, while Hajime teaches the limitation of claim 1, fails to teach the limitation of calm 15. On the other hand in the same field of endeavor a method of measuring similarity of images based on Euclidean distance of Simek teaches A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 (see claim 29) It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the technique of storing a code that case the computer to excite the steps taught by Simek in order to store and execute the method claim 1 of Hajime. The suggestion/motivation for doing so would have been to transfer the method of Hajime in remote locations using internet or storing in removable computer readable media, thus maximize electronically transferability and portability of the method taught by Hajime. Therefore, it would have been obvious to combine Simek with Hajime to obtain the invention as specified in claim 1. As to claim 16, Simek teaches An electronic device, comprising: one or more processors; and a memory storing instructions configured to cause the one or more processors (see claim 29 and Fig. 12); Regarding the remaining limitation of claim 16, all the claim limitations are set forth and rejected as per discussion for claim 1,. Regarding claim 17, all the claim limitations are set forth and rejected as per discussion for claims 16 and 2. As to claim18 the combination Hajime and Simek teaches the electronic device of claim 16, wherein the instructions are further configured to cause the one or more processors (Simek: claim 29 and Fig. 12);)to: based on determining that the target object in the query image is not registered in a database, obtain, as the reference images, images of the target object having respective poses through an image acquisition device ( Hajime, section 3.3, Projective Semantic Consistency PSC. Semantic consistency is then computed by counting the number of matching labels between the query and the synthetic image. In case of mismatch it would have been obvious to one ordinary skill in the art at time of filing to capture an image to overcome the inconsistence.). Regarding claim 19, all the claim limitations are set forth and rejected as per discussion for claims 16 and 4. Regarding claim 20, all the claim limitations are set forth and rejected as per discussion for claims 16 and 1. Allowable Subject Matter 5. Claims 6-10 are objected to as being dependent upon a rejected base claims but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claim. 6. Regarding dependent claim 6 no prior art is found to anticipate or render the following limitation obvious: “generating a first self-correlation feature of the query image and a second self-correlation feature of each of the reference images by inputting the first association feature and the second association feature into the first self-attention units, respectively; generating a first cross-correlation feature of the query image and a second cross-correlation feature of each of the reference images by inputting the first self-correlation feature and the second self-correlation feature into the first cross-attention unit; generating a third self-correlation feature of the query image and a fourth self-correlation feature of each of the reference images by inputting the first cross-correlation feature and the second cross-correlation feature into the second self-attention units, respectively; and generating the correlation matrix between the query image and each of the reference images based on the third self-correlation feature and the fourth self-correlation feature.” 7. Claims 7-10 are objected since they are depending on the objected claim 6 Prior art of record but not applied in the rejection “Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization, Supplementary material ” ICCV paper, IEEE explore, pub. 2019, to Hajime et al., disclosed: This supplementary material provides additional details that could not be included in the paper due to space constraints: Sec. A describes the construction of the image-scan graph in more detail (c. F . Sec. 3.2 in the paper). Sec. B shows that avoiding reduction of the field-of-view of a camera before extracting surface normal improves performance. Sec. C provides details on the construction of the “super classes” (c. F . Sec. 3.3 in the paper) and justifies the design choice made in the paper. Sec. D details the construction of the training sets used by our trainable verification approach (c. F . Sec. 4 in the paper). Finally, Sec. E shows qualitative results (c. F . Fig. 4 in the paper). Contact Information Any inquiry concerning this communication or earlier communication from the examiner should be directed to Mekonen Bekele whose telephone number is (469) 295-9077.The examiner can normally be reached on Monday -Friday from 9:00AM to 6:50 PM Eastern Time. If attempt to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Eng, George can be reached on (571) 272-7495.The fax phone number for the organization where the application or proceeding is assigned is 571-237-8300. Information regarding the status of an application may be obtained from the patent Application Information Retrieval (PAIR) system. Status information for published application may be obtained from either Private PAIR or Public PAIR. Status information for unpublished application is available through Privet PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have question on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866.217-919 (tool-free) /MEKONEN T BEKELE/Primary Examiner, Art Unit 2699
Read full office action

Prosecution Timeline

Apr 26, 2024
Application Filed
Feb 15, 2026
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602744
IMAGE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE AND MEDIUM
2y 5m to grant Granted Apr 14, 2026
Patent 12602897
FACE DETECTION BASED FILTERING FOR IMAGE PROCESSING
2y 5m to grant Granted Apr 14, 2026
Patent 12586244
COMPOSITE IMAGE CAPTURE WITH TWO DEGREES OF FREEDOM CAMERA CAPTURING OVERLAPPING IMAGE FRAMES
2y 5m to grant Granted Mar 24, 2026
Patent 12561941
Video Shooting Method and Electronic Device
2y 5m to grant Granted Feb 24, 2026
Patent 12561761
PROGRESSIVE REFINEMENT VIDEO ENHANCEMENT
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
92%
With Interview (+13.1%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 757 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month