Last updated: May 29, 2026

Application No. 18/648,212

METHOD AND DEVICE WITH DETERMINING POSE OF TARGET OBJECT IN QUERY IMAGE

Non-Final OA §102§103

Filed

Apr 26, 2024

Priority

Apr 28, 2023 — CN 202310485584.5 +2 more

Examiner

BEKELE, MEKONEN T

Art Unit

2699

Tech Center

2600 — Communications

Assignee

Samsung Electronics Co., Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +13.1% interview lift. Interview lift (+13.1%) is below the 15.0% threshold. A written response is recommended.

Based on 762 resolved cases, 2023–2026

Examiner Intelligence

BEKELE, MEKONEN T View full profile →

Grants 79% — above average

Career Allowance Rate

604 granted / 762 resolved

+17.3% vs TC avg

Moderate +13% lift

Without

With

+13.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

17 currently pending

Career history

781

Total Applications

across all art units

Statute-Specific Performance

§101

7.8%

-32.2% vs TC avg

§103

40.6%

+0.6% vs TC avg

§102

33.0%

-7.0% vs TC avg

§112

7.5%

-32.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 762 resolved cases

Office Action

§102 §103

Detailed Action
1.	 Claims 1-20  are pending in this Application.

Notice of Pre-AIA  or AIA  Status
2.	 The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless -
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


3.	Claims 1-5 and 11-14 are   rejected under 35 U.S.C. 102(a)(1)/102(a)(2) as being anticipated by Hajime et al., ( hereafter Hajime), “Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization” ICCV paper, IEEE explore, pub. 2019

	As to claim 1, Hajime teaches  A method performed by an electronic device, comprising: 
obtaining a query image ( Fig.2 Given a set of camera pose estimates for a query image) ; 
	obtaining reference images corresponding to the query image (section 3.1 Candidate location retrieval. Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query.), wherein the reference images are obtained based on having respective reference objects therein that have a same object type as an object type of an object in the query image( as discussed above, the Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query image ) ;
	 determining a first semantic feature and first information corresponding to the query image, wherein the first information comprises first geometric information of the query image or first positional information of the query image;  determining second semantic features and second pieces of information of the respectively corresponding reference images, wherein the second pieces of information each comprise second geometric information or second positional information of their respectively corresponding reference images, each reference image having a corresponding second semantic feature and second piece of information (Abstract,  Section I left col., 2nd par., Fig. 1, Section 3 Geometric-Semantic Pose Verification,   the authors   propose multiple approaches for pose verification based on the combination of appearance, scene geometry, and semantic information. They integrate their approach into the Inlock pipeline [72], a state-of-the-art visual localization approach for large-scale indoor scenes.  The approach verifies the estimated pose by comparing  Geometric-Semantic Pose of  query image and database images . Specifically the approach verifies the estimated pose by comparing the semantics and   surface normal extracted from the query (d, j) and database (f, l) ); and 
	determining a pose of the target object based on (i) the first semantic feature and the first information and (ii) the second semantic features and the second pieces of information ( as discussed above the approach verifies the estimated pose by comparing the semantics and   surface normal extracted from the query (d, j) and database (f, l) );).

	As to claim 2, Hajime teaches  the determining of the pose of the target object comprises: generating a first association feature of the query image based on the first semantic feature and the first geometric information of the query image (Abstract,  Section I left col., 2nd par., Fig. 1, Geometric information  and Semantic information is extracted from the query image QD, where the geometric information of QD, is surface normal extracted from the query Q (d, j));
	generating a second association feature of the query image based on the second semantic features and the second pieces of geometric information of the reference images (Abstract,  Section I left col., 2nd par., Fig. 1, similarly  the Geometric information  and Semantic information is extracted from the  database image Q, where the geometric information of Q, is surface normal extracted from the query Q (f, l));and 
	determining the pose of the target object based on the first association feature and the second association feature( Abstract,  Section I left col., 2nd par., Fig. 1, the estimated pose is calculated  by comparing the semantics and surface normal extracted from the query (d, j) and database (f, l)) 
	As to claim 3, Hajime teaches  the obtaining of the reference images corresponding to the query image comprises: based on determining that the target object in the query image is an object registered in a database, obtaining the reference images from the database (section 3.1 right col., 2nd par., Candidate location retrieval. Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query).
	As to claim 4, Hajime teaches   the determining of the pose of the target object based on the first association feature and the second association feature comprises: generating correlation matrixes of correlation between the query image and each of the respectively corresponding reference images based on the first association feature and the second association feature, wherein each correlation matrix represents a relative position of a first pixel block of the query image with respect to a positionally-corresponding second pixel block of its corresponding reference image (Section 3.1 right col., 2nd par.,  InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position. Let

    PNG
    media_image1.png
    38
    358
    media_image1.png
    Greyscale

be the local descriptor similarity function between Root- SIFT descriptors extracted at pixel position (x, y) in Q and QD. The similarity score between Q and QD); and 
determining the pose of the target object based on the correlation matrixes ( as discuss above the pose is determined by evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position given by equation 1).

	As to claim 5, Hajime teaches  the generating of one of the correlation matrixes comprises: inputting the first association feature and the second association feature corresponding to the one of the correlation matrixes into an attention network(section 4. Trainable Pose Verification, Network architecture for pose verification. The network design follows an approach similar to that of DensePV, where given the original Q and a synthetic query image QD we first extract dense feature descriptors d(Q, x, y) and d(QD, x, y) using a fully convolutional network. Then, a descriptor similarity score map is computed by the cosine similarity

    PNG
    media_image2.png
    32
    324
    media_image2.png
    Greyscale

Finally, the 2D descriptor similarity score-map given by Eq. 7 is processed by a score regression CNN that estimates the agreement between Q and QD, resulting in a scalar
score.)

	As to claim 11, Hajime teaches  the determining of the pose of the target object comprises: selecting a target reference image from among the reference images based on a semantic feature corresponding to the query image, semantic features corresponding to each of the respective reference images, and similarity information associated with positional information between the query image and each of the reference images; and determining the pose of the target object based on the query image and the target reference image ( the limitations of these claim are discussed in claim 4 and 5 above, For example  as discussed in claim 5  the Trainable Pose Verification Network architecture for pose verification design follows an approach similar to that of DensePV, where given the original Q and a synthetic query image QD the network  first extract dense feature descriptors d(Q, x, y) and d(QD, x, y) using a fully convolutional network. Then, a descriptor similarity score map is computed by the cosine similarity

    PNG
    media_image2.png
    32
    324
    media_image2.png
    Greyscale

Finally, the 2D descriptor similarity score-map given by Eq. 7 is processed by a score regression CNN that estimates the agreement between Q and QD, resulting in a scalar
score.)
 .
	As to claim 12, Hajime teaches  the determining of the target reference image from among the reference images comprises: for a first reference image of the reference images, determining a second pixel of the first reference image that is most similar to a first pixel of the query image from among pixels of the first reference image corresponding to a first position range with respect to the first pixel of the query image, based on the semantic feature of the query image and a semantic feature of the first reference image (Section 3.1 right col., 2nd par.,  InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position. Let

    PNG
    media_image1.png
    38
    358
    media_image1.png
    Greyscale

be the local descriptor similarity function between Root- SIFT descriptors extracted at pixel position (x, y) in Q and QD. The similarity score between Q and QD);); 
	for the first reference image, determining a third pixel of the first reference image that is most similar to the second pixel of the first reference image from among pixels of the query image corresponding to a second position range with respect to the second pixel of the first reference image, based on the semantic feature of the query image and the semantic feature of the first reference image; and determining the target reference image from among the reference images based on the first pixel, the second pixel, and the third pixel (Section 3.1 right col.2nd par., - page 4376 right col 1st par., .Euclidean distance between descriptors corresponding to the same pixel position  given by  equation 1

    PNG
    media_image1.png
    38
    358
    media_image1.png
    Greyscale

carry out similarity score between Q and QD pixel by pixel bases. The similarity score between Q and QD then is  given by 

    PNG
    media_image3.png
    42
    386
    media_image3.png
    Greyscale

The median is used instead of the mean as it is more robust to outliers. Invalid pixels, i.e., pixels into which no 3D point  projects, are not considered in Eq. 2. Inlock finally selects the pose estimated using database image D that maximizes DensePV(Q,QD).)


	As to claim 14, Hajime teaches  the determining of the pose of the target object based on the query image and the target reference image comprises: generating a similarity matrix based on the first semantic feature of the query image and a second target semantic feature of the target reference image (Section 3.1 right col., 2nd par.,  Euclidean distance between descriptors corresponding to the same pixel position  given by  equation 1

    PNG
    media_image1.png
    38
    358
    media_image1.png
    Greyscale

carry out similarity score between Q and QD pixel by pixel bases); 
	optimizing the similarity matrix based on first saliency information of the query image, second target saliency information of the target reference image, first geometric consistency information of the query image, or second target geometric consistency information of the target reference image ( as discussed above the similarity score between Q and QD then is  given by 

    PNG
    media_image3.png
    42
    386
    media_image3.png
    Greyscale

The median is used instead of the mean as it is more robust to outliers. Invalid pixels, i.e., pixels into which no 3D point  projects, are not considered in Eq. 2. Inlock finally selects the pose estimated using database image D that maximizes DensePV(Q,QD); and 
	determining the pose of the target object based on the optimized similarity matrix, a depth image corresponding to the query image, and a target depth image corresponding to the target reference image (Section 3.1 right col., 2nd par.,  The dense2D-2D matches between the query image and a retrieved database image define a set of 2D-3D matches when taking the depth map of the database image into account. The pose is then estimated using standard P3P-RANSAC [25].)  

	As to claim 13, Hajime teaches  the determining of the target reference image from among the reference images based on the first pixel, the second pixel, and the third pixel comprises: for each reference image, determining a preset number of second pixel pairs from among first pixel pairs for a corresponding reference image, in order of similarity, wherein each of the first pixel pairs comprises the first pixel and the third pixel corresponding to the first pixel, and each of the second pixel pairs comprises the first pixel and the second pixel corresponding to the first pixel; fusing similarities of the second pixel pairs; and determining the target reference image from among the reference images, based on the fused similarity of the second pixel pairs for each reference image (Section 3.1 right col., 2nd par.,  as discussed above InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position, where Euclidean distance given by equation 

    PNG
    media_image1.png
    38
    358
    media_image1.png
    Greyscale

 where  the Euclidean distance equation(1) calculate (dis)similarity between the two images based on a pixel-by-pixel basis.  Specifically equation (1) calculates the straight-line distance between corresponding pixel intensities (e.g., RGB or grayscale values) of two images. A lower distance indicates higher similarity. ).  

Claim Rejections - 35 USC § 103
	 In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103, which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

4.	Claim 15-20  are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Hajime, “Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization ”, in view of Simek et al., (hereafter Simek), US20180139431 A, Pub 05/17/2018

	Regarding claim 15, while Hajime teaches the limitation of claim 1, fails to teach the limitation of calm 15.
	On the other hand in the same field of endeavor a method of  measuring similarity of images based on Euclidean distance of Simek teaches  A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 (see claim 29)	 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate  the technique of storing a code that case the computer to excite the steps taught by Simek in order to store and execute the method claim 1 of Hajime.
The suggestion/motivation for doing so would have been to transfer the method of Hajime in remote locations using internet or storing in removable computer readable media, thus maximize electronically transferability and portability of the method taught by Hajime. Therefore, it would have been obvious to combine Simek with Hajime to obtain the invention as specified in claim 1.

	As to claim 16, Simek  teaches An electronic device, comprising: one or more processors; and a memory storing instructions configured to cause the one or more processors (see claim 29 and Fig. 12);
  Regarding   the remaining  limitation of claim 16, all the claim limitations are  set forth and rejected as per discussion for claim 1,.

Regarding   claim 17, all the claim limitations are  set forth and rejected as per discussion for claims  16 and 2. 

As to claim18 the combination Hajime  and Simek  teaches   the electronic device of claim 16, wherein the instructions are further configured to cause the one or more processors  (Simek: claim 29 and Fig. 12);)to: based on determining that the target object in the query image is not registered in a database, obtain, as the reference images, images of the target object having respective poses through an image acquisition device ( Hajime, section 3.3, Projective Semantic Consistency PSC. Semantic consistency is then computed by counting the number of matching labels between the query and the synthetic image. In case of mismatch it would have been obvious to one ordinary skill in the art at time of filing to capture an image to overcome the inconsistence.).

Regarding   claim 19, all the claim limitations are  set forth and rejected as per discussion for claims  16 and 4. 

Regarding   claim 20, all the claim limitations are  set forth and rejected as per discussion for claims  16 and 1. 

Allowable Subject Matter
5.	Claims 6-10 are  objected to as being dependent upon a rejected base claims but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claim.
6.	Regarding dependent claim 6 no prior art is found to anticipate or render the following limitation obvious:
	“generating a first self-correlation feature of the query image and a second self-correlation feature of each of the reference images by inputting the first association feature and the second association feature into the first self-attention units, respectively; generating a first cross-correlation feature of the query image and a second cross-correlation feature of each of the reference images by inputting the first self-correlation feature and the second self-correlation feature into the first cross-attention unit; generating a third self-correlation feature of the query image and a fourth self-correlation feature of each of the reference images by inputting the first cross-correlation feature and the second cross-correlation feature into the second self-attention units, respectively; and generating the correlation matrix between the query image and each of the reference images based on the third self-correlation feature and the fourth self-correlation feature.”

7.	Claims 7-10 are objected since they  are depending on the objected  claim 6

Prior art of record but not applied in the rejection 
“Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization, Supplementary material ” ICCV paper, IEEE explore, pub. 2019, to Hajime et al., disclosed:
This supplementary material provides additional details that could not be included in the paper due to space constraints: Sec. A describes the construction of the image-scan graph in more detail (c. F . Sec. 3.2 in the paper). Sec. B shows that avoiding reduction of the field-of-view of a camera before extracting surface normal improves performance. Sec. C provides details on the construction of the “super classes” (c. F . Sec. 3.3 in the paper) and justifies the design choice made in the paper. Sec. D details the construction of the training sets used by our trainable verification approach (c. F . Sec. 4 in the paper). Finally, Sec. E shows qualitative results (c. F . Fig. 4 in the paper).
Contact Information
Any inquiry concerning this communication or earlier communication from the examiner should be directed to Mekonen Bekele whose telephone number is (469) 295-9077.The examiner can normally be reached on Monday -Friday from 9:00AM to 6:50 PM Eastern Time.
If attempt to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Eng, George can be reached on (571) 272-7495.The fax phone number for the organization where the application or proceeding is assigned is 571-237-8300.  Information regarding the status of an application may be obtained from the patent Application Information Retrieval (PAIR) system. Status information for published application may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished application is available through Privet PAIR only.
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have question on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866.217-919 (tool-free)
/MEKONEN T BEKELE/Primary Examiner, Art Unit 2699

Read full office action

Prosecution Timeline

Apr 26, 2024

Application Filed

Feb 19, 2026

Non-Final Rejection mailed — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/313,755

Patent 12626329

ELECTRONIC DEVICE FOR UPSCALING IMAGE AND METHOD FOR CONTROLLING SAME

3y 0m to grant Granted May 12, 2026

18/207,018

Patent 12620124

IMAGE PROCESSING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT

2y 11m to grant Granted May 05, 2026

18/308,712

Patent 12620097

Technique for Optimizing Rendering Parameters of Overlays of Medical Images

3y 0m to grant Granted May 05, 2026

18/156,448

Patent 12614252

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM

3y 3m to grant Granted Apr 28, 2026

18/392,997

Patent 12608767

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

2y 4m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

79%

Grant Probability

92%

With Interview (+13.1%)

2y 10m (~9m remaining)

Median Time to Grant

Low

PTA Risk

Based on 762 resolved cases by this examiner. Grant probability derived from career allowance rate.