Detailed Action
1. Claims 1-20 are pending in this Application.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless -
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
3. Claims 1-5 and 11-14 are rejected under 35 U.S.C. 102(a)(1)/102(a)(2) as being anticipated by Hajime et al., ( hereafter Hajime), “Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization” ICCV paper, IEEE explore, pub. 2019
As to claim 1, Hajime teaches A method performed by an electronic device, comprising:
obtaining a query image ( Fig.2 Given a set of camera pose estimates for a query image) ;
obtaining reference images corresponding to the query image (section 3.1 Candidate location retrieval. Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query.), wherein the reference images are obtained based on having respective reference objects therein that have a same object type as an object type of an object in the query image( as discussed above, the Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query image ) ;
determining a first semantic feature and first information corresponding to the query image, wherein the first information comprises first geometric information of the query image or first positional information of the query image; determining second semantic features and second pieces of information of the respectively corresponding reference images, wherein the second pieces of information each comprise second geometric information or second positional information of their respectively corresponding reference images, each reference image having a corresponding second semantic feature and second piece of information (Abstract, Section I left col., 2nd par., Fig. 1, Section 3 Geometric-Semantic Pose Verification, the authors propose multiple approaches for pose verification based on the combination of appearance, scene geometry, and semantic information. They integrate their approach into the Inlock pipeline [72], a state-of-the-art visual localization approach for large-scale indoor scenes. The approach verifies the estimated pose by comparing Geometric-Semantic Pose of query image and database images . Specifically the approach verifies the estimated pose by comparing the semantics and surface normal extracted from the query (d, j) and database (f, l) ); and
determining a pose of the target object based on (i) the first semantic feature and the first information and (ii) the second semantic features and the second pieces of information ( as discussed above the approach verifies the estimated pose by comparing the semantics and surface normal extracted from the query (d, j) and database (f, l) );).
As to claim 2, Hajime teaches the determining of the pose of the target object comprises: generating a first association feature of the query image based on the first semantic feature and the first geometric information of the query image (Abstract, Section I left col., 2nd par., Fig. 1, Geometric information and Semantic information is extracted from the query image QD, where the geometric information of QD, is surface normal extracted from the query Q (d, j));
generating a second association feature of the query image based on the second semantic features and the second pieces of geometric information of the reference images (Abstract, Section I left col., 2nd par., Fig. 1, similarly the Geometric information and Semantic information is extracted from the database image Q, where the geometric information of Q, is surface normal extracted from the query Q (f, l));and
determining the pose of the target object based on the first association feature and the second association feature( Abstract, Section I left col., 2nd par., Fig. 1, the estimated pose is calculated by comparing the semantics and surface normal extracted from the query (d, j) and database (f, l))
As to claim 3, Hajime teaches the obtaining of the reference images corresponding to the query image comprises: based on determining that the target object in the query image is an object registered in a database, obtaining the reference images from the database (section 3.1 right col., 2nd par., Candidate location retrieval. Inlock uses the NetVLAD [1] descriptor to identify the 100 database images most visually similar to the query).
As to claim 4, Hajime teaches the determining of the pose of the target object based on the first association feature and the second association feature comprises: generating correlation matrixes of correlation between the query image and each of the respectively corresponding reference images based on the first association feature and the second association feature, wherein each correlation matrix represents a relative position of a first pixel block of the query image with respect to a positionally-corresponding second pixel block of its corresponding reference image (Section 3.1 right col., 2nd par., InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position. Let
PNG
media_image1.png
38
358
media_image1.png
Greyscale
be the local descriptor similarity function between Root- SIFT descriptors extracted at pixel position (x, y) in Q and QD. The similarity score between Q and QD); and
determining the pose of the target object based on the correlation matrixes ( as discuss above the pose is determined by evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position given by equation 1).
As to claim 5, Hajime teaches the generating of one of the correlation matrixes comprises: inputting the first association feature and the second association feature corresponding to the one of the correlation matrixes into an attention network(section 4. Trainable Pose Verification, Network architecture for pose verification. The network design follows an approach similar to that of DensePV, where given the original Q and a synthetic query image QD we first extract dense feature descriptors d(Q, x, y) and d(QD, x, y) using a fully convolutional network. Then, a descriptor similarity score map is computed by the cosine similarity
PNG
media_image2.png
32
324
media_image2.png
Greyscale
Finally, the 2D descriptor similarity score-map given by Eq. 7 is processed by a score regression CNN that estimates the agreement between Q and QD, resulting in a scalar
score.)
As to claim 11, Hajime teaches the determining of the pose of the target object comprises: selecting a target reference image from among the reference images based on a semantic feature corresponding to the query image, semantic features corresponding to each of the respective reference images, and similarity information associated with positional information between the query image and each of the reference images; and determining the pose of the target object based on the query image and the target reference image ( the limitations of these claim are discussed in claim 4 and 5 above, For example as discussed in claim 5 the Trainable Pose Verification Network architecture for pose verification design follows an approach similar to that of DensePV, where given the original Q and a synthetic query image QD the network first extract dense feature descriptors d(Q, x, y) and d(QD, x, y) using a fully convolutional network. Then, a descriptor similarity score map is computed by the cosine similarity
PNG
media_image2.png
32
324
media_image2.png
Greyscale
Finally, the 2D descriptor similarity score-map given by Eq. 7 is processed by a score regression CNN that estimates the agreement between Q and QD, resulting in a scalar
score.)
.
As to claim 12, Hajime teaches the determining of the target reference image from among the reference images comprises: for a first reference image of the reference images, determining a second pixel of the first reference image that is most similar to a first pixel of the query image from among pixels of the first reference image corresponding to a first position range with respect to the first pixel of the query image, based on the semantic feature of the query image and a semantic feature of the first reference image (Section 3.1 right col., 2nd par., InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position. Let
PNG
media_image1.png
38
358
media_image1.png
Greyscale
be the local descriptor similarity function between Root- SIFT descriptors extracted at pixel position (x, y) in Q and QD. The similarity score between Q and QD););
for the first reference image, determining a third pixel of the first reference image that is most similar to the second pixel of the first reference image from among pixels of the query image corresponding to a second position range with respect to the second pixel of the first reference image, based on the semantic feature of the query image and the semantic feature of the first reference image; and determining the target reference image from among the reference images based on the first pixel, the second pixel, and the third pixel (Section 3.1 right col.2nd par., - page 4376 right col 1st par., .Euclidean distance between descriptors corresponding to the same pixel position given by equation 1
PNG
media_image1.png
38
358
media_image1.png
Greyscale
carry out similarity score between Q and QD pixel by pixel bases. The similarity score between Q and QD then is given by
PNG
media_image3.png
42
386
media_image3.png
Greyscale
The median is used instead of the mean as it is more robust to outliers. Invalid pixels, i.e., pixels into which no 3D point projects, are not considered in Eq. 2. Inlock finally selects the pose estimated using database image D that maximizes DensePV(Q,QD).)
As to claim 14, Hajime teaches the determining of the pose of the target object based on the query image and the target reference image comprises: generating a similarity matrix based on the first semantic feature of the query image and a second target semantic feature of the target reference image (Section 3.1 right col., 2nd par., Euclidean distance between descriptors corresponding to the same pixel position given by equation 1
PNG
media_image1.png
38
358
media_image1.png
Greyscale
carry out similarity score between Q and QD pixel by pixel bases);
optimizing the similarity matrix based on first saliency information of the query image, second target saliency information of the target reference image, first geometric consistency information of the query image, or second target geometric consistency information of the target reference image ( as discussed above the similarity score between Q and QD then is given by
PNG
media_image3.png
42
386
media_image3.png
Greyscale
The median is used instead of the mean as it is more robust to outliers. Invalid pixels, i.e., pixels into which no 3D point projects, are not considered in Eq. 2. Inlock finally selects the pose estimated using database image D that maximizes DensePV(Q,QD); and
determining the pose of the target object based on the optimized similarity matrix, a depth image corresponding to the query image, and a target depth image corresponding to the target reference image (Section 3.1 right col., 2nd par., The dense2D-2D matches between the query image and a retrieved database image define a set of 2D-3D matches when taking the depth map of the database image into account. The pose is then estimated using standard P3P-RANSAC [25].)
As to claim 13, Hajime teaches the determining of the target reference image from among the reference images based on the first pixel, the second pixel, and the third pixel comprises: for each reference image, determining a preset number of second pixel pairs from among first pixel pairs for a corresponding reference image, in order of similarity, wherein each of the first pixel pairs comprises the first pixel and the third pixel corresponding to the first pixel, and each of the second pixel pairs comprises the first pixel and the second pixel corresponding to the first pixel; fusing similarities of the second pixel pairs; and determining the target reference image from among the reference images, based on the fused similarity of the second pixel pairs for each reference image (Section 3.1 right col., 2nd par., as discussed above InLoc’s dense pose verification stage then densely extracts RootSIFT [2,43] descriptors from both the synthetic and the real query image2. It then evaluates the (dis)similarity between the two images as the median of the inverse Euclidean distance between descriptors corresponding to the same pixel position, where Euclidean distance given by equation
PNG
media_image1.png
38
358
media_image1.png
Greyscale
where the Euclidean distance equation(1) calculate (dis)similarity between the two images based on a pixel-by-pixel basis. Specifically equation (1) calculates the straight-line distance between corresponding pixel intensities (e.g., RGB or grayscale values) of two images. A lower distance indicates higher similarity. ).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103, which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
4. Claim 15-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over
Hajime, “Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization ”, in view of Simek et al., (hereafter Simek), US20180139431 A, Pub 05/17/2018
Regarding claim 15, while Hajime teaches the limitation of claim 1, fails to teach the limitation of calm 15.
On the other hand in the same field of endeavor a method of measuring similarity of images based on Euclidean distance of Simek teaches A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 (see claim 29)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the technique of storing a code that case the computer to excite the steps taught by Simek in order to store and execute the method claim 1 of Hajime.
The suggestion/motivation for doing so would have been to transfer the method of Hajime in remote locations using internet or storing in removable computer readable media, thus maximize electronically transferability and portability of the method taught by Hajime. Therefore, it would have been obvious to combine Simek with Hajime to obtain the invention as specified in claim 1.
As to claim 16, Simek teaches An electronic device, comprising: one or more processors; and a memory storing instructions configured to cause the one or more processors (see claim 29 and Fig. 12);
Regarding the remaining limitation of claim 16, all the claim limitations are set forth and rejected as per discussion for claim 1,.
Regarding claim 17, all the claim limitations are set forth and rejected as per discussion for claims 16 and 2.
As to claim18 the combination Hajime and Simek teaches the electronic device of claim 16, wherein the instructions are further configured to cause the one or more processors (Simek: claim 29 and Fig. 12);)to: based on determining that the target object in the query image is not registered in a database, obtain, as the reference images, images of the target object having respective poses through an image acquisition device ( Hajime, section 3.3, Projective Semantic Consistency PSC. Semantic consistency is then computed by counting the number of matching labels between the query and the synthetic image. In case of mismatch it would have been obvious to one ordinary skill in the art at time of filing to capture an image to overcome the inconsistence.).
Regarding claim 19, all the claim limitations are set forth and rejected as per discussion for claims 16 and 4.
Regarding claim 20, all the claim limitations are set forth and rejected as per discussion for claims 16 and 1.
Allowable Subject Matter
5. Claims 6-10 are objected to as being dependent upon a rejected base claims but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claim.
6. Regarding dependent claim 6 no prior art is found to anticipate or render the following limitation obvious:
“generating a first self-correlation feature of the query image and a second self-correlation feature of each of the reference images by inputting the first association feature and the second association feature into the first self-attention units, respectively; generating a first cross-correlation feature of the query image and a second cross-correlation feature of each of the reference images by inputting the first self-correlation feature and the second self-correlation feature into the first cross-attention unit; generating a third self-correlation feature of the query image and a fourth self-correlation feature of each of the reference images by inputting the first cross-correlation feature and the second cross-correlation feature into the second self-attention units, respectively; and generating the correlation matrix between the query image and each of the reference images based on the third self-correlation feature and the fourth self-correlation feature.”
7. Claims 7-10 are objected since they are depending on the objected claim 6
Prior art of record but not applied in the rejection
“Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization, Supplementary material ” ICCV paper, IEEE explore, pub. 2019, to Hajime et al., disclosed:
This supplementary material provides additional details that could not be included in the paper due to space constraints: Sec. A describes the construction of the image-scan graph in more detail (c. F . Sec. 3.2 in the paper). Sec. B shows that avoiding reduction of the field-of-view of a camera before extracting surface normal improves performance. Sec. C provides details on the construction of the “super classes” (c. F . Sec. 3.3 in the paper) and justifies the design choice made in the paper. Sec. D details the construction of the training sets used by our trainable verification approach (c. F . Sec. 4 in the paper). Finally, Sec. E shows qualitative results (c. F . Fig. 4 in the paper).
Contact Information
Any inquiry concerning this communication or earlier communication from the examiner should be directed to Mekonen Bekele whose telephone number is (469) 295-9077.The examiner can normally be reached on Monday -Friday from 9:00AM to 6:50 PM Eastern Time.
If attempt to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Eng, George can be reached on (571) 272-7495.The fax phone number for the organization where the application or proceeding is assigned is 571-237-8300. Information regarding the status of an application may be obtained from the patent Application Information Retrieval (PAIR) system. Status information for published application may be obtained from either Private PAIR or Public PAIR.
Status information for unpublished application is available through Privet PAIR only.
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have question on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866.217-919 (tool-free)
/MEKONEN T BEKELE/Primary Examiner, Art Unit 2699