Last updated: April 19, 2026
Application No. 18/938,587
DOCUMENT RETRIEVAL USING INTRA-IMAGE RELATIONSHIPS

Non-Final OA §103§DP
Filed
Nov 06, 2024
Examiner
GIULIANI, GIUSEPPI J
Art Unit
2153
Tech Center
2100 — Computer Architecture & Software
Assignee
Georgetown University
OA Round
1 (Non-Final)
Interview Optional

— +7.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 279 resolved cases, 2023–2026
Examiner Intelligence

GIULIANI, GIUSEPPI J View full profile →
Grants 58% of resolved cases
Career Allow Rate
162 granted / 279 resolved
+3.1% vs TC avg
Moderate +7% lift
Without
With
+7.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
25 currently pending
Career history
304
Total Applications
across all art units
Statute-Specific Performance

§101
11.4%
-28.6% vs TC avg
§103
53.7%
+13.7% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
12.7%
-27.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 279 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-4, 6-8 and 11-18 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5 and 10 of US Patent No. US 12,169,518 B2 (hereinafter ‘518). Although the claims at issue are not identical, they are not patentably distinct from each other. See the table below for more detail.
Current App. No. 18/938,587
Patent No. US 12,169,518 B2
Claim 1: A method, performed by one or more computing devices, the method comprising:

obtaining an image representation comprising two or more segmentations of an image;

generating one or more latent space representations for each of a first segmentation and a second segmentation of the two or more segmentations;

generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more latent space representations;

determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation;

assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors;

comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more latent space representations based on one or more other image representations; and

retrieving, based on the comparing of the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation.

Similarly, claims 8 and 15.
Claim 1: A method, performed by one or more computing devices, the method comprising: 

obtaining an image representation comprising two or more segmentations of an image; 

generating one or more latent space representations for each of a first segmentation and a second segmentation of the two or more segmentations; 

generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more latent space representations; 

determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation;

assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors; 

comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more latent space representations based on one or more other image representations; and

retrieving, based on the comparing the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation; 

wherein the generating the one or more latent space representations for each of the first segmentation and the second segmentation comprises:

determining an anchor image representation from the respective one of the first segmentation or the second segmentation; 

selecting a positive image representation; 

selecting a negative image representation; 

calculating a first vector representation between the anchor image representation and the positive image representation; 

calculating a second vector representation between the anchor image representation and the negative image representation; and 

generating the one or more latent space representations based on the anchor image representation, the positive image representation, and the negative image representation.

Similarly, claim 5.
Claim 2: The method of claim 1, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation.

Similarly, claims 11 and 16.
Claim 3: The method of claim 1, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation.
Claim 3: The method of claim 1, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.

Similarly, claims 12 and 17.
Claim 4: The method of claim 1, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.

Claim 4: The method of claim 1, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation.

Similarly, claims 13 and 18.
Claim 2: The method of claim 1, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation.
Claim 5: The method of claim 1, further comprising:

analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and

annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification.

Similarly, claims 9 and 19.

Claim 6: The method of claim 1, wherein the generating the one or more latent space representations for each of the first segmentation and the second segmentation comprises:

determining an anchor image representation from the respective one of the first segmentation or the second segmentation;

selecting a positive image representation;

selecting a negative image representation;

calculating a first vector representation between the anchor image representation and the positive image representation;

calculating a second vector representation between the anchor image representation and the negative image representation; and

generating the one or more latent space representations based on the anchor image representation, the positive image representation, and the negative image representation.

Similarly, claims 14 and 20.
(Claim 1)… wherein the generating the one or more latent space representations for each of the first segmentation and the second segmentation comprises:

determining an anchor image representation from the respective one of the first segmentation or the second segmentation; 

selecting a positive image representation; 

selecting a negative image representation; 

calculating a first vector representation between the anchor image representation and the positive image representation; 

calculating a second vector representation between the anchor image representation and the negative image representation; and 

generating the one or more latent space representations based on the anchor image representation, the positive image representation, and the negative image representation.

Claim 7: The method of claim 1, wherein the at least one image representation from the one or more other image representations comprises a set of external search results that is a ranked in order of similarity to the dominant segmentation.
Claim 10: The method of claim 5, wherein the set of external search results is a ranked set of search results ranked in a first order based on at least one of a text search or a metadata search, and wherein the modifying the order of the set of external search results comprises reranking the set of external search results to a second order, wherein the second order is different from the first order, wherein a first image representation in the set of search results ranked in the second order is more similar to the dominant segmentation than the non-dominant segmentation.
Claim 10: The method of claim 8, further comprising, based at least on first segmentation being a dominant segmentation, extracting the first segmentation from the image representation.

Claims 5, 9, 10 and 19 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of ‘518 in view of JIN et al., US 2021/0319243 A1 (hereinafter “Jin” – as cited in the IDS filed 6 November 2024).

Claim 5: As noted in the table above, ‘518 does not explicitly teach the method of claim 1, further comprising: analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification.
However, Jin teaches this (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the annotations of ‘518 with the region annotations in an image of Jin according to known methods (i.e. annotating regions of an image based on salient objects). Motivation for doing so is that this provides appropriate and more accurate image feature encoding, and greatly improving image retrieval performance (Jin, [0099]).

Claim 9: As noted in the table above, ‘518 does not explicitly teach the method of claim 8, further comprising annotating the first segmentation with a first classification and annotating the second segmentation with a second classification, wherein the generating one or more latent space representations for the first segmentation is further based on the first classification.
However, Jin teaches this (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the annotations of ‘518 with the region annotations in an image of Jin according to known methods (i.e. annotating regions of an image based on salient objects). Motivation for doing so is that this provides appropriate and more accurate image feature encoding, and greatly improving image retrieval performance (Jin, [0099]).

Claim 10: As noted in the table above, ‘518 does not explicitly teach the method of claim 8, further comprising, based at least on first segmentation being a dominant segmentation, extracting the first segmentation from the image representation.
However, Jin teaches this (Jin,  [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the extraction of ‘518 with the region feature vector generation based on weights of target regions of Jin according to known methods (i.e. generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions). Motivation for doing so is that that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image (Jin, [0043]).

Claim 19: As noted in the table above, ‘518 does not explicitly teach the one or more computing devices of claim 15, wherein the operations further comprise: analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification.
However, Jin teaches this (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the annotations of ‘518 with the region annotations in an image of Jin according to known methods (i.e. annotating regions of an image based on salient objects). Motivation for doing so is that this provides appropriate and more accurate image feature encoding, and greatly improving image retrieval performance (Jin, [0099]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4, 5, 7-11, 13, 15, 16, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jin in view of Liu et al., US 2021/0049202 A1 (hereinafter “Liu” – as cited in the IDS filed 6 November 2024).

Claim 1: Jin teaches a method, performed by one or more computing devices, the method comprising:
obtaining an image representation comprising two or more segmentations of an image (Jin, [0042] note a user may specify a to-be-processed image (or target image) by using a terminal device (the smart phone 101, the tablet computer 102, or the portable computer 103 shown in FIG. 1). For example, the user transmits a target image to the server 105 by using the terminal device, [0043] note a plurality of target regions… a non-salient region in the image may be weakened, and a salient region in the image may be highlighted);
generating one or more representations for each of a first segmentation and a second segmentation of the two or more segmentations (Jin, [0043] note after determining the target image, the server 105 may extract a feature map of the target image. For example, the server may extract a feature map of the target image by using any convolutional layer in a convolutional neural network (CNN) model. After the feature map of the target image is extracted, the feature map may be divided into a plurality of target regions);
generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more representations (Jin, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions);
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation (Jin, [0043] note a non-salient region in the image… a salient region in the image; i.e. salient reads on dominant and non-salient reads on non-dominant);
assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors (Jin, [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition);
comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more representations based on one or more other image representations (Jin, [0066] note after the feature vector of the target image is obtained, an image matching the target image may be retrieved according to the feature vector of the target image, [0097] note image retrieval model may further include a similarity determining module, configured to determine a similarity between different images based on feature vectors of the images, to determine similar images based on the similarity); and
retrieving, based on the comparing of the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).
Jin does not explicitly teach latent space.
However, Liu teaches this (Liu, [0020] note images are initially mapped to base descriptors that characterize the images as vectors in a latent space. The first layer of the graph neural network may be configured to receive base descriptors as input. The base descriptors for images in the repository and the query image may be generated by applying a machine-learned model, such as an artificial neural network model (ANN), a convolutional neural network model (CNN), or other models that are configured to map an image to a base descriptor in the latent space such that images with similar content are closer to each other in the latent space, [0051] note content retrieval system 130 identifies 510 images relevant to the query image by selecting a relevant subset of image nodes. The image descriptors for the relevant subset of image nodes have above a similarity threshold with the query descriptor. The content retrieval system 130 returns 512 the images represented by the relevant subset of image nodes as a query result to the client device).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the feature map of Jin with the latent space of Liu according to known methods (i.e. characterizing images in a latent space). Motivation for doing so is that can effectively learn a new descriptor space that improves retrieval accuracy while maintaining computational efficiency (Liu, [0007]).

Claim 2: Jin and Liu teach the method of claim 1, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation (Jin, [0108] note the division unit 1104 is configured to: divide the feature map in a predetermined region division manner, to obtain the plurality of target regions; or perform an ROI pooling operation on the feature map, to map ROIs to the feature map to obtain the plurality of target regions, [0009] note a region of interest (ROI) pooling operation).

Claim 4: Jin and Liu teach the method of claim 1, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation (Jin, [0089] note the image may be divided in the three manners shown in FIG. 7, part (1) to part (3), to obtain 14 regions R1 to R14. Then a max pooling operation is performed in each region according to a coordinate position of the each region, to determine a feature vector v of the each region).

Claim 5: Jin and Liu teach the method of claim 1, further comprising:
analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).

Claim 7: Jin and Liu teach the method of claim 1, wherein the at least one image representation from the one or more other image representations comprises a set of external search results that is a ranked in order of similarity to the dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).

Claim 8: Jin teaches a method, performed by one or more computing devices, the method comprising:
obtaining an image representation comprising two or more segmentations of an image, the two or more segmentations comprising a first segmentation and a second segmentation  (Jin, [0042] note a user may specify a to-be-processed image (or target image) by using a terminal device (the smart phone 101, the tablet computer 102, or the portable computer 103 shown in FIG. 1). For example, the user transmits a target image to the server 105 by using the terminal device, [0043] note a plurality of target regions… a non-salient region in the image may be weakened, and a salient region in the image may be highlighted);
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation (Jin, [0043] note a non-salient region in the image… a salient region in the image; i.e. salient reads on dominant and non-salient reads on non-dominant);
based at least on the first segmentation being the dominant segmentation, generating one or more representations for the first segmentation (Jin, [0043] note after determining the target image, the server 105 may extract a feature map of the target image. For example, the server may extract a feature map of the target image by using any convolutional layer in a convolutional neural network (CNN) model. After the feature map of the target image is extracted, the feature map may be divided into a plurality of target regions);
generating one or more feature vectors for the first segmentation based at least on the one or more representations (Jin, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions);
comparing the one or more feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more representations based on one or more other image representations (Jin, [0066] note after the feature vector of the target image is obtained, an image matching the target image may be retrieved according to the feature vector of the target image, [0097] note image retrieval model may further include a similarity determining module, configured to determine a similarity between different images based on feature vectors of the images, to determine similar images based on the similarity); and
retrieving, based on the one or more feature vectors to the one or more other feature vectors, at least one image representation from the one or more other image representations that have similarity to the dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).
Jin does not explicitly teach latent space.
However, Liu teaches this (Liu, [0020] note images are initially mapped to base descriptors that characterize the images as vectors in a latent space. The first layer of the graph neural network may be configured to receive base descriptors as input. The base descriptors for images in the repository and the query image may be generated by applying a machine-learned model, such as an artificial neural network model (ANN), a convolutional neural network model (CNN), or other models that are configured to map an image to a base descriptor in the latent space such that images with similar content are closer to each other in the latent space, [0051] note content retrieval system 130 identifies 510 images relevant to the query image by selecting a relevant subset of image nodes. The image descriptors for the relevant subset of image nodes have above a similarity threshold with the query descriptor. The content retrieval system 130 returns 512 the images represented by the relevant subset of image nodes as a query result to the client device).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the feature map of Jin with the latent space of Liu according to known methods (i.e. characterizing images feature in a latent space). Motivation for doing so is that can effectively learn a new descriptor space that improves retrieval accuracy while maintaining computational efficiency (Liu, [0007]).

Claim 9: Jin and Liu teach the method of claim 8, further comprising annotating the first segmentation with a first classification and annotating the second segmentation with a second classification, wherein the generating one or more latent space representations for the first segmentation is further based on the first classification (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).

Claim 10: Jin and Liu teach the method of claim 8, further comprising, based at least on first segmentation being a dominant segmentation, extracting the first segmentation from the image representation (Jin,  [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions).

Claim 11: Jin and Liu teach the method of claim 8, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation (Jin, [0108] note the division unit 1104 is configured to: divide the feature map in a predetermined region division manner, to obtain the plurality of target regions; or perform an ROI pooling operation on the feature map, to map ROIs to the feature map to obtain the plurality of target regions, [0009] note a region of interest (ROI) pooling operation).

Claim 13: Jin and Liu teach the method of claim 8, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation (Jin, [0089] note the image may be divided in the three manners shown in FIG. 7, part (1) to part (3), to obtain 14 regions R1 to R14. Then a max pooling operation is performed in each region according to a coordinate position of the each region, to determine a feature vector v of the each region).

Claim 15: Jin teaches one or more computing devices comprising: one or more processors; and memory having a plurality of computer-executable instructions stored thereon; wherein the computer-executable instructions are configured to, when executed by the one or more processors, cause the one or more computing devices to perform a plurality of operations, the operations comprising:
obtaining an image representation comprising two or more segmentations of an image (Jin, [0042] note a user may specify a to-be-processed image (or target image) by using a terminal device (the smart phone 101, the tablet computer 102, or the portable computer 103 shown in FIG. 1). For example, the user transmits a target image to the server 105 by using the terminal device, [0043] note a plurality of target regions… a non-salient region in the image may be weakened, and a salient region in the image may be highlighted);
generating one or more representations for each of a first segmentation and a second segmentation of the two or more segmentations (Jin, [0043] note after determining the target image, the server 105 may extract a feature map of the target image. For example, the server may extract a feature map of the target image by using any convolutional layer in a convolutional neural network (CNN) model. After the feature map of the target image is extracted, the feature map may be divided into a plurality of target regions);
generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more representations (Jin, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions);
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation (Jin, [0043] note a non-salient region in the image… a salient region in the image; i.e. salient reads on dominant and non-salient reads on non-dominant);
assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors (Jin, [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition);
comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more representations based on one or more other image representations (Jin, [0066] note after the feature vector of the target image is obtained, an image matching the target image may be retrieved according to the feature vector of the target image, [0097] note image retrieval model may further include a similarity determining module, configured to determine a similarity between different images based on feature vectors of the images, to determine similar images based on the similarity); and
retrieving, based on the comparing of the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).
Jin does not explicitly teach latent space.
However, Liu teaches this (Liu, [0020] note images are initially mapped to base descriptors that characterize the images as vectors in a latent space. The first layer of the graph neural network may be configured to receive base descriptors as input. The base descriptors for images in the repository and the query image may be generated by applying a machine-learned model, such as an artificial neural network model (ANN), a convolutional neural network model (CNN), or other models that are configured to map an image to a base descriptor in the latent space such that images with similar content are closer to each other in the latent space, [0051] note content retrieval system 130 identifies 510 images relevant to the query image by selecting a relevant subset of image nodes. The image descriptors for the relevant subset of image nodes have above a similarity threshold with the query descriptor. The content retrieval system 130 returns 512 the images represented by the relevant subset of image nodes as a query result to the client device).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the feature map of Jin with the latent space of Liu according to known methods (i.e. characterizing images in a latent space). Motivation for doing so is that can effectively learn a new descriptor space that improves retrieval accuracy while maintaining computational efficiency (Liu, [0007]).

Claim 16: Jin and Liu teach the one or more computing devices of claim 15, wherein the operations further comprise analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation (Jin, [0108] note the division unit 1104 is configured to: divide the feature map in a predetermined region division manner, to obtain the plurality of target regions; or perform an ROI pooling operation on the feature map, to map ROIs to the feature map to obtain the plurality of target regions, [0009] note a region of interest (ROI) pooling operation).

Claim 18: Jin and Liu teach the one or more computing devices of claim 15, wherein the operations further comprise analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation (Jin, [0089] note the image may be divided in the three manners shown in FIG. 7, part (1) to part (3), to obtain 14 regions R1 to R14. Then a max pooling operation is performed in each region according to a coordinate position of the each region, to determine a feature vector v of the each region).

Claim 19: Jin and Liu teach the one or more computing devices of claim 15, wherein the operations further comprise: analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).

Claims 3, 12 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jin and Liu in further view of Fan et al., US 2012/0263352 A1 (hereinafter “Fan”).

Claim 3: Jin and Liu do not explicitly teach the method of claim 1, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
However, Fang teaches this (Fang, [0034] note In addition to applying different filtering/processing techniques to each region, different weighting factors can be applied to the results as part of the overall matching process. These weights could be tuned according to a number of factors including: training data, expert prior knowledge, or adaptive online optimization. By associating different weights with the matching results from both the text and non-text regions, it is possible to further optimize overall system performance for both accuracy and yield).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the weighted regions of Jin and Liu with the different weights for text and non-text regions of Fan according to known methods (i.e. associating different weights based on text and non-text regions). Motivation for doing so is that it is possible to further optimize overall system performance for both accuracy and yield (Fan, [0034]).

Claim 12: Jin and Liu do not explicitly teach the method of claim 8, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
However, Fang teaches this (Fang, [0034] note In addition to applying different filtering/processing techniques to each region, different weighting factors can be applied to the results as part of the overall matching process. These weights could be tuned according to a number of factors including: training data, expert prior knowledge, or adaptive online optimization. By associating different weights with the matching results from both the text and non-text regions, it is possible to further optimize overall system performance for both accuracy and yield).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the weighted regions of Jin and Liu with the different weights for text and non-text regions of Fan according to known methods (i.e. associating different weights based on text and non-text regions). Motivation for doing so is that it is possible to further optimize overall system performance for both accuracy and yield (Fan, [0034]).

Claim 17: Jin and Liu do not explicitly teach the one or more computing devices of claim 15, wherein the operations further comprise identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
However, Fang teaches this (Fang, [0034] note In addition to applying different filtering/processing techniques to each region, different weighting factors can be applied to the results as part of the overall matching process. These weights could be tuned according to a number of factors including: training data, expert prior knowledge, or adaptive online optimization. By associating different weights with the matching results from both the text and non-text regions, it is possible to further optimize overall system performance for both accuracy and yield).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the weighted regions of Jin and Liu with the different weights for text and non-text regions of Fan according to known methods (i.e. associating different weights based on text and non-text regions). Motivation for doing so is that it is possible to further optimize overall system performance for both accuracy and yield (Fan, [0034]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Lee et al., US 2021/0390700 A1 – Image segmentation based on identifying an image mask corresponding to one or more objects among a plurality of objects in an image based on a natural language expression.
Medoff et al., US 2019/0180446 A1 – Causing a minimum bounding region to be submitted to an image search engine to identify similar objects in the corpus of the image search engine.
Wang et al., US 8798362 B2 – Image segmentation including dominant color extraction.
Keating et al., US 20060015492 A1 – Comparison between visual images done by comparing the feature vectors of the most prominent regions (determined in any of a variety of ways, e.g., by size or shape) in each visual image.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Giuseppi Giuliani whose telephone number is (571)270-7128. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached at (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/GIUSEPPI GIULIANI/Primary Examiner, Art Unit 2153
Read full office action
Prosecution Timeline

Nov 06, 2024
Application Filed
Feb 20, 2026
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/631,590
Patent 12602410
MULTIMODAL CONTEXT SELECTION FOR LARGE LANGUAGE MODEL BASED RESOLUTIONS ADDRESSING TECHNICAL ISSUES
2y 5m to grant Granted Apr 14, 2026
18/671,967
Patent 12585649
CONDITIONAL BRANCHING FOR A FEDERATED GRAPH QUERY PLAN
2y 5m to grant Granted Mar 24, 2026
17/239,071
Patent 12561368
METHODS AND SYSTEMS FOR TENSOR NETWORK CONTRACTION BASED ON LOCAL OPTIMIZATION OF CONTRACTION TREE
2y 5m to grant Granted Feb 24, 2026
18/999,901
Patent 12561363
Visual Search Determination for Text-To-Image Replacement
2y 5m to grant Granted Feb 24, 2026
18/057,776
Patent 12536151
ACCURATE AND QUERY-EFFICIENT MODEL AGNOSTIC EXPLANATIONS
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
58%
Grant Probability
65%
With Interview (+7.2%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 279 resolved cases by this examiner. Grant probability derived from career allow rate.