DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-4, 6-8 and 11-18 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5 and 10 of US Patent No. US 12,169,518 B2 (hereinafter ‘518). Although the claims at issue are not identical, they are not patentably distinct from each other. See the table below for more detail.
Current App. No. 18/938,587
Patent No. US 12,169,518 B2
Claim 1: A method, performed by one or more computing devices, the method comprising:
obtaining an image representation comprising two or more segmentations of an image;
generating one or more latent space representations for each of a first segmentation and a second segmentation of the two or more segmentations;
generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more latent space representations;
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation;
assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors;
comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more latent space representations based on one or more other image representations; and
retrieving, based on the comparing of the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation.
Similarly, claims 8 and 15.
Claim 1: A method, performed by one or more computing devices, the method comprising:
obtaining an image representation comprising two or more segmentations of an image;
generating one or more latent space representations for each of a first segmentation and a second segmentation of the two or more segmentations;
generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more latent space representations;
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation;
assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors;
comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more latent space representations based on one or more other image representations; and
retrieving, based on the comparing the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation;
wherein the generating the one or more latent space representations for each of the first segmentation and the second segmentation comprises:
determining an anchor image representation from the respective one of the first segmentation or the second segmentation;
selecting a positive image representation;
selecting a negative image representation;
calculating a first vector representation between the anchor image representation and the positive image representation;
calculating a second vector representation between the anchor image representation and the negative image representation; and
generating the one or more latent space representations based on the anchor image representation, the positive image representation, and the negative image representation.
Similarly, claim 5.
Claim 2: The method of claim 1, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation.
Similarly, claims 11 and 16.
Claim 3: The method of claim 1, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation.
Claim 3: The method of claim 1, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
Similarly, claims 12 and 17.
Claim 4: The method of claim 1, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
Claim 4: The method of claim 1, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation.
Similarly, claims 13 and 18.
Claim 2: The method of claim 1, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation.
Claim 5: The method of claim 1, further comprising:
analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and
annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification.
Similarly, claims 9 and 19.
Claim 6: The method of claim 1, wherein the generating the one or more latent space representations for each of the first segmentation and the second segmentation comprises:
determining an anchor image representation from the respective one of the first segmentation or the second segmentation;
selecting a positive image representation;
selecting a negative image representation;
calculating a first vector representation between the anchor image representation and the positive image representation;
calculating a second vector representation between the anchor image representation and the negative image representation; and
generating the one or more latent space representations based on the anchor image representation, the positive image representation, and the negative image representation.
Similarly, claims 14 and 20.
(Claim 1)… wherein the generating the one or more latent space representations for each of the first segmentation and the second segmentation comprises:
determining an anchor image representation from the respective one of the first segmentation or the second segmentation;
selecting a positive image representation;
selecting a negative image representation;
calculating a first vector representation between the anchor image representation and the positive image representation;
calculating a second vector representation between the anchor image representation and the negative image representation; and
generating the one or more latent space representations based on the anchor image representation, the positive image representation, and the negative image representation.
Claim 7: The method of claim 1, wherein the at least one image representation from the one or more other image representations comprises a set of external search results that is a ranked in order of similarity to the dominant segmentation.
Claim 10: The method of claim 5, wherein the set of external search results is a ranked set of search results ranked in a first order based on at least one of a text search or a metadata search, and wherein the modifying the order of the set of external search results comprises reranking the set of external search results to a second order, wherein the second order is different from the first order, wherein a first image representation in the set of search results ranked in the second order is more similar to the dominant segmentation than the non-dominant segmentation.
Claim 10: The method of claim 8, further comprising, based at least on first segmentation being a dominant segmentation, extracting the first segmentation from the image representation.
Claims 5, 9, 10 and 19 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of ‘518 in view of JIN et al., US 2021/0319243 A1 (hereinafter “Jin” – as cited in the IDS filed 6 November 2024).
Claim 5: As noted in the table above, ‘518 does not explicitly teach the method of claim 1, further comprising: analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification.
However, Jin teaches this (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the annotations of ‘518 with the region annotations in an image of Jin according to known methods (i.e. annotating regions of an image based on salient objects). Motivation for doing so is that this provides appropriate and more accurate image feature encoding, and greatly improving image retrieval performance (Jin, [0099]).
Claim 9: As noted in the table above, ‘518 does not explicitly teach the method of claim 8, further comprising annotating the first segmentation with a first classification and annotating the second segmentation with a second classification, wherein the generating one or more latent space representations for the first segmentation is further based on the first classification.
However, Jin teaches this (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the annotations of ‘518 with the region annotations in an image of Jin according to known methods (i.e. annotating regions of an image based on salient objects). Motivation for doing so is that this provides appropriate and more accurate image feature encoding, and greatly improving image retrieval performance (Jin, [0099]).
Claim 10: As noted in the table above, ‘518 does not explicitly teach the method of claim 8, further comprising, based at least on first segmentation being a dominant segmentation, extracting the first segmentation from the image representation.
However, Jin teaches this (Jin, [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the extraction of ‘518 with the region feature vector generation based on weights of target regions of Jin according to known methods (i.e. generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions). Motivation for doing so is that that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image (Jin, [0043]).
Claim 19: As noted in the table above, ‘518 does not explicitly teach the one or more computing devices of claim 15, wherein the operations further comprise: analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification.
However, Jin teaches this (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
It would have been obvious to one of ordinary skill in the art the time the invention was made to combine the annotations of ‘518 with the region annotations in an image of Jin according to known methods (i.e. annotating regions of an image based on salient objects). Motivation for doing so is that this provides appropriate and more accurate image feature encoding, and greatly improving image retrieval performance (Jin, [0099]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 4, 5, 7-11, 13, 15, 16, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jin in view of Liu et al., US 2021/0049202 A1 (hereinafter “Liu” – as cited in the IDS filed 6 November 2024).
Claim 1: Jin teaches a method, performed by one or more computing devices, the method comprising:
obtaining an image representation comprising two or more segmentations of an image (Jin, [0042] note a user may specify a to-be-processed image (or target image) by using a terminal device (the smart phone 101, the tablet computer 102, or the portable computer 103 shown in FIG. 1). For example, the user transmits a target image to the server 105 by using the terminal device, [0043] note a plurality of target regions… a non-salient region in the image may be weakened, and a salient region in the image may be highlighted);
generating one or more representations for each of a first segmentation and a second segmentation of the two or more segmentations (Jin, [0043] note after determining the target image, the server 105 may extract a feature map of the target image. For example, the server may extract a feature map of the target image by using any convolutional layer in a convolutional neural network (CNN) model. After the feature map of the target image is extracted, the feature map may be divided into a plurality of target regions);
generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more representations (Jin, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions);
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation (Jin, [0043] note a non-salient region in the image… a salient region in the image; i.e. salient reads on dominant and non-salient reads on non-dominant);
assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors (Jin, [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition);
comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more representations based on one or more other image representations (Jin, [0066] note after the feature vector of the target image is obtained, an image matching the target image may be retrieved according to the feature vector of the target image, [0097] note image retrieval model may further include a similarity determining module, configured to determine a similarity between different images based on feature vectors of the images, to determine similar images based on the similarity); and
retrieving, based on the comparing of the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).
Jin does not explicitly teach latent space.
However, Liu teaches this (Liu, [0020] note images are initially mapped to base descriptors that characterize the images as vectors in a latent space. The first layer of the graph neural network may be configured to receive base descriptors as input. The base descriptors for images in the repository and the query image may be generated by applying a machine-learned model, such as an artificial neural network model (ANN), a convolutional neural network model (CNN), or other models that are configured to map an image to a base descriptor in the latent space such that images with similar content are closer to each other in the latent space, [0051] note content retrieval system 130 identifies 510 images relevant to the query image by selecting a relevant subset of image nodes. The image descriptors for the relevant subset of image nodes have above a similarity threshold with the query descriptor. The content retrieval system 130 returns 512 the images represented by the relevant subset of image nodes as a query result to the client device).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the feature map of Jin with the latent space of Liu according to known methods (i.e. characterizing images in a latent space). Motivation for doing so is that can effectively learn a new descriptor space that improves retrieval accuracy while maintaining computational efficiency (Liu, [0007]).
Claim 2: Jin and Liu teach the method of claim 1, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation (Jin, [0108] note the division unit 1104 is configured to: divide the feature map in a predetermined region division manner, to obtain the plurality of target regions; or perform an ROI pooling operation on the feature map, to map ROIs to the feature map to obtain the plurality of target regions, [0009] note a region of interest (ROI) pooling operation).
Claim 4: Jin and Liu teach the method of claim 1, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation (Jin, [0089] note the image may be divided in the three manners shown in FIG. 7, part (1) to part (3), to obtain 14 regions R1 to R14. Then a max pooling operation is performed in each region according to a coordinate position of the each region, to determine a feature vector v of the each region).
Claim 5: Jin and Liu teach the method of claim 1, further comprising:
analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
Claim 7: Jin and Liu teach the method of claim 1, wherein the at least one image representation from the one or more other image representations comprises a set of external search results that is a ranked in order of similarity to the dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).
Claim 8: Jin teaches a method, performed by one or more computing devices, the method comprising:
obtaining an image representation comprising two or more segmentations of an image, the two or more segmentations comprising a first segmentation and a second segmentation (Jin, [0042] note a user may specify a to-be-processed image (or target image) by using a terminal device (the smart phone 101, the tablet computer 102, or the portable computer 103 shown in FIG. 1). For example, the user transmits a target image to the server 105 by using the terminal device, [0043] note a plurality of target regions… a non-salient region in the image may be weakened, and a salient region in the image may be highlighted);
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation (Jin, [0043] note a non-salient region in the image… a salient region in the image; i.e. salient reads on dominant and non-salient reads on non-dominant);
based at least on the first segmentation being the dominant segmentation, generating one or more representations for the first segmentation (Jin, [0043] note after determining the target image, the server 105 may extract a feature map of the target image. For example, the server may extract a feature map of the target image by using any convolutional layer in a convolutional neural network (CNN) model. After the feature map of the target image is extracted, the feature map may be divided into a plurality of target regions);
generating one or more feature vectors for the first segmentation based at least on the one or more representations (Jin, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions);
comparing the one or more feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more representations based on one or more other image representations (Jin, [0066] note after the feature vector of the target image is obtained, an image matching the target image may be retrieved according to the feature vector of the target image, [0097] note image retrieval model may further include a similarity determining module, configured to determine a similarity between different images based on feature vectors of the images, to determine similar images based on the similarity); and
retrieving, based on the one or more feature vectors to the one or more other feature vectors, at least one image representation from the one or more other image representations that have similarity to the dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).
Jin does not explicitly teach latent space.
However, Liu teaches this (Liu, [0020] note images are initially mapped to base descriptors that characterize the images as vectors in a latent space. The first layer of the graph neural network may be configured to receive base descriptors as input. The base descriptors for images in the repository and the query image may be generated by applying a machine-learned model, such as an artificial neural network model (ANN), a convolutional neural network model (CNN), or other models that are configured to map an image to a base descriptor in the latent space such that images with similar content are closer to each other in the latent space, [0051] note content retrieval system 130 identifies 510 images relevant to the query image by selecting a relevant subset of image nodes. The image descriptors for the relevant subset of image nodes have above a similarity threshold with the query descriptor. The content retrieval system 130 returns 512 the images represented by the relevant subset of image nodes as a query result to the client device).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the feature map of Jin with the latent space of Liu according to known methods (i.e. characterizing images feature in a latent space). Motivation for doing so is that can effectively learn a new descriptor space that improves retrieval accuracy while maintaining computational efficiency (Liu, [0007]).
Claim 9: Jin and Liu teach the method of claim 8, further comprising annotating the first segmentation with a first classification and annotating the second segmentation with a second classification, wherein the generating one or more latent space representations for the first segmentation is further based on the first classification (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
Claim 10: Jin and Liu teach the method of claim 8, further comprising, based at least on first segmentation being a dominant segmentation, extracting the first segmentation from the image representation (Jin, [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions).
Claim 11: Jin and Liu teach the method of claim 8, further comprising analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation (Jin, [0108] note the division unit 1104 is configured to: divide the feature map in a predetermined region division manner, to obtain the plurality of target regions; or perform an ROI pooling operation on the feature map, to map ROIs to the feature map to obtain the plurality of target regions, [0009] note a region of interest (ROI) pooling operation).
Claim 13: Jin and Liu teach the method of claim 8, further comprising analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation (Jin, [0089] note the image may be divided in the three manners shown in FIG. 7, part (1) to part (3), to obtain 14 regions R1 to R14. Then a max pooling operation is performed in each region according to a coordinate position of the each region, to determine a feature vector v of the each region).
Claim 15: Jin teaches one or more computing devices comprising: one or more processors; and memory having a plurality of computer-executable instructions stored thereon; wherein the computer-executable instructions are configured to, when executed by the one or more processors, cause the one or more computing devices to perform a plurality of operations, the operations comprising:
obtaining an image representation comprising two or more segmentations of an image (Jin, [0042] note a user may specify a to-be-processed image (or target image) by using a terminal device (the smart phone 101, the tablet computer 102, or the portable computer 103 shown in FIG. 1). For example, the user transmits a target image to the server 105 by using the terminal device, [0043] note a plurality of target regions… a non-salient region in the image may be weakened, and a salient region in the image may be highlighted);
generating one or more representations for each of a first segmentation and a second segmentation of the two or more segmentations (Jin, [0043] note after determining the target image, the server 105 may extract a feature map of the target image. For example, the server may extract a feature map of the target image by using any convolutional layer in a convolutional neural network (CNN) model. After the feature map of the target image is extracted, the feature map may be divided into a plurality of target regions);
generating first one or more feature vectors for the first segmentation and second one or more feature vectors the second segmentation based on the one or more representations (Jin, [0061] note generating a feature vector of a target image according to weights of target regions and feature vectors of the target regions);
determining that the first segmentation is a dominant segmentation in the image representation and the second segmentation is a non-dominant segmentation in the image representation (Jin, [0043] note a non-salient region in the image… a salient region in the image; i.e. salient reads on dominant and non-salient reads on non-dominant);
assigning a greater weight to the first one or more feature vectors and a lower weight to the second one or more feature vectors to generate a weighted set of feature vectors (Jin, [0043] note the target regions may be weighted according to the feature vectors of the target regions in the image, so that a non-salient region in the image may be weakened, and a salient region in the image may be highlighted, thereby effectively improving accuracy and appropriateness (or quality) of the generated feature vector of the image, and improving an image processing effect, for example, improving an image retrieval effect and accuracy in image recognition);
comparing the weighted set of feature vectors to one or more other feature vectors, the one or more other feature vectors generated from one or more representations based on one or more other image representations (Jin, [0066] note after the feature vector of the target image is obtained, an image matching the target image may be retrieved according to the feature vector of the target image, [0097] note image retrieval model may further include a similarity determining module, configured to determine a similarity between different images based on feature vectors of the images, to determine similar images based on the similarity); and
retrieving, based on the comparing of the weighted set of feature vectors to one or more other feature vectors, at least one image representation from the one or more other image representations that has a greater similarity to the dominant segmentation than the non-dominant segmentation (Jin, [Fig. 10], [0102] note after feature vectors of images (or to-be-retrieved images) are extracted according to the technical solution of an example embodiment of the disclosure, retrieval may be performed according to the extracted feature vectors, and then retrieved images are sequentially returned in descending order based on similarity).
Jin does not explicitly teach latent space.
However, Liu teaches this (Liu, [0020] note images are initially mapped to base descriptors that characterize the images as vectors in a latent space. The first layer of the graph neural network may be configured to receive base descriptors as input. The base descriptors for images in the repository and the query image may be generated by applying a machine-learned model, such as an artificial neural network model (ANN), a convolutional neural network model (CNN), or other models that are configured to map an image to a base descriptor in the latent space such that images with similar content are closer to each other in the latent space, [0051] note content retrieval system 130 identifies 510 images relevant to the query image by selecting a relevant subset of image nodes. The image descriptors for the relevant subset of image nodes have above a similarity threshold with the query descriptor. The content retrieval system 130 returns 512 the images represented by the relevant subset of image nodes as a query result to the client device).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the feature map of Jin with the latent space of Liu according to known methods (i.e. characterizing images in a latent space). Motivation for doing so is that can effectively learn a new descriptor space that improves retrieval accuracy while maintaining computational efficiency (Liu, [0007]).
Claim 16: Jin and Liu teach the one or more computing devices of claim 15, wherein the operations further comprise analyzing a visual focus of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the visual focus of each of the first segmentation and the second segmentation within the image representation (Jin, [0108] note the division unit 1104 is configured to: divide the feature map in a predetermined region division manner, to obtain the plurality of target regions; or perform an ROI pooling operation on the feature map, to map ROIs to the feature map to obtain the plurality of target regions, [0009] note a region of interest (ROI) pooling operation).
Claim 18: Jin and Liu teach the one or more computing devices of claim 15, wherein the operations further comprise analyzing a location of each of the first segmentation and the second segmentation within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the analyzing of the location of each of the first segmentation and the second segmentation within the image representation (Jin, [0089] note the image may be divided in the three manners shown in FIG. 7, part (1) to part (3), to obtain 14 regions R1 to R14. Then a max pooling operation is performed in each region according to a coordinate position of the each region, to determine a feature vector v of the each region).
Claim 19: Jin and Liu teach the one or more computing devices of claim 15, wherein the operations further comprise: analyzing a classification of each of the first segmentation and the second segmentation within the image representation; and annotating the first segmentation with a first classification and the second segmentation with a second classification, wherein the determining that the first segmentation is the dominant segmentation is based at least on the first classification (Jin, [0099] note FIG. 9 shows a schematic diagram of weights of regions in an image according to an embodiment of the disclosure. For ease of description of the effects of the technical solution of an example embodiment of the disclosure, weights of the regions are annotated in the image, illustratively shown in FIG. 9. A box denoted as “GT” shown in FIG. 9 represents a region in which a salient object is located in each image. It can be seen from FIG. 9 that a weight of a region including a salient object is generally relatively large, and a weight of a region including no salient object is generally relatively small. In this way, a feature of a foreground region may be strengthened, and a feature of a background region may be weakened, thereby implementing more appropriate and more accurate image feature encoding, and greatly improving image retrieval performance).
Claims 3, 12 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jin and Liu in further view of Fan et al., US 2012/0263352 A1 (hereinafter “Fan”).
Claim 3: Jin and Liu do not explicitly teach the method of claim 1, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
However, Fang teaches this (Fang, [0034] note In addition to applying different filtering/processing techniques to each region, different weighting factors can be applied to the results as part of the overall matching process. These weights could be tuned according to a number of factors including: training data, expert prior knowledge, or adaptive online optimization. By associating different weights with the matching results from both the text and non-text regions, it is possible to further optimize overall system performance for both accuracy and yield).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the weighted regions of Jin and Liu with the different weights for text and non-text regions of Fan according to known methods (i.e. associating different weights based on text and non-text regions). Motivation for doing so is that it is possible to further optimize overall system performance for both accuracy and yield (Fan, [0034]).
Claim 12: Jin and Liu do not explicitly teach the method of claim 8, further comprising identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
However, Fang teaches this (Fang, [0034] note In addition to applying different filtering/processing techniques to each region, different weighting factors can be applied to the results as part of the overall matching process. These weights could be tuned according to a number of factors including: training data, expert prior knowledge, or adaptive online optimization. By associating different weights with the matching results from both the text and non-text regions, it is possible to further optimize overall system performance for both accuracy and yield).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the weighted regions of Jin and Liu with the different weights for text and non-text regions of Fan according to known methods (i.e. associating different weights based on text and non-text regions). Motivation for doing so is that it is possible to further optimize overall system performance for both accuracy and yield (Fan, [0034]).
Claim 17: Jin and Liu do not explicitly teach the one or more computing devices of claim 15, wherein the operations further comprise identifying that the second segmentation is a text element within the image representation and identifying that the first segmentation is a non-text element within the image representation, wherein the determining that the first segmentation is the dominant segmentation is based at least on the identifying that the second segmentation is the text element and the first segmentation is the non-text element.
However, Fang teaches this (Fang, [0034] note In addition to applying different filtering/processing techniques to each region, different weighting factors can be applied to the results as part of the overall matching process. These weights could be tuned according to a number of factors including: training data, expert prior knowledge, or adaptive online optimization. By associating different weights with the matching results from both the text and non-text regions, it is possible to further optimize overall system performance for both accuracy and yield).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the weighted regions of Jin and Liu with the different weights for text and non-text regions of Fan according to known methods (i.e. associating different weights based on text and non-text regions). Motivation for doing so is that it is possible to further optimize overall system performance for both accuracy and yield (Fan, [0034]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Lee et al., US 2021/0390700 A1 – Image segmentation based on identifying an image mask corresponding to one or more objects among a plurality of objects in an image based on a natural language expression.
Medoff et al., US 2019/0180446 A1 – Causing a minimum bounding region to be submitted to an image search engine to identify similar objects in the corpus of the image search engine.
Wang et al., US 8798362 B2 – Image segmentation including dominant color extraction.
Keating et al., US 20060015492 A1 – Comparison between visual images done by comparing the feature vectors of the most prominent regions (determined in any of a variety of ways, e.g., by size or shape) in each visual image.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Giuseppi Giuliani whose telephone number is (571)270-7128. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached at (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GIUSEPPI GIULIANI/Primary Examiner, Art Unit 2153