DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 8, 9-12, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 8, the limitation “the latent space” lacks antecedent basis. Please amend the limitation to recite “the joint latent space” for maintaining consistency with claim 1. Similar reasons apply to claim 20 (i.e., claim 13 recites “joint latent space” while claim 20 recites “the latent space”).
Regarding claim 9, the limitation “performing an image analysis task using the mask and the determined probability” renders the claim indefinite. Specifically, it is unclear and confusing what image analysis tasks this limitation refers to, and one of ordinary skill in the art could not interpret the metes and bounds of the claimed invention. Furthermore, the claim recites “segmentation mode” which appears to be a typographical error for “segmentation model”. Please amend the claim for clarification.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-3, 5-8, 13-15, and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang et al. (“Learning two-branch neural networks for image-text matching tasks”).
Regarding claim 1, Wang discloses:
embedding training images, from a plurality of training datasets having differing label spaces, in a joint latent space to generate first features (see sections 3.2 and 4.1, and fig 1, embedding training images xi, from a training dataset in non-joint embedding spaces, into a joint embedding space to generate image features);
embedding textual labels of the training images in the joint latent space to generate second features (see sections 3.2 and 4.1, and fig 1, embedding texts Yi+ and Yi- of the training images xi into the joint embedding space to generate text features); and
training a segmentation model using the first features and the second features (see sections 3.2 and 4.1, and fig 1, training an embedding network using the image features and the text features).
Regarding claim 2, Wang further discloses:
wherein the plurality of training datasets include a panoptic segmentation dataset, which includes class labels for individual image pixels (see fig 1, pixels in the blue bounding box are labeled with class “negative”), and
an object detection dataset, which includes a class label for a bounding box (see fig 1, pixels in the purple bounding box are labeled with class “positive”).
Regarding claim 3, Wang further discloses: wherein training the segmentation model uses a loss function that weights contributions from the panoptic segmentation dataset and the object detection dataset differently (see section 3.2.2, weights are varied during training based on a loss function, so that contributions from the class “positive” is higher than contributions from the class “negative”).
Regarding claim 5, Wang further discloses: wherein the joint latent space represents a visual object and a textual description of the visual object as vectors that are similar to one another according to a distance metric (see section 3.1 and fig 1, the image and text features are vectors; and see section 3.1, similarity of the image and the text features are indicated by a cosine distance between them in the joint embedding space).
Regarding claim 6, Wang further discloses: comparing the first features to the second features using a distance metric in the joint latent space (see section 3.2.2, determining similarity of the image and text features; and see section 3.1, the similarity is determined by a cosine distance between them).
Regarding claim 7, Wang further discloses: wherein the distance metric is a cosine distance (see rejection of claim 6, cosine distance).
Regarding claim 8, Wang further discloses:
wherein the segmentation model includes an image branch having an image embedding layer embeds images into the latent space (see section 3.2 and fig 1, the image features are extracted by a first branch of an embedding network via pre-trained VGG networks) and
a text branch having a text embedding layer that embeds text labels into the latent space (see section 3.2 and fig 1, the text features are extracted by a second branch of the embedding network via Fisher Vector encoding).
Regarding claims 13-15 and 17-20, Wang discloses everything claimed as applied above (see rejection of claims 1 and 5-8), and further discloses: a hardware processor; and a memory that stores a computer program which, when executed by the hardware processor, causes the hardware processor (see abstract, an inherent computer for computer vision).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 9-12 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Rong et al. (“Unambiguous text localization, retrieval, and recognition for cluttered scenes”).
Regarding claim 9, Wang discloses:
embedding an image using a segmentation model that includes an image branch having an image embedding layer that embeds images into a joint latent space (see sections 3.2 and 4.4 and fig 1, embedding a testing image using an embedding network that includes a first branch to embed that embeds images into a joint embedding space to generate image features);
embedding a textual query term using the segmentation model, wherein the segmentation model further includes a text branch having a text embedding layer that embeds text into the joint latent space (see sections 3.2 and 4.4 and fig 1, embedding a text query using the embedding network that includes a second branch that embeds texts into the joint embedding space to generate text features);
generating a mask for an object within the image using the segmentation model (see section 4.4 and fig 2, generating a bounding box over an object within the testing image using the embedding network); and
determining a probability that the object matches the textual query term using the segmentation mode (see sections 3.2 and 4.4, determining a similarity score indicating that the object matches the text query).
However, Wang does not disclose: performing an image analysis task using the mask and the determined probability (i.e., Wang discloses determining the object within the test image and the text query as a match using the similarity score, however, does not disclose further providing image analysis tasks using the object).
In a similar field of endeavor of performing phrase localization in an image based on a text query, Rong discloses: performing an image analysis task using the mask and the determined probability (see fig 2, a bounding box is generated around an object within an image that matches a text query; and section 3.4, text recognition is further performed on the object in the bounding box).
Therefore, it would have been obvious to one of ordinary skill in the art in the art before the effective filing date of the claimed invention to combine Wang with Rong, and perform phrase localization to generate a bounding box around an object within an image that matches a text query, as disclosed by Wang, and further perform text recognition on the object in the bounding box, as disclosed by Rong, for the purpose of providing literal-level awareness (see Rong section 3.4).
Regarding claim 10, Wang further discloses: wherein the joint latent space represents a visual object and a textual description of the visual object as vectors that are similar to one another according to a distance metric (see section 3.1 and fig 1, the image and text features are vectors; and see section 3.1, similarity of the image and the text features are indicated by a cosine distance between them in the joint embedding space).
Regarding claim 11, Wang further discloses: wherein determining the probability includes comparing the first features to the second features using a distance metric in the joint latent space (see sections 3.2 and 4.4, the similarity score is determined by a cosine distance).
Regarding claim 12, Wang and Rong further disclose: wherein the distance metric is a cosine distance (see rejection of claim 11, cosine distance).
Allowable Subject Matter
Claims 4 and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. None of the cited prior art references discloses the subject matter recited in claims 4 or 16.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: The following references each discloses achieving effective phrase localization by embedding images and texts into a joint space where a similarity is measured by distance: Lev-Tov et al. (USPN 10,459,995), Hussein et al. (USPN 10,496,885), Jin et al. (USPN 11,238,362), Chen et al. (“MSRC: Multimodal spatial regression with semantic context for phrase grounding”), and Plummer et al. (“Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models”).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SJ PARK whose telephone number is (571)270-3569. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANDREW MOYER can be reached at 571-272-9523. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SJ Park/Primary Examiner, Art Unit 2675