Prosecution Insights
Last updated: April 19, 2026
Application No. 18/178,821

PANOPTIC SEGMENTATION WITH MULTI-DATABASE TRAINING USING MIXED EMBEDDING

Non-Final OA §102§103§112
Filed
Mar 06, 2023
Examiner
PARK, SOO JIN
Art Unit
2675
Tech Center
2600 — Communications
Assignee
NEC Laboratories America Inc.
OA Round
1 (Non-Final)
82%
Grant Probability
Favorable
1-2
OA Rounds
2y 8m
To Grant
99%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
589 granted / 720 resolved
+19.8% vs TC avg
Strong +17% interview lift
Without
With
+17.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
15 currently pending
Career history
735
Total Applications
across all art units

Statute-Specific Performance

§101
9.0%
-31.0% vs TC avg
§103
37.3%
-2.7% vs TC avg
§102
26.3%
-13.7% vs TC avg
§112
19.3%
-20.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 720 resolved cases

Office Action

§102 §103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 8, 9-12, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Regarding claim 8, the limitation “the latent space” lacks antecedent basis. Please amend the limitation to recite “the joint latent space” for maintaining consistency with claim 1. Similar reasons apply to claim 20 (i.e., claim 13 recites “joint latent space” while claim 20 recites “the latent space”). Regarding claim 9, the limitation “performing an image analysis task using the mask and the determined probability” renders the claim indefinite. Specifically, it is unclear and confusing what image analysis tasks this limitation refers to, and one of ordinary skill in the art could not interpret the metes and bounds of the claimed invention. Furthermore, the claim recites “segmentation mode” which appears to be a typographical error for “segmentation model”. Please amend the claim for clarification. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-3, 5-8, 13-15, and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang et al. (“Learning two-branch neural networks for image-text matching tasks”). Regarding claim 1, Wang discloses: embedding training images, from a plurality of training datasets having differing label spaces, in a joint latent space to generate first features (see sections 3.2 and 4.1, and fig 1, embedding training images xi, from a training dataset in non-joint embedding spaces, into a joint embedding space to generate image features); embedding textual labels of the training images in the joint latent space to generate second features (see sections 3.2 and 4.1, and fig 1, embedding texts Yi+ and Yi- of the training images xi into the joint embedding space to generate text features); and training a segmentation model using the first features and the second features (see sections 3.2 and 4.1, and fig 1, training an embedding network using the image features and the text features). Regarding claim 2, Wang further discloses: wherein the plurality of training datasets include a panoptic segmentation dataset, which includes class labels for individual image pixels (see fig 1, pixels in the blue bounding box are labeled with class “negative”), and an object detection dataset, which includes a class label for a bounding box (see fig 1, pixels in the purple bounding box are labeled with class “positive”). Regarding claim 3, Wang further discloses: wherein training the segmentation model uses a loss function that weights contributions from the panoptic segmentation dataset and the object detection dataset differently (see section 3.2.2, weights are varied during training based on a loss function, so that contributions from the class “positive” is higher than contributions from the class “negative”). Regarding claim 5, Wang further discloses: wherein the joint latent space represents a visual object and a textual description of the visual object as vectors that are similar to one another according to a distance metric (see section 3.1 and fig 1, the image and text features are vectors; and see section 3.1, similarity of the image and the text features are indicated by a cosine distance between them in the joint embedding space). Regarding claim 6, Wang further discloses: comparing the first features to the second features using a distance metric in the joint latent space (see section 3.2.2, determining similarity of the image and text features; and see section 3.1, the similarity is determined by a cosine distance between them). Regarding claim 7, Wang further discloses: wherein the distance metric is a cosine distance (see rejection of claim 6, cosine distance). Regarding claim 8, Wang further discloses: wherein the segmentation model includes an image branch having an image embedding layer embeds images into the latent space (see section 3.2 and fig 1, the image features are extracted by a first branch of an embedding network via pre-trained VGG networks) and a text branch having a text embedding layer that embeds text labels into the latent space (see section 3.2 and fig 1, the text features are extracted by a second branch of the embedding network via Fisher Vector encoding). Regarding claims 13-15 and 17-20, Wang discloses everything claimed as applied above (see rejection of claims 1 and 5-8), and further discloses: a hardware processor; and a memory that stores a computer program which, when executed by the hardware processor, causes the hardware processor (see abstract, an inherent computer for computer vision). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 9-12 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Rong et al. (“Unambiguous text localization, retrieval, and recognition for cluttered scenes”). Regarding claim 9, Wang discloses: embedding an image using a segmentation model that includes an image branch having an image embedding layer that embeds images into a joint latent space (see sections 3.2 and 4.4 and fig 1, embedding a testing image using an embedding network that includes a first branch to embed that embeds images into a joint embedding space to generate image features); embedding a textual query term using the segmentation model, wherein the segmentation model further includes a text branch having a text embedding layer that embeds text into the joint latent space (see sections 3.2 and 4.4 and fig 1, embedding a text query using the embedding network that includes a second branch that embeds texts into the joint embedding space to generate text features); generating a mask for an object within the image using the segmentation model (see section 4.4 and fig 2, generating a bounding box over an object within the testing image using the embedding network); and determining a probability that the object matches the textual query term using the segmentation mode (see sections 3.2 and 4.4, determining a similarity score indicating that the object matches the text query). However, Wang does not disclose: performing an image analysis task using the mask and the determined probability (i.e., Wang discloses determining the object within the test image and the text query as a match using the similarity score, however, does not disclose further providing image analysis tasks using the object). In a similar field of endeavor of performing phrase localization in an image based on a text query, Rong discloses: performing an image analysis task using the mask and the determined probability (see fig 2, a bounding box is generated around an object within an image that matches a text query; and section 3.4, text recognition is further performed on the object in the bounding box). Therefore, it would have been obvious to one of ordinary skill in the art in the art before the effective filing date of the claimed invention to combine Wang with Rong, and perform phrase localization to generate a bounding box around an object within an image that matches a text query, as disclosed by Wang, and further perform text recognition on the object in the bounding box, as disclosed by Rong, for the purpose of providing literal-level awareness (see Rong section 3.4). Regarding claim 10, Wang further discloses: wherein the joint latent space represents a visual object and a textual description of the visual object as vectors that are similar to one another according to a distance metric (see section 3.1 and fig 1, the image and text features are vectors; and see section 3.1, similarity of the image and the text features are indicated by a cosine distance between them in the joint embedding space). Regarding claim 11, Wang further discloses: wherein determining the probability includes comparing the first features to the second features using a distance metric in the joint latent space (see sections 3.2 and 4.4, the similarity score is determined by a cosine distance). Regarding claim 12, Wang and Rong further disclose: wherein the distance metric is a cosine distance (see rejection of claim 11, cosine distance). Allowable Subject Matter Claims 4 and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. None of the cited prior art references discloses the subject matter recited in claims 4 or 16. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: The following references each discloses achieving effective phrase localization by embedding images and texts into a joint space where a similarity is measured by distance: Lev-Tov et al. (USPN 10,459,995), Hussein et al. (USPN 10,496,885), Jin et al. (USPN 11,238,362), Chen et al. (“MSRC: Multimodal spatial regression with semantic context for phrase grounding”), and Plummer et al. (“Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models”). Any inquiry concerning this communication or earlier communications from the examiner should be directed to SJ PARK whose telephone number is (571)270-3569. The examiner can normally be reached M-F 8:00 AM - 5:00 PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANDREW MOYER can be reached at 571-272-9523. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SJ Park/Primary Examiner, Art Unit 2675
Read full office action

Prosecution Timeline

Mar 06, 2023
Application Filed
Nov 17, 2025
Non-Final Rejection — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602779
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND STORAGE MEDIUM
2y 5m to grant Granted Apr 14, 2026
Patent 12597481
SYSTEM, MOBILE TERMINAL DEVICE, PROGRAM, AND METHOD
2y 5m to grant Granted Apr 07, 2026
Patent 12585700
VIDEO RETRIEVAL METHOD AND APPARATUS BASED ON KEY FRAME DETECTION
2y 5m to grant Granted Mar 24, 2026
Patent 12586402
MACHINE-LEARNING MODELS FOR IMAGE PROCESSING
2y 5m to grant Granted Mar 24, 2026
Patent 12579829
APPLICATION DEVELOPMENT ENVIRONMENT FOR BIOLOGICAL SAMPLE ASSESSMENT PROCESSING
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+17.3%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 720 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month