Last updated: May 29, 2026

Application No. 18/178,821

PANOPTIC SEGMENTATION WITH MULTI-DATABASE TRAINING USING MIXED EMBEDDING

Non-Final OA §102§103§112

Filed

Mar 06, 2023

Priority

Mar 07, 2022 — provisional 63/317,487 +1 more

Examiner

PARK, SOO JIN

Art Unit

2675

Tech Center

2600 — Communications

Assignee

NEC Laboratories America Inc.

OA Round

1 (Non-Final)

Interview Optional

— +17.2% interview lift. Examiner has a relatively high allowance rate (82%); +17.2% interview lift. A written response may suffice.

Based on 724 resolved cases, 2023–2026

Examiner Intelligence

PARK, SOO JIN View full profile →

Grants 82% — above average

Career Allowance Rate

593 granted / 724 resolved

+19.9% vs TC avg

Strong +17% interview lift

Without

With

+17.2%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

15 currently pending

Career history

736

Total Applications

across all art units

Statute-Specific Performance

§101

4.9%

-35.1% vs TC avg

§103

61.5%

+21.5% vs TC avg

§102

12.3%

-27.7% vs TC avg

§112

11.9%

-28.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 724 resolved cases

Office Action

§102 §103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 8, 9-12, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Regarding claim 8, the limitation “the latent space” lacks antecedent basis. Please amend the limitation to recite “the joint latent space” for maintaining consistency with claim 1. Similar reasons apply to claim 20 (i.e., claim 13 recites “joint latent space” while claim 20 recites “the latent space”).

	Regarding claim 9, the limitation “performing an image analysis task using the mask and the determined probability” renders the claim indefinite. Specifically, it is unclear and confusing what image analysis tasks this limitation refers to, and one of ordinary skill in the art could not interpret the metes and bounds of the claimed invention. Furthermore, the claim recites “segmentation mode” which appears to be a typographical error for “segmentation model”. Please amend the claim for clarification.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-3, 5-8, 13-15, and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang et al. (“Learning two-branch neural networks for image-text matching tasks”).
Regarding claim 1, Wang discloses:
embedding training images, from a plurality of training datasets having differing label spaces, in a joint latent space to generate first features (see sections 3.2 and 4.1, and fig 1, embedding training images xi, from a training dataset in non-joint embedding spaces, into a joint embedding space to generate image features);
embedding textual labels of the training images in the joint latent space to generate second features (see sections 3.2 and 4.1, and fig 1, embedding texts Yi+ and Yi- of the training images xi into the joint embedding space to generate text features); and
training a segmentation model using the first features and the second features (see sections 3.2 and 4.1, and fig 1, training an embedding network using the image features and the text features).

Regarding claim 2, Wang further discloses:
wherein the plurality of training datasets include a panoptic segmentation dataset, which includes class labels for individual image pixels (see fig 1, pixels in the blue bounding box are labeled with class “negative”), and
an object detection dataset, which includes a class label for a bounding box (see fig 1, pixels in the purple bounding box are labeled with class “positive”).
	
Regarding claim 3, Wang further discloses: wherein training the segmentation model uses a loss function that weights contributions from the panoptic segmentation dataset and the object detection dataset differently (see section 3.2.2, weights are varied during training based on a loss function, so that contributions from the class “positive” is higher than contributions from the class “negative”).

Regarding claim 5, Wang further discloses: wherein the joint latent space represents a visual object and a textual description of the visual object as vectors that are similar to one another according to a distance metric (see section 3.1 and fig 1, the image and text features are vectors; and see section 3.1, similarity of the image and the text features are indicated by a cosine distance between them in the joint embedding space).
Regarding claim 6, Wang further discloses: comparing the first features to the second features using a distance metric in the joint latent space (see section 3.2.2, determining similarity of the image and text features; and see section 3.1, the similarity is determined by a cosine distance between them).

Regarding claim 7, Wang further discloses: wherein the distance metric is a cosine distance (see rejection of claim 6, cosine distance).

Regarding claim 8, Wang further discloses:
wherein the segmentation model includes an image branch having an image embedding layer embeds images into the latent space (see section 3.2 and fig 1, the image features are extracted by a first branch of an embedding network via pre-trained VGG networks) and
a text branch having a text embedding layer that embeds text labels into the latent space (see section 3.2 and fig 1, the text features are extracted by a second branch of the embedding network via Fisher Vector encoding).

Regarding claims 13-15 and 17-20, Wang discloses everything claimed as applied above (see rejection of claims 1 and 5-8), and further discloses: a hardware processor; and a memory that stores a computer program which, when executed by the hardware processor, causes the hardware processor (see abstract, an inherent computer for computer vision).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 9-12 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Rong et al. (“Unambiguous text localization, retrieval, and recognition for cluttered scenes”).
Regarding claim 9, Wang discloses:
embedding an image using a segmentation model that includes an image branch having an image embedding layer that embeds images into a joint latent space (see sections 3.2 and 4.4 and fig 1, embedding a testing image using an embedding network that includes a first branch to embed that embeds images into a joint embedding space to generate image features);
embedding a textual query term using the segmentation model, wherein the segmentation model further includes a text branch having a text embedding layer that embeds text into the joint latent space (see sections 3.2 and 4.4 and fig 1, embedding a text query using the embedding network that includes a second branch that embeds texts into the joint embedding space to generate text features);
generating a mask for an object within the image using the segmentation model (see section 4.4 and fig 2, generating a bounding box over an object within the testing image using the embedding network); and
determining a probability that the object matches the textual query term using the segmentation mode (see sections 3.2 and 4.4, determining a similarity score indicating that the object matches the text query).
However, Wang does not disclose: performing an image analysis task using the mask and the determined probability (i.e., Wang discloses determining the object within the test image and the text query as a match using the similarity score, however, does not disclose further providing image analysis tasks using the object).
In a similar field of endeavor of performing phrase localization in an image based on a text query, Rong discloses: performing an image analysis task using the mask and the determined probability (see fig 2, a bounding box is generated around an object within an image that matches a text query; and section 3.4, text recognition is further performed on the object in the bounding box).
Therefore, it would have been obvious to one of ordinary skill in the art in the art before the effective filing date of the claimed invention to combine Wang with Rong, and perform phrase localization to generate a bounding box around an object within an image that matches a text query, as disclosed by Wang, and further perform text recognition on the object in the bounding box, as disclosed by Rong, for the purpose of providing literal-level awareness (see Rong section 3.4).

Regarding claim 10, Wang further discloses: wherein the joint latent space represents a visual object and a textual description of the visual object as vectors that are similar to one another according to a distance metric (see section 3.1 and fig 1, the image and text features are vectors; and see section 3.1, similarity of the image and the text features are indicated by a cosine distance between them in the joint embedding space).

Regarding claim 11, Wang further discloses: wherein determining the probability includes comparing the first features to the second features using a distance metric in the joint latent space (see sections 3.2 and 4.4, the similarity score is determined by a cosine distance).

Regarding claim 12, Wang and Rong further disclose: wherein the distance metric is a cosine distance (see rejection of claim 11, cosine distance).

Allowable Subject Matter
Claims 4 and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. None of the cited prior art references discloses the subject matter recited in claims 4 or 16.







Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: The following references each discloses achieving effective phrase localization by embedding images and texts into a joint space where a similarity is measured by distance: Lev-Tov et al. (USPN 10,459,995), Hussein et al. (USPN 10,496,885), Jin et al. (USPN 11,238,362), Chen et al. (“MSRC: Multimodal spatial regression with semantic context for phrase grounding”), and Plummer et al. (“Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models”).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SJ PARK whose telephone number is (571)270-3569. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANDREW MOYER can be reached at 571-272-9523. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SJ Park/Primary Examiner, Art Unit 2675

Read full office action

Prosecution Timeline

Mar 06, 2023

Application Filed

Nov 24, 2025

Non-Final Rejection mailed — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/318,159

Patent 12639929

METHOD AND APPARATUS FOR EXTRACTING IMAGE FEATURE BASED ON VISION TRANSFORMER

3y 0m to grant Granted May 26, 2026

18/534,721

Patent 12639963

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

2y 5m to grant Granted May 26, 2026

17/665,913

Patent 12616799

METHOD AND DEVICE FOR CHECKING AN ADHESIVE CONNECTION BETWEEN A HOLLOW NEEDLE OR CANNULA AND A HOLDING PART

4y 2m to grant Granted May 05, 2026

18/387,475

Patent 12614399

METHOD, APPARATUS AND SYSTEM FOR INSPECTING CELL CRUSH

2y 5m to grant Granted Apr 28, 2026

18/278,294

Patent 12602779

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND STORAGE MEDIUM

2y 7m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

99%

With Interview (+17.2%)

2y 8m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 724 resolved cases by this examiner. Grant probability derived from career allowance rate.