Last updated: April 19, 2026

Application No. 18/634,560

METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR IMAGE SEGMENTATION

Non-Final OA §101§102§103

Filed

Apr 12, 2024

Examiner

SHIN, SOO JUNG

Art Unit

2667

Tech Center

2600 — Communications

Assignee

BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.

OA Round

1 (Non-Final)

Interview Optional

— +16.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 604 resolved cases, 2023–2026

Examiner Intelligence

SHIN, SOO JUNG View full profile →

Grants 87% — above average

Career Allow Rate

527 granted / 604 resolved

+25.3% vs TC avg

Strong +16% interview lift

Without

With

+16.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 4m

Avg Prosecution

28 currently pending

Career history

632

Total Applications

across all art units

Statute-Specific Performance

§101

7.6%

-32.4% vs TC avg

§103

37.5%

-2.5% vs TC avg

§102

19.9%

-20.1% vs TC avg

§112

24.2%

-15.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 604 resolved cases

Office Action

§101 §102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

35 U.S.C. 101 requires that a claimed invention must fall within one of the four eligible categories of invention (i.e. process, machine, manufacture, or composition of matter) and must not be directed to subject matter encompassing a judicially recognized exception as interpreted by the courts.  MPEP 2106.  The four eligible categories of invention  include: (1) process which is an act, or a series of acts or steps, (2) machine which is an concrete thing, consisting of parts, or of certain devices and combination of devices, (3) manufacture which is an article produced from raw or prepared materials by giving to these materials new forms, qualities, properties, or combinations, whether by hand labor or by machinery, and (4) composition of matter which is all compositions of two or more substances and all composite articles, whether they be the results of chemical union, or of mechanical mixture, or whether they be gases, fluids, powders or solids.  MPEP 2106(I).
Claim 20 is rejected under 35 U.S.C. 101 as not falling within one of the four statutory categories of invention because the broadest reasonable interpretation of the instant claims in light of the specification encompasses transitory signals ([0121] of the specification recites that “[t]he electronic device 800 may further include additional removable/non-removable, transitory/non-transitory, volatile/non-volatile storage medium” emphasis added). Transitory signals are not within one of the four statutory categories (i.e. non-statutory subject matter). See MPEP 2106(I). Claims directed toward a non-transitory computer readable medium may qualify as a manufacture and make the claim patent-eligible subject matter. MPEP 2106(I). Therefore, amending the claims to recite a “non-transitory computer-readable medium” would resolve this issue.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 2, 3, 5, 7, 10, 12, 13, 15, 17, 19, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Morita et al. (“Interactive Image Manipulation with Complex Text Instructions,” arXiv:2211.15352v1 [cs.CV] 25 Nov 2022), hereinafter referred to as Morita.
Regarding claim 1, Morita teaches a method for image segmentation, comprising:
extracting an image feature representation of a target image using a trained image encoder (Morita pg. 5 left column: “we upsample the image feature encoded Itr_SR by the pre-trained VGG-16 … we refine ã with residual block according to [14] to obtain the image feature which edit the descriptive information”; Morita Fig. 2: “Image encoder”);
for each of a plurality of classes,
generating, using a trained text encoder, a text feature representation corresponding to a name of the class (Morita pg. 5 left column: “The text features consist of the word-visual vector tv and the word-instruction vector ti encoded by a pretrained RNN”; Morita Fig. 2: “Text encoder”), and
determining a candidate segmentation map for the target image and a class confidence of the class based on the image feature representation and the text feature representation, the candidate segmentation map indicating whether respective pixels in the target image are classified into the class (Morita pg. 5 left column: “the segmentation map Iseg from the pre-processing network … during the generation process in TRDCM, the segmentation map is used to guide the generation of text-relevant region manipulation result that is consistent with the size of the segmentation map”; Morita Fig. 2: “allows users to edit the segmentation map automatically”; Morita pg. 6, §3.5 Loss function and Training: “the generator loss LG consists of an adversarial loss Ladv, a perceptual loss Lper, a text-image matching loss LDAMSM, and a regularization loss Lreg … The discriminator loss LD is defined”; Morita Table 1);
selecting, from the plurality of classes, at least one class related to the target image based on a plurality of class confidences determined respectively for the plurality of classes (Morita Figs. 4-10; Morita Eq. (6); Morita pg. 7 left column: “Our model (auto seg.) means a model that performs image manipulation using a segmentation map automatically detected by the segmentation network … In the quantitative experiment, we evaluate the IS [23], NIMA [25], and FID [10] on randomly selected images … this means that our segmentation network produces highly discriminative images by considering the text-relevant and text-irrelevant content”); and
determining a target segmentation map for the target image based on the at least one candidate segmentation map and the at least one class confidence determined for the at least one selected class, the target segmentation map indicating whether respective pixels in the target image are classified into a class amongst the at least one class (Morita Figs. 9-10).

Regarding claim 2, Morita teaches the method of claim 1, wherein generating the text feature representation for each of the plurality of classes comprises:
generating at least one text sequence containing the name of the class (Morita Fig. 2: “Input: this bird has a black back and sides with a white belly”);
extracting, using the text encoder, at least one sequence feature of the at least one text sequence respectively (Morita Fig. 2: extracted features are bolded, see “black back and sides” and “white belly”); and
generating the text feature representation by aggregating the at least one sequence feature (Morita Fig. 2: see the manipulation and combination phases; Morita Fig. 3).

Regarding claim 3, Morita teaches the method of claim 2, wherein the at least one text sequence comprises a plurality of different text sequences with each text sequence containing the name of the class (see Morita Figs. 2-3 discussed above).

Regarding claim 5, Morita teaches the method of claim 1, wherein determining the candidate segmentation map for each of the plurality of classes comprises:
determining an attention map based on the text feature representation and the image feature representation, the attention map indicating a plurality of correlations between the class and a plurality of image blocks in the target image (Morita pg. 5 left column: “the word-visual vector tv is input to the spatial and channel-wise attention [13] to create the attention features, which are then combined with hlast to obtain the intermediate feature a”; Morita pg. 6 right column: “the discriminator loss LD consists of an adversarial loss Ladv and the text-image correlation loss Lcor”); and
generating the candidate segmentation map by processing the attention map (Morita Figs. 9-10; Morita pg. 4 left column: “The second task is to acquire the candidate class label W of objects”).

Regarding claim 7, Morita teaches the method of claim 5, wherein extracting the image feature representation comprises:
extracting a plurality of image feature for the plurality of image blocks of the target image using the image encoder (Morita Fig. 2: “Image encoder”; Morita pg. 5 left column: “obtain the image feature … concatenate the image feature”); and
determining the image feature representation by aggregating the plurality of image features (Morita Fig. 2: see D1-D3), and
wherein determining the attention map comprises: determining the attention map based on the text feature representation and the plurality of image features (Morita Fig. 2; Morita pg. 5-6, §3.3. Combination phase).

Regarding claim 10, Morita teaches the method of claim 1, wherein the at least one selected class comprises at least two classes (Morita Fig. 2), and wherein determining the target segmentation map comprises:
for each pixel in the target image, determining that the pixel is classified into a target class amongst the at least two classes based on at least two class confidences corresponding to the at least two classes (Morita Figs. 4, 6-10).

Regarding claim 12, Morita teaches an electronic device, comprising:
at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions executable by the at least one processing unit, which, when executed by the at least one processing unit, cause the electronic device to perform acts (Morita Abstract: “multimedia processing and computer vision”) described in claim 1. 
Therefore, claim 12 is rejected using the same rationale as applied to claim 1 discussed above.

Claim 13 is rejected using the same rationale as applied to claim 2 discussed above.

Claim 15 is rejected using the same rationale as applied to claim 5 discussed above.

Claim 17 is rejected using the same rationale as applied to claim 7 discussed above.

Claim 19 is rejected using the same rationale as applied to claim 10 discussed above.

Regarding claim 20, Morita teaches a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, causing the processor to implement acts (Morita Abstract: “multimedia processing and computer vision”) comprising the steps described in claim 1. Therefore, claim 20 is rejected using the same rationale as applied to claim 1 discussed above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 6 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Morita et al. (“Interactive Image Manipulation with Complex Text Instructions,” arXiv:2211.15352v1 [cs.CV] 25 Nov 2022), in view of Yu et al. (US 2024/0013504 A1), hereinafter referred to as Morita and Yu, respectively.
Regarding claim 6, Morita teaches the method of claim 5, wherein generating the candidate segmentation map by processing the attention map comprises:
upsampling attention map to a size corresponding to the target image, to obtain an upsampled attention map (Morita pg. 5 left column: “we upsample the image feature encoded Itr_SR by the pre-trained VGG-16”).
However, Morita does not appear to explicitly teach generating the candidate segmentation map by applying a conditional random field (CRF) process to the upsampled attention map.
Pertaining to the same field of endeavor, Yu teaches generating the candidate segmentation map by applying a conditional random field (CRF) process to the upsampled attention map (Yu ¶¶0020: “the training can include minimizing a loss function that includes a multiple instance learning (MIL) loss term and a conditional random field (CRF) loss term.”; Yu ¶¶0047: “a conditional random field (CRF) loss module 424 that computes, for the image segmentation 402, a CRF loss 432”; Yu ¶¶0049: “The CRF loss 432, computed by the CRF loss module 424, is used to smooth and refine the segmentation masks generated by the referring image segmentation model 150”).
Morita and Yu are considered to be analogous art because they are directed to image processing using machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method and system for interactive image manipulation with complex text instructions (as taught by Morita) to use CRF (as taught by Yu) because the combination can minimize the loss function to smooth and refine the segmentation masks (Yu ¶¶0049).

Claim 16 is rejected using the same rationale as applied to claim 6 discussed above.

Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Morita et al. (“Interactive Image Manipulation with Complex Text Instructions,” arXiv:2211.15352v1 [cs.CV] 25 Nov 2022), in view of Dekel et al. (US 2024/0419382 A1), hereinafter referred to as Morita and Dekel, respectively.
Regarding claim 11, Morita teaches the method of claim 1, wherein the image encoder and the text encoder are trained based on training data, the training data comprising sample image-text pairs (Morita pg. 4 left column: “This network allows linking the text with the corresponding region information in the image. It takes the input image I and the text instructions S as the input, then links text-region information by performing two tasks simultaneously … We detect all objects with retrained Deeplab3 [4] from scratch on CUB and COCO datasets”).
However, Morita does not appear to explicitly teach that the training data is unlabeled.
Pertaining to the same field of endeavor, Dekel teaches using unlabeled training data (Dekel ¶¶0072: “When trained on a set of examples without supervision, a DBN can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors … This composition leads to a fast, layer-by-layer unsupervised training procedure, where contrastive divergence is applied to each sub-network in turn, starting from the ‘lowest’ pair of layers (the lowest visible layer is a training set)”).
Morita and Dekel are considered to be analogous art because they are directed to image processing using machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method and system for interactive image manipulation with complex text instructions (as taught by Morita) to use unlabeled training data (as taught by Dekel) because the combination leads to a fast, layer-by-layer unsupervised training (Dekel ¶¶0072).

Allowable Subject Matter
Claims 4, 8, 9, 14, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:  
Regarding claim 4, the prior art of record teaches that it was known at the time the application was filed to use the method of claim 2, but does not appear to teach or suggest generating the at least one text sequence by filling the name of the class into at least one prompt template.

Regarding claim 8, the prior art of record teaches that it was known at the time the application was filed to use the method of claim 1, but does not appear to explicitly teach or suggest selecting a first number of classes from the plurality of classes based on the plurality of class confidences determined respectively for the plurality of classes; determining a threshold confidence based on the first number of class confidences corresponding to the first number of classes respectively; and selecting, from the first number of classes, the at least one class with a corresponding class confidence exceeding the threshold confidence.

Claim 9 is objected to for the same reason as claim 8 due to dependency. 

Claim 14 is objected to for the same reason as claim 4 discussed above.

Claim 18 is objected to for the same reason as claim 8 discussed above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO J SHIN whose telephone number is (571)272-9753. The examiner can normally be reached M-F; 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached at (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Soo Shin/Primary Examiner, Art Unit 2667                                                                                                                                                                                                        571-272-9753
soo.shin@uspto.gov

Read full office action

Prosecution Timeline

Apr 12, 2024

Application Filed

Feb 18, 2026

Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/283,372

Patent 12602768

SURFACE DEFECT DETECTION MODEL TRAINING METHOD, AND SURFACE DEFECT DETECTION METHOD AND SYSTEM

2y 5m to grant Granted Apr 14, 2026

18/187,749

Patent 12586411

TARGET IDENTIFICATION DEVICE, ELECTRONIC DEVICE, TARGET IDENTIFICATION METHOD, AND STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

18/363,832

Patent 12586204

Detecting Optical Discrepancies In Captured Images

2y 5m to grant Granted Mar 24, 2026

18/368,111

Patent 12586216

METHOD OF DETERMINING A MOTION OF A HEART WALL

2y 5m to grant Granted Mar 24, 2026

18/124,362

Patent 12573021

ULTRASONIC DEFECT DETECTION AND CLASSIFICATION SYSTEM USING MACHINE LEARNING

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

87%

Grant Probability

99%

With Interview (+16.0%)

2y 4m

Median Time to Grant

Low

PTA Risk

Based on 604 resolved cases by this examiner. Grant probability derived from career allow rate.

METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR IMAGE SEGMENTATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email