Last updated: April 19, 2026
Application No. 18/216,918
IMAGE SEGMENTATION MODEL TRAINING METHOD AND APPARATUS, IMAGE SEGMENTATION METHOD AND APPARATUS, AND DEVICE

Non-Final OA §102§103
Filed
Jun 30, 2023
Examiner
CONNER, SEAN M
Art Unit
2663
Tech Center
2600 — Communications
Assignee
Tencent Technology (Shenzhen) Company Limited
OA Round
3 (Non-Final)
Interview Optional

— +27.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 454 resolved cases, 2023–2026
Examiner Intelligence

CONNER, SEAN M View full profile →
Grants 79% — above average
Career Allow Rate
357 granted / 454 resolved
+16.6% vs TC avg
Strong +27% interview lift
Without
With
+27.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
22 currently pending
Career history
476
Total Applications
across all art units
Statute-Specific Performance

§101
11.5%
-28.5% vs TC avg
§103
47.9%
+7.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.1%
-18.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 454 resolved cases
Office Action

§102 §103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  The Amendment filed 23 January 2026 (hereinafter “the Amendment”) has been entered and considered. Claims 1 and 19-20 have been amended. Claims 1-20, all the claims pending in the application, are rejected.

Response to Amendment
Claim Rejections - 35 USC § 102 (Part 1) based on Xu
On page 13 of the Amendment, Applicant asserts that Xu does not teach or suggest “updating a model parameter of a decoder of the second network model by applying an exponential moving average to a model parameter of a decoder of the first network model”, as recited in the amended independent claims. In support of this assertion, Applicant acknowledges that Xu discloses a decoder “Dec” and another decoder “EMA Dec”, but confusingly contends that there is no disclosure or suggestion that a moving average is applied to “Dec” to update a model parameter of “EMA Dec”. The Examiner respectfully disagrees and maintains that Xu teaches the limitation in question. 
For example, the caption of Figure 2 (reproduced and annotated below) discloses that “Enc and Dec represent the student model, while EMA Enc and EMA Dec denote the self-ensembling teacher model updated as the exponential moving average (EMA) of student weights” while illustrating this concept with an arrow that indicates the transfer of weights from the first encoder to the second (emphasis added):

    PNG
    media_image1.png
    638
    1056
    media_image1.png
    Greyscale


Similarly, Section III(A) of Xu discloses the following:

    PNG
    media_image2.png
    266
    518
    media_image2.png
    Greyscale


Clearly, the “teacher model” whose weights ˜θt are being updated is the “EMA Enc – EMA Dec” from Fig. 2, and the “student model” who’s weights θt are subject to the EMA process is the “Enc-Dec” from Fig. 2. That is, Xu clearly discloses updating a model parameter (˜θt) of a decoder of the second network model (teacher model EMA Dec) by applying an exponential moving average (EMA) to a model parameter (θt) of a decoder of the first network model (student model Dec), as claimed, contrary to Applicant’s assertions. 
Importantly, Xu’s equation (2) is identical to Formula (1) in the specification of the subject application: 

    PNG
    media_image3.png
    593
    644
    media_image3.png
    Greyscale

That is, not only does Xu teach the claim limitation in question, Xu appears to teach the limitation in an identical manner as the disclosed invention. 
For all the foregoing reasons, the prior art rejections based on Xu are maintained. 

Claim Rejections - 35 USC § 103 (Part 2) based on Yang, Ouyang and Li
On pages 14-15 of the Amendment, Applicant asserts that Li does not teach or suggest “updating a model parameter of a decoder of the second network model by applying an exponential moving average to a model parameter of a decoder of the first network model”, as recited in the amended independent claims. In support of this assertion, Applicant acknowledges that Li discloses a parallel encoder-decoder networks and that the teacher model weights are updated as an EMA of the student weights, but confusingly contends that there is no disclosure or suggestion that the weights of one decoder are updated by applying an exponential moving average to weights of another decoder. The Examiner respectfully disagrees and maintains that Li teaches the limitation in question.
For example, Fig. 2 of Li (annotated and reproduced below) discloses “teacher and student models”, wherein “the weight of the teacher model is the EMA of the student model”:

    PNG
    media_image4.png
    420
    1060
    media_image4.png
    Greyscale

Similarly, Section III(A) of Li discloses the following:

    PNG
    media_image5.png
    216
    526
    media_image5.png
    Greyscale


Clearly, the Li teaches a “teacher model” whose weights θ’t are being updated according to the EMA of the weights θt of the “student model” from Fig. 2. Since Li’s models are encoder-decoder models, as acknowledged by the Applicant, Li clearly discloses updating a model parameter (θ’t) of a decoder of the second network model (teacher model EMA Dec) by applying an exponential moving average (EMA) to a model parameter (θt) of a decoder of the first network model (student model Dec), as claimed, contrary to Applicant’s assertions. 
Importantly, Li’s equation produced above is identical to Formula (1) in the specification of the subject application: 

    PNG
    media_image3.png
    593
    644
    media_image3.png
    Greyscale


That is, not only does Li teach the claim limitation in question, Li appears to teach the limitation in an identical manner as the disclosed invention. 
For all the foregoing reasons, the prior art rejections based on the combination of Yang, Ouyang, and Li are maintained.

Claim Rejections - 35 USC § 102 (Part 1)
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation” by Xu et al. (hereinafter “Xu”).

According to MPEP § 2153.01(a), “If…the application names fewer joint inventors than a publication (e.g., the application names as joint inventors A and B, and the publication names as authors A, B and C), it would not be readily apparent from the publication that it is an inventor-originated disclosure and the publication would be treated as prior art under AIA  35 U.S.C. 102(a)(1)”. 
The Examiner notes that Xu appears to have four authors in common with the inventors of listed in the subject application (XU, Zhe; LU, Donghuan; MA, Kai; and ZHENG, Yefeng). However, five additional people (Yixin Wang, Lequan Yu, Jiangpeng Yan, Jie Luo, and Raymond Kai-yu Tong) are listed as co-authors of the Xu publication. Therefore, it is not readily apparent from the Xu publication that it is an inventor-originated disclosure. Based upon the earlier publication date of the Xu reference (9/28/21) relative to the effective filing date of the subject application (11/10/21), according to MPEP § 2153.01(a), Xu constitutes prior art under 35 U.S.C. 102(a)(1). 
Further, the publication date of the reference falls within the one year grace period of the effective filing date of the subject application. Accordingly, this rejection under 35 U.S.C. 102(a)(1) might be overcome by: (1) a showing under 37 CFR 1.130(a) that the subject matter disclosed in the reference was obtained directly or indirectly from the inventor of this application and is thus not prior art in accordance with 35 U.S.C.102(b)(1)(A); or (2) a showing under 37 CFR 1.130(b) of a prior public disclosure under 35 U.S.C. 102(b)(1)(B). See MPEP § 717.01.

As to independent claim 1, Xu discloses an image segmentation model training method, performed by an electronic device (Abstract discloses that Xu is directed to “a novel cyclic prototype consistency learning (CPCL) framework” which is used, for example, for “brain tumor segmentation from MRI and kidney segmentation from CT images”; Section IV(A)(4) discloses that the framework is “implemented in Python with PyTorch, using NVIDIA GeForce RTX 3090 GPU with 24GB memory”), the image segmentation model training method comprising: acquiring a first image (Xl), a second image (Xu), and a labeled image (Yl), the labeled image being an image segmentation result obtained by labeling the first image (Section III(A) discloses a training set comprising labeled dataset which includes a labeled image Xl and a “ground-truth segmentation label” Yl of the labeled image Xl, wherein the training set further comprises an unlabeled set which includes unlabeled image Xu; see also Fig. 2, reproduced and annotated below); acquiring a first predicted image (Pl) according to a first network model (Enc-Dec), the first predicted image being an image segmentation result obtained by predicting the first image (Fig. 2 shows a “student model” comprising encoder Enc and decoder Dec which predicts a segmentation result Pl of the labeled image Xl); acquiring a second predicted image (Pu) according to a second network model (EMA Enc – EMA Dec), the second predicted image being an image segmentation result obtained by predicting the second image (Fig. 2 shows a “self-ensembling teacher model” comprising encoder EMA Enc and decoder EMA Dec which predicts a segmentation result Pu of the unlabeled image Xu; see also Section III); determining a reference image (Pl2u) of the second image based on the second image (Xu) and the labeled image (Yl), the reference image being an image segmentation result obtained by calculating the second image (Fig. 2 shows probability map Pl2u is obtained based on the “unlabeled features” of the unlabeled image Xu (such features being calculated, as claimed, by Enc EMA extracting the unlabeled features from the unlabeled image Xu) and an expanded “labeled prototype” that is generated based on the “labeled features” Fl of the labeled image Xl combined with “ground-truth segmentation label” Yl; see also Section III); updating a model parameter of the first network model based on the first predicted image (Pl), the labeled image (Yl), the second predicted image (Pu), and the reference image (Pl2u) to obtain an image segmentation model (Section III and Fig. 2 discloses that the CPCL framework (including the “student model” corresponding the claimed first network) is trained to update “student weights” according to loss function L in equation 11 which is based on segmentation result Pl and ground-truth label Yl (see equation 12) and segmentation result Pu and probability map Pl2u (see equation 6); and updating a model parameter of a decoder of the second network model by applying an exponential moving average to a model parameter of a decoder of the first network model (Fig. 2 and Section III(A) discloses that “Enc and Dec represent the student model, while EMA Enc and EMA Dec denote the self-ensembling teacher model updated as the exponential moving average (EMA) of student weights” and that “we update the teacher model’s weights ˜θt at the training step t by means of the exponential moving average (EMA) approach”; see also Equation (2)). 

    PNG
    media_image1.png
    638
    1056
    media_image1.png
    Greyscale


As to claim 2, Xu further discloses that the determining comprises: determining a foreground prototype of the first image and a background prototype of the first image based on the labeled image, the foreground prototype being a reference feature of a first region in the first image, and the background prototype being a reference feature of another region except the first region in the first image (Section III and Fig. 2 discloses determining a foreground prototype pl(fg) and a background prototype pl(bg) (See “labeled prototype” in Fig. 2) from based on the “labeled features” extracted from the labeled image Xl combined with the ground-truth segmentation label Yl; see equations 3 and 4); and determining the reference image based on the foreground prototype, the background prototype, and the second image (Section III and Fig. 2 discloses that probability map Pl2u is obtained based on the “unlabeled features” of the unlabeled image Xu and the “labeled prototype”; see equation 5).

As to claim 3, Xu further discloses that determining the foreground prototype and the background prototype comprises: acquiring a feature map of the first image, the feature map for characterizing semantic information of the first image; and determining the foreground prototype and the background prototype based on the feature map and the labeled image (Section III and Fig. 2 discloses “feature map” Fl which is “extracted by the encoder for the labeled image” Xl which necessarily carries semantic information of the image, wherein equations 3 and 4 disclose that the foreground and background prototypes are calculated based on the feature map Fl).

As to claim 4, Xu further discloses that determining the foreground and the background prototype based on the feature map and the labeled image comprises: determining, in the feature map, a first voxel feature of a first voxel with a spatial position located in the first region and a second voxel feature of a second voxel with a spatial position located in the another region based on the labeled image; calculating an average value of the first voxel feature as the foreground prototype; and calculating an average value of the second voxel feature as the background prototype (Section III and Fig. 2 discloses that the foreground and background prototypes are calculated via “masked average pooling” according to “the spatial location for each voxel” (x, y, z) and the features in the feature map Fl at that spatial location based on the ground-truth segmentation label Yl; see equations 3 and 4). 

As to claim 5, Xu further discloses that the determining the reference image based on the foreground prototype, the background prototype, and the second image comprises: acquiring a feature map of the second image, the feature map for characterizing semantic information of the second image; and determining the reference image based on the foreground prototype, the background prototype, and the feature map (Section III and Fig. 2 discloses acquiring “unlabeled feature map” Fu from teacher encoder EMA Enc which extracts the unlabeled features from the unlabeled image Xu, the unlabeled feature necessarily carrying semantic information, wherein the probability map Pl2u is calculated based on the foreground prototype pl(fg), background prototype pl(bg), and unlabeled feature map Fu; see equations 3-5). 

As to claim 6, Xu further discloses that the determining the reference image based on the foreground prototype, the background prototype, and the feature map comprises: determining, for a voxel in the second image, a voxel feature of the voxel based on the feature map; calculating a foreground similarity between the voxel feature of the voxel and the foreground prototype and a background similarity between the voxel feature of the voxel and the background prototype; determining a foreground probability that the voxel belongs to a foreground region based on the foreground similarity; determining a background probability that the voxel belongs to a background region based on the background similarity; determining a segmentation result of the voxel based on the foreground probability and the background probability; and determining the reference image based on the segmentation result of the voxel (Section III, equation 5, and Fig. 2 discloses calculating the “cosine distance” between a feature in the unlabeled feature map Fu at the spatial location (x, y, z) of a voxel in the unlabeled image Xu and the each of the foreground and background prototypes pl “to produce the probability map Pl2u” over the foreground and background classes). 

As to claim 7, Xu further discloses that updating the model parameter comprises: determining a loss value of the first network model based on the first predicted image, the labeled image, the second predicted image, and the reference image; and updating the model parameter based on the loss value to obtain the image segmentation model (Section III and Fig. 2 discloses that the CPCL framework (including the “student model” corresponding the claimed first network) is trained to update “student weights” according to loss function L in equation 11 which is based on segmentation result Pl and ground-truth label Yl (see equation 12) and segmentation result Pu and probability map Pl2u (see equation 6)).

As to claim 8, Xu further discloses that determining the loss value of the first network model comprises: determining a first loss value based on the first predicted image and the labeled image; determining a second loss value based on the second predicted image and the reference image; and determining the loss value of the first network model based on the first loss value and the second loss value (Section III and Fig. 2 discloses that the total loss function L in equation 11 is calculated based on a “supervised loss” Ls, and “forward and backward prototype consistency loss[es]” Lfpc and Lbpc, wherein the supervised loss Ls is based on segmentation result Pl and ground-truth label Yl (see equation 12) and the forward prototype consistency loss Lfpc is based on segmentation result Pu and probability map Pl2u (see equation 6)).

As to claim 9, Xu further discloses that determining the loss value of the first network model comprises: determining the loss value of the first network model based on the model parameter of the first network model, a second model parameter of the second network model, the first predicted image, the labeled image, the second predicted image, and the reference image (Section III and Fig. 2 discloses that the CPCL framework is trained according to loss function L in equation 11, wherein the loss function is determined according to “the weights of the student model and the teacher model” (θ and θ˜, respectively); see equation 1; Section III discloses that the loss function of equation 11 is further based on segmentation result Pl and ground-truth label Yl (see equation 12) and segmentation result Pu and probability map Pl2u (see equation 6)).

As to claim 10, Xu further discloses that the second model parameter is determined based on the model parameter of the first network model (Fig. 2 discloses that the weights of the “self-ensembling teacher model [are] updated as the exponential moving average (EMA) of student weights”).

As to claim 11, Xu further discloses determining a first weight (α) of a third model parameter of a third network model (θ~t-l) and a second weight (1-α) of the model parameter of the first network model (θt); and performing weighted summation on the third model parameter and the model parameter of the first network model based on the first weight and the second weight to obtain the second model parameter (θ~t) of the second network model (See equation 2 of Section III). 

As to claim 12, Xu further discloses that after acquiring the second predicted image, the image segmentation model training method further comprises: determining a first reference image of the first image based on the first image and the second predicted image, the first reference image being an image segmentation result obtained by calculating the first image (Section III and Fig. 2 shows probability map Pu2l which is a segmentation result obtained based on the “labeled features” Fl of the labeled image Xl (such features being calculated, as claimed, by Enc extracting the labeled features from the labeled image Xl) and an expanded “unlabeled prototype” that is generated based on the “unlabeled features” Fu of the unlabeled image Xu combined with the unlabeled segmentation result Pu); and wherein the determining the loss value of the first network model comprises: determining the loss value of the first network model based on the first predicted image, the labeled image, the first reference image, the second predicted image, and the reference image of the second image (Section III and Fig. 2 discloses that the CPCL framework is trained to update “student weights” according to total loss function L in equation 11 which is based on segmentation result Pl and ground-truth label Yl (see equation 12) and segmentation result Pu and probability map Pl2u (see equation 6) and the probability map Pu2l (see equation 10)).  

As to claim 13, Xu further discloses that the determining the first reference image comprises: determining a foreground prototype of the second image and a background prototype of the second image based on the second predicted image, the foreground prototype being a reference feature of a second region in the second image, and the background prototype being a reference feature of another region except the second region in the second image; and determining the first reference image based on the foreground prototype, the background prototype, and the first image (Section III and Fig. 2 discloses calculating “the foreground and background holistic prototypes for unlabeled data” Xu based on segmentation result Pu (See equations 7 and 8), wherein the foreground and background regions are necessarily mutually exclusive, wherein probability map Pu2l is calculated based on the foreground and background holistic prototypes and features Fl of the labeled image Xl (see equation 9)). 

As to claim 14, Xu further discloses that the determining the foreground prototype and the background prototype comprises: acquiring a second feature map of the second image; and determining the foreground prototype and the background prototype based on the second feature map and the second predicted image (Section III and Fig. 2 discloses acquiring feature map Fu of the unlabeled image Xu and determining the foreground and background holistic prototypes based thereon (see equation 8)).

As to claim 15, Xu further discloses that the determining the foreground prototype and the background prototype based on the second feature map of and the second predicted image comprises: determining, in the second feature map, a voxel feature of a third voxel with a spatial position located in the second region and a voxel feature of a fourth voxel with a spatial position located in the another region based on the second predicted image; and calculating an average value of the voxel feature of the third voxel as the foreground prototype; and calculating an average value of the voxel feature of the fourth voxel as the background prototype (Section III and Fig. 2 discloses that the foreground and background holistic prototypes are respectively calculated via “masked average pooling” according to the spatial location for each voxel (x, y, z) and the feature in the feature map Fu at that spatial location based on an unlabeled binary prediction mask calculated according to the predicted segmentation result Pu of the unlabeled image Xu; see equations 7 and 8).

As to claim 16, Xu further discloses that determining the first reference image based on the foreground prototype, the background prototype, and the first image comprises: acquiring a feature map of the first image; and determining the first reference image based on the foreground prototype, the background prototype, and the feature map of the first image (Section III and Fig. 2 discloses acquiring feature map Fl of the labeled image Xl and determining the probability map Pu2l based on the foreground and background holistic prototypes and the feature map Fl (see equation 9)).

As to claim 17, Xu further discloses that the determining the first reference image of the first image based on the foreground prototype, the background prototype, and the feature map of the first image comprises: determining, for a voxel in the first image, a voxel feature of the voxel based on the feature map of the first image; calculating a foreground similarity between the voxel feature of the voxel and the foreground prototype and a background similarity between the voxel feature of the voxel and the background prototype; determining a foreground probability that the voxel belongs to a foreground region based on the foreground similarity; determining a background probability that the voxel belongs to a background region based on the background similarity; determining a segmentation result of the voxel based on the foreground probability and the background probability; and determining the first reference image based on the segmentation result of the voxel in the first image (Section III, equation 9, and Fig. 2 discloses calculating the “cosine distances” between a feature in the labeled feature map Fl at the spatial location (x, y, z) of a voxel in the labeled image Xl and the each of the foreground and background holistic prototypes pu “to produce the U2L probability map Pu2l” over the foreground and background classes).

As to claim 18, Xu further discloses that the determining the loss value of the first network model comprises: determining a first loss value based on the first predicted image and the labeled image (equation 12 shows a supervised loss Ls calculated based on the predicted segmentation result Pl and the ground-truth segmentation label Yl); determining a second loss value based on the second predicted image and the reference image (equation 6 discloses forward prototype consistency loss Lfpc calculated based on predicated segmentation result Pu and probability map Pl2u); determining a third loss value based on the labeled image and the first reference image (equation 10 discloses backward prototype consistency loss Lbpc calculated based on the ground-truth segmentation label Yl and probability map Pu2l); and determining the loss value of the first network model based on the first loss value, the second loss value, and the third loss value (equation 11 discloses that the total loss is calculated based on each of the above-discussed losses). 

Independent claim 19 recites an image segmentation model training apparatus, comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code (Section IV(A)(4) of Xu discloses that the framework is “implemented in Python with PyTorch, using NVIDIA GeForce RTX 3090 GPU with 24GB memory”, wherein the Python code is necessarily stored in memory), the program code comprising code configured to cause at least one of the at least one processor to perform the steps recited in the method of independent claim 1. Accordingly, claim 19 is rejected for reasons analogous to those discussed above in conjunction with claim 1 mutatis mutandis. 

Independent claim 20 recites a non-transitory computer-readable storage medium storing computer code which, when executed by at least one processor, causes the at least one processor to (Section IV(A)(4) of Xu discloses that the framework is “implemented in Python with PyTorch, using NVIDIA GeForce RTX 3090 GPU with 24GB memory”, wherein the Python code is necessarily stored on some medium) at least perform the steps recited in the method of independent claim 1. Accordingly, claim 20 is rejected for reasons analogous to those discussed above in conjunction with claim 1 mutatis mutandis.

Claim Rejections - 35 USC § 103 (Part 2)
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-8 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over “Mutual-Prototype Adaptation for Cross-Domain Polyp Segmentation” by Yang et al. (hereinafter “Yang”) in view of “Self-supervision with Superpixels: Training Few-Shot Medical Image Segmentation Without Annotation” by Ouyang et al. (hereinafter “Ouyang”) and further in view of “Transformation-Consistent Self-Ensembling Model for Semisupervised Medical Image Segmentation” by Li et al. (hereinafter “Li”).

According to MPEP § 904.03, “The best reference should always be the one used in rejecting the claims. Sometimes the best reference will have a publication date less than a year prior to the application filing date, hence it will be open to being overcome under 37 CFR 1.130 or 1.131. In such circumstances, if a second reference exists which cannot be so overcome and which, though inferior, is an adequate basis for rejection, the claims should be additionally rejected thereon”. 
In this instance, the above Xu reference is the best prior art for rejecting the present claims. Since the Xu reference is open to being overcome under 37 CFR 1.130 or 1.131, as discussed above in the prior art rejections falling under the heading “Part 1”, the following rejection and the set of rejections that follow (i.e., rejections falling under the heading “Part 2”) are additionally made in view of the high likelihood that the Xu publication will be disqualified as a prior art reference by way of affidavit or declaration.

As to independent claim 1, Yang discloses an image segmentation model training method, performed by an electronic device (Abstract discloses that Yang is directed to “A novel self-supervised FSS framework for…medical image segmentation”; Section 4 discloses that “The network is implemented with PyTorch…on a single Nvidia RTX 2080Ti GPU, consuming 2.8 GBs of memory”), the image segmentation model training method comprising: acquiring a first image (xs), a second image (xt), and a labeled image (ys), the labeled image being an image segmentation result obtained by labeling the first image (Section III and Fig. 3 discloses that the framework is given “labeled samples” Ds = (xs, ys) and “unlabeled samples” xt; Figs. 4-5 show that the ground truth label is an image segmentation result); acquiring a first predicted image (pfs) according to a first network model (C2), the first predicted image being an image segmentation result obtained by predicting the first image (Section III and Fig. 3 discloses fine classifier C2 that obtains “fine segmentation” pfs of the first image xs); acquiring a second predicted image (pct) according to a second network model (C1), the second predicted image being an image segmentation result obtained by predicting the second image (Section III and Fig. 3 discloses coarse classifier C1 that obtains “coarse segmentation” pct of the second image xt); determining a reference image (pft) of the second image based on the second image, the reference image being an image segmentation result obtained by calculating the second image (Section III and Fig. 3 discloses determining a “fine segmentation” pft of the second image xt based on “deep features” ft of the second image xt); and updating a model parameter of the first network model based on the first predicted image (pfs), the labeled image (ys), the second predicted image (pct), and the reference image (pft) to obtain an image segmentation model (Section III and Fig. 3 discloses that the model is trained according to a total loss Ltotal in equation 15 which includes a cross entropy loss Lsseg based on the fine segmentation pfs of the first image xs and the ground-truth label ys and loss Ltseg in equation 8 based on fine segmentation pft of the second image xt which is derived from the predicted coarse segmentation pct of the second image xt; Algorithm 1 shows that the training involves updating “parameters” of the model). 
Yang does not expressly disclose that the reference image is determined based on the labeled image. 
Ouyang, like Yang, is directed to a framework for “medical image segmentation” in which training is performed on a “support set” S of “labeled images” composed of images xs and corresponding ground truth binary masks ys and a “query set” Q of unlabeled images xq using representation prototypes (Abstract and Section 3). Ouyang discloses that the binary mask support label ys is used average-pooled in order to generate foreground and background prototypes upon which the query prediction mask y^q is predicted (Section 3 and Fig. 2). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Yang to use the ground truth binary mask ys in the generation of foreground and background prototypes upon which the query prediction mask y^q is generated from query image xq, as taught by Ouyang, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have generated a “representation prototype ensemble” that “preserves more intra-class local distinctions by explicitly representing local regions into separate prototypes” (Section 3 of Ouyang).
Although each of Yang and Ouyang discloses parallel encoder-decoder networks that perform segmentation on labeled and unlabeled images, respectively (Fig. 3 of Yang and Fig. 2 of Ouyang), the proposed combination of references does not expressly disclose updating a model parameter of decoder of the second network model by applying an exponential moving average to a model parameter of a decoder of the first network model. 
	Li, like Yang and Ouyang, is directed to a “deep learning” framework for “medical image segmentation” using “labeled and unlabeled data”, wherein the framework includes parallel branches with respective encoder-decoder networks for segmenting the labeled and unlabeled images (Abstract and Fig. 2, reproduced and annotated below). Li discloses that the parallel encoder-decoder networks are “teacher and student models” which “share the same architecture”, wherein “the weights of the student model have been updated with gradient descent” and “the teacher model weights are updated as an EMA of the student weights” (Fig. 2 and Section III).

    PNG
    media_image4.png
    420
    1060
    media_image4.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Yang and Ouyang such that the parallel branch encoder-decoder networks are trained such that the teacher model weights are updated as an EMA of the student weights, as taught by Li, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have avoided “misrecognition” while constructing “better targets”, as taught by Li (Last paragraph of Section I).  

As to claim 2, the proposed combination of Yang, Ouyang and Li further teaches that the determining comprises: determining a foreground prototype of the first image and a background prototype of the first image based on the labeled image, the foreground prototype being a reference feature of a first region in the first image, and the background prototype being a reference feature of another region except the first region in the first image; and determining the reference image based on the foreground prototype, the background prototype, and the second image (Section 3 and Fig. 2 of Ouyang discloses determining foreground and background prototypes of the support image xs based on the support label ys, wherein the foreground and background classes are mutually exclusive by virtue of their binary nature, wherein the query binary mask prediction is generated using the foreground and background prototypes and the query image xq; the reasons for combining the references are the same as those discussed above in conjunction with claim 1). 

As to claim 3, the proposed combination of Yang, Ouyang and Li further teaches that determining the foreground prototype and the background prototype comprises: acquiring a feature map of the first image, the feature map for characterizing semantic information of the first image; and determining the foreground prototype and the background prototype based on the feature map and the labeled image (Section 3 and Fig. 2 of Ouyang discloses acquiring “support feature map” fθ(xs) of the support image xs, wherein the feature map necessarily includes semantic information of the support image xs since the segmentation is performed for “semantic classes”, and the foreground and background prototypes are determined according to this support feature map fθ(xs) and the support label ys; the reasons for combining the references are the same as those discussed above in conjunction with claim 1).

As to claim 4, the proposed combination of Yang, Ouyang and Li further teaches that determining the foreground prototype and the background prototype based on the feature map and the labeled image comprises: determining, in the feature map, a first voxel feature of a first voxel with a spatial position located in the first region and a second voxel feature of a second voxel with a spatial position located in the another region based on the labeled image; calculating an average value of the first voxel feature as the foreground prototype; and calculating an average value of the second voxel feature as the background prototype (Section 3 and Fig. 2 of Ouyang discloses that the foreground and background “prototypes are calculated by spatially averaging support feature maps within pooling windows” according to a threshold T that distinguishes foreground from background in the 3D feature map; the reasons for combining the references are the same as those discussed above in conjunction with claim 1).

As to claim 5, the proposed combination of Yang, Ouyang and Li further teaches that the determining the reference image based on the foreground prototype, the background prototype, and the second image comprises: acquiring a feature map of the second image, the feature map for characterizing semantic information of the second image; and determining the reference image based on the foreground prototype, the background prototype, and the feature map (Section 3 and Fig. 2 of Ouyang discloses acquiring “query feature map” fθ(xq) of the query image xq, wherein the feature map necessarily includes semantic information of the query image xq since the segmentation is performed for “semantic classes”, and the query prediction mask y^q is predicted based on the foreground and background prototypes and this query feature map fθ(xq); the reasons for combining the references are the same as those discussed above in conjunction with claim 1). 

As to claim 6, the proposed combination of Yang, Ouyang and Li further teaches that the determining the reference image based on the foreground prototype, the background prototype, and the feature map comprises: determining, for a voxel in the second image, a voxel feature of the voxel based on the feature map; calculating a foreground similarity between the voxel feature of the voxel and the foreground prototype and a background similarity between the voxel feature of the voxel and the background prototype; determining a foreground probability that the voxel belongs to a foreground region based on the foreground similarity; determining a background probability that the voxel belongs to a background region based on the background similarity; determining a segmentation result of the voxel based on the foreground probability and the background probability; and determining the reference image based on the segmentation result of the voxel (Section 3 and Fig. 2 of Ouyang further teaches that the 3D query feature map fθ(xq) and the prototype ensemble is input to a similarity operator to “compute local similarity maps” therebetween at each “spatial location”, and the query prediction mask y^q is predicted based on a softmax function (which necessarily outputs a probability) applied to the foreground and background prototypes at each spatial location; the reasons for combining the references are the same as those discussed above in conjunction with claim 1).

As to claim 7, Yang as modified above further teaches that updating the model parameter comprises: determining a loss value of the first network model based on the first predicted image, the labeled image, the second predicted image, and the reference image; and updating the model parameter based on the loss value to obtain the image segmentation model (Section III and Fig. 3 of Yang discloses that the model is trained according to a total loss Ltotal in equation 15 (corresponding to the claimed loss value) which includes a cross entropy loss Lsseg based on the fine segmentation pfs of the first image xs and the ground-truth label ys and loss Ltseg in equation 8 based on fine segmentation pft of the second image xt which is derived from the predicted coarse segmentation pct of the second image xt; Algorithm 1 shows that the training involves updating “parameters” of the model).

As to claim 8, Yang as modified above further teaches that determining the loss value of the first network model comprises: determining a first loss value based on the first predicted image and the labeled image; determining a second loss value based on the second predicted image and the reference image; and determining the loss value of the first network model based on the first loss value and the second loss value (Section III and Fig. 3 of Yang discloses that the model is trained according to a total loss Ltotal in equation 15 which includes a cross entropy loss Lsseg (corresponding to the claimed first loss value) based on the fine segmentation pfs of the first image xs and the ground-truth label ys and loss Ltseg in equation 8 (corresponding to the claimed second loss value) based on fine segmentation pft of the second image xt which is derived from the predicted coarse segmentation pct of the second image xt; Algorithm 1 shows that the training involves updating “parameters” of the model).

Independent claim 19 recites an image segmentation model training apparatus, comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code (Section 4 of Yang discloses that “The network is implemented with PyTorch…on a single Nvidia RTX 2080Ti GPU, consuming 2.8 GBs of memory”, wherein the code in PyTorch is necessarily stored in memory), the program code comprising code configured to cause at least one of the at least one processor to perform the steps recited in the method of independent claim 1. Accordingly, claim 19 is rejected for reasons analogous to those discussed above in conjunction with claim 1 mutatis mutandis. 

Independent claim 20 recites a non-transitory computer-readable storage medium storing computer code which, when executed by at least one processor, causes the at least one processor to (Section 4 of Yang discloses that “The network is implemented with PyTorch…on a single Nvidia RTX 2080Ti GPU, consuming 2.8 GBs of memory”, wherein the code in PyTorch is necessarily stored on some medium) at least perform the steps recited in the method of independent claim 1. Accordingly, claim 20 is rejected for reasons analogous to those discussed above in conjunction with claim 1 mutatis mutandis.

Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Yang in view of Ouyang and Li and further in view of “Self-supervised Mean Teacher for Semi-supervised Chest X-ray Classification” by Liu et al. (hereinafter “Liu”).

As to claim 9, Yang as modified above further teaches that determining the loss value of the first network model comprises: determining the loss value of the first network model based on the model parameter of the first network model, the first predicted image, the labeled image, the second predicted image, and the reference image (Section III and Fig. 3 of Yang discloses that the model is trained according to a total loss Ltotal in equation 15 which includes a cross entropy loss Lsseg based on the fine segmentation pfs of the first image xs and the ground-truth label ys and loss Ltseg in equation 8 based on fine segmentation pft of the second image xt which is derived from the predicted coarse segmentation pct of the second image xt; Algorithm 1 shows that the training involves updating “parameters” of the model).
The proposed combination of Yang, Ouyang and Li does not expressly disclose that determining the loss value is based on a second model parameter of the second network model.
Liu, like Yang and Ouyang, is directed to medical image analysis using semi-supervised learning of a labeled dataset and an unlabeled dataset (Abstract, Section 3, and Fig. 1). In particular, Liu discloses a “mean teacher” framework comprising a teacher model θ’ that operates on the unlabeled dataset and a student model θ that operates on the labeled dataset, wherein the “teacher model parameter is updated with exponential moving average (EMA)” of the student parameter “with θ’(t) = αθ’ (t−1) + (1−α)θ(t)” (Section 3 and Fig. 1). That is, Liu discloses that determining the loss value of the first network model comprises: determining the loss value of the first network model based on the model parameter of the first network model (θ) and a second model parameter of the second network model (θ’), wherein the model parameters are related by the EMA and α.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Yang, Ouyang and Li to substitute Yang’s feature extractor and corresponding classifiers with a mean teacher model comprising a teacher model θ’ that operates on the unlabeled dataset and a student model θ that operates on the labeled dataset such that the loss value of the respective models are related, as taught by Liu, to arrive at the claimed invention discussed above. Such a modification is the result of simple substitution of one known element for another producing a predictable result.  More specifically, Yang’s models and Liu’s perform the same general and predictable function, the predictable function being feature extraction and classification of labeled and unlabeled data. Since each individual element and its function are shown in the prior art, albeit shown in separate references, the difference between the claimed subject matter and the prior art rests not on any individual element or function but in the very combination itself - that is in the substitution of Yang’s models by replacing them with Liu’s. Thus, the simple substitution of one known element for another producing a predictable result renders the claim obvious. An ordinarily skilled artisan would have appreciated that Liu’s mean-teacher framework would have “work[ed] better in multi-label semi-supervised tasks than other SSL methods” such as the one taught by Yang (Section 1 of Liu).

As to claim 10, the proposed combination of Yang, Ouyang, Li and Liu further teaches that the second model parameter is determined based on the model parameter of the first network model (Section 3 and Fig. 1 of Liu discloses that the “teacher model parameter is updated with exponential moving average (EMA)” of the student parameter “with θ’(t) = αθ’ (t−1) + (1−α)θ(t)”; the reasons for combining the references are the same as those discussed above in conjunction with claim 9).

As to claim 11, the proposed combination of Yang, Ouyang, Li and Liu further teaches determining a first weight (α) of a third model parameter of a third network model (θ’(t-1))and a second weight (1-α) of the model parameter of the first network model (θ(t)); and performing weighted summation on the third model parameter and the model parameter of the first network model based on the first weight and the second weight to obtain the second model parameter of the second network model (Section 3 and Fig. 1 of Liu discloses that the “teacher model parameter is updated with exponential moving average (EMA)” of the student parameter “with θ’(t) = αθ’ (t−1) + (1−α)θ(t)”; the reasons for combining the references are the same as those discussed above in conjunction with claim 9).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486. The examiner can normally be reached 10 AM - 6 PM Monday through Friday, and some Saturday afternoons.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Greg Morse can be reached at (571) 272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SEAN M CONNER/Primary Examiner, Art Unit 2663
Read full office action
Prosecution Timeline

Jun 30, 2023
Application Filed
Jul 05, 2025
Non-Final Rejection — §102, §103
Oct 07, 2025
Examiner Interview Summary
Oct 07, 2025
Applicant Interview (Telephonic)
Oct 09, 2025
Response Filed
Nov 22, 2025
Final Rejection — §102, §103
Jan 23, 2026
Response after Non-Final Action
Feb 17, 2026
Request for Continued Examination
Feb 22, 2026
Response after Non-Final Action
Mar 17, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/328,597
Patent 12586374
MULTIMODAL VIDEO SUMMARIZATION
2y 5m to grant Granted Mar 24, 2026
18/632,078
Patent 12586412
USING TWO-DIMENSIONAL IMAGES AND MACHINE LEARNING TO IDENTIFY INFORMATION PERTAINING TO EYE SHAPE
2y 5m to grant Granted Mar 24, 2026
18/909,470
Patent 12585862
Training Data for Training Artificial Intelligence Agents to Automate Multimodal Software Usage
2y 5m to grant Granted Mar 24, 2026
18/009,783
Patent 12579778
Pattern Matching Device, Pattern Measuring System, Pattern Matching Program
2y 5m to grant Granted Mar 17, 2026
18/212,325
Patent 12573180
COLLECTION OF IMAGE DATA FOR USE IN TRAINING A MACHINE-LEARNING MODEL
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+27.1%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 454 resolved cases by this examiner. Grant probability derived from career allow rate.
IMAGE SEGMENTATION MODEL TRAINING METHOD AND APPARATUS, IMAGE SEGMENTATION METHOD AND APPARATUS, AND DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email