Office Action Analysis: 17662868 — APPARATUS OF MACHINE LEARNING, MACHINE LEARNING METHOD, AND INFERENCE APPARATUS

Examiner Intelligence

LAHAM BAUZO, ALVARO SALIM View full profile →
Grants only 25% of cases
Career Allowance Rate
1 granted / 4 resolved
-30.0% vs TC avg
Strong +100% interview lift
Without
With
+100.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
17 currently pending
Career history
30
Total Applications
across all art units
Statute-Specific Performance

§101
3.5%
-36.5% vs TC avg
§103
96.6%
+56.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 4 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
This Office Action is in response to the amendment filed on March 9, 2026. 
Claims 1, 6-13, 15, and 17 have been amended. 
Claim 5 has been cancelled.
No new claims have been added. 
The objections and rejections from the prior correspondence that are not restated herein are withdrawn.
No prior art was found that anticipates or renders Claim 13 obvious.

Response to Arguments
Applicant's arguments filed on March 9, 2026 have been fully considered.
Applicant’s arguments regarding the 35 U.S.C. 103 rejections of the previous office action have been fully considered but are not persuasive. Applicant argues:
“Applicant respectfully submits that the Zhang reference fails to disclose processing circuitry configured to train, by using a first calibration model that receives, as input, first processing data and a first processing: label assigned by a first user to the first processing data, and outputs calibration data relating to calibration of individual characteristics in label assignment by the first user. based on first input training data, a first training label assigned by the first user to the first input training data, and calibration training data; and train, by using the first calibration model, a target model based on at least the first processing data and the calibration data or a calibration label which is a label calibrated in the individual characteristics included in the first processing label using the calibration data, as recited in amended Claim 1.”
Examiner respectfully disagrees. ZHANG teaches discloses all the limitation elements above. Specifically, ZHANG teaches:
An apparatus of machine learning comprising: processing circuitry configured to (ZHANG [page 21, section B Implementation details] teaches: "All of the models were trained on a NVIDIA RTX 208".)
train, by using a first calibration model that receives, as input, first processing data and a first processing label assigned by a first user to the first processing data (ZHANG [page 1, Abstract] teaches: "In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs." ZHANG [page 4, Figure 1] teaches: "An architecture schematic in the presence of 3 annotators of varying characteristics (over-segmentation, under-segmentation and confusing between two classes, red and blue). The model consists of two parts: (1) segmentation network parametrised by                     
                        θ
                    
                 that generates an estimate of the unobserved true segmentation probabilities,                     
                        
                                p
                            
                                θ
                            
                                x
                            
                ; (2) annotator network, parametrised by                     
                        ϕ
                    
                , that estimates the pixelwise confusion matrices (CMs),                     
                        
                                                A
                                            
                                                ϕ
                                            
                                                r
                                            
                                                x
                                            
                                r
                                =
                                1
                            
                                3
                            
                 of the annotators for the given input image x.” ZHANG [page 5, Section 3.3] teaches: "Given training                     
                        X
                        =
                        
                                                x
                                            
                                                n
                                            
                                n
                                =
                                1
                            
                                N
                            
                 and noisy labels                    
                         
                                        Y
                                    
                                    ~
                                
                                (
                                r
                                )
                            
                        =
                        
                                                        y
                                                    
                                                    ~
                                                
                                                n
                                            
                                                        r
                                                    
                                        :
                                        r
                                        ∈
                                        S
                                        
                                                        x
                                                    
                                                        n
                                                    
                                n
                                =
                                1
                            
                                N
                            
                 for                     
                        r
                        =
                        1
                        ,
                        …
                        ,
                        R
                    
                , we optimize the parameters                     
                        {
                        θ
                        ,
                         
                        ϕ
                        }
                    
                 by minimizing the negative log-likelihood (NLL),                     
                        -
                        l
                        o
                        g
                        p
                        
                                                Y
                                            
                                            ~
                                        
                                                1
                                            
                                ,
                                …
                                ,
                                 
                                                Y
                                            
                                            ~
                                        
                                                R
                                            
                                X
                            
                . From eqs. (1) and (2), this optimization objective equates to the sum of cross-entropy losses between the observed noisy segmentations and the estimated annotator label distributions:

    PNG
    media_image1.png
    113
    1114
    media_image1.png
    Greyscale

ZHANG [page 2, Section 1 Introduction] teaches: "In this work, we introduce the first instance of an end-to-end supervised segmentation method that jointly estimates, from noisy labels alone, the reliability of multiple human annotators (i.e., first processing label assigned by a first user) and true segmentation labels. The proposed architecture (Fig. 1) consists of two coupled CNNs where one estimates the true segmentation probabilities and the other models the characteristics of individual annotators (e.g., tendency to over-segmentation, mix-up between different classes, etc.) by estimating the pixel-wise confusion matrices (CMs) on a per image basis." Examiner's note: under BRI, configured to train [...] can be interpreted as the training, where the joint network parameters are optimized via a loss function and stochastic gradient descent. The target model can be interpreted as the segmentation network parametrized by             
                θ
            
         that generates an estimate of the unobserved true segmentation probabilities. A first calibration model can be reasonably interpreted as the "annotator network, parametrized by             
                ϕ
            
        " which outputs the confusion matrices. Additionally, that receives, as input, first processing data and a first processing label assigned by a first user to the first processing data can be reasonably interpreted as the annotator network taking as input image             
                
                        x
                    
                        n
                    
         (i.e., first processing data) to produce confusion matrices. The training process also uses noisy segmentation labels (i.e., first processing label) provided by the annotator for an image             
                
                        x
                    
                        n
                    
         in the loss function (ZHANG [Eq. 4 and Figure 1]).)
and outputs calibration data relating to calibration of individual characteristics in label assignment by the first user, based on first input training data, a first training label assigned by the first user to the first input training data, and calibration training data; (Examiner’s note: Under broadest reasonable interpretation, outputs calibration data relating to calibration of individual characteristics in label assignment by the first user can be interpreted as ZHANG’s confusion matrices (CMs), which model the characteristics of individual annotators (e.g., tendency to over-segmentation, mix-up between different classes, etc.) (see ZHANG [page 2, section 1. Introduction]). Moreover, based on first input training data / and calibration data can be interpreted as the input image                     
                        
                                x
                            
                                n
                            
                 used by the segmentation network, as shown in ZHANG [page 4, Figure 1], to estimate the true label distribution that is multiplied by the confusion matrices, and based on […] a first training label assigned by the first user to the first input training data can be interpreted as the noisy labels                     
                        
                                        Y
                                    
                                    ~
                                
                                (
                                r
                                )
                            
                        =
                        
                                                        y
                                                    
                                                    ~
                                                
                                                n
                                            
                                                        r
                                                    
                                        :
                                        r
                                        ∈
                                        S
                                        
                                                        x
                                                    
                                                        n
                                                    
                                n
                                =
                                1
                            
                                N
                            
                 provided by human annotators. Furthermore, as per the claim objection, first input training data and calibration training data refer to the same training data used for outputting the calibration data (i.e., confusion matrices).)
train, by using the first calibration model, a target model based on at least the first processing data and the calibration data or a calibration label which is a label calibrated in the individual characteristics included in the first processing label using the calibration data (ZHANG [page 5, section 3.3] teaches the loss function used to minimize the loss (i.e., train) the annotator network (i.e., by using the first calibration model) and segmentation network (i.e., a target model). Under broadest reasonable interpretation, training the target model based on at least the calibration data can be interpreted as ZHANG’s confusion matrices (CMs)                     
                        
                                        A
                                    
                                    ^
                                
                                ϕ
                            
                                        r
                                    
                                        x
                                    
                                        n
                                    
                , which model the individual characteristics of each annotator (e.g., tendency to over-segmentation, mix-up between different classes, etc.), and based on at least the first processing data can be interpreted as the input image                     
                        
                                x
                            
                                n
                            
                .)
Applicant further argues:
“Applicant respectfully submits that the Zhang reference fails to disclose training a target model in which the calibration label, for example, is used as ground truth data. Rather, the Zhang reference discloses that the annotation CNN and the segmentation CNN are trained through an end-to-end supervised learning process, wherein both CNN models are concurrently trained so as to minimize a value of a loss function. 
Thus, the Zhang reference clearly fails to disclose training the target model after the calibration model has been trained, as required by amended Claim 1.
Moreover, Zhang discloses training both models simultaneously so as to minimize the loss function, and therefore the segmentation CNN of Zhang is not trained by supervised learning using calibrated labels as ground truth, as is the target model recited in Claim 1.”
Examiner respectfully disagrees. The argument refers to claim limitation train the target model based on the first processing data and the calibration data based on the generated calibration label or the calibration parameter and the first processing label, which is taught by the new reference VEIT, as shown in the 103 rejections below. Additionally, under BRI claim 1 does not require that the calibration model is fully trained prior to training the target model. Claim 1 recites:
train, by using a first calibration model that receives, as input, first processing data and a first processing label assigned by a first user to the first processing data, and outputs calibration data relating to calibration of individual characteristics in label assignment by the first user, based on first input training data, a first training label assigned by the first user to the first input training data, and calibration training data;
However, it is unclear what is being trained, and therefore the claim is rendered indefinite under 35 USC 112(b). For purposes of examination, the examiner will construe the limitation (1) above as training [a target model] by using a calibration model. 
The next limitation of claim 1 recites:
train, by using the first calibration model, a target model based on at least the first processing data and the calibration data or a calibration label which is a label calibrated in the individual characteristics included in the first processing label using the calibration data
However, the claim language in limitation (1) or limitation (2) above do not restrict the order in which the training is executed. Even if limitation (1) recited “training a first calibration model that receives […]”, the broadest reasonable interpretation would cover jointly training the calibration model and the target model, as disclosed in both ZHANG and VEIT references.
Applicant further argues:
“Further, the Zhang reference discloses that the annotation CNN outputs a confusion matrix A from the input image, but does not disclose a calibration model that outputs calibration parameters from the real data and the assigned labels (corresponding to Y in Zhang), which is a clear difference from amended Claim 1.
Further, Zhang discloses computing an estimated annotator distribution P by multiplying the confusion matrix A output from the annotation CNN by the estimated true label P output from the segmentation CNN. On the contrary, the invention recited in Claim 1 does not require such computation of pseudo-assigned labels corresponding to the estimated annotator distribution P of Zhang.”
ZHANG is not relied upon to teach the limitations of claim 1 regarding a calibration model that outputs calibration parameters from the real data and the assigned labels, but VEIT teaches these limitations, as shown in the 103 rejections below.
Applicant’s arguments regarding the 35 U.S.C. 101 rejections of the previous office action have been fully considered and are persuasive. The 35 U.S.C. 101 rejections are withdrawn.

Claim Objections
Claim 1 is objected to because, according to page 11, lines 18-20, the specification suggests that "first input training data" appears to be "data for use in a label assignment is an MR image in which a measurement voxel of a magnetic resonance imaging apparatus is set"; claim 8 also supports this interpretation. Additionally, page 15, lines 10-19 state that "The trial data is the calibration training data assigned a label for use in machine learning of the calibration model. The following description assumes that the trial data is an MR image which may be generated by actually imaging a patient or phantom with a magnetic resonance imaging apparatus, or may be a pseudo-MR image artificially generated through image processing or prediction calculation." Therefore, under broadest reasonable interpretation, there is no substantial difference between "calibration training data" and "first input training data" because the specification states that they both can be an MR image to which a label is assigned. For purposes of examination, "first input training data" and "calibration training data" will be treated as equivalent terms.

Claim 17 is objected to for the same reasons described in the objection to claim 1.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-4 and 6-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding Claim 1, the claim recites a first "train […] based on first input training data, a first training label assigned by the first user to the first input training data, and calibration training data", but it is not clear what is actually being trained. The second "train" recited in claim 1 clearly indicates that "a target model" is being trained "based on at least the first processing data and the calibration data or a calibration label which is a label calibrated in the individual characteristics included in the first processing label using the calibration data." For the above stated reasons, the claim is indefinite under 112(b). For purposes of examination, the first "train" recited in claim 1 will be construed is "train […] a target model".

Regarding Claims 2-4 and 6-16, the dependent claims inherit the deficiencies of their respective parent claims and are likewise rejected.

Regarding Claim 17, the claim has similar issues as Claim 1, and thus rejected under 112(b) for similar reasons and using similar rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-7, 9-12, 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over ZHANG (“Disentangling Human Error from the Ground Truth in Segmentation of Medical Images”) in view of VEIT ("Learning From Noisy Large-Scale Datasets With Minimal Supervision"), hereafter ZHANG and VEIT respectively.

Regarding Claim 1:
ZHANG teaches:
An apparatus of machine learning comprising: processing circuitry configured to (ZHANG [page 21, section B Implementation details] teaches: "All of the models were trained on a NVIDIA RTX 208".)
train, by using a first calibration model that receives, as input, first processing data and a first processing label assigned by a first user to the first processing data (ZHANG [page 1, Abstract] teaches: "In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs." ZHANG [page 4, Figure 1] teaches: "An architecture schematic in the presence of 3 annotators of varying characteristics (over-segmentation, under-segmentation and confusing between two classes, red and blue). The model consists of two parts: (1) segmentation network parametrised by                                 
                                    θ
                                
                             that generates an estimate of the unobserved true segmentation probabilities,                                 
                                    
                                            p
                                        
                                            θ
                                        
                                            x
                                        
                            ; (2) annotator network, parametrised by                                 
                                    ϕ
                                
                            , that estimates the pixelwise confusion matrices (CMs),                                 
                                    
                                                            A
                                                        
                                                            ϕ
                                                        
                                                            r
                                                        
                                                            x
                                                        
                                            r
                                            =
                                            1
                                        
                                            3
                                        
                             of the annotators for the given input image x.” ZHANG [page 5, Section 3.3] teaches: "Given training                                 
                                    X
                                    =
                                    
                                                            x
                                                        
                                                            n
                                                        
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                             and noisy labels                                
                                     
                                                    Y
                                                
                                                ~
                                            
                                            (
                                            r
                                            )
                                        
                                    =
                                    
                                                                    y
                                                                
                                                                ~
                                                            
                                                            n
                                                        
                                                                    r
                                                                
                                                    :
                                                    r
                                                    ∈
                                                    S
                                                    
                                                                    x
                                                                
                                                                    n
                                                                
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                             for                                 
                                    r
                                    =
                                    1
                                    ,
                                    …
                                    ,
                                    R
                                
                            , we optimize the parameters                                 
                                    {
                                    θ
                                    ,
                                     
                                    ϕ
                                    }
                                
                             by minimizing the negative log-likelihood (NLL),                                 
                                    -
                                    l
                                    o
                                    g
                                    p
                                    
                                                            Y
                                                        
                                                        ~
                                                    
                                                            1
                                                        
                                            ,
                                            …
                                            ,
                                             
                                                            Y
                                                        
                                                        ~
                                                    
                                                            R
                                                        
                                            X
                                        
                            . From eqs. (1) and (2), this optimization objective equates to the sum of cross-entropy losses between the observed noisy segmentations and the estimated annotator label distributions:

    PNG
    media_image1.png
    113
    1114
    media_image1.png
    Greyscale

ZHANG [page 2, Section 1 Introduction] teaches: "In this work, we introduce the first instance of an end-to-end supervised segmentation method that jointly estimates, from noisy labels alone, the reliability of multiple human annotators (i.e., first processing label assigned by a first user) and true segmentation labels. The proposed architecture (Fig. 1) consists of two coupled CNNs where one estimates the true segmentation probabilities and the other models the characteristics of individual annotators (e.g., tendency to over-segmentation, mix-up between different classes, etc.) by estimating the pixel-wise confusion matrices (CMs) on a per image basis." Examiner's note: under BRI, configured to train [...] can be interpreted as the training, where the joint network parameters are optimized via a loss function and stochastic gradient descent. The target model can be interpreted as the segmentation network parametrized by                         
                            θ
                        
                     that generates an estimate of the unobserved true segmentation probabilities. A first calibration model can be reasonably interpreted as the "annotator network, parametrized by                         
                            ϕ
                        
                    " which outputs the confusion matrices. Additionally, that receives, as input, first processing data and a first processing label assigned by a first user to the first processing data can be reasonably interpreted as the annotator network taking as input image                         
                            
                                    x
                                
                                    n
                                
                     (i.e., first processing data) to produce confusion matrices. The training process also uses noisy segmentation labels (i.e., first processing label) provided by the annotator for an image                         
                            
                                    x
                                
                                    n
                                
                     in the loss function (ZHANG [Eq. 4 and Figure 1]).)
and outputs calibration data relating to calibration of individual characteristics in label assignment by the first user, based on first input training data, a first training label assigned by the first user to the first input training data, and calibration training data; (Examiner’s note: Under broadest reasonable interpretation, outputs calibration data relating to calibration of individual characteristics in label assignment by the first user can be interpreted as ZHANG’s confusion matrices (CMs), which model the characteristics of individual annotators (e.g., tendency to over-segmentation, mix-up between different classes, etc.) (see ZHANG [page 2, section 1. Introduction]). Moreover, based on first input training data / and calibration data can be interpreted as the input image                                 
                                    
                                            x
                                        
                                            n
                                        
                             used by the segmentation network, as shown in ZHANG [page 4, Figure 1], to estimate the true label distribution that is multiplied by the confusion matrices, and based on […] a first training label assigned by the first user to the first input training data can be interpreted as the noisy labels                                 
                                    
                                                    Y
                                                
                                                ~
                                            
                                            (
                                            r
                                            )
                                        
                                    =
                                    
                                                                    y
                                                                
                                                                ~
                                                            
                                                            n
                                                        
                                                                    r
                                                                
                                                    :
                                                    r
                                                    ∈
                                                    S
                                                    
                                                                    x
                                                                
                                                                    n
                                                                
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                             provided by human annotators. Furthermore, as per the claim objection, first input training data and calibration training data refer to the same training data used for outputting the calibration data (i.e., confusion matrices).)
train, by using the first calibration model, a target model based on at least the first processing data and the calibration data or a calibration label which is a label calibrated in the individual characteristics included in the first processing label using the calibration data (ZHANG [page 5, section 3.3] teaches the loss function used to minimize the loss (i.e., train) the annotator network (i.e., by using the first calibration model) and segmentation network (i.e., a target model). Under broadest reasonable interpretation, training the target model based on at least the calibration data can be interpreted as ZHANG’s confusion matrices (CMs)                                 
                                    
                                                    A
                                                
                                                ^
                                            
                                            ϕ
                                        
                                                    r
                                                
                                                    x
                                                
                                                    n
                                                
                            , which model the individual characteristics of each annotator (e.g., tendency to over-segmentation, mix-up between different classes, etc.), and based on at least the first processing data can be interpreted as the input image                                 
                                    
                                            x
                                        
                                            n
                                        
                            .)
ZHANG is not relied upon for teaching:
wherein the processing circuitry is further configured to generate, as the calibration data, a calibration parameter with respect to the calibration label or the first processing label by inputting the first processing data and the first processing label to the first calibration model; and
train the target model based on the first processing data and the calibration data based on the generated calibration label or the calibration parameter and the first processing label.
However, VEIT teaches: wherein the processing circuitry is further configured to generate, as the calibration data, a calibration parameter with respect to the calibration label or the first processing label by inputting the first processing data and the first processing label to the first calibration model;  (VEIT [page 844, section 4.4. Training Details] teaches: “We trained the baseline model on 50 NVIDIA K40 GPUs using the noisy labels from the Open Images training set. We stopped training after 49 million mini-batches (with 32 images each). This network is the starting point for all model variants.” VEIT [page 840, section 1. Introduction] teaches: "We propose an alternative approach: instead of using the small clean dataset to learn visual representations directly, we use it to learn a mapping between noisy and clean annotations. We argue that this mapping not only learns the patterns of noise, but it also captures the structure in the label space. The learned mapping between noisy and clean annotations allows to clean the noisy dataset and fine-tune the network using both the clean and the full dataset with reduced noise." VEIT [page 841, section 3. Our Approach] teaches: "Formally, we have a very large training dataset                                 
                                    T
                                
                             comprising tuples of noisy labels                                 
                                    y
                                
                             and images                                 
                                    I
                                    ,
                                     
                                    T
                                    =
                                    
                                                            y
                                                        
                                                            i
                                                        
                                                    ,
                                                     
                                                            I
                                                        
                                                            i
                                                        
                                            ,
                                            …
                                        
                            , and a small dataset                                 
                                    V
                                
                             of triplets of verified labels                                 
                                    v
                                
                            , noisy labels                                 
                                    y
                                
                             and images                                 
                                    I
                                    ,
                                     
                                    V
                                    =
                                    
                                                            v
                                                        
                                                            i
                                                        
                                                    ,
                                                    
                                                            y
                                                        
                                                            i
                                                        
                                                    ,
                                                     
                                                            I
                                                        
                                                            i
                                                        
                                            ,
                                            …
                                        
                            .” VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: "The first classifier is a label cleaning network denoted as                                 
                                    g
                                
                             that models the structure in the label space and learns a mapping from the noisy labels                                 
                                    y
                                
                             to the human verified labels                                 
                                    v
                                
                            , conditional on the input image. We denote the cleaned labels output by                                 
                                    g
                                
                             as                                  
                                    
                                            c
                                        
                                        ^
                                    
                             so that                                  
                                    
                                            c
                                        
                                        ^
                                    
                                    =
                                    g
                                    (
                                    y
                                    ,
                                    I
                                    )
                                
                            . [...] In order to model the label structure and noise conditional on the image, the network has two separate inputs, the noisy labels                                 
                                    y
                                
                             as well as the visual features                                 
                                    f
                                    
                                            I
                                        
                            . […] Denoting the residual cleaning module as                                 
                                    
                                            g
                                        
                                            '
                                        
                            , the label cleaning network                                 
                                    g
                                
                             computes cleaned labels
                
                            c
                        
                        ^
                    
                    =
                    c
                    l
                    i
                    p
                    (
                    y
                    +
                    
                            g
                        
                            '
                        
                            y
                            ,
                            f
                            
                                    I
                                
                    ,
                     
                            0,1
                        
Examiner's note: Under broadest reasonable interpretation, the first calibration model can be interpreted as the label cleaning network g, which learns the mapping between the noisy annotations (i.e., with respect to the […] first processing label) and clean annotations. The label cleaning network takes as input the noisy labels and images (i.e., by inputting the first processing data and the first processing label to the calibration model) to learn the mapping. Additionally, generate […] a calibration parameter can be interpreted as the output of the residual cleaning module                         
                            
                                    g
                                
                                    '
                                
                    , which computes the residual (i.e., difference) between the noisy annotations and clean annotations, for generating the cleaned label                         
                            
                                    c
                                
                                ^
                            
                    .)
and train the target model based on the first processing data and the calibration data based on the generated calibration label or the calibration parameter and the first processing label. (VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: “The second classifier is an image classifier denoted as                                 
                                    h
                                
                             that learns to annotate images by imitating the first classifier                                 
                                    g
                                
                             by using                                 
                                    g
                                
                            ’s predictions as ground truth targets. We denote the predicted labels output by                                 
                                    h
                                
                             as                                  
                                    
                                            p
                                        
                                        ^
                                    
                             so that                                  
                                    
                                            p
                                        
                                        ^
                                    
                                    =
                                    h
                                    
                                            I
                                        
                             (i.e., train the target model based on the first processing data).” VEIT [page 842, section 3.2. Model Training] teaches: “For the image classifier (i.e., the target model), the supervision (i.e., train) depends on the source of the training sample. For all samples                                 
                                    j
                                
                             from the noisy dataset                                 
                                    T
                                
                            , the classifier is supervised by the cleaned labels                                 
                                    
                                                    c
                                                
                                                ^
                                            
                                            j
                                        
                             produced by the label cleaning network (i.e., based on […] the calibration data based on the generated calibration label or the calibration parameter).” Examiner’s note: Under broadest reasonable interpretation, first processing data can be interpreted as the image                                 
                                    I
                                
                             in each tuple sample                                 
                                    j
                                
                             from dataset                                 
                                    T
                                
                            . The label cleaning network takes as input the noisy labels and images (i.e., by inputting the first processing data and the first processing label to the calibration model) to learn the mapping. Additionally, generate […] a calibration parameter can be interpreted as the output of the residual cleaning module                                 
                                    
                                            g
                                        
                                            '
                                        
                            , which computes the residual (i.e., difference) between the noisy annotations and clean annotations, for generating the cleaned label                                 
                                    
                                            c
                                        
                                        ^
                                    
                            .)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of ZHANG and VEIT before them, to include VEIT’s label cleaning network in ZHANG’s segmentation method. One would have been motivated to make such a combination in order to use clean labels to reduce noise in the large dataset before fine-tuning the network (VEIT [page 846, section 5. Conclusion]).

Regarding Claim 2:
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. VEIT further teaches:
wherein the processing circuitry is further configured to: generate, as the calibration data, the calibration parameter with respect to the first processing label by inputting the first processing data and the first processing label to the first calibration model; (VEIT [page 844, section 4.4. Training Details] teaches: “We trained the baseline model on 50 NVIDIA K40 GPUs using the noisy labels from the Open Images training set. We stopped training after 49 million mini-batches (with 32 images each). This network is the starting point for all model variants.” VEIT [page 840, section 1. Introduction] teaches: "We propose an alternative approach: instead of using the small clean dataset to learn visual representations directly, we use it to learn a mapping between noisy and clean annotations. We argue that this mapping not only learns the patterns of noise, but it also captures the structure in the label space. The learned mapping between noisy and clean annotations allows to clean the noisy dataset and fine-tune the network using both the clean and the full dataset with reduced noise." VEIT [page 841, section 3. Our Approach] teaches: "Formally, we have a very large training dataset                                 
                                    T
                                
                             comprising tuples of noisy labels                                 
                                    y
                                
                             and images                                 
                                    I
                                    ,
                                     
                                    T
                                    =
                                    
                                                            y
                                                        
                                                            i
                                                        
                                                    ,
                                                     
                                                            I
                                                        
                                                            i
                                                        
                                            ,
                                            …
                                        
                            , and a small dataset                                 
                                    V
                                
                             of triplets of verified labels                                 
                                    v
                                
                            , noisy labels                                 
                                    y
                                
                             and images                                 
                                    I
                                    ,
                                     
                                    V
                                    =
                                    
                                                            v
                                                        
                                                            i
                                                        
                                                    ,
                                                    
                                                            y
                                                        
                                                            i
                                                        
                                                    ,
                                                     
                                                            I
                                                        
                                                            i
                                                        
                                            ,
                                            …
                                        
                            .” VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: "The first classifier is a label cleaning network denoted as                                 
                                    g
                                
                             that models the structure in the label space and learns a mapping from the noisy labels                                 
                                    y
                                
                             to the human verified labels                                 
                                    v
                                
                            , conditional on the input image. We denote the cleaned labels output by                                 
                                    g
                                
                             as                                  
                                    
                                            c
                                        
                                        ^
                                    
                             so that                                  
                                    
                                            c
                                        
                                        ^
                                    
                                    =
                                    g
                                    (
                                    y
                                    ,
                                    I
                                    )
                                
                            . [...] In order to model the label structure and noise conditional on the image, the network has two separate inputs, the noisy labels                                 
                                    y
                                
                             as well as the visual features                                 
                                    f
                                    
                                            I
                                        
                            . […] Denoting the residual cleaning module as                                 
                                    
                                            g
                                        
                                            '
                                        
                            , the label cleaning network                                 
                                    g
                                
                             computes cleaned labels
                
                            c
                        
                        ^
                    
                    =
                    c
                    l
                    i
                    p
                    (
                    y
                    +
                    
                            g
                        
                            '
                        
                            y
                            ,
                            f
                            
                                    I
                                
                    ,
                     
                            0,1
                        
Examiner's note: Under broadest reasonable interpretation, the first calibration model can be interpreted as the label cleaning network g, which learns the mapping between the noisy annotations (i.e., with respect to the […] first processing label) and clean annotations. The label cleaning network takes as input the noisy labels and images (i.e., by inputting the first processing data and the first processing label to the calibration model) to learn the mapping. Additionally, generate […] a calibration parameter can be interpreted as the output of the residual cleaning module                         
                            
                                    g
                                
                                    '
                                
                    , which computes the residual (i.e., difference) between the noisy annotations and clean annotations, for generating the cleaned label                         
                            
                                    c
                                
                                ^
                            
                    .)
VEIT further teaches: generate the calibration label by applying the calibration parameter to the first processing label; (VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: " Denoting the residual cleaning module as                                 
                                    
                                            g
                                        
                                            '
                                        
                            , the label cleaning network                                 
                                    g
                                
                             computes cleaned labels
                
                            c
                        
                        ^
                    
                    =
                    c
                    l
                    i
                    p
                    (
                    y
                    +
                    
                            g
                        
                            '
                        
                            y
                            ,
                            f
                            
                                    I
                                
                    ,
                     
                            0,1
                        
Examiner’s note: Under broadest reasonable interpretation, generate […] the calibration label can be interpreted as                         
                            
                                    c
                                
                                ^
                            
                    , which results from applying the output of the residual cleaning module                         
                            
                                    g
                                
                                    '
                                
                     to the noisy label                         
                            y
                        
                     (i.e., first processing label). The residual cleaning module computes the residual (i.e., difference) between the noisy annotations and clean annotations, for generating the cleaned label                         
                            
                                    c
                                
                                ^
                            
                    .)
train the target model based on the first processing data and the first calibration label. (VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: “The second classifier is an image classifier denoted as                                 
                                    h
                                
                             that learns to annotate images by imitating the first classifier                                 
                                    g
                                
                             by using                                 
                                    g
                                
                            ’s predictions as ground truth targets. We denote the predicted labels output by                                 
                                    h
                                
                             as                                  
                                    
                                            p
                                        
                                        ^
                                    
                             so that                                  
                                    
                                            p
                                        
                                        ^
                                    
                                    =
                                    h
                                    
                                            I
                                        
                             (i.e., train the target model based on the first processing data).” VEIT [page 842, section 3.2. Model Training] teaches: “For the image classifier (i.e., the target model), the supervision (i.e., train) depends on the source of the training sample. For all samples                                 
                                    j
                                
                             from the noisy dataset                                 
                                    T
                                
                            , the classifier is supervised by the cleaned labels                                 
                                    
                                                    c
                                                
                                                ^
                                            
                                            j
                                        
                             produced by the label cleaning network (i.e., based on […] the first calibration label).” Examiner’s note: Under broadest reasonable interpretation, first processing data can be interpreted as the image                                 
                                    I
                                
                             in each tuple sample                                 
                                    j
                                
                             from dataset                                 
                                    T
                                
                            . The label cleaning network takes as input the noisy labels and images (i.e., by inputting the first processing data and the first processing label to the calibration model) to learn the mapping. Additionally, generate […] a calibration parameter can be interpreted as the output of the residual cleaning module                                 
                                    
                                            g
                                        
                                            '
                                        
                            , which computes the residual (i.e., difference) between the noisy annotations and clean annotations, for generating the cleaned label                                 
                                    
                                            c
                                        
                                        ^
                                    
                            .)

Regarding Claim 3:
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. VEIT further teaches:
wherein the processing circuitry is further configured to: generate, as the calibration data, the calibration label by inputting the first processing data and the first processing label to the first calibration model; (VEIT [page 844, section 4.4. Training Details] teaches: “We trained the baseline model on 50 NVIDIA K40 GPUs (i.e., processing circuitry) using the noisy labels from the Open Images training set. We stopped training after 49 million mini-batches (with 32 images each). This network is the starting point for all model variants.” VEIT [page 840, section 1. Introduction] teaches: "We propose an alternative approach: instead of using the small clean dataset to learn visual representations directly, we use it to learn a mapping between noisy and clean annotations. We argue that this mapping not only learns the patterns of noise, but it also captures the structure in the label space. The learned mapping between noisy and clean annotations allows to clean the noisy dataset and fine-tune the network using both the clean and the full dataset with reduced noise." VEIT [page 841, section 3. Our Approach] teaches: "Formally, we have a very large training dataset                                 
                                    T
                                
                             comprising tuples of noisy labels                                 
                                    y
                                
                             and images                                 
                                    I
                                    ,
                                     
                                    T
                                    =
                                    
                                                            y
                                                        
                                                            i
                                                        
                                                    ,
                                                     
                                                            I
                                                        
                                                            i
                                                        
                                            ,
                                            …
                                        
                            , and a small dataset                                 
                                    V
                                
                             of triplets of verified labels                                 
                                    v
                                
                            , noisy labels                                 
                                    y
                                
                             and images                                 
                                    I
                                    ,
                                     
                                    V
                                    =
                                    
                                                            v
                                                        
                                                            i
                                                        
                                                    ,
                                                    
                                                            y
                                                        
                                                            i
                                                        
                                                    ,
                                                     
                                                            I
                                                        
                                                            i
                                                        
                                            ,
                                            …
                                        
                            .” VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: "The first classifier is a label cleaning network denoted as                                 
                                    g
                                
                             that models the structure in the label space and learns a mapping from the noisy labels                                 
                                    y
                                
                             to the human verified labels                                 
                                    v
                                
                            , conditional on the input image. We denote the cleaned labels output by                                 
                                    g
                                
                             as                                  
                                    
                                            c
                                        
                                        ^
                                    
                             so that                                  
                                    
                                            c
                                        
                                        ^
                                    
                                    =
                                    g
                                    (
                                    y
                                    ,
                                    I
                                    )
                                
                            . [...] In order to model the label structure and noise conditional on the image, the network has two separate inputs, the noisy labels                                 
                                    y
                                
                             as well as the visual features                                 
                                    f
                                    
                                            I
                                        
                            . […] Denoting the residual cleaning module as                                 
                                    
                                            g
                                        
                                            '
                                        
                            , the label cleaning network                                 
                                    g
                                
                             computes cleaned labels
                
                            c
                        
                        ^
                    
                    =
                    c
                    l
                    i
                    p
                    (
                    y
                    +
                    
                            g
                        
                            '
                        
                            y
                            ,
                            f
                            
                                    I
                                
                    ,
                     
                            0,1
                        
Examiner's note: Under broadest reasonable interpretation, the first calibration model can be interpreted as the label cleaning network g, which learns the mapping between the noisy annotations (i.e., with respect to the […] first processing label) and clean annotations. The label cleaning network takes as input the noisy labels and images (i.e., by inputting the first processing data and the first processing label to the calibration model) to learn the mapping. Additionally, generate […] the calibration label can be interpreted as                         
                            
                                    c
                                
                                ^
                            
                    , which result by applying the output of the residual cleaning module                         
                            
                                    g
                                
                                    '
                                
                     to the noisy label                         
                            y
                        
                     (i.e., first processing label). The residual cleaning module computes the residual (i.e., difference) between the noisy annotations and clean annotations, for generating the cleaned label                         
                            
                                    c
                                
                                ^
                            
                    .)
train the target model based on the first processing data and the generated calibration label. (VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: “The second classifier is an image classifier denoted as                                 
                                    h
                                
                             that learns to annotate images by imitating the first classifier                                 
                                    g
                                
                             by using                                 
                                    g
                                
                            ’s predictions as ground truth targets. We denote the predicted labels output by                                 
                                    h
                                
                             as                                  
                                    
                                            p
                                        
                                        ^
                                    
                             so that                                  
                                    
                                            p
                                        
                                        ^
                                    
                                    =
                                    h
                                    
                                            I
                                        
                             (i.e., train the target model based on the first processing data).” VEIT [page 842, section 3.2. Model Training] teaches: “For the image classifier (i.e., the target model), the supervision (i.e., train) depends on the source of the training sample. For all samples                                 
                                    j
                                
                             from the noisy dataset                                 
                                    T
                                
                            , the classifier is supervised by the cleaned labels                                 
                                    
                                                    c
                                                
                                                ^
                                            
                                            j
                                        
                             produced by the label cleaning network (i.e., based on […] the generated calibration label).” Examiner’s note: Under broadest reasonable interpretation, first processing data can be interpreted as the image                                 
                                    I
                                
                             in each tuple sample                                 
                                    j
                                
                             from dataset                                 
                                    T
                                
                            . The label cleaning network takes as input the noisy labels and images to learn the mapping for generating the cleaned label                                 
                                    
                                            c
                                        
                                        ^
                                    
                            .)

Regarding Claim 4:
 	ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. Further, ZHANG teaches:
wherein the processing circuitry is further configured to: output, as the calibration data, a reliability with respect to the first processing label by inputting the first processing data and the first processing label to the first calibration model; (ZHANG [page 5, section 3.3] teaches: "[…] and                                 
                                    t
                                    r
                                    (
                                    A
                                    )
                                
                             denotes the trace of matrix A. The mean trace represents the average probability that a randomly selected annotator provides an accurate label (i.e. reliability)." Additionally, ZHANG [page 4, Figure 1] teaches a term of the loss (i.e., parameter of a loss function)
                
                    [
                    …
                    ]
                    +
                    
                            ∑
                            
                                r
                                =
                                1
                            
                                3
                            
                            t
                            r
                            (
                            
                                            A
                                        
                                        ^
                                    
                                    θ
                                
                                            r
                                        
                                    x
                                
                            )
                            =
                            
                                    L
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            (
                            θ
                            ,
                            ϕ
                            )
                        
where the mean trace                         
                            t
                            r
                            (
                            
                                            A
                                        
                                        ^
                                    
                                    θ
                                
                                            r
                                        
                                    x
                                
                    ) (i.e., reliability) denotes the average probability that a randomly selected annotator provides an accurate label (i.e., with respect to the first processing label) to an image                         
                            
                                    x
                                
                                    n
                                
                    . Furthermore, under BRI, the annotator network (i.e., first calibration model) receives both the input image                         
                            
                                    x
                                
                                    n
                                
                     (i.e., first processing data) and, through the loss, the noisy segmentation label (i.e., first processing label) when computing the mean trace.)
train the target model based on the first processing data and the first processing label by using the reliability as a parameter of a loss function. (ZHANG [page 5, section 3.3] teaches the loss function used to minimize the loss (i.e., train) the annotator network and segmentation network (i.e., target model). The first processing data can be interpreted as the input image                                 
                                    
                                            x
                                        
                                            n
                                        
                            . Furthermore, ZHANG [page 5, Eq (4)] teaches noisy labels                                 
                                    
                                                    Y
                                                
                                                ~
                                            
                                            (
                                            r
                                            )
                                        
                                    =
                                    
                                                                    y
                                                                
                                                                ~
                                                            
                                                            n
                                                        
                                                                    r
                                                                
                                                    :
                                                    r
                                                    ∈
                                                    S
                                                    
                                                                    x
                                                                
                                                                    n
                                                                
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                            , which are the labels assigned (i.e., first processing label) by the annotators. Additionally, ZHANG [page 4, Figure 1] teaches the loss
                
                            ∑
                            
                                r
                                =
                                1
                            
                                3
                            
                            t
                            r
                            (
                            
                                            A
                                        
                                        ^
                                    
                                    θ
                                
                                            r
                                        
                                    x
                                
                            )
                            =
                            
                                    L
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            (
                            θ
                            ,
                            ϕ
                            )
                        
where the mean trace                         
                            t
                            r
                            (
                            
                                            A
                                        
                                        ^
                                    
                                    θ
                                
                                            r
                                        
                                    x
                                
                    ) (i.e., reliability), a term in the loss function (i.e., parameter of a loss function) denotes the average probability that a randomly selected annotator provides an accurate label.)

Regarding Claim 6:
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
wherein the processing circuitry is further configured to assign the first training label to the first input training data in accordance with an instruction by the first user. (ZHANG [page 3, section 3 Method] teaches: “Specifically, we consider a scenario where set of images (i.e., first input training data) […] are assigned with noisy segmentation labels (i.e., assign the first training label) […] from multiple annotators where                                 
                                    
                                                    y
                                                
                                                ~
                                            
                                            n
                                        
                                                    r
                                                
                             denotes the label from annotator                                 
                                    r
                                    ∈
                                    {
                                    1
                                    ,
                                    …
                                    ,
                                    R
                                    }
                                
                             […]”.)

Regarding Claim 7:
	
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
wherein the first input training data is medical data generated by a medical device. (ZHANG [page 1, Abstract] teaches: “We then demonstrate the utility of the method on three public medical imaging segmentation datasets with simulated (when necessary) and real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain tumours); 3) LIDC-IDRI (lung abnormalities).” Examiner’s notes: MSLSC, BraTS, and LIDC-IDRI are all examples of MRI scans datasets (i.e., MR or magnetic resonance).)

Regarding Claim 9:
	
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
calculate, from the first training label, a first calibration parameter for calibrating individual characteristics in label assignment by the first user; (ZHANG [page 4, Figure 1] teaches the annotator network that outputs confusion matrices (i.e., a calibration parameter) based on the input image                                 
                                    
                                            x
                                        
                                            n
                                        
                            . Additionally, the confusion matrices, as described in Figure 1, contain estimations of individual characteristics of each annotator in order to estimate the annotator's individual tendencies when performing segmentation (i.e., first training label).)
train, based on the first input training data, the first training label, and the calibration parameter, the first calibration model configured to receive input data as input, and to output as the calibration data the calibration parameter corresponding to the input data. (ZHANG [page 5, Eq (4)] teaches the training using a loss function to be minimized, which includes the annotator network (i.e., first calibration model) given (i.e., based on) training input                                 
                                    X
                                    =
                                    
                                                            x
                                                        
                                                            n
                                                        
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                            (i.e., first input training data), noisy labels (i.e., first training label)                                 
                                    
                                                    Y
                                                
                                                ~
                                            
                                            (
                                            r
                                            )
                                        
                                    =
                                    
                                                                    y
                                                                
                                                                ~
                                                            
                                                            n
                                                        
                                                                    r
                                                                
                                                    :
                                                    r
                                                    ∈
                                                    S
                                                    
                                                                    x
                                                                
                                                                    n
                                                                
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                            , which are the labels assigned by the annotators, and confusion matrices (i.e., calibration parameter) based on the annotator’s individual tendencies when performing segmentation.)

Regarding Claim 10:
	
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
to train, based on the first input training data, the first training label, […] the first calibration model that receives, as input, input data and a label assigned by the first user to the input data (ZHANG [page 5, Eq (4)] teaches the training using a loss function to be minimized, which includes the annotator network (i.e., first calibration model) given (i.e., based on) training input                                 
                                    X
                                    =
                                    
                                                            x
                                                        
                                                            n
                                                        
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                            (i.e., first input training data), noisy labels (i.e., first training label)                                 
                                    
                                                    Y
                                                
                                                ~
                                            
                                            (
                                            r
                                            )
                                        
                                    =
                                    
                                                                    y
                                                                
                                                                ~
                                                            
                                                            n
                                                        
                                                                    r
                                                                
                                                    :
                                                    r
                                                    ∈
                                                    S
                                                    
                                                                    x
                                                                
                                                                    n
                                                                
                                            n
                                            =
                                            1
                                        
                                            N
                                        
                            , which are the labels assigned by the annotators, and confusion matrices based on the annotator’s individual tendencies when performing segmentation.)
VEIT further teaches: train, based on […] a correct label with respect to the first input training data, the first calibration model (VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: "The first classifier is a label cleaning network denoted as                                 
                                    g
                                
                             that models the structure in the label space and learns a mapping from the noisy labels                                 
                                    y
                                
                             to the human verified labels                                 
                                    v
                                
                            , conditional on the input image. We denote the cleaned labels output by                                 
                                    g
                                
                             as                                  
                                    
                                            c
                                        
                                        ^
                                    
                             so that                                  
                                    
                                            c
                                        
                                        ^
                                    
                                    =
                                    g
                                    (
                                    y
                                    ,
                                    I
                                    )
                                
                            . [...] In order to model the label structure and noise conditional on the image, the network has two separate inputs, the noisy labels                                 
                                    y
                                
                             as well as the visual features                                 
                                    f
                                    
                                            I
                                        
                            . […] Denoting the residual cleaning module as                                 
                                    
                                            g
                                        
                                            '
                                        
                            , the label cleaning network                                 
                                    g
                                
                             computes cleaned labels
                
                            c
                        
                        ^
                    
                    =
                    c
                    l
                    i
                    p
                    (
                    y
                    +
                    
                            g
                        
                            '
                        
                            y
                            ,
                            f
                            
                                    I
                                
                    ,
                     
                            0,1
                        
VEIT [page 842, section 3.2. Model Training] teaches: “The label cleaning network is supervised (i.e., train […] the first calibration model) by the verified labels of all samples i in the human rated set                         
                            V
                        
                     (i.e., based on a correct label with respect to the first input training data). The cleaning loss is based on the difference between the cleaned labels                         
                            
                                            c
                                        
                                        ^
                                    
                                    i
                                
                     and the corresponding ground truth verified labels                         
                            
                                    v
                                
                                    i
                                
                    ,
                
                            L
                        
                            c
                            l
                            e
                            a
                            n
                        
                    =
                    
                            ∑
                            
                                i
                                ∈
                                V
                            
                                                    c
                                                
                                                ^
                                            
                                            i
                                        
                                    -
                                    
                                            v
                                        
                                            i
                                        
We choose the absolute distance as error measure, since the label vectors are very sparse.”)
and outputs, as the calibration data, a correct label for the input data. (VEIT [page 842, section 3.1. Multi-Task Label Cleaning Architecture] teaches: " The second classifier is an image classifier denoted as                                 
                                    h
                                
                             that learns to annotate images by imitating the first classifier                                 
                                    g
                                
                             by using                                 
                                    g
                                
                            ’s predictions as ground truth targets. […] Denoting the residual cleaning module as                                 
                                    
                                            g
                                        
                                            '
                                        
                            , the label cleaning network                                 
                                    g
                                
                             computes cleaned labels
                
                            c
                        
                        ^
                    
                    =
                    c
                    l
                    i
                    p
                    (
                    y
                    +
                    
                            g
                        
                            '
                        
                            y
                            ,
                            f
                            
                                    I
                                
                    ,
                     
                            0,1
                        
Examiner’s note: Under broadest reasonable interpretation, the calibration data can be interpreted as                         
                            g
                        
                    ’s predictions as ground truth targets, which are corrected labels using the residual (difference) between the noisy annotations and clean annotations of the image                         
                            I
                        
                    .)

Regarding Claim 11:
	
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
wherein the processing circuitry is further configured to: determine a reliability of the first training label with respect to the first input training data; (ZHANG [page 5, section 3.3] teaches: "[…] and                                 
                                    t
                                    r
                                    (
                                    A
                                    )
                                
                             denotes the trace of matrix A. The mean trace represents the average probability that a randomly selected annotator provides an accurate label (i.e. reliability)." Additionally, ZHANG [page 4, Figure 1] teaches a term of the loss (i.e., parameter of a loss function)
                
                    [
                    …
                    ]
                    +
                    
                            ∑
                            
                                r
                                =
                                1
                            
                                3
                            
                            t
                            r
                            (
                            
                                            A
                                        
                                        ^
                                    
                                    θ
                                
                                            r
                                        
                                    x
                                
                            )
                            =
                            
                                    L
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            (
                            θ
                            ,
                            ϕ
                            )
                        
where the mean trace                         
                            t
                            r
                            (
                            
                                            A
                                        
                                        ^
                                    
                                    θ
                                
                                            r
                                        
                                    x
                                
                    ) (i.e., reliability) denotes the average probability that a randomly selected annotator provides an accurate label (i.e., with respect to the first processing label) to an image                         
                            
                                    x
                                
                                    n
                                
                    .)
train the first calibration model that receives, as input, input data and a label assigned by the first user to the input data based on the first input training data, the first training label, and the reliability of the first training label, and outputs a reliability of the label with respect to the input data. (ZHANG [Figure 1; page 5, Eq (4)] teaches the training of the annotator network (i.e., train the first calibration model) via a loss function. The loss includes an input training image                                 
                                    
                                            x
                                        
                                            n
                                        
                             (i.e., first input training data), a noisy label assigned by the annotator (i.e., first training label), and the mean trace, which represents the average probability that an annotator provides an accurate label (i.e., reliability).

Regarding Claim 12:
	
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
wherein the processing circuitry is further configured to train the target model based on a combination of the first processing data, the first processing label, and the first calibration model and a combination of second processing data, a second processing label assigned by a second user to the second processing data, and a second calibration model for calibrating individual characteristics in label assignment by the second user. (ZHANG [page 4, Figure 1] teaches both the annotator network (i.e., first/second calibration model) and the segmentation network (i.e., target model) being trained via a loss function by using noisy labels from multiple annotators (i.e., combination), with each annotator providing a noisy label (i.e., first/second processing label) for an input image (i.e., first/second processing data). Under BRI, the “second calibration model” can be interpreted as the model configured with different parameters than the first annotator when receiving input data (i.e., labels) and learning the varying characteristics of another annotator’s labeling style, as described in ZHANG’s Figure 1. Furthermore, ZHANG [page 2, Section 1 Introduction] teaches: "In this work, we introduce the first instance of an end-to-end supervised segmentation method that jointly estimates, from noisy labels alone, the reliability of multiple human annotators (i.e., a second processing label assigned by a second user) and true segmentation labels. The proposed architecture (Fig. 1) consists of two coupled CNNs where one estimates the true segmentation probabilities and the other models the characteristics of individual annotators (e.g., tendency to over-segmentation, mix-up between different classes, etc.) by estimating the pixel-wise confusion matrices (CMs) on a per image basis.")

Regarding Claim 14:
	
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
wherein the target model is a machine learning model trained in such a manner as to receive input processing data as input and to output prediction data corresponding to the input processing data. (ZHANG [page 4, Figure 1] teaches: “At test time, the output of the segmentation network,                                 
                                    
                                                    p
                                                
                                                ^
                                            
                                            θ
                                        
                                    (
                                    x
                                    )
                                
                             is used to yield the prediction.” Examiner’s note:                                 
                                    x
                                
                             is the input image (i.e., input processing data).)

Regarding Claim 17:
The claim recites similar limitations as corresponding claim 1 and is rejected for similar reasons as claim 1 using similar teachings and rationale.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over ZHANG in view of VEIT as applied to claim 1 above, and further in view of SHAMIR (US 20210196207 A1), hereafter SHAMIR.

Regarding Claim 8:
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
wherein the first input training data is an MR image (ZHANG [page 1, Abstract] teaches: “We then demonstrate the utility of the method on three public medical imaging segmentation datasets with simulated (when necessary) and real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain tumours); 3) LIDC-IDRI (lung abnormalities).” Examiner’s notes: MSLSC, BraTS, and LIDC-IDRI are all examples of MRI scans datasets (i.e., MR or magnetic resonance).)
However, ZHANG is not relied upon for teaching, but SHAMIR teaches: in which a measurement voxel by MR spectroscopy with a magnetic resonance imaging apparatus is set, (SHAMIR [0068] teaches: “The medical images may include images derived (i.e., set) from magnetic resonance imaging (MRI), radiography, ultrasound, elastography, photoacoustic imaging, positron emission tomography, echocardiography, magnetic particle imaging, functional near-infrared spectroscopy, and/or any other imaging technique.”)
and the first training label is a mark indicative of a position of the measurement voxel. (SHAMIR [0068] teaches: “In some instances, features extracted from each image may indicate a specific area of a patient's body, such as a head or torso, where voxels are labeled to indicate segmentation of a tumor and different tissue types, such as: combined skin and muscle tissue, skull/bone, cerebrospinal fluid, gray matter, and white matter. […] For example, a marker in an image, such as a voxel, pixel, and/or the like may be correlated to a point in a patient's anatomy from which a tissue sample is excised.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of ZHANG, VEIT, and SHAMIR before them, to include SHAMIR’s class MR spectroscopy training data in ZHANG and VEIT's segmentation method. One would have been motivated to make such a combination in order to improve the accuracy of the model during training (SHAMIR [0086-0087]).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over ZHANG in view of VEIT as applied to claim 1 above, and further in view of HOULSBY (“Parameter-Efficient Transfer Learning for NLP”), hereafter HOULSBY.

Regarding Claim 15:
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. ZHANG further teaches:
[…] model corresponding to a user other than the first user […] (Under BRI, the “model corresponding to a user other than the first user” can be interpreted as the annotator network configured with different parameters                                 
                                    ϕ
                                
                             than the parameters of a first user based on the received input data (i.e., labels) and learned characteristics of the annotator’s labeling style, as described in ZHANG’s Figure 1, which shows different confusion matrices for each individual annotator.)
based on the first input training data, the first training label, and the calibration data. (ZHANG [page 5, Section 3.3] teaches input training image                                 
                                    
                                            x
                                        
                                            n
                                        
                             (i.e., first input training data), noisy labels assigned by annotators (i.e., first training label), and confusion matrix, or mean trace (i.e., calibration data).)
However, ZHANG in view of VEIT is not relied upon for teaching, but HOULSBY teaches: train an untrained first calibration model by copying a training parameter of a trained calibration […] to the untrained first calibration model, and training a training parameter of the untrained first calibration model (HOULSBY [page 1, section 1 Introduction] teaches: “The embeddings are then fed to custom downstream models (i.e., untrained first calibration models). Fine-tuning involves copying the weights (i.e., training parameters) from a pre-trained network and tuning (i.e., train an untrained first calibration model) them (i.e., training parameter) on the downstream task.”.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of ZHANG, VEIT, and HOULSBY before them, to include HOULSBY’s transfer learning in ZHANG and VEIT's segmentation method. One would have been motivated to make such a combination in order to build a system that performs well, without having to build the system from scratch (HOULSBY [page 1, section 1 Introduction]).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over ZHANG in view of VEIT as applied to claim 1 above, and further in view of GUAN (“Who Said What: Modeling Individual Labelers Improves Classification”), hereafter GUAN.

Regarding Claim 16:
ZHANG in view of VEIT teaches the elements of claim 1 as outlined above. However, ZHANG in view of VEIT is not relied upon for teaching, but GUAN teaches:
wherein the first calibration model includes a plurality of calibration models respectively corresponding to a plurality of users other than the first user, (GUAN [page 4, Summary of Procedure] teaches: “We learn a doctor model for each doctor.”)
and an addition layer that weight-adds a plurality of calibration data pieces from the calibration models (GUAN [page 4, Model Architecture] teaches: “Weighted Doctor Net (WDN): Fixed DN with averaging weights for combining the predictions of the doctor models learned on top, one weight per doctor model.”)
in accordance with a weight trained for calibration of label assignment by the first user, and outputs the calibration data pieces. (GUAN [page 5, Summary of Procedure] teaches: “During evaluation of WDN, the prediction made for our model is a linear combination of the doctor models predictions (i.e., label assignment) where the coefficients are the averaging weights learned (i.e., trained for calibration) in Phase 2 of training.” Examiner’s note: under BRI, each weight in the linear combination can be interpreted as the calibration data pieces.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of ZHANG, VEIT, and GUAN before them, to include GUAN’s models for each doctor in ZHANG and VEIT’s segmentation method. One would have been motivated to make such a combination in order to “make more effective use of noisy labels when every example is labeled by a subset of a larger pool of experts.” (GUAN [page 8, Conclusion]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Alvaro S Laham Bauzo whose telephone number is (571)272-5650. The examiner can normally be reached Mon-Fri 7:30 AM - 11:00 AM | 1:00 PM - 5:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/A.S.L./Examiner, Art Unit 2146

/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

May 11, 2022
Application Filed
Jun 24, 2025
Non-Final Rejection mailed — §103, §112
Oct 24, 2025
Response Filed
Dec 08, 2025
Final Rejection mailed — §103, §112
Mar 09, 2026
Request for Continued Examination
Mar 14, 2026
Response after Non-Final Action
Mar 27, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/646,082
Patent 12632705
ADVERSARIAL 3D DEFORMATIONS LEARNING
4y 4m to grant Granted May 19, 2026
17/758,166
Patent 12475388
MACHINE LEARNING MODEL SEARCH METHOD, RELATED APPARATUS, AND DEVICE
3y 4m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
25%
Grant Probability
99%
With Interview (+100.0%)
4y 1m (~1m remaining)
Median Time to Grant
High
PTA Risk
Based on 4 resolved cases by this examiner. Grant probability derived from career allowance rate.
APPARATUS OF MACHINE LEARNING, MACHINE LEARNING METHOD, AND INFERENCE APPARATUS

This examiner grants 25% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

APPARATUS OF MACHINE LEARNING, MACHINE LEARNING METHOD, AND INFERENCE APPARATUS

This examiner grants 25% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email