Office Action Analysis: 18532915 — HAIR COLOR CLASSIFICATION USING CONSISTENCY REGULARIZATION AND ANNOTATION CONFUSION MATRICES

Examiner Intelligence

CHEN, JOSHUA NMN View full profile →
Grants 85% — above average
Career Allow Rate
34 granted / 40 resolved
+23.0% vs TC avg
Strong +26% interview lift
Without
With
+26.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
20 currently pending
Career history
60
Total Applications
across all art units
Statute-Specific Performance

§101
18.7%
-21.3% vs TC avg
§103
52.0%
+12.0% vs TC avg
§102
15.7%
-24.3% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases
Office Action

§103 §DP
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/26/2026 was filed and is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Terminal Disclaimer
The terminal disclaimer filed on 26 February 2026 disclaiming the terminal portion of any patent granted on this application which would extend beyond the expiration date of patent application number 18/532,834 has been reviewed and is accepted.  The terminal disclaimer has been recorded.

Response to Amendment
Applicant’s specification amendment, see P. 1, filed 02/26/2026, is acknowledged by the examiner. The specification objection of 11/26/2025 is withdrawn. 

Applicant’s response to double patenting and the filed terminal disclaimer, see P. 1, filed 02/26/2026, is acknowledged by the examiner. The double patenting of claims 1, 5-6, 8, 17, and 20 of 11/26/2025 is withdrawn. 

Applicant’s arguments, see P. 1- P. 3, filed 02/26/2026, with respect to independent claims 1, 8, and 17 have been fully considered but are not found convincing. The 35 U.S.C. 103 rejection of 11/26/2025 has NOT been withdrawn.

Regarding claims 1, 8, and 17, applicant claims that Wilson (US 2025/0157181 A1, hereinafter Wilson) fails to teach “hair reflectance classifier that outputs a hair reflectance value”. 
Examiner respectfully disagree as Wilson in Para [0055] teaches “The machine-learning model 252A is trained to receive an image of hair as input and determine hair condition 300A as output. The image of hair may user input data 370A received from a client device of a user. The hair condition 300A may include (but is not limited to) a current shade 310A, a current tone 320A, a current texture 330A, a current porosity 340A, and/or a percentage of gray 350A. The machine-learning model 252A encodes the hair condition into a set of values or a vector. For example, each of the shade 310A, tone 320A, texture 330A, porosity 340A, and percentage of gray 350A corresponds to a numerical value.”  Examiner believes that both tone and percentage of gray, alone or combined, can be interpreted as hair reflectance under broadest reasonable interpretation. 
	As such, the previous 35 U.S.C 103 rejection is maintained. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 4, 8-9, 14-15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Wilson et al. (US 2025/0157181 A1, hereinafter Wilson) in view of YU et al. (US 2021/0406996 A1, hereinafter Yu) and Sun et al. (US 2022/0012901 A1, hereinafter Sun).

Regarding claims 1, 8, and 17, Wilson discloses
Claim 1: A computing device comprising a processor coupled to storage device storing instructions that when executed by the processor (Para [0130]: “FIG. 7 is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller).”) cause the computing device to:
Claim 8: A method comprising:
receiving an input image (Para [0055]: “The machine-learning model 252A is trained to receive an image of hair as input and determine hair condition 300A as output.”) ; and b… (step b. is rejected below as the claim language of step b. is similar to that of claims 1 and 17)
providing the hair shade and hair reflectance (Para [0055]: “The machine-learning model 252A is trained to receive an image of hair as input and determine hair condition 300A as output.”).
Claim 17: A computer program product comprising a non-transient storage device storing computer executable instructions (Para [0032]:” In various embodiments, the online hair color formulation system 140 includes a memory 144 comprising a non-transitory storage medium on which instructions are encoded.”) that, when executed by a processor of a computing device, cause the computing device to:
classify hair shade and hair reflectance of hair in an input image using a neural network (Para [0055]: “The machine-learning model 252A is trained to receive an image of hair as input and determine hair condition 300A as output.”),
i) a shade classifier that outputs a hair shade value and ii) a reflectance classifier that outputs a hair reflectance value (Para [0055]: “The image of hair may user input data 370A received from a client device of a user. The hair condition 300A may include (but is not limited to) a current shade 310A, a current tone 320A, a current texture 330A, a current porosity 340A, and/or a percentage of gray 350A. The machine-learning model 252A encodes the hair condition into a set of values or a vector. For example, each of the shade 310A tone 320A, texture 330A, porosity 340A, and percentage of gray 350A corresponds to a numerical value.”),
the neural network having been defined through training with a sum of shade and reflectance cross entropy losses (Para [0108]: “The machine-learning model is applied to the training dataset. Parameters of the machine-learning model are modified during the training process to minimize a loss function indicating a difference between a prediction of an image and a label of the image in the training dataset. For a given hair image, the machine-learning model is trained to determine a set of values for a set of attributes associated with hair condition.”) determined from
i) outputs of the hair shade values and hair reflectance values, and ii) target labels for hair shade and hair reflectance, the target labels prepared for training images from respective expert votes by a plurality of experts (Para [0054]: “Referring to FIG. 3A, in some embodiments, the training data 242A includes images of hair and their corresponding hair conditions.”, Para [0108]: “The machine-learning model is applied to the training dataset. Parameters of the machine-learning model are modified during the training process to minimize a loss function indicating a difference between a prediction of an image and a label of the image in the training dataset. For a given hair image, the machine-learning model is trained to determine a set of values for a set of attributes associated with hair condition.”).
However, Wilson does not explicitly disclose
the neural network comprising an encoding backbone coupled to a pair of classifiers,
each of the pair of classifiers comprising a linear classifier,
the neural network having been defined through training with a sum of shade and reflectance cross entropy losses, and
ii) target labels for hair shade and hair reflectance, the target labels prepared for training images from respective expert votes by a plurality of experts.
Yu teaches
the neural network comprising an encoding backbone coupled to a pair of classifiers (Fig. 3, Para [0036]: “FIG. 3 is a block diagram illustrating the deep learning network architecture. The 300 network comprises a convolutional neural network (CNN) 302 for processing a source image at an input layer 304. In an embodiment, CNN 302 is configured using a residual network based backbone having a plurality of residual blocks 306, to extract the shared feature.”, Para [0037]: “A flattened feature vector 308 (for example, using average pooling) is obtained from the backbone net 302.”, Para [0038]: “The feature vector 308 is duplicated for processing (e.g. in parallel) by a plurality (K) of classifiers 310 for each of the K facial attributes.”),
each of the pair of classifiers comprising a linear classifier (Para [0038]: “The feature vector 308 is duplicated for processing (e.g. in parallel) by a plurality (K) of classifiers 310 for each of the K facial attributes. Each individual classifier ( e.g. 3121, 3122,…312K) comprises one or a plurality of fully ,connected linear layers (3141, 3142,…314K) and a prediction block (3161, 3162,…316K) to output a final prediction for each attribute.”), and
ii) target labels for hair shade and hair reflectance, the target labels prepared for training images from respective expert votes by a plurality of experts (Para [0022]: “For example, for a hair color annotation, four people may say brown but two may say dark brown or even black.”, TABLE 1: “16 Hair Color Bald, White, Grey, Light Blonde, Blonde, Brown, Dark Brown, Black, Strawberry, Copper, Red, Auburn, Other”, Para [0041]: “As noted, the annotations for a particular attribute are not identical by each annotator. A "truth" resolving rule is used to select among the annotation values (soft labels). One rule is a most common prediction rule choosing the most commonly predicted attribute from set of predictions. One rule is a top two most common prediction rule.”, Para [0101]: “At step 1204, operations store soft labels to the data set for each image of the plurality of images, the soft labels comprising respective attribute values, per attribute, as determined by a plurality of respective human image annotators acting independently.”).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wilson with the architecture of the neural network, specifically a backbone model outputting to multiple linear classifiers, and ground truth labels decided by multiple annotators for training the model of Yu to effectively increase the accuracy when determine facial features. In addition, it is suggested in Para [0056] of Wilson that the CNN model 252A that determine hair condition can be multiple models trained to determine multiple attributes associated with the user’s hair conditions. As such the 252A model of Wilson can simple be substituted with the Fig. 3 neural network architecture of a backbone net and multiple classifiers, each for different attributes, of Yu to yield predictable result.  
However Wilson in view of Yu does not explicitly teach
the neural network having been defined through training with a sum of shade and reflectance cross entropy losses.
Sun teaches
the neural network having been defined through training with a sum of shade and reflectance cross entropy losses (Para [0029]: “When training such networks, the cross entropy loss function may be used.”, Para [0059]: “In detail, such a network structure would assign motion codes to a series of segmented actions as seen in demonstration videos; rather than learning to detect and output a single motion code, an ensemble of classifiers that can separately identify parts of the hierarchy in FIG. 1 can be used to build substrings that could then be appended together as a single, representative string”, Para [0079]: “The overall structure of the network 700 is shown in FIG. 7. All objective functions are cross-entropy losses. The motion code embedding model were trained separately before integrating it into action recognition model. The objective function of the motion code embedding model is defined as a linear combination of individual losses:…”).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wilson in view of Yu with determining individual cross entropy loss function for each classifier and combine the cross entropy losses for training of Sun to effectively increase the learnability of models when training with multiple ground truth labels.

Regarding claims 4, dependent upon claims 1, Wilson in view of Yu and Sun teaches every element regarding claims 1. 
Yu further teaches
the target labels comprise soft labels for at least some of the training images, the soft label determined from an empirical distribution of the expert votes over the respective classes (Para [0022]: “For example, for a hair color annotation, four people may say brown but two may say dark brown or even black.”, Para [0041]: “As noted, the annotations for a particular attribute are not identical by each annotator. A "truth" resolving rule is used to select among the annotation values (soft labels). One rule is a most common prediction rule choosing the most commonly predicted attribute from set of predictions. One rule is a top two most common prediction rule.”, Para [0101]: “At step 1204, operations store soft labels to the data set for each image of the plurality of images, the soft labels comprising respective attribute values, per attribute, as determined by a plurality of respective human image annotators acting independently.”).

Regarding claims 9 and 18, dependent upon claims 8 and 17 respectively, Wilson in view of Yu and Sun teaches every element regarding claims 8 and 17. 
Wilson further discloses
the hair shade and hair reflectance are provided for training a generative model to simulate hair color (Para [0060]: “The machine-learning model 252B is trained to receive a current hair condition and desired hair condition pair as input 370B, and generate a hair color treatment 300B as output. The hair color treatment 300B may include one or more attributes of hair treatment, such as a shade level 310B, a tone level 320B, and/or a duration 330B for applying the level of shade 310A and the level of tone 320B to the hair. In some embodiments, the level of shade indicates how much a shade of the current their needs to be darkened or lightened, and/or the level of tone indicates how much a tone of the current hair needs to be changed.”).

Regarding claims 14, dependent upon claims 9, Wilson in view of Yu and Sun teaches every element regarding claims 9. 
Yu further teaches
the target labels comprise soft labels for at least some of the training images, the soft label determined from an empirical distribution of the expert votes over the respective classes (Para [0022]: “For example, for a hair color annotation, four people may say brown but two may say dark brown or even black.”, Para [0041]: “As noted, the annotations for a particular attribute are not identical by each annotator. A "truth" resolving rule is used to select among the annotation values (soft labels). One rule is a most common prediction rule choosing the most commonly predicted attribute from set of predictions. One rule is a top two most common prediction rule.”, Para [0101]: “At step 1204, operations store soft labels to the data set for each image of the plurality of images, the soft labels comprising respective attribute values, per attribute, as determined by a plurality of respective human image annotators acting independently.”).

Regarding claims 15, dependent upon claims 14, Wilson in view of Yu and Sun teaches every element regarding claims 14. 
Yu further teaches
the input image is associated with at least one of the target labels and wherein step b. is performed to train the neural network using the input image (Para [0101]-[0102]: “ At step 1202, operations store an attribute data set with which to train an attribute classifier that predicts attributes from an inference time image, the attribute data set comprising a plurality of images showing a plurality of attributes, each of the attributes having a plurality of respective attribute values. At step 1204, operations store soft labels to the data set for each image of the plurality of images, the soft labels comprising respective attribute values, per attribute, as determined by a plurality of respective human image annotators acting independently. At step 1206, operations provide the attribute data set for training the attribute classifier. Storing operations, in an embodiment, store to a data store such as, but not limited to, a database. In an embodiment, operations 1200 may further comprise (e.g. at step 1208) training the attribute classifier using the attribute data set. In an embodiment, when training, the method comprises using a "truth" resolving rule to select a truth from among the soft labels”).

Claims 2-3, 10, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wilson et al. (US 2025/0157181 A1, hereinafter Wilson) in view of YU et al. (US 2021/0406996 A1, hereinafter Yu), Sun et al. (US 2022/0012901 A1, hereinafter Sun), and Neat (HOW TO DECODE THE HAIR COLOR NUMBERING SYSTEM, hereinafter Neat).

Regarding claim 2, dependent upon claims 1, Wilson in view of Yu and Sun teaches every element regarding claim 1.
However, Wilson in view of Yu and Sun does not explicitly teach 
the neural network is configured to classify the hair shade and hair reflectance in accordance with a plurality of respective classes to follow an industry standard for hair color.
Neat teaches
the neural network is configured to classify the hair shade and hair reflectance in accordance with a plurality of respective classes to follow an industry standard for hair color ( P. 3:
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, P. 4: 
    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
, P. 5: 
    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wilson in view of Yu and Sun with using the industry standard numbering system for color of hair of Neat as output when performing classification of hair color in an image with outputs that are numerical values of shade and reflectance (tone). In particular at Para [0055] of Wilson, the machine learning model 252A receives input of an image of hair and outputs hair conditions 300A, which includes current shade 310A and current tone 320A, in numerical value. The modification here is simply substituting the numerical value output of Wilson with the industry standard hair color value in Neat to yield predictable results that is understandable to industry professionals. 

Regarding claim 3, dependent upon claims 1, Wilson in view of Yu and Sun teaches every element regarding claim 1.
However, Wilson in view of Yu and Sun does not explicitly teach 
wherein the hair reflectance comprises a primary reflectance component and a secondary reflectance component.
Neat teaches
wherein the hair reflectance comprises a primary reflectance component and a secondary reflectance component (
P. 3:
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, 
P. 4: 
    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
, P. 5: 
    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
).

Regarding claims 10 and 19, dependent upon claims 9 and 17 respectively, Wilson in view of Yu and Sun teaches every element regarding claims 9 and 17.
However, Wilson in view of Yu and Sun does not explicitly teach 
the neural network is configured to classify the hair shade and hair reflectance in accordance with a plurality of respective classes to follow an industry standard for hair color, and wherein the hair reflectance comprises a primary reflectance component and a secondary reflectance component.
Neat teaches
the neural network is configured to classify the hair shade and hair reflectance in accordance with a plurality of respective classes to follow an industry standard for hair color, and wherein the hair reflectance comprises a primary reflectance component and a secondary reflectance component (
P. 3:
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, 
P. 4: 
    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
, P. 5: 
    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
).

Claims 5, 11, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wilson et al. (US 2025/0157181 A1, hereinafter Wilson) in view of YU et al. (US 2021/0406996 A1, hereinafter Yu), Sun et al. (US 2022/0012901 A1, hereinafter Sun), and Tanno et al. (Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion, hereinafter Tanno).

Regarding claims 5, 11, and 16, dependent upon claims 1, 9, and 15 respectively, Wilson in view of Yu and Sun teaches every element regarding claims 1, 9, and 15.
However, Wilson in view of Yu and Sun does not explicitly teach
the neural network is trained with respective annotator confusion matrices comprising a plurality (n) of shade annotator confusion matrices and a plurality (n) of reflectance annotator confusion matrices,
wherein n is defined from a total number of experts providing the expert votes, and
wherein the output from the shade classifier is multiplied by a respective one of the plurality of shade confusion matrices and wherein the output from the reflectance classifier is multiplied by a respective one of the plurality of reflectance confusion matrices to predict the vote of a respective one of the plurality of experts; and
wherein the neural network and respective annotator confusion matrices are trained together.
Tanno teaches
the neural network having been trained with respective annotator confusion matrices comprising a plurality (n) of shade annotator confusion matrices and a plurality (n) of reflectance annotator confusion matrices (Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”),
wherein n is defined from a total number of experts providing the expert votes (Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”), and
wherein the output from the shade classifier is multiplied by a respective one of the plurality of shade confusion matrices and wherein the output from the reflectance classifier is multiplied by a respective one of the plurality of reflectance confusion matrices to predict the vote of a respective one of the plurality of experts (Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”, P. 3 Para. 2: “At inference time, we use the most confident class in                         
                            
                                            p
                                        
                                        ^
                                    
                                    θ
                                
                            (
                            x
                            )
                        
                     as the final classification output.”); and
wherein the neural network and respective annotator confusion matrices are trained together (Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”, P. 3 Para. 2: “Next, we describe our optimization algorithm for jointly learning the parameters of the base classifier, θ and the CMs.”).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wilson in view of Yu and Sun with use of confusion matrices to denote the annotators’ annotation and training the matrices along with the model at the same time of Tanno to effectively increase the robustness and accuracy when training a machine learning model.

Regarding claims 20, dependent upon claims 17, Wilson in view of Yu and Sun teaches every element regarding claims 17.
However, Wilson in view of Yu and Sun does not explicitly teach
a. execution of the instructions cause the computing device to provide the neural network for classification wherein the neural network having been trained with respective annotator confusion matrices comprising a plurality (n) of shade annotator confusion matrices and a plurality (n) of reflectance annotator confusion matrices,
wherein n is defined from a total number of experts providing the expert votes, and
wherein the output from the shade classifier is multiplied by a respective one of the plurality of shade confusion matrices and wherein the output from the reflectance classifier is multiplied by a respective one of the plurality of reflectance confusion matrices to predict the vote of a respective one of the plurality of experts; and
wherein the neural network and respective annotator confusion matrices are trained together… (Examiner points to the language one or both of in the preamble of claim 20 and the use of or between step a. and step b. of claim 20 for reasoning of omitting the remainder of claim 20).
Tanno teaches
a. execution of the instructions cause the computing device to provide the neural network for classification wherein the neural network having been trained with respective annotator confusion matrices comprising a plurality (n) of shade annotator confusion matrices and a plurality (n) of reflectance annotator confusion matrices (Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”),
wherein n is defined from a total number of experts providing the expert votes (Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”),  and
wherein the output from the shade classifier is multiplied by a respective one of the plurality of shade confusion matrices and wherein the output from the reflectance classifier is multiplied by a respective one of the plurality of reflectance confusion matrices to predict the vote of a respective one of the plurality of experts (Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”, P. 3 Para. 2: “At inference time, we use the most confident class in                         
                            
                                            p
                                        
                                        ^
                                    
                                    θ
                                
                            (
                            x
                            )
                        
                     as the final classification output.”); and
wherein the neural network and respective annotator confusion matrices are trained together(Figure 1, P.3 : “Figure 1: General schematic of the model (eq. 2) in the presence of 4 annotators. Given input image x, the classifier parametrised by θ generates an estimate of the ground truth class probabilities,                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                    . Then, the class probabilities of respective annotators                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                            ∶
                            =
                            
                                    A
                                
                                            r
                                        
                                    p
                                
                                    θ
                                
                            (
                            x
                            )
                        
                     for                         
                            r
                            ∈
                            {
                            1
                            ,
                             
                            2
                            ,
                             
                            3
                            ,
                             
                            4
                            }
                        
                     are computed. The model parameters                         
                            {
                            θ
                            ,
                             
                                    A
                                
                                            1
                                        
                            ,
                             
                                    A
                                
                                            2
                                        
                            ,
                             
                                    A
                                
                                            3
                                        
                            ,
                             
                                    A
                                
                                    (
                                    4
                                    )
                                
                            }
                        
                     are optimized to minimize the sum of four cross-entropy losses between each estimated annotator distribution                         
                            
                                    p
                                
                                            r
                                        
                            (
                            x
                            )
                        
                     and the noisy labels                         
                            
                                            y
                                        
                                        ~
                                    
                                    (
                                    r
                                    )
                                
                    observed from each annotator. The probability that each annotator provides accurate labels can be estimated by taking the average diagonal elements of the associated confusion matrix (CM), which we refer to as the “skill level” of the annotator.”, P. 3 Para. 2: “Next, we describe our optimization algorithm for jointly learning the parameters of the base classifier, θ and the CMs.”).

Claims 6-7 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Wilson et al. (US 2025/0157181 A1, hereinafter Wilson) in view of YU et al. (US 2021/0406996 A1, hereinafter Yu), Sun et al. (US 2022/0012901 A1, hereinafter Sun), and Tarvainen et al. (Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, hereinafter Tarvainen).
Regarding claims 6 and 12, dependent upon claims 1 and 9 respectively, Wilson in view of Yu and Sun teaches every element regarding claims 1 and 9.
However, Wilson in view of Yu and Sun does not explicitly teach 
the neural network is trained in accordance with a mean teacher framework.
Tarvainen teaches
the neural network having is in accordance with a mean teacher framework.
(Figure 2, P. 3: “Figure 2: The Mean Teacher method. The figure depicts a training batch with a single labeled example. Both the student and the teacher model evaluate the input applying noise                 
                    (
                    η
                    ,
                     
                    η
                    '
                    )
                
             within their computation. The softmax output of the student model is compared with the one-hot label using classification cost and with the teacher output using consistency cost. After the weights of the student model have been updated with gradient descent, the teacher model weights are updated as an exponential moving average of the student weights. Both model outputs can be used for prediction, but at the end of the training the teacher prediction is more likely to be correct. A training step with an unlabeled example would be similar, except no classification cost would be applied.”).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wilson in view of Yu and Sun with training classification model using the mean teacher framework of Tarvainen to effectively increase the accuracy of the model.

Regarding claims 7 and 13, dependent upon claims 6 and 12 respectively, Wilson in view of Yu, Sun and Tarvainen teaches every element regarding claims 6 and 12.
Tarvainen further teaches
the neural network comprises a teacher network obtained from the mean teacher framework (Figure 2, P. 3: “Figure 2: The Mean Teacher method. The figure depicts a training batch with a single labeled example. Both the student and the teacher model evaluate the input applying noise                 
                    (
                    η
                    ,
                     
                    η
                    '
                    )
                
             within their computation. The softmax output of the student model is compared with the one-hot label using classification cost and with the teacher output using consistency cost. After the weights of the student model have been updated with gradient descent, the teacher model weights are updated as an exponential moving average of the student weights. Both model outputs can be used for prediction, but at the end of the training the teacher prediction is more likely to be correct. A training step with an unlabeled example would be similar, except no classification cost would be applied.”).

Relevant Prior Art Directed to State of Art
Fu et al. (US 10,339,685 B2, hereinafter Fu) is prior art not applied in the rejection(s) above. Fu discloses a system and method are provided to detect, analyze and digitally remove makeup from an image of a face. An autoencoder-based framework is provided to extract attractiveness-aware features to perform an assessment of facial beauty.

Jiang et al. (US 2020/0170564 A1, hereinafter Jiang) is prior art not applied in the rejection(s) above. Jiang discloses a deep learning based system and method for skin diagnostics as well as testing metrics that show that such a deep learning based system outperforms human experts on the task of apparent skin diagnostics. Also shown and described is a system and method of monitoring a skin treatment regime using a deep learning based system and method for skin diagnostics.
Ponlawan et al. (Guideline of Personalized Facial Makeup Using a Hierarchical Cascade Classifier, hereinafter Ponlawan) is prior art not applied in the rejection(s) above. Ponlawan discloses a hierarchical cascade classifier to develop a guideline of personalized facial makeup.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSHUA CHEN whose telephone number is (703)756-5394. The examiner can normally be reached M-Th: 9:30 am - 4:30pm ET F: 9:30 am - 2:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, STEPHEN R KOZIOL can be reached at (408)918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J. C./            Examiner, Art Unit 2665                             

/Stephen R Koziol/               Supervisory Patent Examiner, Art Unit 2665
Read full office action
Prosecution Timeline

Dec 07, 2023
Application Filed
Nov 20, 2025
Non-Final Rejection — §103, §DP
Feb 26, 2026
Response Filed
Mar 13, 2026
Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/026,081
Patent 12602747
METHOD AND APPARATUS FOR DENOISING A LOW-LIGHT IMAGE
2y 5m to grant Granted Apr 14, 2026
17/904,842
Patent 12592090
COMPENSATION OF INTENSITY VARIANCES IN IMAGES USED FOR COLONY ENUMERATION
2y 5m to grant Granted Mar 31, 2026
17/978,489
Patent 12579614
IMAGING DEVICE
2y 5m to grant Granted Mar 17, 2026
18/170,803
Patent 12579678
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 17, 2026
17/924,570
Patent 12573065
Vision Sensing Device and Method
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

3-4
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+26.1%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allow rate.
HAIR COLOR CLASSIFICATION USING CONSISTENCY REGULARIZATION AND ANNOTATION CONFUSION MATRICES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

HAIR COLOR CLASSIFICATION USING CONSISTENCY REGULARIZATION AND ANNOTATION CONFUSION MATRICES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email