Office Action Analysis: 17896747 — ORDINAL CLASSIFICATION THROUGH NETWORK DECOMPOSITION

Examiner Intelligence

BASOM, BLAINE T View full profile →
Grants 43% of resolved cases
Career Allowance Rate
140 granted / 326 resolved
-12.1% vs TC avg
Strong +23% interview lift
Without
With
+22.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
23 currently pending
Career history
364
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
85.8%
+45.8% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
2.6%
-37.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 326 resolved cases
Office Action

§103 §112
DETAILED ACTION
	This Office Action is responsive to the Applicant’s submission, filed on January 2, 2026, amending claims 1-8, 11-18 and 20.  The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 5 and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In particular, there is no antecedent basis for “each nominal classifiers,” which is recited in each of claims 5 and 15. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 6, 7, 9, 11, 16, 17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the article entitled, “A Neural Approach to Ordinal Regression for the Preventive Assessment of Developmental Dyslexia” by Martinez-Murcia et al. (“Murcia”), and also over U.S. Patent Application Publication No. 2019/0156155 to Wang et al. (“Wang”).
Regarding claims 1, 11 and 20, Murcia “propose[s] a mixed neural model that calculates risk levels of dyslexia from tests that can be completed at the age of 5 years.”  Like claimed, Murcia particularly teaches:
learning, by an encoder neural network, compact representations of input data (Murcia teaches that the mixed neural model comprises an encoder and an ordinal neural regressor:
In this paper, we present a novel methodology to predict the risk of DD [development dyslexia] in 5 year old individuals based on the outcomes of tests designed by expert psychologists. These subjects were followed over 4 years (from 5 to 8 years old), until a consistent DD risk evaluation was performed at age 7. We apply autoencoders for obtaining a feature modelling of the test outcomes and then a ordinal neural regressor that tries to predict the risk levels using the data at age 7.
(Section 1 “Introduction;” emphasis added).

The resulting model is a combination of the encoder part of a DAE [denoising autoencoder] and an ordinal neural regressor, a 3-layer feed-forward network that uses the CORAL framework. The model architecture is displayed in more detail at Figure 1.a), b) and c).
(Section 2.5 “Full Model: Architecture and Training;” emphasis added).

    PNG
    media_image1.png
    169
    438
    media_image1.png
    Greyscale

Murcia particularly discloses that the encoder reduces the dimensionality of input data, and is trained alongside a corresponding decoder that then increases the dimensionality to reconstruct the input data:
Autoencoders (AEs) are a specific type of neural encoder-decoder architecture. It consists of a feed-forward neural network that reduces dimensionality (encoder), directly connected to a inverse network (a decoder, usually symmetric with the encoder) that increases dimensionality to reconstruct the original shape. Then, the network is trained to minimize the error between the input and the output. A typical variation is the denoising AE (DAE), in which the input is corrupted with noise and the network is expected to provide the original input, without noise, which is sometimes considered a regularization procedure. No further regularization was used.
In this work, we propose a hybrid model that trains the autoencoder and then reuses the encoder part to perform dimensionality reduction, as in [10]. The precise architecture uses symmetric encoder and decoder modules. There are three layers of N, 64 and 3 neurons for the encoder and 3, 64 and N for the decoder (where N is the number of tests included). We used 3 neurons in the Z-layer to favour a visual interpretation of the results in a three-dimensional space, and 64 neurons in the intermediate layers of the encoder and decoder were chosen after a careful systematic test of accuracy and visualization, in powers of 2. A higher number of neurons led to overfitting and lower explainability of the representation, and a smaller number of neurons yielded lower performance. Batch normalization is used for speeding up the convergence and the activation function for layers 1, 2, 4 and 5 is ELU. The intermediate layer (usually known as Z-layer) and output layers have linear activation. For training we use the Mean Squared Error (MSE) between the input and output data as loss, and the Adamax optimizer [7].
(Section 2.3 “Denoising Autoencoder;” emphasis added).

The dimensionally-reduced input data learned by the encoder via the training is considered “compact neural representations” like claimed.);
freezing the encoder neural network for downstream tasks (Murcia teaches that, in some embodiments, the encoder can be pretrained and then locked while training the ordinal neural regressor:
The cost functions for the encoder and the regressor are defined at Sections 2.3 and 2.4, and Adam with lr = 0.01 is used to train the whole system (with or without locking the encoder). We applied early stopping in both cases, that is, the training was stopped after 150 epochs if there was no improvement in validation loss, in order to retain the best model.
(Section 2.5 “Full Model: Architecture and Training;” emphasis added).

Finally, we define the following models to be compared in our work:
PCA.  A model composed of a decomposition of the dataset using Principal Component Analysis and a CORAL regression (Fig. 1b) on the component scores for each subject. Note that only the training subset is used to create the PCA model and project the test set.
Pretraining.  The proposed model in which the autoencoder (Fig. 1.a) is first trained, and then the model is built with the neural regressor (Fig. 1.b) and the pre-trained encoder (Fig. 1.c). The encoder is locked and only the neural regressor is trained.
Retraining.  The AE is pre-trained as in the previous model (Fig. 1.a), but this time, the full model (Fig. 1.c) including the encoder and the neural regressor are trained simultaneously.
(Section 2.6 “Evaluation;” emphasis added).

In such embodiments, the encoder is considered to be frozen for downstream tasks, i.e. for the downstream, ordinal neural regressor.);
training, by a hardware processor, K-1 binary classifiers on top of the compact neural representations to obtain trained K-1 binary classifiers (As noted above, Murcia teaches that the encoder can be pretrained and then locked while training the ordinal neural regressor.  Murcia further discloses that the ordinal neural regressor provides a series of K-1 binary classifiers:
Many ordinal regression methods have been proposed. The most widespread consists on dividing the grading problem in a series of binary classifiers, each of which indicates if a certain threshold has been surpassed [9,13]. However, most of these systems deal with inconsistency in the classifier when the training complexity increases. That is, some binary classifiers may indicate the grade is above a given threshold, whereas others may not. In [2], the authors propose a Consistent Rank Logits (CORAL) ordinal regression to implement the binary classifiers with parameter sharing in the weights of the last layers, but with individual biases in each neuron, accomplishing theoretical classifier consistence.
(Section 1 “Introduction;” emphasis added).

To perform ordinal regression, we use the Consistent Rank Logits (CORAL) approach proposed in [2]. CORAL is devised to create an ordinal regression framework with theoretical guarantees for classifier consistency, in contrast to other methods in the literature [13]. The procedure consists of two major contributions. First, a label extension, by which the rank level                         
                            
                                    y
                                
                                    i
                                
                     is extended into K − 1 binary labels                         
                            {
                            
                                    t
                                
                                    i
                                    ,
                                    0
                                
                            ,
                            …
                            
                                    t
                                
                                    i
                                    ,
                                    K
                                    -
                                    1
                                
                            }
                        
                     such that                         
                            
                                    t
                                
                                    i
                                    ,
                                    j
                                
                            ∈
                            {
                            0,1
                            }
                        
                     indicates whether                         
                            
                                    y
                                
                                    i
                                
                     exceeds a given rank                         
                            (
                            
                                    y
                                
                                    i
                                
                            >
                             
                                    r
                                
                                    k
                                
                            )
                        
                     as in [13]. This is implemented at the output layer of the regression network, via a layer with K − 1 binary neuron classifiers sharing the same weight parameter but independent bias units, which according to [2] solves the inconsistency problem among predicted binary responses. The predicted rank is obtained as:

                            r
                        
                            i
                        
                    =
                     
                            ∑
                            
                                j
                                =
                                0
                            
                                K
                                -
                                2
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                    (
                    1
                    )
                
where                         
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                     is the output (linear activation) of the jth neuron for the ith subject,
also known as logit.
The second key aspect of the CORAL regression is the loss function. To calculate the loss between                         
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                     and the target level                         
                            
                                    t
                                
                                    i
                                    ,
                                    j
                                
                    , the authors propose:

                    L
                    
                            o
                            ,
                            1
                        
                    =
                     
                            ∑
                            
                                n
                            
                                    ∑
                                    
                                        j
                                    
                                            t
                                        
                                            n
                                            ,
                                            j
                                        
                            log
                        
                        ⁡
                        
                                    s
                                    
                                                    o
                                                
                                                    n
                                                    ,
                                                    j
                                                
                    +
                    
                            1
                            -
                            
                                    t
                                
                                    n
                                    ,
                                    j
                                
                                    log
                                
                                ⁡
                                
                                            s
                                            
                                                            o
                                                        
                                                            n
                                                            ,
                                                            j
                                                        
                            -
                            
                                    o
                                
                                    n
                                    ,
                                    j
                                
                    (
                    2
                    )
                
where                         
                            s
                            (
                            ∙
                            )
                        
                     is the sigmoid function. An optional feature importance variable could multiply the second term to adjust for label prevalence, although adding it did not increased the performance significantly. Furthermore, since it also implied making assumptions about the real distribution of subjects, we chose not to use this importance term.
(Section 2.4 “Ordinal Neural Regression;” emphasis added).

K-1 binary classifiers are thus trained, necessarily with a hardware processor, on top of the frozen pretrained encoder, i.e. on top of the compact neural representations provided thereby, to obtain trained K-1 binary classifiers.); and 
generating, by the hardware processor, a predicted ordinal label by aggregating the trained K-1 binary classifiers (Murcia discloses that a predicted rank is obtained by summing the outputs of the K-1 binary neuron classifiers:

To perform ordinal regression, we use the Consistent Rank Logits (CORAL) approach proposed in [2]. CORAL is devised to create an ordinal regression framework with theoretical guarantees for classifier consistency, in contrast to other methods in the literature [13]. The procedure consists of two major contributions. First, a label extension, by which the rank level                         
                            
                                    y
                                
                                    i
                                
                     is extended into K − 1 binary labels                         
                            {
                            
                                    t
                                
                                    i
                                    ,
                                    0
                                
                            ,
                            …
                            
                                    t
                                
                                    i
                                    ,
                                    K
                                    -
                                    1
                                
                            }
                        
                     such that                         
                            
                                    t
                                
                                    i
                                    ,
                                    j
                                
                            ∈
                            {
                            0,1
                            }
                        
                     indicates whether                         
                            
                                    y
                                
                                    i
                                
                     exceeds a given rank                         
                            (
                            
                                    y
                                
                                    i
                                
                            >
                             
                                    r
                                
                                    k
                                
                            )
                        
                     as in [13]. This is implemented at the output layer of the regression network, via a layer with K − 1 binary neuron classifiers sharing the same weight parameter but independent bias units, which according to [2] solves the inconsistency problem among predicted binary responses. The predicted rank is obtained as:

                            r
                        
                            i
                        
                    =
                     
                            ∑
                            
                                j
                                =
                                0
                            
                                K
                                -
                                2
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                    (
                    1
                    )
                
where                         
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                     is the output (linear activation) of the jth neuron for the ith subject,
also known as logit.
(Section 2.4 “Ordinal Neural Regression; emphasis added”).

Accordingly, the predicted ordinal label, i.e. rank, is generated by aggregating, necessarily with a hardware processor, the output of the trained K-1 ordinal classifiers.).
Murcia thus teaches a computer-implemented method similar to that of claim 1, which is for ordinal classification of input data.  These teachings are necessarily implemented via computer program instructions stored on and executed by a computer comprising memory for storing the computer program instructions and at least one processor for executing the instructions.  The memory of such a computer comprising instructions to perform the above-described tasks taught by Murcia is considered a computer program product similar to that of claim 11.  Such a computer comprising a memory and a processor for performing the above-described tasks taught by Murcia is considered a computer processing system similar to that of claim 20.  Murcia, however, does not explicitly teach that input data of a same class are positioned apart below a first threshold, and that input data of different class are positioned apart above the first threshold, as is required by each of claims 1, 11 and 20.
Wang nevertheless teaches learning, by an encoder neural network, compact representations (i.e. feature vectors) of input data, including by optimizing the encoder neural network such that input data belonging to a same class are positioned apart below a first threshold (i.e. below an intra-class distance threshold), and input data of a different class are positioned apart above the first threshold (i.e. above twice the intra-class distance threshold) (see e.g. paragraphs 0007, 0028 and 0031-0036).
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia and Wang before the effective filing date of the claimed invention, to modify the method, computer program product and computer processing system taught by Murcia such that the compact representations are learned by optimizing the encoder neural network so that input data of a same class are positioned apart below a first threshold, and input data of a different class are positioned apart above the first threshold, as is taught by Wang.  It would have been advantageous to one of ordinary skill to utilize such a combination because it can improve the classification functionality provided by a neural network, as is suggested by Wang (see e.g. paragraphs 0003 and 0028).  Accordingly, Murcia and Wang are considered to teach, to one of ordinary skill in the art, a computer-implemented method like that of claim 1, a computer program product like that of claim 11, and a computer processing system like that of claim 20.
As per claims 6 and 16, it would have been obvious, as is described above, to modify the method and computer program product taught by Murcia such that the compact representations are learned by optimizing the encoder neural network so that input data of a same class are positioned apart below a first threshold, and input data of a different class are positioned apart above the first threshold, as is taught by Wang.  Wang particularly teaches that the learning comprises optimizing the encoder neural network such that (a) input data belonging to a same class is close in an encoded space by a same class threshold amount, and (b) input data belonging to a different class is far in the encoded space by a different class threshold amount (see e.g. paragraphs 0007, 0028 and 0031-0035).  Accordingly, the above-described combination of Murcia and Wang further teaches a computer-implemented method like that of claim 6 and a computer program product like that of claim 16.
As per claims claims 7 and 17, it would have been obvious, as is described above, to modify the method and computer program product taught by Murcia such that the compact representations are learned by optimizing the encoder neural network so that input data of a same class are positioned apart below a first threshold (i.e. within a same class threshold amount), and input data of a different class are positioned apart above the first threshold (i.e. above a different class threshold amount), as is taught by Wang.  Wang particularly teaches that the different class threshold amount is greater than twice the same class threshold amount (see e.g. paragraph 0007), and so it is apparent that input data belonging to different classes would not overlap in the encoded space.  Accordingly, the above-described combination of Murcia and Wang further teaches a computer-implemented method like that of claim 7 and a computer program product like that of claim 17.
	As per claims 9 and 19, Murcia teaches that the training step trains the K-1 binary classifiers such that a kth binary classifier is given by zk and is defined as:
                
                            z
                        
                            k
                        
                            f
                            
                                            x
                                        
                                            i
                                        
                    =
                     
                                            1
                                            ,
                                             
                                            i
                                            f
                                             
                                                    y
                                                
                                                    i
                                                
                                            >
                                            k
                                        
                                            0
                                             
            ,
where                         
                            
                                    x
                                
                                    i
                                
                     denotes the ith input,                         
                            
                                    y
                                
                                    i
                                
                     denotes the ordinal label for                         
                            
                                    x
                                
                                    i
                                
                    , and k denotes the number of the classifier being considered (Murcia teaches that the K-1 binary classifiers are trained such that a jth binary classifier for an ith input is given by                         
                            
                                    t
                                
                                    i
                                    ,
                                    j
                                
                     and is defined as:
                
                            t
                        
                            i
                            ,
                            j
                        
                    =
                     
                                            1
                                            ,
                                             
                                            i
                                            f
                                             
                                                    y
                                                
                                                    i
                                                
                                            >
                                             
                                                    r
                                                
                                                    k
                                                
                                            0
                                             
Where                         
                            
                                    y
                                
                                    i
                                
                     denotes the ordinal label, i.e. a rank level, for the ith input, and                         
                            
                                    r
                                
                                    k
                                
                     denotes the number, i.e. a rank, associated with the classifier being considered:
To perform ordinal regression, we use the Consistent Rank Logits (CORAL) approach proposed in [2]. CORAL is devised to create an ordinal regression framework with theoretical guarantees for classifier consistency, in contrast to other methods in the literature [13]. The procedure consists of two major contributions. First, a label extension, by which the rank level                         
                            
                                    y
                                
                                    i
                                
                     is extended into K − 1 binary labels                         
                            {
                            
                                    t
                                
                                    i
                                    ,
                                    0
                                
                            ,
                            …
                            
                                    t
                                
                                    i
                                    ,
                                    K
                                    -
                                    1
                                
                            }
                        
                     such that                         
                            
                                    t
                                
                                    i
                                    ,
                                    j
                                
                            ∈
                            {
                            0,1
                            }
                        
                     indicates whether                         
                            
                                    y
                                
                                    i
                                
                     exceeds a given rank                         
                            (
                            
                                    y
                                
                                    i
                                
                            >
                             
                                    r
                                
                                    k
                                
                            )
                        
                     as in [13]. This is implemented at the output layer of the regression network, via a layer with K − 1 binary neuron classifiers sharing the same weight parameter but independent bias units, which according to [2] solves the inconsistency problem among predicted binary responses. The predicted rank is obtained as:

                            r
                        
                            i
                        
                    =
                     
                            ∑
                            
                                j
                                =
                                0
                            
                                K
                                -
                                2
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                    (
                    1
                    )
                
where                         
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                     is the output (linear activation) of the jth neuron for the ith subject,
also known as logit.
The second key aspect of the CORAL regression is the loss function. To calculate the loss between                         
                            
                                    o
                                
                                    i
                                    ,
                                    j
                                
                     and the target level                         
                            
                                    t
                                
                                    i
                                    ,
                                    j
                                
                    , the authors propose:

                    L
                    
                            o
                            ,
                            1
                        
                    =
                     
                            ∑
                            
                                n
                            
                                    ∑
                                    
                                        j
                                    
                                            t
                                        
                                            n
                                            ,
                                            j
                                        
                            log
                        
                        ⁡
                        
                                    s
                                    
                                                    o
                                                
                                                    n
                                                    ,
                                                    j
                                                
                    +
                    
                            1
                            -
                            
                                    t
                                
                                    n
                                    ,
                                    j
                                
                                    log
                                
                                ⁡
                                
                                            s
                                            
                                                            o
                                                        
                                                            n
                                                            ,
                                                            j
                                                        
                            -
                            
                                    o
                                
                                    n
                                    ,
                                    j
                                
                    (
                    2
                    )
                
where                         
                            s
                            (
                            ∙
                            )
                        
                     is the sigmoid function. An optional feature importance variable could multiply the second term to adjust for label prevalence, although adding it did not increased the performance significantly. Furthermore, since it also implied making assumptions about the real distribution of subjects, we chose not to use this importance term.
(Section 2.4 “Ordinal Neural Regression;” emphasis added).
Murcia is thus considered to teach an equation like that of claims 9 and 19).  Accordingly, the above-described combination of Murcia and Wang further teaches a computer-implemented method like that of claim 9 and a computer program product like that of claim 19.

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Murcia and Wang, which is described above, and also over the article entitled “Vector Embeddings with Subvector Permutation Invariance Using a Triplet Enhanced Autoencoder” by Mark Alan Matties (“Matties”).
Regarding claims 2 and 12, Murcia and Wang teach a computer-implemented method like that of claim 1 and a computer program product like that of claim 11, as is described above, which entail learning compact representations of input data by an encoder neural network.  Murcia and Wang, however, do not explicitly disclose that the learning step uses a triplet loss, as is required by claims 2 and 12.
Matties nevertheless generally teaches learning compact representations (i.e. embedding representations) of input data by an encoder neural network, and particularly wherein the learning of the compact representations uses a triplet loss (see e.g. section 2 “Background and Related Work”). 
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia, Wang and Matties before the effective filing date of the claimed invention, to modify the method and computer program product taught by Murcia and Wang such that the learning of the compact neural representations of the input data uses a triplet loss like taught by Matties.  It would have been advantageous to one of ordinary skill to utilize such a triplet loss because it can reduce the distance between two inputs that are similar, and increase the distance between two samples that are dissimilar, as is taught by Matties (see e.g. section 2.2 “Triplet Loss”).  Accordingly, Murcia, Wang and Matties are considered to teach, to one of ordinary skill in the art, a computer-implemented method like that of claim 2 and a computer program product like that of claim 12.

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Murcia and Wang, which is described above, and also over the article entitled “Automatic Features Extraction Using Autoencoder in Intrusion Detection System” by Kunang et al. (“Kunang”).
Regarding claims 3 and 13, Murcia and Wang teach a computer-implemented method like that of claim 1 and a computer program product like that of claim 11, as is described above, which entail learning compact representations of input data by an encoder neural network.  Murcia and Wang, however, do not explicitly disclose that the learning step uses a cross-entropy loss, as is required by claims 3 and 13.
Kunang nevertheless generally teaches learning compact representations (i.e. performing feature extraction) of input data by an encoder neural network, and particularly wherein the learning of the compact representations uses a cross-entropy loss (see e.g. the Abstract and section III.A “Autoencoder”). 
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia, Wang and Kunang before the effective filing date of the claimed invention, to modify the method and computer program product taught by Murcia and Wang such that the learning of the compact neural representations of the input data uses a cross-entropy loss like taught by Kunang.  It would have been advantageous to one of ordinary skill to utilize such a cross-entropy loss because it can result in more accurate classifier performance, as is taught by Kunang (see e.g. the Abstract and section IV.A “Model Hyperparameter”).  Accordingly, Murcia, Wang and Kunang are considered to teach, to one of ordinary skill in the art, a computer-implemented method like that of claim 3 and a computer program product like that of claim 13.

Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Murcia and Wang, which is described above, and also over the article entitled “Dimensionality Reduction by Learning an Invariant Mapping” by Hadsell et al. (“Hadsell”).
Regarding claims 4 and 14, Murcia and Wang teach a computer-implemented method like that of claim 1 and a computer program product like that of claim 11, as is described above, which entail learning compact representations of input data by an encoder neural network.  Murcia and Wang, however, do not explicitly disclose that the learning step uses a contrastive loss, as is required by claims 4 and 14.
Hadsell nevertheless generally teaches learning compact representations of input data by an encoder (i.e. by a function that maps a high dimensional input pattern to a low-dimensional output), and particularly wherein the learning of the compact representations uses a contrastive loss (see e.g. section 2 “Learning the Low Dimensional Mapping”).
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia, Wang and Hadsell before the effective filing date of the claimed invention, to modify the method and computer program product taught by Murcia and Wang such that the learning of the compact neural representations of the input data uses a contrastive loss like taught by Hadsell.  It would have been advantageous to one of ordinary skill to utilize such a contrastive loss because it would provide for a mapping of similar inputs to nearby points in a latent space, as is taught by Hadsell (see e.g. section 2 “Learning the Low Dimensional Mapping”).  Accordingly, Murcia, Wang and Hadsell are considered to teach, to one of ordinary skill in the art, a computer-implemented method like that of claim 4 and a computer program product like that of claim 14.
Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Murcia and Wang, which is described above, and also over the article entitled, “Discriminative Auto-Encoding for Classification and Representation Learning Problems” by Bansal et al. (“Bansal”).
Regarding claims 5 and 15, Murcia and Wang teach a computer-implemented method like that of claim 1 and a computer program product like that of claim 11, as is described above, which entail training K-1 ordinal classifiers on top of compact neural representations to obtain trained K-1 ordinal classifiers.  Murcia and Wang, however, do not explicitly disclose that the training step comprises discarding a last classification layer of each nominal classifier responsive to the compact neural representations having at least some overlap, as is required by claims 5 and 15.
Bansal nevertheless teaches adding a classifier (i.e. a discriminator) to train an encoder, whereby the encoder learns discriminative features of the input data: the compact neural (i.e. latent space) representations produced by the encoder for the input data for different classes thereby become non-overlapping (see e.g. the abstract, section II “Approach” and FIGS. 1 and 2).
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia, Wang and Bansal before the effective filing date of the claimed invention, to modify the method and computer program product taught by Murcia and Wang such that, if the compact neural representations produced by the encoder are not discriminative (i.e. have at least some overlap), a classifier is added to train the encoder like taught by Bansal.  It would have been advantageous to one of ordinary skill to utilize such a combination because it can improve predictive tasks, as is taught by Bansal (see e.g. the abstract).  Murcia teaches that, once the encoder is trained, the discriminator used to train the encoder is discarded (see e.g. section 2.3 “Denoising Autoencoder,” which states “we propose a hybrid model that trains the autoencoder and then reuses the encoder part to perform dimensionality reduction”).  It thus follows that the classifier used to train the encoder would likewise be discarded.  Murcia, Wang and Bansal are therefore considered to teach using and then discarding a classifier (including a last classification layer) responsive to the compact neural representations produced by the encoder having at least some overlap.  The classifier can be considered a nominal classifier.  Accordingly, Murcia, Wang and Bansal are considered to teach, to one of ordinary skill in the art, a computer-implemented method like that of claim 5 and a computer program product like that of claim 15.

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Murcia and Wang, which is described above, and also over the article entitled “Data-driven Prognostics with Predictive Uncertainty Estimation using Ensemble of Deep Ordinal Regression Models” by TV et al. (“TV”).
Regarding claim 8, Murcia and Wang teach a computer-implemented method like that of claim 1, as is described above, which entails learning, by an encoder neural network, compact neural representations of input data.  Murcia and Wang, however, do not teach that the given input is a time series and that the encoder neural network comprises at least one Long Short-Term Memory (LSTM), as is required by claim 8.
TV generally teaches applying ordinal regression to provide Remaining Useful Life (RUL) estimates from multi-sensor time series data (see e.g. the abstract).  TV suggests that, to generate such estimates, compact representations of the time-series input data are learned by an encoder neural network, wherein the encoder neural network comprises at least one LSTM to generate the compact neural representations (see e.g. the abstract, section 3 “Background: Deep LSTM Networks, section 4.2 “LSTM-based Ordinal Regression” and FIG. 1(b)).
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia, Wang and TV before the effective filing date of the claimed invention, to apply the method taught by Murcia and Wang to time series input data like taught by TV, wherein the encoder neural network comprises at least one LSTM like taught by TV to generate the compact neural representations.  It would have been advantageous to one of ordinary skill to utilize such a combination because the resulting model would could be used for additional applications, such as the remaining useful life estimation taught by TV (see e.g. the abstract).  Accordingly, Murcia, Wang and TV are considered to teach, to one of ordinary skill in the art, a computer-implemented method like that of claim 8.
Regarding claim 18, Murcia and Wang teach a computer program product like that of claim 11, as is described above, which entails learning, by an encoder neural network, compact neural representations of input data.  Murcia and Wang, however, do not teach that the encoder neural network comprises at least one Long Short-Term Memory (LSTM), as is required by claim 18.
Like noted above, TV generally teaches applying ordinal regression to provide Remaining Useful Life (RUL) estimates from multi-sensor time series data (see e.g. the abstract).  TV suggests that, to generate such estimates, compact representations of the time-series input data are learned by an encoder neural network, wherein the encoder neural network comprises at least one LSTM to generate the compact neural representations (see e.g. the abstract, section 3 “Background: Deep LSTM Networks, section 4.2 “LSTM-based Ordinal Regression” and FIG. 1(b)).
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia, Wang and TV before the effective filing date of the claimed invention, to apply the computer program product taught by Murcia and Wang to time series input data like taught by TV, wherein the encoder comprises at least one LSTM like taught by TV to generate the compact neural representations.  It would have been advantageous to one of ordinary skill to utilize such a combination because the resulting model would could be used for additional applications, such as the remaining useful life estimation taught by TV (see e.g. the abstract).  Accordingly, Murcia, Wang and TV are considered to teach, to one of ordinary skill in the art, a computer program product like that of claim 18.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Murcia and Wang, which is described above, and also over the article entitled “A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification” by Peikari et al. (“Peikari”).
Regarding claim 10, Murcia and Wang teach a computer-implemented method like that of claim 1, as is described above, which entails learning compact neural representations of input data, and generating a predicted ordinal label by aggregating trained K-1 ordinal classifiers.  Murcia and Wang, however, do not explicitly teach performing a semi-supervised ordinal classification task by clustering unlabeled data to at least some of the compact representations, as is required by claim 10.
Peikari nevertheless teaches performing a semi-supervised classification task by clustering unlabeled data to at least some compact representations associated with labeled data (see e.g. “Proposed Method” on pages 3 and 4).
It would have been obvious to one of ordinary skill in the art, having the teachings of Murcia, Wang and Peikari before the effective filing date of the claimed invention, to modify the computer-implemented method taught by Murcia and Wang so as to perform a semi-supervised classification task (i.e. an ordinal classification task) by clustering unlabeled data to at least some of the compact representations like taught by Peikari.  It would have been advantageous to one of ordinary skill to utilize such a combination because it would enable learning from fewer labeled data points, as is taught by Peikari (see e.g. page 1).  Accordingly, Murcia, Wang and Peikari are considered to teach, to one of ordinary skill in the art, a computer implemented method like that of claim 10.

Response to Arguments
The Examiner acknowledges the Applicant’s amendments to claims 1-8, 11-18 and 20.  In response to these amendments, the objections presented in the previous Office Action to claims 1-20 are respectfully withdrawn, as are the 35 U.S.C. § 112(a) rejections presented in the previous Office Action to claims 2, 4, 5, 12, 14 and 15.   The Examiner respectfully notes, however, that the Applicant’s amendments have necessitated the new 35 U.S.C. § 112(b) or 35 U.S.C. § 112 (pre-AIA ), second paragraph, rejections presented above with respect to claims 5 and 15.
Regarding the prior art rejections, the Applicant generally argues that the references cited in the previous Office Action fail to teach or suggest at least “learning, by an encoder neural network, compact neural representations of the input data” and “wherein input data of a same class are positioned apart below a first threshold, and input data of a different class positioned apart above the first threshold,” as is now recited in each of independent claims 1, 11 and 20.
The Examiner, however, respectfully disagrees and submits that the combination of Murcia and Wang provides such a teaching.  In particular, Murcia teaches “learning, by an encoder neural network, compact neural representations of input data” (see e.g. Section 2.3 “Denoising Autoencoder,” which recites “[i]n this work, we propose a hybrid model that trains the autoencoder and then reuses the encoder part to perform dimensionality reduction.”).  As noted above, Murcia does not explicitly teach that “input data of a same class are positioned apart below a first threshold, and input data of a different class positioned apart above the first threshold.”
Nevertheless, like further noted above, Wang provides such a teaching.  In particular, Wang teaches learning, by an encoder neural network, compact representations (i.e. feature vectors) of input data, including by optimizing the encoder neural network such that input data belonging to a same class are positioned apart below a first threshold (i.e. below an intra-class distance threshold), and input data of a different class are positioned apart above the first threshold (i.e. above twice the intra-class distance threshold) (see e.g. paragraphs 0007, 0028 and 0031-0036).
Like noted above, it would have been obvious to one of ordinary skill in the art, having the teachings of Murcia and Wang before the effective filing date of the claimed invention, to modify the method, computer program product and computer processing system taught by Murcia such that the compact representations are learned by optimizing the encoder neural network so that input data of a same class are positioned apart below a first threshold, and input data of a different class are positioned apart above the first threshold, as is taught by Wang.  It would have been advantageous to one of ordinary skill to utilize such a combination because it can improve the classification functionality provided by a neural network, as is suggested by Wang (see e.g. paragraphs 0003 and 0028).  Accordingly, the Examiner respectfully maintains that Murcia and Wang teach “learning, by an encoder neural network, compact neural representations of the input data” and “wherein input data of a same class are positioned apart below a first threshold, and input data of a different class positioned apart above the first threshold,” as is now claimed.
The Applicant’s arguments regarding the prior art rejections have thus been fully considered, but are not persuasive.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BLAINE T BASOM whose telephone number is (571)272-4044. The examiner can normally be reached Monday-Friday, 9:00 am - 5:30 pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Ell can be reached at (571)270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BTB/
5/9/2026

/MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141
Read full office action
Prosecution Timeline

Aug 26, 2022
Application Filed
Oct 01, 2025
Non-Final Rejection mailed — §103, §112
Dec 18, 2025
Interview Requested
Jan 02, 2026
Response Filed
May 14, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/644,425
Patent 12632794
METHOD AND SYSTEM FOR CROSS-CHAIN CONSENSUS ORIENTED TO FEDERATED LEARNING
4y 5m to grant Granted May 19, 2026
17/806,556
Patent 12608647
MULTIMODAL DATA INFERENCE
3y 10m to grant Granted Apr 21, 2026
17/334,697
Patent 12566981
METHOD AND SYSTEM FOR EVENT PREDICTION BASED ON TIME-DOMAIN BOOTSTRAPPED MODELS
4y 9m to grant Granted Mar 03, 2026
16/817,836
Patent 12487727
Sensory Adjustment Mechanism
5y 8m to grant Granted Dec 02, 2025
17/649,045
Patent 12443420
Automatic Image Conversion
3y 8m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
43%
Grant Probability
66%
With Interview (+22.7%)
4y 6m (~9m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 326 resolved cases by this examiner. Grant probability derived from career allowance rate.
ORDINAL CLASSIFICATION THROUGH NETWORK DECOMPOSITION

This examiner grants 43% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

ORDINAL CLASSIFICATION THROUGH NETWORK DECOMPOSITION

This examiner grants 43% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email