Office Action Analysis: 18155228 — MACHINE LEARNING TRAINING APPROACH FOR A MULTITASK PREDICTIVE DOMAIN

Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Claim Rejections - 35 USC § 101

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.

Claim 1
Step 1: The claim recites a method; therefore, it is directed to the statutory category of
processes.
Step2A Prong 1: The claim recites, inter alia:
[G]enerating… a sharing loss value for the first machine learning model and the second machine learning model on a measured dissimilarity between the first machine learning model and the second machine learning model: This limitation recites a mathematical concept because it involves using math functions to calculate the dissimilarity between the model values.
[G]enerating… and using a loss function and a training dataset, a prediction loss value for the first machine learning model: This limitation is a mathematical concept since it involves using a previously calculated loss to apply it to create the prediction loss value using mathematical operations. 
[G]enerating…an aggregated loss value for the first machine learning model based on the code similarity value, the sharing loss value, and the prediction loss value: This limitation recites a mathematical concept because it involves using mathematical operations to generate the aggregated loss value. See Paragraph 37 which states, “…the aggregated loss value may be the weighted sum of prediction loss matrix (e.g., the predictive performance of the machine learning models) and the sharing-similarity loss matrix (e.g., the sharing loss matrix, D, scaled through multiplication by similarity matrix, S).”
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
[R]eceiving… a similarity matrix corresponding to a plurality of machine learning models, wherein: a first machine learning model of the plurality of machine learning model… a second machine learning model of the plurality of machine learning models… and the similarity matrix comprises a code similarity value corresponding to the first machine learning model and the second machine learning model that is based on a textual similarity between the first code and the second code: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
 …is trained to generate a first class-specific predictive output based on a first code… is trained to generate a second class-specific predictive output based on a second code that is different from the first code: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
…by one or more processors…and updating… the first machine learning model based on the aggregated loss value to generate an updated machine learning model with improved accuracy relative to the first machine learning model: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
[R]eceiving… a similarity matrix corresponding to a plurality of machine learning models, wherein: a first machine learning model of the plurality of machine learning model… a second machine learning model of the plurality of machine learning models… and the similarity matrix comprises a code similarity value corresponding to the first machine learning model and the second machine learning model that is based on a textual similarity between the first code and the second code: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
 …is trained to generate a first class-specific predictive output based on a first code… is trained to generate a second class-specific predictive output based on a second code that is different from the first code: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
…by one or more processors… and updating… the first machine learning model based on the aggregated loss value to generate an updated machine learning model with improved accuracy relative to the first machine learning model: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).

The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts for calculating similarity values, generating loss values, and aggregating those values for model updating). The claim merely describes a process of applying known mathematical operations (computing dissimilarity measures, generating sharing and prediction loss values, and calculating a weighted aggregated loss value) to data associated with machine learning models, and performing standard data processing steps (receiving a similarity matrix, training models and their produced outputs, and updating the model based on the calculated loss). The recitation of one or more processors and machine learning models to generate class-specific predictive outputs merely indicates a technological environment in which the abstract ideas are applied, without improving the functioning of a computer or machine learning model itself.
Therefore, the claim as a whole remains focused on the abstract idea and fails Step 2B of the eligibility
analysis.

Claim 2
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia: 
Updating… the similarity matrix based on the aggregated loss value: This limitation recites a mathematical concept because it involves manipulating numerical values within a matrix using a computed loss value. 
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 3
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
The first code is a first medical code associated with a first textual description, and the second code is a second medical code associated with a second textual description: This limitation is a mental process which involves associating features to medical codes, which can be performed in the human mind.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 4
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the measured dissimilarity comprises at least one of: (i) a distance between one or more coefficients of the first machine learning model and the second machine learning model or (ii) an output difference between one or more outputs of the first machine learning model and the second machine learning model: This limitation recites a mathematical concept because it involves using math functions to calculate the dissimilarity between the output values of the models.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 5
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia: 
the sharing loss value is represented by a sharing loss matrix comprising a respective sharing loss value…: This limitation recites a mathematical concept because it represents sharing loss values in a matrix.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
…for each pair of machine learning models of the plurality of machine learning models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
…for each pair of machine learning models of the plurality of machine learning models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 6
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
generating the aggregated loss value comprises: generating… a sharing-similarity loss matrix based on the sharing loss matrix and the similarity matrix, wherein (a) the sharing-similarity loss matrix comprises a sharing-similarity loss value for the first machine learning model and the second machine learning model and (b) the sharing-similarity loss value comprises the sharing loss value scaled by the code similarity value: This recites a mathematical concept because it incorporates using a math formula to calculate the sharing-similarity loss value to get the aggregative loss.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 7
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the aggregated loss value for the particular first machine learning model is representative of a joint loss for each of the plurality of machine learning models, and wherein the aggregated loss value comprises a weighted sum of (a) a prediction loss matrix comprising a respective prediction loss value for each machine learning model of the plurality of machine learning models and (b) the sharing-similarity loss matrix: This limitation recites a mathematical concept because it involves adding the other losses in order to attain the aggregated loss value.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 8
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the prediction loss value for the first machine learning model and the second machine learning model is represented by a prediction loss matrix comprising a respective prediction loss value for each machine learning model of the plurality of machine learning models: This limitation is a mathematical concept because it involves mathematical representations of the models using a matrix.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 9
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the plurality of… models are represented by a model matrix: This limitation is a mathematical concept since it deals with representing models using matrices.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
and wherein updating the first machine learning model comprises: updating… the model matrix for the plurality of machine learning models to optimize the aggregated loss value:   Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
and wherein updating the first machine learning model comprises: updating… the model matrix for the plurality of machine learning models to optimize the aggregated loss value: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply
an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 10
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia: 
the model matrix is indicative of a set of coefficients…: This limitation is a mathematical concept because it represents model parameters as a matrix of coefficients.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
…for each of the plurality of machine learning models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
…for each of the plurality of machine learning models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 11
Step 1: The claim recites a system; therefore, it is directed to the statutory category of machine.
Step2A Prong 1: The claim recites, inter alia:
[G]enerating… a sharing loss value for the first machine learning model and the second machine learning model on a measured dissimilarity between the first machine learning model and the second machine learning model: This limitation recites a mathematical concept because it involves using math functions to calculate the dissimilarity between the model values.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
A system comprising: one or more processors; and one or more memories storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
A system comprising: one or more processors; and one or more memories storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply
an exception and therefore do not provide an inventive concept. The claim is ineligible.
The remainder of claim 11 recites identical limitations to claim 1. Therefore, claim 11 is rejected using the same rationale as claim 1.

Claims 12-14 recite identical limitations to claims 2-4. Therefore, claims 12-14 are rejected using the same rationale as claims 2-4.

Claim 15
Step 1: The claim recites a non-transitory computer medium; therefore, it is directed to the statutory
category of manufacture.
Step2A Prong 1: The claim recites, inter alia:
[G]enerating… a sharing loss value for the first machine learning model and the second machine learning model on a measured dissimilarity between the first machine learning model and the second machine learning model: This limitation recites a mathematical concept because it involves using math functions to calculate the dissimilarity between the model values.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
One or more non-transitory computer-readable media storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
One or more non-transitory computer-readable media storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The remainder of claim 15 recites identical limitations to claim 1. Therefore, claim 15 is rejected using the same rationale as claim 1.

Claims 16-20 recite identical limitations to claims 5-9. Therefore, claims 16-20 are rejected using the same rationale as claims 5-9.
Claim Rejections - 35 USC § 103
Claims 1-2, 4-6, 8-12, 14-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Duan (US 20220121934 A1) in view of Sun (CN 1135553582 A), Li (US 20210233239 A1), Malakouti (“Not All Samples Are Equal: Class Dependent Hierarchical Multi-Task Learning for Patient Diagnosis Classification”, 2020), and Wu (CN 111680136 A).

Regarding claim 1,
Duan teaches [a] computer-implemented method comprising: receiving, by one or more processors, a similarity matrix corresponding to a plurality of machine learning models, wherein… (Paragraph 0003, “This specification describes a method and corresponding system for automatically, that is without supervision, identifying a computer-implemented neural network which is able to generate a disentangled latent variable representation of an input data item”, Paragraph 0004, “In implementations determining the measure of similarity between the sets of latent representations of the trained neural networks is performed in parallel between pairs or groups of the trained neural networks that is the determining is performed as a set of parallel tasks, optionally on a distributed computing system”, See also Para 0019, “That is the latent representation of the (each) first neural network may be compared with the latent representation of each second neural network. The comparison may employ a similarity matrix in which each entry is a pairwise comparison of latent representations.”, Paragraph 95 of Duan, “code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.”
Duan teaches generating a similarity matrix containing pairwise similarity values between latent representations of different neural networks where each entry represents a similarity value corresponding to the models, which corresponds to the code similarity value.)
Duan does not teach generating, by the one or more processors, a sharing loss value for the at least two machine learning models, wherein the sharing loss value is based at least in part on a measured dissimilarity between the at least two machine learning models.
Sun, in the same field of endeavor, teaches generating… a sharing loss value for the first machine learning model and the second machine learning model based on a measured dissimilarity between the first machine learning model and the second machine learning model (Page 4 of Sun, “…using the improved hamming distance algorithm, judging the similarity of different mathematical matrix corresponding to different local model updating respectively as the gradient descending direction similarity.”
Sun teaches generating a sharing loss value based on a measured dissimilarity between machine learning models by applying a Hamming-distance metric to compare the model update matrices of different models.)
generating… using a loss function and a training dataset, a prediction loss value for the first machine learning model (Page 5 of Sun, “wherein dk is the local data set of the k-th client, fi (w) = α (xi, yi, w) is a loss function…the loss function is inversely proportional to the model precision; therefore, the
optimization of the target function of the machine learning generally is to make the loss function reach
the minimum value.”
Sun teaches generating a prediction loss value using a loss function computed on a training dataset where the loss function measures the difference between the predicted outputs and training data and is minimized to improve the model accuracy.);
Duan and Sun do not teach generating, by the one or more processors, an aggregated loss value for the particular machine learning model based at least in part on the similarity [value], the … loss value, and the prediction loss value; and updating, by the one or more processors, the particular machine learning model based on the aggregated loss value.
Li, in the same field of endeavor, teaches generating… an aggregated loss value for the first machine learning model based on the… similarity value, the… loss value, and the… loss value; and updating… the particular machine learning model based on the aggregated loss value (Paragraph 20 of Li, “The style similarity estimator 110, the clinical quality estimator 112, and the image content regularizer 120, determine a similarity loss 114, a clinical quality loss 116, and a content loss 122, respectively, which are aggregated to form a cumulative loss 118. The parameters of style transfer network 106 are then iteratively updated based on the cumulative loss 118 during the training process”, Paragraph 29 of Li, “Image content regularizer may comprise one or more differentiable functions, which take medical images 102 and style transferred medical images 108 as input, and produce content loss 122 as output.”
Li teaches aggregating multiple loss components including a similarity loss and other loss values into a cumulative loss to update the parameters of a machine learning model during training. Li’s content loss corresponds to the prediction loss value because it measures the error between the model output and the desired output characteristics during training.)
 Duan, Sun, and Li do not teach a first machine learning model of the plurality of machine learning models is trained to generate a first class-specific predictive output based on a first code, a second machine learning model of the plurality of machine learning models is trained to generate a second class-specific predictive output based on a second code that is different from the first code… to generate an updated machine learning model with improved accuracy relative to the first machine learning model.
Malakouti, in the same field of endeavor, teaches a first machine learning model of the plurality of machine learning models is trained to generate a first class-specific predictive output based on a first code, a second machine learning model of the plurality of machine learning models is trained to generate a second class-specific predictive output based on a second code that is different from the first code ("Assume we have T diagnoses and diagnostic categories, each covered by a separate binary classification task... Our objective is to learn T discriminant functions f1,f2,...,fT in which ft : RD → R. Hence, the predicted score of the discriminant function ft can be mapped to one of the binary labels 0,1 using a task specific threshold..."
Malakouti teaches learning multiple task-specific functions (f_1,f_2…f_T) where each function operates on a patient representation to generate its own classification score using a task specific threshold, which corresponds to separate machine learning models that produce different class-specific outputs.), 
…to generate an updated machine learning model with improved accuracy relative to the first machine learning model (Page 2 Introduction, “Finally, we show our method can learn models with improved classification performance and analyze the difference between model adaptation from parent diagnostic categories for positive and negative classes.”).
Therefore, it would have been obvious to one of ordinary skill in the art to incorporate Duan’s teaching with Malakouti’s task specific multi-model training framework in order to enable Duan’s similarity-based relationships between models to be applied in a multi-task learning environment to improve classification performance of multiple machine learning models (Introduction of Malakouti).
Duan Sun, Li, and Malakouti do not teach a code similarity… corresponding to the first machine learning model and the second machine learning model that is based on a textual similarity between the first code and the second code.
Wu, in the same field of endeavor, teaches a code similarity… corresponding to the first machine learning model and the second machine learning model that is based on a textual similarity between the first code and the second code (Page 5 Paragraph 5 of Wu, “…the first preset model is different from the second preset model; using different preset models to judge the similarity of the text to be matched and the matching text in different ways. a first preset model or a second preset model; it can adopt depth semantic model DSSM, neural network depth semantic model CNN-DSSM or long-term memory network depth semantic LSTM-DSSM… mask weight reuse extracting the similarity maximum value in each data dimension of the first model code, and the similarity maximum value in each data dimension of the second model code. the first similar dimension refers to the similarity of the text to be matched and the matching text; the second similar dimension refers to the similarity of the text to be matched and the matching text… if the first model code is (3.89, 9.23)”
Wu teaches generating a first model and second model code using different preset models (e.g. DSSM, CNN-DSSM, LSTM-DSSM), where each model code contains dimensions representing the semantic similarity between a text to be matched and a matching text. Wu further teaches extracting similarity dimensions from the respective model codes, where the extracted dimensions represent textual similarity between texts. The model codes are vector representations generated and are compared to determine textual similarity.); 
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Duan and Malakouti’s teaching with Wu’s textual code similarity in order to enable more accurate inter relationships between codes (Paragraphs 1-2 Page 2 of Wu).
	
Regarding claim 2,
Duan teaches generating a similarity matrix based on the … value[Para 0019, “That is the latent representation of the (each) first neural network may be compared with the latent representation of each second neural network. The comparison may employ a similarity matrix in which each entry is a pairwise comparison of latent representations”, Paragraph 10 of Duan, “For example a similarity measure or disentanglement score for each of the P pairwise comparisons may be combined or aggregated to determine a disentanglement score for the trained neural network. The aggregation may be performed by averaging, e.g. by determining a median disentanglement score.”] 
Duan does not teach [t]he computer-implemented method of claim 1 further comprising:
updating, by the one or more processors, the similarity matrix based on the aggregated loss value. 
Li, in the same field of endeavor, teaches updating… based on the aggregated loss value [Para 0020, “The style similarity estimator 110, the clinical quality estimator 112, and the image content regularizer 120, determine a similarity loss 114, a clinical quality loss 116, and a content loss 122,
respectively, which are aggregated to form a cumulative loss 118. The parameters of style transfer
network 106 are then iteratively updated based on the cumulative loss 118 during the training process.”]
Therefore, it would have been obvious to one of ordinary skill in the art to modify the claimed invention of Duan to incorporate the aggregated loss approach of Li, since both references aim to improve model performance through iterative optimization (Para 0026 of Duan, Abstract of Li). Applying the aggregated loss value to update the similarity matrix is obvious because the model is minimizing the cumulative loss which is also minimizing the similarity loss at the same time.

Regarding claim 4,
Duan does not teach the measured dissimilarity comprises at least one of: (i) a distance between one or more coefficients… the first machine learning model and the second machine learning model.
Sun, in the same field of endeavor, teaches the measured dissimilarity comprises at least one of: (i) a distance between one or more coefficients between the first machine learning model and the second machine learning model 
 (Page 4 of Sun, “…using the improved hamming distance algorithm, judging the similarity of different mathematical matrix corresponding to different local model updating respectively as the gradient descending direction similarity”, Claim 4 of Sun, “determining whether sign marks of corresponding bit weight values in the different mathematical matrixes are the same to obtain a similarity matrix…”, Page 7 Paragraph 4, “…if the server compares the positive sign similarity percentage between two model updates satisfies the condition (e.g., reaches the threshold value); Well, it is believed that these two models update the attack from the sybil.” Claim 1, “…obtaining different mathematical matrixes corresponding to respectively different local models uploaded by the client”
Sun teaches computing a distance between parameters (coefficients) between two machine learning models by comparing the weight values in the model update matrices using the Hamming distance algorithm.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Duan’s teaching with Sun’s distance metric between coefficients between two machine learning models in order to improve predictive accuracy and control over the prediction result to improve the model performance (Sun’s Background).

Regarding claim 5,
Duan does not disclose the sharing loss value is represented by a sharing loss matrix comprising a respective sharing loss value for each pair of machine learning models of the plurality of machine learning models. 
Sun, in the same field of endeavor, teaches the sharing loss value is represented by a sharing loss matrix comprising a respective sharing loss value for each pair of machine learning models of the plurality of machine learning models [Pg. 4, “Wherein, the improved hamming distance algorithm comprises: judging the number of the sign of the sign of the corresponding bit between the two matrixes.” See also Pg. 8, Para 4, “if the comparison discovery of the positive and negative similarity percentage between two model updating satisfies the condition (e.g., to reach the threshold value), then it is considered that the two models update from malicious (e.g., sybil attack)”] to identify updates similar to each between two different models to further enhance the generation of distance similarities between models. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claimed invention of Duan, with the calculation of distance similarity between two models of Sun, because such a modification would allow to compare output similarities between two different machine learning models to improve the model’s performance (See Sun’s Background, See Para 0084 in Duan).

Regarding claim 6, 	Duan discloses the similarity matrix [Para 0019, That is the latent representation of the (each first neural network may be compared with the latent representation of each second neural network. The comparison may employ a similarity matrix in which each entry is a pairwise comparison of latent representations, See also Para 0019, That is the latent representation of the (each) first neural network may be compared with the latent representation of each second neural network. The comparison may employ a similarity matrix in which each entry is a pairwise comparison of latent representations. See also Para 0095, The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers]
Duan does not teach generating the aggregated loss value comprises: generating, by the one or more processors, a sharing -similarity loss matrix based on the sharing loss matrix and the similarity matrix, wherein (a) the sharing- similarity loss matrix comprises a sharing-similarity loss value for the first machine learning model and the second machine learning model, and (b) the sharing-similarity loss value comprises the sharing loss value scaled by the code similarity value.
Sun, in the same field of endeavor, teaches (a) the sharing- similarity loss matrix comprises a sharing-similarity loss value for the first machine learning model and the second machine learning model [Pg. 4, “Wherein, the improved hamming distance algorithm comprises: judging the number of the sign of the sign of the corresponding bit between the two matrixes.” See also Pg. 8, Para 4, “if the comparison discovery of the positive and negative similarity percentage between two model updating satisfies the condition (e.g., to reach the threshold value), then it is considered that the two models update from malicious (e.g., sybil attack)”]
 (b) the sharing-similarity loss value scaled by the code similarity value [Pg. 4, using the improved hamming distance algorithm, judging the similarity of different mathematical matrix corresponding to different local model updating respectively as the gradient descending direction similarity] to generate a similarity matrix between pairs or groups of models where each entry of the similarity matrix corresponds to the claimed code similarity value.
Therefore, it would have been obvious to one of ordinary skill in the art to generate and improve the performance of similarities between multiple machine learning models using a sharing loss value.
Duan and Sun do not teach an aggregated loss value comprising the sharing loss value and
sharing matrix. 
Li, in the same field of endeavor, teaches the aggregated loss value [Para 0020, The
style similarity estimator 110, the clinical quality estimator 112, and the image content regularizer 120, determine a similarity loss 114, a clinical quality loss 116, and a content loss 122, respectively, which are aggregated to form a cumulative loss 118. The parameters of style transfer network 106 are then iteratively updated based on the cumulative loss 118 during the training process] to further enhance the evaluation of medical images (Abstract).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Duan and Sun’s teachings of similarity matrix and sharing loss value scaled by the code similarity value to generate the aggregated loss value of Li to improve model optimization between multiple machine learning models (See Sun’s Background, See Para 0084 in Duan, and See the Abstract of Li). Combining Duan’s generation of a similarity matrix, Sun’s distance similarity between multiple machine learning models would both generate the aggregated loss value taught from Li. Such combination would predictably enhance model coordination learning by jointly considering similarity and dissimilarity losses.

Regarding claim 8, 
Duan teaches for the first machine learning model and the second machine learning model (Paragraph 75 of Duan, “In some implementations making a pairwise comparison between two sets of latent representations involves comparing each individual dimension or component of one latent variable representation to each individual dimension or component of another latent variable representation.”, Paragraph 81 of Duan, “To compare a pair of trained neural networks i,j the set of evaluation data items is processed by first and second trained neural networks of the pair”)
Duan does not teach the prediction loss value for the at least two machine learning models is represented by a prediction loss matrix comprising a respective prediction loss value for each machine learning model of the plurality of machine learning models.
 Sun, in the same field of endeavor, teaches the prediction loss value… is represented by a prediction loss matrix comprising a respective prediction loss value for each machine learning model of the plurality of machine learning models [Pg. 4, using the improved hamming distance algorithm, judging the similarity of different mathematical matrix corresponding to different local model updating respectively as the gradient descending direction similarity; See also Pg. 5, the loss function is inversely proportional to the model precision; therefore, the optimization of the target function of the machine learning generally is to make the loss function reach the minimum value]
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claimed invention of Duan, with the prediction loss value generated for different machine learning models of Sun, because such a modification would allow to improve predictive accuracy and control over the prediction result to improve the model performance (See Sun’s Background, See Para 0084 in Duan, and See the Abstract of Li).

Regarding claim 9, 
Duan teaches the plurality of machine learning models are represented by a model matrix the plurality of machine learning models are represented by a model matrix [Para 0004, In implementations determining the measure of similarity between the sets of latent representations of the trained neural networks is performed in parallel between pairs or groups of the trained neural networks that is the determining is performed as a set of parallel tasks, optionally on a distributed computing system; See also Para 0019, That is the latent representation of the (each) first neural network may be compared with the latent representation of each second neural network. The comparison may employ a similarity matrix in which each entry is a pairwise comparison of latent representations. See also Para 0095, The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers ] 
Li, in the same field of endeavor, teaches updating the first machine learning model comprises: updating, by the one or more processors, the model matrix for the plurality of machine learning models to optimize the aggregated loss value [Para 0020, The style similarity estimator 110, the clinical quality estimator 112, and the image content regularizer 120, determine a similarity loss 114, a clinical quality loss 116, and a content loss 122, respectively, which are aggregated to form a cumulative loss 118. The parameters of style transfer network 106 are then iteratively updated based on the cumulative loss 118 during the training process]
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claimed invention of the similarity matrix from Duan, with the updating of a machine learning model using the aggregated loss from Sun, because such a modification would allow to improve prediction result of the machine learning model and improve model performance (See Sun’s Background, See Para 0084 in Duan, and See the Abstract of Li).

Regarding claim 10, 
Duan teaches the model matrix is indicative of a set of coefficients for each of the plurality of machine learning models [Para 0004, In implementations determining the measure of similarity between the sets of latent representations of the trained neural networks is performed in parallel between pairs or groups of the trained neural networks that is the determining is performed as a set of parallel tasks, optionally on a distributed computing system; See also Para 0019, That is the latent representation of the (each) first neural network may be compared with the latent representation of each second neural network. The comparison may employ a similarity matrix in which each entry is a pairwise comparison of latent representations. See also Para 0095, The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers] to generate a similarity matrix using multiple machine learning models.]

Regarding claim 11,
Duan teaches [a] system comprising: one or more processors; and one or more memories storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising (Paragraph 93, “For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions.”, Paragraph 95, “The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. ”)
The remainder of claim 11 recites identical limitations as claim 1. Therefore, claim 11 is rejected using the same rationale as claim 1.

Claims 12-14 recite identical limitations to claims 2-4. Therefore, claims 12-14 are rejected using the same rationale as claims 2-4.

Regarding claim 15,
Duan teaches [o]ne or more non-transitory computer-readable media storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: (Paragraph 19, “One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations”).
The remainder of claim 15 recites identical limitations to claim 1. Therefore, claim 15 is rejected using the same rationale as claim 1.

Claims 16-20 recite identical limitations to claims 5-9. Therefore, claims 16-20 are rejected using the same rationale as claims 5-9.

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over US 20220121934 A1 (hereinafter “Duan”) in view of CN 1135553582 A (hereinafter “Sun”), US 20210233239 A1 (hereinafter “Li”), and in further view of Malakouti (“Not All Samples Are Equal: Class Dependent Hierarchical Multi-Task Learning for Patient Diagnosis Classification”, 2020), Wu (CN 111680136 A), and “Multimodal Machine Learning for Automated ICD Coding” (hereinafter “Xu et. al”).

Regarding claim 3,
	Duan does not teach first code is a first medical code associated with a first textual description, and the second code is a second medical code associated with a second textual description.
Wu, in the same field of endeavor, teaches first code is… associated with a first textual description, and the second code is… associated with a second textual description (Page 5 Paragraph 5, “…the first preset model is different from the second preset model; using different preset models to judge the similarity of the text to be matched and the matching text in different ways. a first preset model or a second preset model; it can adopt depth semantic model DSSM, neural network depth semantic model CNN-DSSM or long-term memory network depth semantic LSTM-DSSM… mask weight reuse extracting the similarity maximum value in each data dimension of the first model code, and the similarity maximum value in each data dimension of the second model code. the first similar dimension refers to the similarity of the text to be matched and the matching text; the second similar dimension refers to the similarity of the text to be matched and the matching text… if the first model code is (3.89, 9.23)”
Wu teaches generating a first model and second model code using different preset models (e.g. DSSM, CNN-DSSM, LSTM-DSSM), where each model code contains dimensions representing the semantic similarity between a text to be matched and a matching text. Wu also teaches extracting similarity dimensions from model codes where the extracted dimensions represent textual similarity corresponding to a code similarity between the first code and second code based on the textual similarity generated)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Duan and Malakouti’s teaching with Wu’s textual code similarity in order to enable more accurate inter relationships between codes (Paragraphs 1-2 Page 2 of Wu).
Duan and Wu do not teach first medical code and second medical code.
	Xu, in the same field of endeavor, teaches first medical code… second medical code… (Pg.9 Para 3, “measure the overlap between two sets, which are our extracted evidence (ICD Code Prediction) and physicians’ annotations] to compare results between physician’s medical code descriptions and their model’s predictive output of medical code description.”
Here, the first medical code corresponds to the physician’s medical code and the second medical code corresponds to the model’s output of medical code.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Duan and Wu’s teaching with Xu’s textual similarity between medical code descriptions in order to accurately compare medical codes to improve the model performance (Introduction of Xu).

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over US 20220121934
A1 (hereinafter “Duan”) in view of CN 1135553582 A (hereinafter “Sun”), US 20210233239 A1
(hereinafter “Li”), and further in view of Malakouti (“Not All Samples Are Equal: Class Dependent Hierarchical Multi-Task Learning for Patient Diagnosis Classification”, 2020), Wu (CN 111680136 A), and US 20210357687 A1 (hereinafter “Gao”).

Regarding claim 7, 
Duan does not teach the aggregated loss value for the particular machine learning model is representative of a joint loss for each of the plurality of machine learning models, and wherein the aggregated loss value comprises a weighted sum of (a) a prediction loss matrix comprising a respective prediction loss value for each machine learning model of the plurality of machine learning models and (b) the sharing-similarity loss matrix. 
Gao, in the same field of endeavor, teaches the aggregated loss value for the first machine learning model is representative of a joint loss for each of the plurality of machine learning models, and wherein the aggregated loss value comprises a weighted sum [Claim 9, wherein the loss metric is computed by a weighted sum of an online action recognizer loss and a temporal proposal generator loss, wherein the online action recognizer loss includes a sum of the frame loss and the start loss, and wherein the temporal proposal generator loss includes a multiple instance learning loss and a pair-wise co-activity similarity loss. See also Para 0019, As used herein, the term “network” may comprise any hardware or software-based framework that includes any artificial intelligence network or system, neural network or system and/or any training or learning models implemented thereon or therewith].
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claimed invention of Duan, with the aggregated loss computed by a weighted sum of Gao, because such a modification would allow updating the model based on the loss values calculated to improve the training process (See Abstract in Duan and Abstract of Gao).
 Duan and Gao do not teach the sharing-similarity loss matrix/ sharing loss value. 
Sun, in the same field of endeavor, teaches  (a) a prediction loss matrix comprising a respective prediction loss value for each machine learning model of the plurality of machine learning models and (Pg. 4, using the improved hamming distance algorithm, judging the similarity of different mathematical matrix corresponding to different local model updating respectively as the gradient descending direction similarity; See also Pg. 5, the loss function is inversely proportional to the model precision; therefore, the optimization of the target function of the machine learning generally is to make the loss function reach the minimum value)
 (b) the sharing-similarity loss matrix [Pg. 4, using the improved hamming distance algorithm, judging the similarity of different mathematical matrix corresponding to different local model updating respectively as the gradient descending direction similarity] to calculate the similarity between different models.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claimed invention of Duan and Gao, with the calculation of distance similarity between two models of Sun, because such a modification would allow updating the model based on the loss values calculated to improve model performance (See Sun’s Background, See Para 0084 in Duan, and Abstract of Gao).

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 3, 4, 11, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

35 U.S.C 103 Response-
Applicant’s arguments with respect to claim(s) 1, 3, 11, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant argues that none of the cited references teach or suggest a similarity matrix comprising a code similarity value corresponding to the first and second machine learning models that is based on textual similarity between the first and second code (Page 2 of Applicant’s Remarks). Applicant’s argument is not persuasive because the new cited combination above teaches the amended similarity limitation of using a similarity matrix that comprises code similarity value between the first and second machine learning models that is based on textual similarity between the first code and second code. Malakouti teaches having multiple task functions where teach task function corresponds to a binary classifier where the goal of Malakouti’s method is to improve the accuracy of the model. Wu teaches generating code similarities based on textual similarity between different codes of machine learning models and Duan already teaches the similarity matrix limitation. The cited references collectively teach the amended limitation. 
Applicant argues that none of the cited references teach generating an aggregated loss value for the first machine learning model based on the code similarity value, the sharing loss value, and prediction loss value (Page 2 of Remarks). Applicant’s argument is not persuasive. Sun teaches generating both (i) a sharing loss value based on a measured dissimilarity between machine learning models (e.g., via a Hamming-distance similarity metric) and (ii) a prediction loss value based on a training dataset and loss function for a particular model. Duan teaches generating similarity values between models (e.g., via a similarity matrix corresponding to a plurality of models). Li teaches aggregating multiple loss terms including similarity loss and other task-specific losses into a single cumulative loss to iteratively update model parameters during training. It would have been obvious to one of ordinary skill in the art to aggregate Duan’s similarity-based value and Sun’s sharing and prediction loss values with Li’s combined loss function in order to optimize the model performance.

35 U.S.C 101 Response-
Step 2A Prong 1:
Under the 35 U.S.C. §101 rejection, Applicant argues that the amended claim 1 does not recite a mathematical concept under MPEP § 2106.04(a)(2)(I) (Page 14 of Remarks). Examiner respectfully disagrees.
Amended claim 1 recites, inter alia, “generating… a sharing loss value for the first machine learning model and the second machine learning model based on a measured dissimilarity between the first machine learning model and the second machine learning model.” The limitation does not merely involve mathematics in a general sense; rather, it requires performing a mathematical function. See paragraph 31 from the instant specification, which explains that the sharing loss matrix represents a measured dissimilarity that may be determined using L1, L2, or Euclidean distance measures between model outputs. Such “measured dissimilarity” requires mathematical calculations to quantify differences between numerical values.
Amended claim 1 further recites “generating… and using a loss function and a training dataset, a prediction loss value for the first machine learning model.” This limitation recites a mathematical concept because it requires applying a loss function to model outputs. Paragraph 36 of the instant specification explains how prediction loss values may be determined using loss functions such as binary cross-entropy. Applying a loss function to numerical outputs constitutes a mathematical calculation under MPEP § 2106.04(a)(2)(I).
Claim 1 additionally recites “generating…an aggregated loss value for the first machine learning model based on the code similarity value, the sharing loss value, and the prediction loss value”. Paragraph 37 of the instant specification explains that the aggregated loss value may be a weighted sum of several losses.

Step 2A Prong Two:
Applicant argues that even if claim 1 recites abstract ideas of mathematical concepts, that it does provide a practical application because it improves machine learning model accuracy and relies on Ex Parte Desjardins (Page 15 of Remarks). The argument is not persuasive for at least the following reasons.
The claim is primarily directed to generating and aggregating mathematical loss values based on similarity calculations, which constitute abstract mathematical concepts. The recitation of “updating the machine learning model” appears only at the end of the claim and merely applies the calculated loss values in the field of model training. Unlike Desjardins, the amended claim still does not recite specific training mechanisms or modifications to the model itself, but instead focuses on mathematical loss calculations and aggregations. In other words, the training step only uses the calculated loss values to update the model, so the training is just an incidental use of the loss calculations rather than the focus of the claimed improvement.  This amounts to making an improvement to the abstract idea rather than an improvement to the technology.  See MPEP 2106.05 (a), (I).
Additionally, the statement in the claim that the updated machine learning model has ‘improved accuracy’ merely recites an intended result of the updating.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAJD MAHER HADDAD whose telephone number is (571)272-2265. The examiner can normally be reached Mon-Friday 8-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.M.H./Examiner, Art Unit 2125                                                                                                                                                                                                         

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
MACHINE LEARNING TRAINING APPROACH FOR A MULTITASK PREDICTIVE DOMAIN

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MACHINE LEARNING TRAINING APPROACH FOR A MULTITASK PREDICTIVE DOMAIN

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email