Office Action Analysis: 18194603 — DEFENSE FROM MEMBERSHIP INFERENCE ATTACKS IN TRANSFER LEARNING

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 15, 2023 was filed. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claims 1, 11, 12, 19 and their respective dependent claims are objected to because of the following informalities: 
Change “a private data” to “private data”
Change “updating a model parameters” to “updating model parameters”
Remove the extra period at the end of claim 12.
 Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 19 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subject matter because claim 19 recites a computer program product comprising “one or more computer-readable storage devices” storing instructions executable by a processor. While non-transitory computer-readable storage media are considered a statutory manufacture, the claim as written uses the term “devices,” which could encompass transitory signals, and therefore doesn’t fall under any of the four statutory categories of machine, manufacture, composition of matter, or process. The specification at paragraph 64 only describes “computer-readable storage media” as non-transitory media. Claim 19 and 20 should be rejected as non-statutory. The applicant may overcome this issue by amending “devices” to “non-transitory computer-readable storage media” to clearly claim a statutory manufacture.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.

Claim 1
Step 1: The claim recites a method; therefore, it is directed to the statutory category of
processes.
Step2A Prong 1: The claim recites, inter alia:
…to prevent data leakage from membership inference attacks…: This limitation is seen as a mental concept because it recites an intended result or objective of protecting data privacy.
[C]omputing an initial loss distribution LINIT of a plurality of loss values: This limitation is a mathematical concept because it deals with performing mathematical calculations to derive LINIT. See Paragraph 65 which states “The modeling of the LINIT is performed by computing the mean and variance of the logit-scaled loss values. The mean and the variance aid in providing a Gaussian variable. The logit scaling also aids in obtaining a Gaussian random variable of the minibatch losses to make an MIA attack more difficult for an adversary.”.
[C]omputing a batch loss of a minibatch from the private data after beginning a fine-tuning operation to transform the pre-trained model into a fine-tuned model: This is a mathematical concept because the batch loss is computed by applying a loss function over the minibatch which applies math equations in order to compute the loss.
and computing a loss distribution LBATCH of the batch loss: This limitation is a mathematical concept because it deals with calculating the mean and variance between the batches. See Paragraph 9 that states, “…the modeling of the LBATCH as a second Gaussian Random Variable is performed by computing the mean and variance of the logit-scaled batch loss.”.
[C]omputing a divergence metric between LINIT and LBATCH: This limitation is viewed as a mathematical concept because it deals with applying Kullback-Leibler (KL) divergence method to measure the distance between two probability distributions dealing with math calculations. See Paragraph 10 which states, “In one or more embodiments, the computing of the divergence metric between LNIT and LBATCH includes using a Kullback-Leibler divergence to measure a distance between LINIT and LBATCH. Kullback-Leibler (KL) divergence is a method useful to measure a distance between two probability distributions.”. 
[M]ultiplying an output of the divergence metric with a pre-defined hyperparameter a to obtain a result: This limitation is viewed as a mathematical concept since it deals with multiplying two different numbers to get a result.
 [A]dding the result to the batch loss as a regularizer: This limitation is a mathematical concept dealing with the addition of the result from the previous step and the batch loss to obtain the regularizer.
…computing backpropagation on the regularized batch loss: This limitation is a mathematical concept because it involves computing derivatives and loss functions during the backpropagation process.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
[T]raining a machine learning model… updating a model parameters by…:  Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
[R]eceiving a pre-trained model and a pre-defined hyperparameter X as an input for machine learning: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
[A]nd outputting the fine-tuned model: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
[A]pplying a forward pass by querying the pre-trained model with a private data: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).  
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
[T]raining a machine learning model… updating a model parameters by…:  Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
[R]eceiving a pre-trained model and a pre-defined hyperparameter X as an input for machine learning: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
[A]nd outputting the fine-tuned model: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
[A]pplying a forward pass by querying the pre-trained model with a private data: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts for preventing data leakage from membership inference attacks). The claim merely describes applying known mathematical operations to machine learning training data, including computing loss distributions, calculating statistical parameters such as mean and variance, measuring divergence between probability distributions, and performing backpropagation to update model parameters. The recitation of training and fine-tuning a machine learning model, applying a forward pass, receiving a pre-trained model and hyperparameters, and outputting a fine-tuned model merely describes generic computer implementation and routine data gathering and output, without improving the functioning of a computer or machine learning model itself.

Claim 2
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
applying logit-scaling to the plurality of loss values: This limitation is seen as a mathematical concept because logit-scaling is a math operation applied to numerical values.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.

Claim 3
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
performing a modeling of the LINIT as a first Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss values: This limitation is seen as a mathematical concept because it recites the application of statistical formulas to numeric data including calculating the mean and variance.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 4
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
applying logit-scaling to the batch loss, and wherein the modeling of the LBATCH as a second Gaussian Random Variable is performed by computing a mean and a variance of the logit-scaled batch loss: This limitation is seen as a mathematical concept because it recites performing logit-scaling, computing a mean and variance,  which involve math operations and statistical formulas. 
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 5
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
computing the divergence metric between LINIT and LBATCH comprises using a Kullback-Leibler divergence to measure a distance between LINIT and LBATCH: This limitation is seen as a mathematical concept because it recites performing a statistical comparison between two probability distributions.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 6
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
…computing the backpropagation is performed by using an update rule, and the method further comprising: This limitation is a mathematical concept because it involves math operations during the backpropagation and update phase/
 determining, prior to outputting, that the fine-tuned model meets a termination criterion: This limitation is seen as a mental process because it involves the determination of that a model meets a certain criteria.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
the updating of the model parameters by…: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
the updating of the model parameters by…: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 7
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
computing of the backpropagation to update the model parameters is performed using a Stochastic Gradient Descent (SGD): This limitation is seen as a mathematical concept because it recites an algorithmic procedure for adjusting numeric model parameters based on computed gradients.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 8
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the computing of the backpropagation to update the model parameters is performed using an adaptive learning rate method (Adam):  This limitation is a mathematical concept because it recites an algorithmic procedure for updating model parameters using gradients and other mathematical operations.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 9
Step 1: A process, as above.
Step2A Prong 1: This claim does not recite an additional abstract idea, but the claim depends on
claim 2 that depends on claim 1, which recites an abstract idea.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
the fine-tuning operation of the pre-trained model occurs in a federated learning setting, and the method further comprising: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
providing the output of the fine-tuned model to an aggregation server: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
the fine-tuning operation of the pre-trained model occurs in a federated learning setting, and the method further comprising: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
providing the output of the fine-tuned model to an aggregation server: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 10
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
varying a pre-defined value of the hyperparameter X to change a privacy-robustness values of the training model: This limitation is seen as a mathematical concept because it recites adjusting a numeric parameter that influences the outcome of a mathematical training algorithm.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 11
Step 1: The claim recites an apparatus; therefore, it is directed to the statutory category of
apparatus.
Step2A Prong 1: The claim recites, inter alia:
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
and prevent data leakage from membership inference attacks…: This limitation is seen as a mental concept because it recites an intended result or objective of protecting data privacy.
[C]omputing an initial loss distribution LINIT of a plurality of loss values: This limitation is a mathematical concept because it deals with performing mathematical calculations to derive LINIT. See Paragraph 65 which states “The modeling of the LINIT is performed by computing the mean and variance of the logit-scaled loss values. The mean and the variance aid in providing a Gaussian variable. The logit scaling also aids in obtaining a Gaussian random variable of the minibatch losses to make an MIA attack more difficult for an adversary.”.
[C]omputing a batch loss of a minibatch from the private data after beginning a fine-tuning operation to transform the pre-trained model into a fine-tuned model…: This is a mathematical concept because the batch loss is computed by applying a loss function over the minibatch which applies math equations in order to compute the loss.
and computing a loss distribution LBATCH of the batch loss: This limitation is a mathematical concept because it deals with calculating the mean and variance between the batches. See Paragraph 9 that states, “…the modeling of the LBATCH as a second Gaussian Random Variable is performed by computing the mean and variance of the logit-scaled batch loss.”.
[C]omputing a divergence metric between LINIT and LBATCH: This limitation is viewed as a mathematical concept because it deals with applying Kullback-Leibler (KL) divergence method to measure the distance between two probability distributions dealing with math calculations. See Paragraph 10 which states, “In one or more embodiments, the computing of the divergence metric between LNIT and LBATCH includes using a Kullback-Leibler divergence to measure a distance between LINIT and LBATCH. Kullback-Leibler (KL) divergence is a method useful to measure a distance between two probability distributions.”. 
[M]ultiplying an output of the divergence metric with a pre-defined hyperparameter a to obtain a result: This limitation is viewed as a mathematical concept since it deals with multiplying two different numbers to get a result.
[A]dding the result to the batch loss as a regularizer: This limitation is a mathematical concept dealing with the addition of the result from the previous step and the batch loss to obtain the regularizer.
…computing backpropagation of the regularized batch loss: This limitation is a mathematical concept because it involves computing derivatives and loss functions during the backpropagation process.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the additional elements are as follows:
A computing device configured to train a machine learning model… the computing device comprising: a processor; and a memory coupled to the processor, the memory storing instructions to cause the processor to perform acts comprising… updating a model parameters by…: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
[R]eceiving a pre-trained model and a pre-defined hyperparameter X as an input: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
[A]pplying a forward pass by querying the pre-trained model with a private data: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).  
and outputting the fine-tuned model: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
A computing device configured to train a machine learning model… the computing device comprising: a processor; and a memory coupled to the processor, the memory storing instructions to cause the processor to perform acts comprising… updating a model parameters by…:  Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
[R]eceiving a pre-trained model and a pre-defined hyperparameter X as an input: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
[A]pplying a forward pass by querying the pre-trained model with a private data: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).  
and outputting the fine-tuned model: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts for preventing data leakage from membership inference attacks). The claim merely describes applying known mathematical operations to machine learning training data, including computing loss distributions, calculating statistical parameters such as mean and variance, measuring divergence between probability distributions, and performing backpropagation to update model parameters. The recitation of training and fine-tuning a machine learning model, applying a forward pass, receiving a pre-trained model and hyperparameters, and outputting a fine-tuned model merely describes generic computer implementation and routine data gathering and output, without improving the functioning of a computer or machine learning model itself.

Claim 12 is an apparatus claim that recites identical limitations to claim 2. Therefore, claim 12 is rejected using the same rationale as claim 2.

Claim 13
Step 1: An apparatus, as above.
Step2A Prong 1: The claim recites, inter alia:
…applying logit- scaling to the batch loss: This limitation is seen as a mathematical concept because logit-scaling is a math operation applied to numerical values.
performing a modeling of the LINIT as a Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss: This limitation is seen as a mathematical concept because it recites the application of statistical formulas to numeric data including calculating the mean and variance.
performing a modeling of the LINIT as a first Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss values: This limitation is seen as a mathematical concept because it recites the application of statistical formulas to numeric data including calculating the mean and variance.
and performing a modeling of the LBATCH as a second Gaussian Random Variable by computing a mean and a variance of the logit-scaled batch loss: This limitation is seen as a mathematical concept because it recites performing logit-scaling, computing a mean and variance, which involve math operations and statistical formulas.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 14 is an apparatus claim that recites identical limitations to claim 5. Therefore, claim 14 is rejected using the same rationale as claim 5.
Claim 15 is an apparatus claim that recites identical limitations to claim 6. Therefore, claim 15 is rejected using the same rationale as claim 6.

Claim 16
Step 1: An apparatus, as above.
Step2A Prong 1: The claim recites, inter alia:
the update rule for the computing of the backpropagation to update the model parameters is performed by a Stochastic Gradient Descent (SGD) or an adaptive learning rate method (Adam):  This limitation is seen as a mathematical concept because it recites an algorithmic procedure for adjusting numeric model parameters based on computed gradients.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 17
Step 1: An apparatus, as above.
Step2A Prong 1: This claim does not recite an additional abstract idea, but the claim depends on
claim 16, which recites an abstract idea.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
training the fine-tuned model…: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
…in a federated learning setting: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
and providing the output of the fine-tuned model to an aggregation server that shares an update of fine-tuned model:  Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
training the fine-tuned model…: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)). 
…in a federated learning setting: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
and providing the output of the fine-tuned model to an aggregation server that shares an update of fine-tuned model: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 18 is an apparatus claim that recites identical limitations to claim 10. Therefore, claim 18 is rejected using the same rationale as claim 10.

Claim 19
Step 1. The claim is assumed to be directed to a statutory category for the purposes of the abstract idea analysis. 
Step2A Prong 1: The claim recites, inter alia:
…to compute an initial loss distribution LINIT of a plurality of the loss values: This limitation is a mathematical concept because it deals with performing mathematical calculations to derive LINIT. See Paragraph 65 which states “The modeling of the LINIT is performed by computing the mean and variance of the logit-scaled loss values. The mean and the variance aid in providing a Gaussian variable. The logit scaling also aids in obtaining a Gaussian random variable of the minibatch losses to make an MIA attack more difficult for an adversary.”.
…to compute a batch loss of a minibatch from the private data after beginning a fine-tuning operation to transform the pre-trained model into a fine-tuned model: This is a mathematical concept because the batch loss is computed by applying a loss function over the minibatch which applies math equations in order to compute the loss.
and to compute a loss distribution LBATCH of the batch loss: This limitation is a mathematical concept because it deals with calculating the mean and variance between the batches. See Paragraph 9 that states, “…the modeling of the LBATCH as a second Gaussian Random Variable is performed by computing the mean and variance of the logit-scaled batch loss.”.
…to compute a divergence metric between LINIT and LBATCH: This limitation is viewed as a mathematical concept because it deals with applying Kullback-Leibler (KL) divergence method to measure the distance between two probability distributions dealing with math calculations. See Paragraph 10 which states, “In one or more embodiments, the computing of the divergence metric between LNIT and LBATCH includes using a Kullback-Leibler divergence to measure a distance between LINIT and LBATCH. Kullback-Leibler (KL) divergence is a method useful to measure a distance between two probability distributions.”.
[M]ultiply an output of the divergence metric with a pre-defined hyperparameter A. to obtain a result: This limitation is viewed as a mathematical concept since it deals with multiplying two different numbers to get a result.
and to add the result to the batch loss as a regularizer: This limitation is a mathematical concept dealing with the addition of the result from the previous step and the batch loss to obtain the regularizer.
…by computing backpropagation on the regularized batch loss: This limitation is a mathematical concept because it involves computing derivatives and loss functions during the backpropagation process.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the additional elements are as follows:
A computer program product comprising: one or more computer-readable storage devices and program instructions stored on at least one of the one or more computer-readable storage devices, the program instructions executable by a processor, the program instructions comprising: program instructions… to update a model parameters: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
…to receive a pre-trained model and a pre-defined hyperparameter X as an input: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
…to apply a forward pass by querying the pre-trained model with a private data:    The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
…to output the fine-tuned model: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
A computer program product comprising: one or more computer-readable storage devices and program instructions stored on at least one of the one or more computer-readable storage devices, the program instructions executable by a processor, the program instructions comprising: program instructions… to update a model parameters: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
…to receive a pre-trained model and a pre-defined hyperparameter X as an input: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
…to apply a forward pass by querying the pre-trained model with a private data: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).        
…to output the fine-tuned model: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts for preventing data leakage from membership inference attacks). The claim merely describes applying known mathematical operations to machine learning training data, including computing loss distributions, calculating statistical parameters such as mean and variance, measuring divergence between probability distributions, and performing backpropagation to update model parameters. The recitation of training and fine-tuning a machine learning model, applying a forward pass, receiving a pre-trained model and hyperparameters, and outputting a fine-tuned model merely describes generic computer implementation and routine data gathering and output, without improving the functioning of a computer or machine learning model itself.

Claim 20
Step 1: An apparatus, as above.
Step2A Prong 1: The claim recites, inter alia:
to apply logit-scaling to the loss values obtained from the forward pass: This limitation is seen as a mathematical concept because logit-scaling is a math operation applied to numerical values.
…to perform a modeling of the LINIT as a first Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss values: This limitation is seen as a mathematical concept because it recites the application of statistical formulas to numeric data including calculating the mean and variance.
…to apply logit-scaling to the batch loss, and to perform a modeling of the LBATCH as a second Gaussian Random Variable by computing a mean and variance of the logit-scaled batch loss: This limitation is seen as a mathematical concept because it recites performing logit-scaling, computing a mean and variance, which involve math operations and statistical formulas.
…to compute the backpropagation to update the model parameters by using an update rule: This limitation is a mathematical concept because it involves math operations during the backpropagation and update phase
…prior to output of the fine-tuned model to determine that the fine- tuned model meets a termination criterion: This limitation is seen as a mental process because it involves the determination of that a model meets a certain criteria.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-5, 7-8, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Carlini
(“Membership Inference Attacks From First Principles”, 2022) in view of Chen (“RELAXLOSS: DEFENDING MEMBERSHIP INFERENCE ATTACKS WITHOUT LOSING UTILITY”, 2022) and Zhang (US 20240064160 A1).

Regarding claim 1, 
Carlini teaches [a] computer-implemented method of training a machine learning model to prevent data leakage from membership inference attacks, the method comprising (Section 2 Page 2 of Carlini, “The field of training data privacy constructs attacks that leak data, develops techniques to prevent memorization, and measures the privacy of proposed defenses.”, Section 3 Page 2, “The objective of a membership inference attack (MIA) [60] is to predict if a specific training example was, or was not, used as training data in a particular model. This makes MIAs the simplest and most widely deployed attack for auditing training data privacy… This section formalizes the membership inference attack security game (§III-A), and introduces our membership inference evaluation methodology”):
computing an initial loss distribution LINIT of a plurality of loss values (Section IV Page 4, “We formalize this by considering two distributions over models… is the distribution of models trained on datasets containing (x,y), and then Qout(x,y)={f←T(D∖{(x,y)})∣D←D}… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
Carlini establishes Q_out as the baseline distribution of losses for data points that the model has not yet been trained on which corresponds to the initial loss distribution LINIT.); 
…the private data after beginning a fine-tuning operation to transform the pre-trained model into a fine-tuned model, and computing a loss distribution LBATCH of the batch loss (Section 1 Introduction Page 1, “Neural networks are now trained on increasingly sensitive datasets, and so it is necessary to ensure that trained models are privacy-preserving. In order to empirically verify if a model is in fact private, membership inference attacks [60] have become the de facto standard [42], [63] because of their simplicity. A membership inference attack receives as input a trained model and an example from the data distribution, and predicts if that example was used to train the model.”, Section IV Page 4, “We formalize this by considering two distributions over models: Qin(x,y)={f←T(D∪{(x,y)})∣D←D} is the distribution of models trained on datasets containing (x,y)… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
Carlini defines Q_in as a distribution of losses for models trained on datasets containing specific training example (x,y), thereby characterizing how the inclusion of private data changes the trained model’s loss behavior. Because fine-tuning involves updating an existing model’s parameters using additional training data, Carlini’s method encompasses computing a loss distribution derived from private data after the model parameter updates. ); 
computing a divergence metric between LINIT and LBATCH (Section IV Page 4, “…the best hypothesis test… is obtained by thresholding the Likelihood-ratio Test between the two hypotheses: 
    PNG
    media_image1.png
    79
    625
    media_image1.png
    Greyscale
”
The Likelihood-ratio test maps onto the divergence metric because both are mathematical methods used to quantify the “distance” or statistical difference between the two distributions Q_in (LBatch) and Q_out (Linit). The divergence metric determines how much the model’s behavior has shifted toward the trained state. If the result of the likelihood ratio is a high value, then it means that the model’s behavior shifted away from the un-trained baseline where it now belongs to the trained distribution.); 
Carlini does not teach receiving a pre-trained model and a pre-defined hyperparameter X as an input for machine learning; applying a forward pass by querying the pre-trained model with a private data; and multiplying an output of the divergence metric with a pre-defined hyperparameter a to obtain a result; adding the result to the batch loss as a regularizer; updating a model parameters by computing backpropagation on the regularized batch loss; and outputting the fine-tuned model.
Chen, in the same field of endeavor, teaches receiving a pre-trained model and a pre-defined hyperparameter X as an input for machine learning (Section 7 Page 9 of Chen, “Our method involves a single hyperparameter α that controls the trade-off between privacy and utility. A fine-grained grid search on a validation set (i.e., first estimating the privacy-utility trade-off with varying value of α, and subsequently selecting the α corresponding to the desired privacy/utility level) allows precise control over the expected privacy/utility level of the target model.”, Adaptive Attack Section C.4 Page 20, “And for the NN-based attack, we use the complete logits prediction from the pre-trained shadow models as features to train the adaptive attack models (modeled as a NN).”); 
applying a forward pass by querying the pre-trained model with a private data (Page 3 Preliminaries Section, “We consider the standard setting of MIA: the attacker has access to a query set S = {(zi , mi)} N i=1 containing both member (training) and non-member (testing) samples drawn from the same data distribution Pdata, where mi is the membership attribute (mi = 1 if zi is a member). The task is to infer the value of the membership attribute mi associated with each query sample zi… which predicts mi for a given query sample zi and a target model parametrized by θ.”, Page 17 Model Architectures, “…we adopt the same architecture as used in Nasr et al. (2018) 12: a 4-layer fully-connected neural network”, See Algorithm 1, “for epoch in {1, ..., E} do for batch_index in {1, ..., K} do Get sample batch {(xi , yi)} B i=1 Perform forward pass: pi = f(xi ; θ)”, Section IV Page 4, “We formalize this by considering two distributions over models: Qin(x,y)={f←T(D∪{(x,y)})∣D←D} is the distribution of models trained on datasets containing (x,y)… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
The forward pass in Algorithm 1 simulates the attacker’s query by processing private training data to generate output predictions. The algorithm then applies backpropagation to fine tune the model’s weights. The membership attribute m_i identifies whether a queried sample z_i is private training data, and Algorithm 1 explicitly performs a forward pass on such samples.); 
and multiplying an output of the divergence metric with a pre-defined hyperparameter a to obtain a result; adding the result to the… loss as a regularizer (Defense Methods Section B.5 Page 17, “Label-smoothing prevents overconfident predictions by incorporating a regularization term into the training objective that penalizes the distance (measured by the KL-divergence) between the model predictions and the uniform distribution. The objective is formularized as follows

    PNG
    media_image2.png
    44
    516
    media_image2.png
    Greyscale
”
The equation defines a regularized loss by scaling the KL-divergence (divergence metric) by the hyperparameter a to penalize the overconfident predictions. This mathematical result is added to the standard cross-entropy loss to force the model toward a uniform distribution.); 
updating a model parameters by computing backpropagation on the regularized… loss; and outputting the fine-tuned model (See Algorithm 1 on Page 4, “

    PNG
    media_image3.png
    761
    375
    media_image3.png
    Greyscale
”
The algorithm uses a forward pass to query the model with private data, then calculates a regularized loss where hyperparameter a (from Equation 14) scales the divergence metric that penalizes overconfident predictions. Performing backpropagation on this result to update the weights fine-tunes the model into a privacy-preserving state that balances classification utility.).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching that membership inference vulnerability is measured by the divergence between loss distributions of member and non-member data with Chen’s teaching of a training algorithm that applies a hyperparameter-scaled divergence penalty as a regularizer during backpropagation in order to defend against attacks and minimize the measurable privacy risk (Introduction of Chen).
	Carlini and Chen do not teach computing a batch loss of a minibatch.
	Zhang, in the same field of endeavor, teaches computing a batch loss of a minibatch… (Paragraph 111 of Zhang, “Under mini-batch SGD, the mini-batch loss of the l.sup.th selected vehicle can be calculated by averaging losses across all selected message graphs:”
Zhang discloses computing a mini-batch loss by averaging losses of multiple training samples under mini-batch SGD, which corresponds to computing a batch loss of a minibatch.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini and Chen’s teaching with Zhang’s teaching of computing a mini-batch loss in order to fine tune a model in a privacy-preserving manner while leveraging mini-batch SGD techniques used in deep learning (Paragraph 97 of Zhang).

Regarding claim 2,
Carlini teaches further comprising applying logit-scaling to the plurality of loss values (Page 6 Under Algorithm 1,“We first train N shadow models [60] on random samples from the data distribution D, so that half of these models are trained on the target point (x, y), and half are not (we call these respectively IN and OUT models for (x, y)).”, Page 10 of Logit scaling the loss function, “The first step of our attack projects the model’s confidences to a logit scale to ensure that the distributions that we work with are approximately normal… we find that using the model’s confidence f(x)y∈[0,1], or its logarithm (the cross-entropy loss), leads to poor performance of the attack since these statistics do not behave like Gaussians (recall from Figure 4).Our logit rescaling performs best, but the exact numerical computation of the logit function ϕ(p)=log(p1−p) matters. ”
Logit-scaling is performed on the plurality of model confidences to convert them into Gaussian distributions.).

Regarding claim 3,
Carlini teaches performing a modeling of the LINIT as a first Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss values (Page 4 Under Equation 3, “To minimize the number of shadow models necessary, we assume Q˜ in/out is a Gaussian distribution, reducing our attack to estimating just four parameters: the mean and variance of each distribution.”, Page 6 Under Algorithm 1, “In this case, we fit m dimensional spherical Gaussians 
    PNG
    media_image4.png
    34
    250
    media_image4.png
    Greyscale
 to the losses collected from querying the shadow models m times per example, and compute a standard likelihood-ratio test between two multivariate normal distributions.”
Carlini teaches this claim by instructing the user to collect a plurality of scores from shadow models, transform them to a logit scale, and calculate the mean and variance of the Q_out (Linit) to fit the Gaussian model.).

Regarding claim 4,
Carlini teaches applying logit-scaling to the batch loss (Page 6 Under Algorithm 1, “We then fit two Gaussians to the confidences of the IN and OUT models on (x, y) (in logit scale).”, Page 10 Logit scaling the loss function section, “The first step of our attack projects the model’s confidences to a logit scale to ensure that the distributions that we work with are approximately normal.”), 
and wherein the modeling of the LBATCH as a second Gaussian Random Variable is performed… (Page 4 Under Equation 3, “we assume Q~in/out is a Gaussian distribution, reducing our attack to estimating just four parameters: the mean and variance of each distribution.”, Algorithm 1 (lines 10-13) explicitly calculate mean and variance to define the Gaussian parameters.).
by computing a mean and a variance of the logit-scaled batch loss (See Line 10, 12, and 15 from Algorithm 1 on Page 6, “µin ← mean(confsin)… σ 2 in ← var(confsin)… 
    PNG
    media_image5.png
    40
    250
    media_image5.png
    Greyscale
”
Carlini’s method instructs the user to model the (Q_in / L_Batch) distribution as a Gaussian by calculating the mean (µin) and variance (σ^2in) from the logit-scaled confidence scores. Carlini teaches using these Gaussian parameters to perform a statistical test to determine membership).

Regarding claim 5,
Carlini teaches computing the divergence metric between LINIT and LBATCH… to measure a distance between LINIT and LBATCH (See Equation 2 on Page 4 of Carlini, “
    PNG
    media_image6.png
    43
    263
    media_image6.png
    Greyscale
”
The Likelihood-ratio test maps onto the divergence metric because both are mathematical methods used to quantify the “distance” or statistical difference between the two distributions Q_in (LBatch) and Q_out (Linit). The divergence metric determines how much the model’s behavior has shifted toward the trained state.)
Carlini does not teach comprises using a Kullback-Leibler divergence. 
Chen, in the same field of endeavor, teaches comprises using a Kullback-Leibler divergence… (Page 17 Under Section B.5, “Label-smoothing prevents overconfident predictions by incorporating a regularization term into the training objective that penalizes the distance (measured by the KL-divergence 
    PNG
    media_image7.png
    38
    517
    media_image7.png
    Greyscale
”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching of measuring membership inference vulnerability by computing a divergence metric between loss distributions with Chen’s teaching of using Kullback-Leibler (KL) divergence metric within a training regularization objective in order to implement a known statistical divergence method between loss distributions to defend against privacy attacks (Page 17 Section B.5).

Regarding claim 7,
	Carlini teaches the computing of the backpropagation to update the model parameters is performed using a Stochastic Gradient Descent (SGD) (Page 2 Section A Machine learning notation, “Neural networks are trained via stochastic gradient descent [32] to minimize some loss function : 
    PNG
    media_image8.png
    43
    280
    media_image8.png
    Greyscale
 ”, Page 12 Mismatched training procedures section, “In Figure 11b we fix the architecture to a WRN28-10, and vary the training optimizer: SGD, SGDM (SGD with momentum) or Adam.”
Equation 1 explicitly represents the parameter update step of the backpropagation algorithm, where gradients are used to minimize the loss via Stochastic Gradient Descent.).

	Regarding claim 8,
Carlini teaches the computing of the backpropagation to update the model parameters is performed using an adaptive learning rate method (Adam) (Page 2 Section A Machine learning notation, “to minimize some loss function: 
    PNG
    media_image8.png
    43
    280
    media_image8.png
    Greyscale
 ”, Page 12 Mismatched training procedures section, “In Figure 11b we fix the architecture to a WRN28-10, and vary the training optimizer: SGD, SGDM (SGD with momentum) or Adam.”
Carlini explicitly identifies Adam as a training optimizer used to minimize the loss function by dynamically adjusting learning rates, which serves as a functional implementation of the parameter update phase in the backpropagation process.).

Regarding claim 10, 
Carlini does not teach varying a pre-defined value of the hyperparameter X to change a privacy-robustness values of the training model. 
Chen, in the same field of endeavor, teaches varying a pre-defined value of the hyperparameter X to change a privacy-robustness values of the training model (Page 9 Under Practicality, “Our method involves a single hyperparameter α that controls the trade-off between privacy and utility. A fine-grained grid search on a validation set (i.e., first estimating the privacy-utility trade-off with varying value of α, and subsequently selecting the α corresponding to the desired privacy/utility level) allows precise control over the expected privacy/utility level of the target model.”, Page 18 Under Distillation, “To determine the hyper-parameter that best describes the privacy-utility trade-off, we conduct preliminary experiments and investigate the effect of α and T independently.”
Chen discloses a predefined hyperparameter that is intentionally varied to control the privacy-utility (privacy-robustness) trade-off of the trained model.).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching of a framework for measuring and defending against membership inference attacks with Chen’s teaching of using a hyperparameter tuned to control the trade-off between the privacy robustness and utility in order to enable control over the final model’s privacy robustness level for optimization (Page 9, Practicality of Chen).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Carlini (“Membership Inference Attacks From First Principles”, 2022) in view of Chen (“RELAXLOSS: DEFENDING MEMBERSHIP INFERENCE ATTACKS WITHOUT LOSING UTILITY”, 2022), and Nasr (“Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning”, 2019).

	Regarding claim 9, 
	Carlini does not teach the fine-tuning operation of the pre-trained model occurs in a federated learning setting, and the method further comprising: providing the output of the fine-tuned model to an aggregation server.
Nasr, in the same field of endeavor, teaches the fine-tuning operation of the pre-trained model occurs in a federated learning setting (Page 5 Stand-alone fine-tunning section, “At a later stage it is updated to f△ after being fine-tuned using a new dataset D△…. The model for inference attacks against fine-tunned models is a special case of our membership inference model for at-tacking federated learning.”),
 and the method further comprising: providing the output of the fine-tuned model to an aggregation server (Page 5 Federated Learning Section, “A central server keeps the latest version of the parameters W for the global model… In each epoch of training, each participant downloads the global parameters, updates them locally using SGD algorithm on their local training data, and uploads them back to the server.”
Nasr teaches fine-tuning a pre-trained model in a federated learning setting where participants locally update the model and provide the resulting fine-tuned model outputs (parameter updates) to a central aggregation server for global model aggregation.).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching of a method for fine-tuning a model to prevent membership inference attacks by measuring and minimizing loss distribution divergence with Nasr’s teaching of fine-tuning operations within an federated learning environment in order to apply the privacy-preserving fine-tuning method within a federated learning framework to protect sensitive data across distributed participants (Page 5, Federated Learning section of Nasr)

Claims 11-16 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Carlini
(“Membership Inference Attacks From First Principles”, 2022) in view of Chen (“RELAXLOSS: DEFENDING MEMBERSHIP INFERENCE ATTACKS WITHOUT LOSING UTILITY”, 2022), Zhang (US 20240064160 A1), Sadeghi (US 12537178 B2), and Nasr (“Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning”, 2019).

Regarding claim 11,
Carlini teaches configured to train a machine learning model and prevent data leakage from membership inference attacks (Section 1 Introduction Page 1, “Neural networks are now trained on increasingly sensitive datasets, and so it is necessary to ensure that trained models are privacy-preserving. In order to empirically verify if a model is in fact private, membership inference attacks [60] have become the de facto standard [42], [63] because of their simplicity. A membership inference attack receives as input a trained model and an example from the data distribution, and predicts if that example was used to train the model.”, Section IV Page 4, “We formalize this by considering two distributions over models: Qin(x,y)={f←T(D∪{(x,y)})∣D←D} is the distribution of models trained on datasets containing (x,y)… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
Carlini’s definition of Q_in as the distribution for models trained on datasets containing (x,y) maps to the “private data” limitation because it describes how a model’s behavior is fundamentally altered by the inclusion of specific, sensitive training samples. Since a fine-tuned model’s loss distribution (L_batch) is derived from its interaction with this private data, it functions the same as Q_in.); 
computing an initial loss distribution LINIT of a plurality of loss values (Section IV Page 4, “We formalize this by considering two distributions over models… is the distribution of models trained on datasets containing (x,y), and then Qout(x,y)={f←T(D∖{(x,y)})∣D←D}… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
Carlini establishes Q_out as the baseline distribution of losses for data points that the model has not yet been trained on which corresponds to the initial loss distribution LINIT.); 
…the private data after beginning a fine-tuning operation to transform the pre-trained model into a fine-tuned model, and computing a loss distribution LBATCH of the batch loss (Section 1 Introduction Page 1, “Neural networks are now trained on increasingly sensitive datasets, and so it is necessary to ensure that trained models are privacy-preserving. In order to empirically verify if a model is in fact private, membership inference attacks [60] have become the de facto standard [42], [63] because of their simplicity. A membership inference attack receives as input a trained model and an example from the data distribution, and predicts if that example was used to train the model.”, Section IV Page 4, “We formalize this by considering two distributions over models: Qin(x,y)={f←T(D∪{(x,y)})∣D←D} is the distribution of models trained on datasets containing (x,y)… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
Carlini’s definition of Q_in as the distribution for models trained on datasets containing (x,y) maps to the “private data” limitation because it describes how a model’s behavior is fundamentally altered by the inclusion of specific, sensitive training samples. Since a fine-tuned model’s loss distribution (L_batch) is derived from its interaction with this private data, it functions the same as Q_in.); 
computing a divergence metric between LINIT and LBATCH (Section IV Page 4, “…the best hypothesis test… is obtained by thresholding the Likelihood-ratio Test between the two hypotheses: 
    PNG
    media_image1.png
    79
    625
    media_image1.png
    Greyscale
”
The Likelihood-ratio test maps onto the divergence metric because both are mathematical methods used to quantify the “distance” or statistical difference between the two distributions Q_in (LBatch) and Q_out (Linit). The divergence metric determines how much the model’s behavior has shifted toward the trained state. If the result of the likelihood ratio is a high value, then it means that the model’s behavior shifted away from the un-trained baseline where it now belongs to the trained distribution.); 
Carlini does not teach receiving a pre-trained model and a pre-defined hyperparameter X as an input for machine learning; applying a forward pass by querying the pre-trained model with a private data; and multiplying an output of the divergence metric with a pre-defined hyperparameter a to obtain a result; adding the result to the batch loss as a regularizer; updating a model parameters by computing backpropagation on the regularized batch loss; and outputting the fine-tuned model.
Chen, in the same field of endeavor, teaches receiving a pre-trained model and a pre-defined hyperparameter X as an input for machine learning (Section 7 Page 9, “Our method involves a single hyperparameter α that controls the trade-off between privacy and utility. A fine-grained grid search on a validation set (i.e., first estimating the privacy-utility trade-off with varying value of α, and subsequently selecting the α corresponding to the desired privacy/utility level) allows precise control over the expected privacy/utility level of the target model.”, Adaptive Attack Section C.4 Page 20, “And for the NN-based attack, we use the complete logits prediction from the pre-trained shadow models as features to train the adaptive attack models (modeled as a NN).”); 
applying a forward pass by querying the pre-trained model with a private data (Page 3 Preliminaries Section, “We consider the standard setting of MIA: the attacker has access to a query set S = {(zi , mi)} N i=1 containing both member (training) and non-member (testing) samples drawn from the same data distribution Pdata, where mi is the membership attribute (mi = 1 if zi is a member). The task is to infer the value of the membership attribute mi associated with each query sample zi… which predicts mi for a given query sample zi and a target model parametrized by θ.”, Page 17 Model Architectures, “…we adopt the same architecture as used in Nasr et al. (2018) 12: a 4-layer fully-connected neural network”, See Algorithm 1, “for epoch in {1, ..., E} do for batch_index in {1, ..., K} do Get sample batch {(xi , yi)} B i=1 Perform forward pass: pi = f(xi ; θ), Section IV Page 4, “We formalize this by considering two distributions over models: Qin(x,y)={f←T(D∪{(x,y)})∣D←D} is the distribution of models trained on datasets containing (x,y)… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
The forward pass in Algorithm 1 simulates the attacker’s query by processing private training data to generate output predictions. The algorithm then applies backpropagation to fine tune the model’s weights. The membership attribute m_i identifies whether a queried sample z_i is private training data, and Algorithm 1 explicitly performs a forward pass on such samples. 
multiplying an output of the divergence metric with a pre-defined hyperparameter a to obtain a result; adding the result to the… loss as a regularizer (Defense Methods Section B.5 Page 17, “Label-smoothing prevents overconfident predictions by incorporating a regularization term into the training objective that penalizes the distance (measured by the KL-divergence) between the model predictions and the uniform distribution. The objective is formularized as follows

    PNG
    media_image2.png
    44
    516
    media_image2.png
    Greyscale
”
The equation defines a regularized loss by scaling the KL-divergence (divergence metric) by the hyperparameter a to penalize the overconfident predictions. This mathematical result is added to the standard cross-entropy loss to force the model toward a uniform distribution.);  
updating a model parameters by computing backpropagation of the regularized batch loss; and outputting the fine-tuned model (See Algorithm 1 on Page 4, “

    PNG
    media_image3.png
    761
    375
    media_image3.png
    Greyscale
”
The algorithm uses a forward pass to query the model with private data, then calculates a regularized loss where hyperparameter a (from Equation 14) scales the divergence metric that penalizes overconfident predictions. Performing backpropagation on this result to update the weights fine-tunes the model into a privacy-preserving state that balances classification utility.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching that membership inference vulnerability is measured by the divergence between loss distributions of member and non-member data with Chen’s teaching of a training algorithm that applies a hyperparameter-scaled divergence penalty as a regularizer during backpropagation in order to defend against attacks and minimize the measurable privacy risk (Introduction of Chen).
Carlini and Chen do not teach computing a batch loss of a minibatch.
	Zhang, in the same field of endeavor, teaches computing a batch loss of a minibatch… (Paragraph 111 of Zhang, “Under mini-batch SGD, the mini-batch loss of the l.sup.th selected vehicle can be calculated by averaging losses across all selected message graphs:”
Zhang discloses computing a mini-batch loss by averaging losses of multiple training samples under mini-batch SGD, which corresponds to computing a batch loss of a minibatch.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini and Chen’s teaching with Zhang’s teaching of computing a mini-batch loss in order to fine tune a model in a privacy-preserving manner while leveraging mini-batch SGD techniques used in deep learning (Paragraph 97 of Zhang).
Carlini, Chen, and Zhang do not teach a computing device configured to train a machine learning model and prevent data leakage from membership inference attacks, the computing device comprising: a processor; and a memory coupled to the processor, the memory storing instructions to cause the processor to perform acts.
Sadeghi, in the same field of endeavor, teaches [a] computing device… the computing device comprising: a processor; and a memory coupled to the processor, the memory storing instructions to cause the processor to perform acts (Paragraph 38 of Sadeghi, “In other features, the processor is configured to communicate with a remote computing device via a network, and the instructions cause the processor to allow the remote computing device to control the substrate processing system and to disallow manual control of the substrate processing system while the remote computing device controls the substrate processing system via the network.”): 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini and Chen’s teaching of training a model to prevent data leakage using a divergence-based regularizer with Sadeghi’s teaching of a generic computing device in order to implement a system configured to perform Carlini and Chen’s method to prevent data leakage from membership inference attacks (Paragraph 38 of Sadeghi).

Claim 12 is an apparatus claim that recites identical limitations to claim 2. Therefore, claim 12 is rejected using the same rationale as claim 2.

Regarding claim 13,
Carlini teaches applying logit- scaling to the batch loss (Page 6 Under Algorithm 1,“We first train N shadow models [60] on random samples from the data distribution D, so that half of these models are trained on the target point (x, y), and half are not (we call these respectively IN and OUT models for (x, y)).”, Page 10 of Logit scaling the loss function, “The first step of our attack projects the model’s confidences to a logit scale to ensure that the distributions that we work with are approximately normal… we find that using the model’s confidence f(x)y∈[0,1], or its logarithm (the cross-entropy loss), leads to poor performance of the attack since these statistics do not behave like Gaussians (recall from Figure 4).Our logit rescaling performs best, but the exact numerical computation of the logit function ϕ(p)=log(p1−p) matters. ”
Logit-scaling is performed on the plurality of model confidences to convert them into Gaussian distributions.);
performing a modeling of the LINIT as a Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss (Page 4 Under Equation 3, “To minimize the number of shadow models necessary, we assume Q˜ in/out is a Gaussian distribution, reducing our attack to estimating just four parameters: the mean and variance of each distribution.”, Page 6 Under Algorithm 1, “In this case, we fit m dimensional spherical Gaussians 
    PNG
    media_image4.png
    34
    250
    media_image4.png
    Greyscale
 to the losses collected from querying the shadow models m times per example, and compute a standard likelihood-ratio test between two multivariate normal distributions.”
Carlini teaches this claim by instructing the user to collect a plurality of scores from shadow models, transform them to a logit scale, and calculate the mean and variance of the Q_out (Linit) to fit the Gaussian model.);
performing a modeling of the LINIT as a first Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss values Page 4 Under Equation 3, “To minimize the number of shadow models necessary, we assume Q˜ in/out is a Gaussian distribution, reducing our attack to estimating just four parameters: the mean and variance of each distribution.”, Page 6 Under Algorithm 1, “In this case, we fit m dimensional spherical Gaussians 
    PNG
    media_image4.png
    34
    250
    media_image4.png
    Greyscale
 to the losses collected from querying the shadow models m times per example, and compute a standard likelihood-ratio test between two multivariate normal distributions.”
Carlini teaches this claim by instructing the user to collect a plurality of scores from shadow models, transform them to a logit scale, and calculate the mean and variance of the Q_out (Linit) to fit the Gaussian model.); 
and performing a modeling of the LBATCH as a second Gaussian Random Variable by computing a mean and a variance of the logit-scaled batch loss (Page 4 Under Equation 3, “we assume Q~in/out is a Gaussian distribution, reducing our attack to estimating just four parameters: the mean and variance of each distribution.”, Algorithm 1 (lines 10-13) explicitly calculate mean and variance to define the Gaussian parameters.).
Carlini does not teach the instructions cause the processor to perform additional acts.
Sadeghi, in the same field of endeavor, teaches the instructions cause the processor to perform additional acts (Paragraph 38 of Sadeghi, “In other features, the processor is configured to communicate with a remote computing device via a network, and the instructions cause the processor to allow the remote computing device to control the substrate processing system and to disallow manual control of the substrate processing system while the remote computing device controls the substrate processing system via the network.”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching of training a model to prevent data leakage using a divergence-based regularizer with Sadeghi’s teaching of a generic computing device in order to implement a system configured to perform Carlini and Chen’s method to prevent data leakage from membership inference attacks (Paragraph 38 of Sadeghi).

Claim 14 is an apparatus claim that recites identical limitations to claim 5. Therefore, claim 14 is rejected using the same rationale as claim 5.
Claim 15 is an apparatus claim that recites identical limitations to claim 6. Therefore, claim 15 is rejected using the same rationale as claim 6.

Regarding claim 16,
Carlini teaches the update rule for the computing of the backpropagation to update the model parameters is performed by a Stochastic Gradient Descent (SGD) or an adaptive learning rate method (Adam) (Page 2 Section A Machine learning notation, “Neural networks are trained via stochastic gradient descent [32] to minimize some loss function : 
    PNG
    media_image8.png
    43
    280
    media_image8.png
    Greyscale
 ”, Page 12 Mismatched training procedures section, “In Figure 11b we fix the architecture to a WRN28-10, and vary the training optimizer: SGD, SGDM (SGD with momentum) or Adam.”
Equation 1 explicitly represents the parameter update step of the backpropagation algorithm, where gradients are used to minimize the loss via Stochastic Gradient Descent.).

Claim 18 is an apparatus claim that recites identical limitations to claim 10. Therefore, claim 18 is rejected using the same rationale as claim 10.

Regarding claim 19,
Carlini teaches …to compute an initial loss distribution LINIT of a plurality of the loss values (Section IV Page 4, “We formalize this by considering two distributions over models… is the distribution of models trained on datasets containing (x,y), and then Qout(x,y)={f←T(D∖{(x,y)})∣D←D}… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
Carlini establishes Q_out as the baseline distribution of losses for data points that the model has not yet been trained on which corresponds to the initial loss distribution LINIT.);
 …the private data after beginning a fine-tuning operation to transform the pre-trained model into a fine-tuned model, and to compute a loss distribution LBATCH of the batch loss (Section 1 Introduction Page 1, “Neural networks are now trained on increasingly sensitive datasets, and so it is necessary to ensure that trained models are privacy-preserving. In order to empirically verify if a model is in fact private, membership inference attacks [60] have become the de facto standard [42], [63] because of their simplicity. A membership inference attack receives as input a trained model and an example from the data distribution, and predicts if that example was used to train the model.”, Section IV Page 4, “We formalize this by considering two distributions over models: Qin(x,y)={f←T(D∪{(x,y)})∣D←D} is the distribution of models trained on datasets containing (x,y)… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
Carlini’s definition of Q_in as the distribution for models trained on datasets containing (x,y) maps to the “private data” limitation because it describes how a model’s behavior is fundamentally altered by the inclusion of specific, sensitive training samples. Since a fine-tuned model’s loss distribution (L_batch) is derived from its interaction with this private data, it functions the same as Q_in.); 
…to compute a divergence metric between LINIT and LBATCH… (Section IV Page 4, “…the best hypothesis test… is obtained by thresholding the Likelihood-ratio Test between the two hypotheses: 
    PNG
    media_image1.png
    79
    625
    media_image1.png
    Greyscale
”
The Likelihood-ratio test maps onto the divergence metric because both are mathematical methods used to quantify the “distance” or statistical difference between the two distributions Q_in (LBatch) and Q_out (Linit). The divergence metric determines how much the model’s behavior has shifted toward the trained state. If the result of the likelihood ratio is a high value, then it means that the model’s behavior shifted away from the un-trained baseline where it now belongs to the trained distribution.) 
Carlini does not teach …receive a pre-trained model and a pre-defined hyperparameter X as an input; …apply a forward pass by querying the pre-trained model with a private data……and multiply an output of the divergence metric with a pre-defined hyperparameter A. to obtain a result, and to add the result to the batch loss as a regularizer; …to update a model parameters by computing backpropagation on the regularized batch loss; and… to output the fine-tuned model.
Chen, in the same field of endeavor, teaches …receive a pre-trained model and a pre-defined hyperparameter X as an input (Section 7 Page 9, “Our method involves a single hyperparameter α that controls the trade-off between privacy and utility. A fine-grained grid search on a validation set (i.e., first estimating the privacy-utility trade-off with varying value of α, and subsequently selecting the α corresponding to the desired privacy/utility level) allows precise control over the expected privacy/utility level of the target model.”, Adaptive Attack Section C.4 Page 20, “And for the NN-based attack, we use the complete logits prediction from the pre-trained shadow models as features to train the adaptive attack models (modeled as a NN).”); 
…apply a forward pass by querying the pre-trained model with a private data (Page 3 Preliminaries Section, “We consider the standard setting of MIA: the attacker has access to a query set S = {(zi , mi)} N i=1 containing both member (training) and non-member (testing) samples drawn from the same data distribution Pdata, where mi is the membership attribute (mi = 1 if zi is a member). The task is to infer the value of the membership attribute mi associated with each query sample zi… which predicts mi for a given query sample zi and a target model parametrized by θ.”, Page 17 Model Architectures, “…we adopt the same architecture as used in Nasr et al. (2018) 12: a 4-layer fully-connected neural network”, See Algorithm 1, “for epoch in {1, ..., E} do for batch_index in {1, ..., K} do Get sample batch {(xi , yi)} B i=1 Perform forward pass: pi = f(xi ; θ)”, Section IV Page 4, “We formalize this by considering two distributions over models: Qin(x,y)={f←T(D∪{(x,y)})∣D←D} is the distribution of models trained on datasets containing (x,y)… To simplify the situation, we instead define Qin and Qout as the distributions of losses on (x,y) for models either trained, or not trained, on this example.”
The forward pass in Algorithm 1 simulates the attacker’s query by processing private training data to generate output predictions. The algorithm then applies backpropagation to fine tune the model’s weights. The membership attribute m_i identifies whether a queried sample z_i is private training data, and Algorithm 1 explicitly performs a forward pass on such samples. ); 
…and multiply an output of the divergence metric with a pre-defined hyperparameter A. to obtain a result, and to add the result to the batch loss as a regularizer (Defense Methods Section B.5 Page 17, “Label-smoothing prevents overconfident predictions by incorporating a regularization term into the training objective that penalizes the distance (measured by the KL-divergence) between the model predictions and the uniform distribution. The objective is formularized as follows

    PNG
    media_image2.png
    44
    516
    media_image2.png
    Greyscale
”
The equation defines a regularized loss by scaling the KL-divergence (divergence metric) by the hyperparameter a to penalize the overconfident predictions. This mathematical result is added to the standard cross-entropy loss to force the model toward a uniform distribution.); 
…to update a model parameters by computing backpropagation on the regularized batch loss; and… to output the fine-tuned model (See Algorithm 1 on Page 4, “

    PNG
    media_image3.png
    761
    375
    media_image3.png
    Greyscale
”
The algorithm uses a forward pass to query the model with private data, then calculates a regularized loss where hyperparameter a (from Equation 14) scales the divergence metric that penalizes overconfident predictions. Performing backpropagation on this result to update the weights fine-tunes the model into a privacy-preserving state that balances classification utility.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching that membership inference vulnerability is measured by the divergence between loss distributions of member and non-member data with Chen’s teaching of a training algorithm that applies a hyperparameter-scaled divergence penalty as a regularizer during backpropagation in order to defend against attacks and minimize the measurable privacy risk (Introduction of Chen).
Carlini and Chen do not teach a computer program product comprising: one or more computer-readable storage devices and program instructions stored on at least one of the one or more computer-readable storage devices, the program instructions executable by a processor, the program instructions comprising.
Sadeghi, in the same field of endeavor, teaches a computer program product comprising: one or more computer-readable storage devices and program instructions stored on at least one of the one or more computer-readable storage devices, the program instructions executable by a processor, the program instructions comprising (Paragraph 107 of Sadeghi, “The terms server and client device are to be understood broadly as representing computing devices with one or more processors and memory configured to execute machine readable instructions. The terms application and computer program are to be understood broadly as representing machine readable instructions executable by the computing devices.”): 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini and Chen’s teaching of training a model to prevent data leakage using a divergence-based regularizer with Sadeghi’s teaching of a generic computing device in order to implement a system configured to perform Carlini and Chen’s method to prevent data leakage from membership inference attacks (Paragraph 38 of Sadeghi).

Claims 6, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Carlini (“Membership Inference Attacks From First Principles”, 2022) in view of Chen (“RELAXLOSS: DEFENDING MEMBERSHIP INFERENCE ATTACKS WITHOUT LOSING UTILITY”, 2022), Sadeghi (US 12537178 B2), and Nasr (“Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning”, 2019).

Regarding claim 6,
	Carlini teaches the updating of the model parameters by computing the backpropagation is performed by using an update rule, and the method further comprising (“Page 2 Section A Machine learning notation, “Neural networks are trained via stochastic gradient descent [32] to minimize some loss function : 
    PNG
    media_image8.png
    43
    280
    media_image8.png
    Greyscale
 ”): 
Carlini does not teach determining, prior to outputting… model meets a termination criterion.
Sadeghi, in the same field of endeavor, teaches determining, prior to outputting… model meets a termination criterion (Paragraph 223, “Control returns to the 2952 if the model does not meet the predetermined training criteria. Control ends if the model meets the predetermined training criteria.”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching of updating model parameters via stochastic gradient descent (SGD) update rule during training with Sadeghi’s teaching of applying a predetermined termination criterion to control the training of the model in order to provide an automated mechanism that would terminate training based on a threshold (Paragraph 223 of Sadeghi).
Carlini and do not teach a fine-tuned model.
	Nasr, in the same field of endeavor, teaches the fine-tuned model (Page 5 Section B, “In addition to such attack setting, the attacker might observe an updated version of the model after fine-tuning, for instance, which is very common in deep learning. Besides, in the case of federated learning, the attacker can be an entity who participates in the training. The settings of fine-tunning and federated learning are depicted in Table I.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini and Sadeghi’s teaching with Nasr’s teaching of training on specific data to fine tune a model in order to produce a practical method that yields a fine tuned model as the standard output of the privacy-preserving training process (Page 5 Section B of Nasr).

Regarding claim 17,
	Carlini does not teach the instructions cause the processor to perform additional acts comprising: training the fine-tuned model in a federated learning setting, and providing the output of the fine-tuned model to an aggregation server that shares an update of fine-tuned model.
Nasr, in the same field of endeavor, teaches training the fine-tuned model in a federated learning setting (Page 5 Stand-alone fine-tunning section, “At a later stage it is updated to f△ after being fine-tuned using a new dataset D△…. The model for inference attacks against fine-tunned models is a special case of our membership inference model for at-tacking federated learning.”);
 and providing the output of the fine-tuned model to an aggregation server that shares an update of fine-tuned model (Page 5 Federated Learning Section, “A central server keeps the latest version of the parameters W for the global model… In each epoch of training, each participant downloads the global parameters, updates them locally using SGD algorithm on their local training data, and uploads them back to the server.”, Page 5 Federated Learning, “In this setting, N participants… collaborate to train a global model”
Nasr teaches fine-tuning a pre-trained model in a federated learning setting where participants locally update the model and provide the resulting fine-tuned model outputs (parameter updates) to a central aggregation server for global model aggregation.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s method of fine-tuning a model to prevent data leakage with Nasr’s teaching of performing fine-tuning in a federated learning setting in order to extend the privacy-preserving training method to a distributed, federated environment (Page 5, Federated Learning Section of Nasr.
Carlini and Nasr do not teach the instructions cause the processor to perform additional acts.
Sadeghi teaches the instructions cause the processor to perform additional acts (Paragraph 38 of Sadeghi, “In other features, the processor is configured to communicate with a remote computing device via a network, and the instructions cause the processor to allow the remote computing device to control the substrate processing system and to disallow manual control of the substrate processing system while the remote computing device controls the substrate processing system via the network.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini and Nasr’s teaching with Sadeghi’s teaching of a processor executing instructions stored in memory to perform certain acts in order to implement the federated, privacy-preserving fine-tuning method on a standard computing device (Paragraph 38 of Sadeghi).

Regarding claim 20,
Carlini teaches …to apply logit-scaling to the loss values obtained from the forward pass (Page 6 Under Algorithm 1,“We first train N shadow models [60] on random samples from the data distribution D, so that half of these models are trained on the target point (x, y), and half are not (we call these respectively IN and OUT models for (x, y)).”, Page 10 of Logit scaling the loss function, “The first step of our attack projects the model’s confidences to a logit scale to ensure that the distributions that we work with are approximately normal… we find that using the model’s confidence f(x)y∈[0,1], or its logarithm (the cross-entropy loss), leads to poor performance of the attack since these statistics do not behave like Gaussians (recall from Figure 4).Our logit rescaling performs best, but the exact numerical computation of the logit function ϕ(p)=log(p1−p) matters. ”); 
…to perform a modeling of the LINIT as a first Gaussian Random Variable by computing a mean and a variance of the logit-scaled loss values (Page 4 Under Equation 3, “To minimize the number of shadow models necessary, we assume Q˜ in/out is a Gaussian distribution, reducing our attack to estimating just four parameters: the mean and variance of each distribution.”, Page 6 Under Algorithm 1, “In this case, we fit m dimensional spherical Gaussians 
    PNG
    media_image4.png
    34
    250
    media_image4.png
    Greyscale
 to the losses collected from querying the shadow models m times per example, and compute a standard likelihood-ratio test between two multivariate normal distributions.”
Carlini teaches this claim by instructing the user to collect a plurality of scores from shadow models, transform them to a logit scale, and calculate the mean and variance of the Q_out (Linit) to fit the Gaussian model.); 
…to apply logit-scaling to the batch loss (Page 6 Under Algorithm 1, “We then fit two Gaussians to the confidences of the IN and OUT models on (x, y) (in logit scale).”, Page 10 Logit scaling the loss function section, “The first step of our attack projects the model’s confidences to a logit scale to ensure that the distributions that we work with are approximately normal.”), 
and to perform a modeling of the LBATCH as a second Gaussian Random Variable by computing a mean and variance of the logit-scaled batch loss (Page 4 Under Equation 3, “we assume Q~in/out is a Gaussian distribution, reducing our attack to estimating just four parameters: the mean and variance of each distribution.”, Algorithm 1 (lines 10-13) explicitly calculate mean and variance to define the Gaussian parameters.); 
…to compute the backpropagation to update the model parameters by using an update rule (“Page 2 Section A Machine learning notation, “Neural networks are trained via stochastic gradient descent [32] to minimize some loss function : 
    PNG
    media_image8.png
    43
    280
    media_image8.png
    Greyscale
 ”): 
Carlini does not teach …prior to output of the fine-tuned model to determine that the fine-tuned model meets a termination criterion and program instructions.
Sadeghi teaches …prior to output of the… model to determine that the… model meets a termination criterion (Paragraph 223, “Control returns to the 2952 if the model does not meet the predetermined training criteria. Control ends if the model meets the predetermined training criteria.”).
program instructions (Paragraph 107 of Sadeghi, “The terms server and client device are to be understood broadly as representing computing devices with one or more processors and memory configured to execute machine readable instructions.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini’s teaching of updating model parameters via stochastic gradient descent (SGD) update rule during training with Sadeghi’s teaching of applying a predetermined termination criterion to control the training of the model in order to provide an automated mechanism that would terminate training based on a threshold (Paragraph 223 of Sadeghi).
Carlini and Sadeghi do not teach a fine-tuned model.
Nasr, in the same field of endeavor, teaches the fine-tuned model (Page 5 Section B, “In addition to such attack setting, the attacker might observe an updated version of the model after fine-tuning, for instance, which is very common in deep learning. Besides, in the case of federated learning, the attacker can be an entity who participates in the training. The settings of fine-tunning and federated learning are depicted in Table I.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Carlini and Sadeghi’s teaching with Nasr’s teaching of training on specific data to fine tune a model in order to produce a practical method that yields a fine tuned model as the standard output of the privacy-preserving training process (Page 5 Section B of Nasr).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAJD MAHER HADDAD whose telephone number is (571)272-2265. The examiner can normally be reached Mon-Friday 8-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.M.H./Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
DEFENSE FROM MEMBERSHIP INFERENCE ATTACKS IN TRANSFER LEARNING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

DEFENSE FROM MEMBERSHIP INFERENCE ATTACKS IN TRANSFER LEARNING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email