Office Action Analysis: 18189489 — LEARNING METHOD FOR ENHANCING ROBUSTNESS OF A NEURAL NETWORK

Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to the application filed on March 24, 2023. 
Claims 1-7 are pending and have been examined. Claims 1-4 and 6-7 are rejected under 35 U.S.C. 101 and 103. Claim 5 is rejected under 35 U.S.C 101 and 112(b).

Priority
Applicants’ claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. The present application claims foreign priority based on Korean Patent Application No. 10-2022-0147423 filed November 7, 2022. The examiner notes that a certified copy (in Korean) of the above-noted application was received on March 24, 2023.

Information Disclosure Statement
Acknowledgment is made of the information disclosure statements filed March 24, 2023, which comply with 37 CFR 1.97. As such, the information disclosure statements have been placed in the application file and the information referred to therein has been considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. § 112(b):
(b) CONCLUSION – The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. § 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 5 is rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
Regarding Claim 5, the limitation "wherein Ldist is the second loss function, Y1 is the first output data, Y2 is the second output data, LKw is the Kullback-Leibler divergence function, and T is a temperature coefficient used to adjust the characteristics of the distributions used in the second loss function Ldist" does not clearly set the metes and bounds of the patent protection desired. There is insufficient antecedent basis for this limitation in the claim, rendering the claim indefinite because “the Kullback-Leibler divergence function” is not defined in Claim 2, which Claim 5 is dependent on. 
Claim Rejections - 35 USC § 101
35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1 – 7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
According to the USPTO guidelines, a claim is directed to non-statutory subject matter if:
Step 1: The claim does not fall within one of the four statutory categories of invention (process, machine, manufacture, or composition of matter), or,
Step 2: The claim recites a judicial exception, e.g. an abstract idea, without reciting additional elements that amount to significantly more than the judicial exception, as determined using the following analysis:
Step 2A, Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Step 2A, Prong 2: Does the claim recite additional elements that integrate the judicial exception into a practical application?
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
MPEP 2106.04(a)(2)(I) states: “The mathematical concepts grouping is defined as mathematical relationships, mathematical formulas or equations, and mathematical calculations.”
MPEP 2106.04(a)(2)(III) states: “Accordingly, the “mental processes” abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgements, and opinions.
Further, the MPEP states: “The courts do not distinguish between mental processes that are performed entirely in the human mind and mental processes that require a human to use a physical aid (e.g. pen and paper or a slide run) to perform the claim limitation.
Using the two-step inquiry, it is clear that Claims 1 – 7 are each directed to an abstract idea as shown below:
With respect to Claim 1, which is an independent claim:
Step 1: Yes, Claim 1 recites a learning method, also known as a process, which is one of the four statutory categories of patentable subject matter.
Step 2A, Prong 1: Yes, a judicial exception is recited in this claim as it recites a mathematical calculation:
“adding noise to weights of the first neural network”
The claim is explicitly reciting that math is being performed (adding) and this is consistent with the specification at [0035].
“calculating a loss function using the first output data, the second output data, and a true value corresponding to the input data”
The claim is explicitly reciting a math calculation. A loss function is a mathematical equation.
Step 2A, Prong 2: Furthermore, MPEP 2106.05(g) Insignificant Extra-Solution Activity has found mere data gathering and post-solution activity to be insignificant extra-solution activity:
“preparing a second neural network having the same weights as a first neural network which is pre-trained”
There is no description in the claim itself, or in the specification, on what is meant by “preparing” the second neural network. Thus, this limitation amounts to covering any method of “preparing” a neural network with no restriction on how that is accomplished and no description of the mechanism used for “preparing”, thus, this only amounts to “apply it” and mere instructions to implement an abstract idea on a computer since preparing a second neural network having the same weights as a pre-trained first neural network means using a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)(1).
“generating a first output data of the first neural network and generating a second output data of the second neural network by providing input data to the first neural network and the second neural network”
This part of the claim only amounts to using generically recited neural networks in their ordinary capacity (e.g. receiving input data and generating output data accordingly) in accordance with MPEP 2106.05(f)(2).
Step 2B: No, the additional elements of Claim 1 do not provide significantly more than the abstract idea itself. Preparing a second neural network with the same weights as a first pre-trained neural network only amounts to “apply it” and mere instructions to implement an abstract idea on a computer - see MPEP 2106.05(f)(1). Providing input data and generating output data are well-understood, routine, and conventional activity of transmitting or receiving data over a network - see MPEP 2106.05(d).
Therefore, Claim 1 is directed to non-statutory subject matter and rejected. 
With respect to Claim 2, which depends upon Claim 1:
A judicial exception is recited in this claim as it recites a mathematical calculation:
“calculating a first loss function using the first output data and the true value;
calculating a second loss function using the first output data and the second output data;
calculating a third loss function using the second output data and the true value;
combining the first loss function, the second loss function, and the third loss function.”; Recites a calculation of loss functions, which are mathematical calculations that quantify the difference between a model’s predicted output and the actual true values.
Therefore, Claim 2 is directed to non-statutory subject matter and rejected. 
With respect to Claim 3, which depends upon Claim 2:
A judicial exception is recited in this claim as it recites a mathematical calculation:
“wherein the first loss function corresponds to a cross-entropy between the first output data and the true value, and third loss function corresponds to a cross-entropy between the second output data and the true value”; Recites a calculation of loss functions, which are mathematical calculations that quantify the difference between a model’s predicted output and the actual true values.
Therefore, Claim 3 is directed to non-statutory subject matter and rejected. 
With respect to Claim 4, which depends upon Claim 2:
A judicial exception is recited in this claim as it recites a mathematical calculation:
“wherein the second loss function corresponds to a Kullback-Leibler divergence function receiving a distribution generated from the first output data and a distribution generated from the second output data”; Recites a calculation of a loss function, specifically a Kullback-Leibler divergence function, which is a mathematical calculation that quantifies the information lost when one probability distribution is used to approximate a true distribution.
Therefore, Claim 4 is directed to non-statutory subject matter and rejected. 
With respect to Claim 5, which depends upon Claim 2:
A judicial exception is recited in this claim as it recites a mathematical calculation:
“wherein the second loss function is determined according to the equation:
 
    PNG
    media_image1.png
    61
    521
    media_image1.png
    Greyscale

wherein Ldist is the second loss function, Y1 is the first output data, Y2 is the second output data, LKw is the Kullback-Leibler divergence function, and T is a temperature coefficient used to adjust the characteristics of the distributions used in the second loss function Ldist”; Recites a specific mathematical equation that can be mathematically calculated using the stated variables. 
Therefore, Claim 5 is directed to non-statutory subject matter and rejected. 
With respect to Claim 6, which depends upon Claim 2:
A judicial exception is recited in this claim as it recites a mathematical calculation:
“wherein the first loss function, the second loss function, and the third loss function are linearly combined, and wherein a sum of a coefficient applied to the first loss function and a coefficient applied to the second loss function is equal to 1”; Recites a linear combination of loss functions and the sum of a coefficient applied to the first and second loss function equal to 1, which are a series of mathematical calculations that combine mathematical objects (i.e. functions and variables) by multiplying each by a constant scalar and adding the results together.
Therefore, Claim 6 is directed to non-statutory subject matter and rejected. 
With respect to Claim 7, which depends upon Claim 1:
The following is a generic link according to MPEP 2106.05(h):
“wherein the first neural network is identical to the second neural network”; This claim element sets forth that the first neural network is identical to the second neural network. This just further describes what the second neural network is, and does not add a limitation that would amount to a practical application or significantly more.
Therefore, Claim 7 is directed to non-statutory subject matter and rejected. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e. changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claim(s) 1-4 and 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over “Distilling the Knowledge in a Neural Network” by Hinton et al., (non-patent literature, hereinafter “Hinton”), in view of “Semi-supervised Image Deraining Using Knowledge Distillation” by Cui et al., (non-patent literature, hereinafter “Cui”), in further view of “Layer-Wise Distillation for Protecting Pre-Trained Neural Network Models” by Chakraborty et al., (US20200311540, hereinafter “Chakraborty”).
With respect to Claim 1:
Hinton teaches:
a. “A learning method of a neural network, the learning method comprising…a first neural network which is pre-trained;” (Page 1, Section 1 “Introduction”, teaches the existence of two neural network models, a cumbersome model known as the first model that is trained. The pre-trained first model can then be trained again, known as distillation, to transfer knowledge to a second model.)
b. “generating a first output data of the first neural network and generating a second output data of the second neural network by providing input data to the first neural network and the second neural network;” (Page 2, Section 1 “Introduction”, teaches the training of a cumbersome model, also known as the first neural network, where a training set of input data is provided to the first neural network and generates soft targets, also known as a first output data, of the first neural network. The same training set or a separate “transfer” set is also provided as input data to a distilled model, also known as the second neural network, and generates soft targets, also known as a second output data, of the second neural network. 
c. “calculating a loss function using the first output data, the second output data, and a true value corresponding to the input data.” (Page 3, Section 2 “Distillation”, teaches a calculation of a weighted average, which is used as a loss function in machine learning, of two different objective functions. This loss function is calculated using soft targets (output data) from the cumbersome model (first model), soft targets (output data) from the distilled model (second model), and the correct labels (true values) corresponding with the input data.
Hilton does not appear to explicitly disclose:d. “preparing a second neural network having the same weights as a first neural network”e. “adding noise to weights of the first neural network”
However, Cui teaches:
f. “preparing a second neural network having the same weights as a first neural network” (Page 6, Section B “Semi-supervised Image Deraining using Knowledge Distillation”, teaches a teacher network, or first neural network, is first trained through a supervised learning procedure, which means it is pre-trained. During the second training stage, a student network, or second neural network, “is first initialized with the parameters of the teacher model”. This means the second neural network is prepared having the same parameters, which includes weights, as the pre-trained first neural network.	
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the present application to implement a part of Claim 1 that utilized the teachings of Hinton and the teachings of Cui, which are both in the same field of invention. Prior to the current application’s effective filing date, knowledge distillation was a known phenomenon in machine learning. A PHOSITA would have been motivated to apply known training techniques for improving neural network robustness to a specific knowledge distillation architecture in order to generate more reliable output data from a teacher neural network and improve the training of a student neural network.
The combination of Hinton and Cui do not appear to explicitly disclose:
“adding noise to weights of the first neural network”
However, Chakraborty teaches:
“adding noise to weights of the first neural network” (Paragraph 0044 and Paragraph 0069 teach the process of introducing noise into the operational parameters of the individual layers (e.g. “the weights provided in a weight matrix”) of a pre-trained neural network. 
It would have been obvious to a PHOSITA before the effective filing date of the present application to implement a method like Claim 1 that utilized the teachings of Hinton, the teachings of Cui, and the teachings of Chakraborty, which are all in the same field of invention. A PHOSITA would have been motivated to apply the addition of noise to a teacher neural network taught by Chakraborty to the pre-trained cumbersome model (teacher model) taught by Hinton in order to improve the robustness of the overall model and output data used for knowledge distillation. This would, in turn, improve the training quality of the distilled model (student model). The addition of using self-distillation where a teacher and student model have identical architectures, as taught by Cui, is shown to be beneficial for making the output data from a teacher model more stable and informative for training a student model.
With respect to Claim 2:
Hinton teaches:
“The learning method of claim 1, wherein calculating the loss function comprises:
calculating a first loss function using the first output data and the true value;” (Page 1, Section 1 “Introduction”, teaches a cumbersome trained model, also known as the first trained model, is the same as a supervised pre-trained teacher model, which means an inherent loss function is computed using the first model’s output value and a true value before the knowledge distillation process begins. 
b. “calculating a second loss function using the first output data and the second output data;” (Page 3, Section 2 “Distillation”, teaches an objective function described as a cross entropy, which is a type of loss function, is calculated using the soft targets from the cumbersome model (first output data from the first model) and softmax of the distilled model (second output data from the second model).
c. “calculating a third loss function using the second output data and the true value;” (Page 3, Section 2 “Distillation”, teaches a second objective function described as a cross entropy, which is a type of loss function, is calculated using the logits in softmax of the distilled model (second output data from the second model) and the correct labels (the true value).
d. “combining the first loss function, the second loss function, and the third loss function.” (Page 3, Section 2 “Distillation”, teaches the use of “a weighted average of two different objective functions”. A weighted average is a combination of functions where each function contributes proportionally based on the assigned weight amount. Further, Hinton teaches the calculation of a first loss function during the training of the first model using the first model’s output data and a true value. Another loss function is calculated between the first model output data and second model output data. Another loss function is calculated between the second model output data and a true value. Hinton teaches combining multiple loss functions. It would have been obvious to a PHOSITA to combine three loss functions (as described in Claim 2) instead of just two loss functions since combining multiple loss functions is a well-known phenomenon when training neural networks.

With respect to Claim 3:
Hinton teaches:
a. “The learning method of claim 2, wherein the first loss function corresponds to a cross-entropy between the first output data and the true value,” (Page 1, Section 1 “Introduction”, teaches a cumbersome model that has been trained. A PHOSITA would understand that training a teacher model for classification using labeled data typically involves calculating a cross-entropy loss function between the teacher model’s output and the true values before the knowledge distillation process begins. So, Hinton teaches a first cross-entropy loss between the first output data (from the cumbersome model) and the true value.
b. “and third loss function corresponds to a cross-entropy between the second output data and the true value” (Page 3, Section 2 “Distillation”, teaches an objective function described as a cross entropy, which is a type of loss function, is calculated using the logits in softmax of the distilled model (second output data from the second model) and the correct labels (the true value).	

With respect to Claim 4:
Hinton teaches:
a. “The learning method of claim 2, wherein the second loss function corresponds to a Kullback-Leibler divergence function receiving a distribution generated from the first output data and a distribution generated from the second output data” (Page 3, Section 2 “Distillation”, teaches a cross entropy between soft targets from the cumbersome model and soft targets from the distilled model. A PHOSITA would know that using cross-entropy with soft targets from a first model (also known as a teacher model) and softmax predictions from a second model (also known as a student model) is functionally equivalent to minimizing Kullback-Leibler divergence in knowledge distillation.
With respect to Claim 6:
Hinton teaches:
a. “The learning method of claim 2, wherein the first loss function, the second loss function, and the third loss function are linearly combined, and wherein a sum of a coefficient applied to the first loss function and a coefficient applied to the second loss function is equal to 1.” (Pages 2 and 3, Section 2 “Distillation”, teach multiple loss functions combined as a weighted average (also known as a specific type of linear combination where the weights are non-negative numbers that sum to 1). It would have been obvious to a PHOSITA to include a third loss function and normalize the coefficients so that their sum equals to one. Extending a function to include additional loss functions and normalizing weights within a loss function are well-known when it comes to training neural networks since they keep the combined loss function at a consistent scale to ensure no single loss dominates the training process.
With respect to Claim 7:
Cui teaches:
a. “The learning method of claim 1, wherein the first neural network is identical to the second neural network.” (Page 4, Section A “Network Architecture”, teaches self-distillation, which is a known method in machine learning where a teacher neural network and a student neural network share identical neural network architecture. Cui teaches a first training stage where the teacher neural network is trained. Subsequently, the student neural network is then trained through a known method in machine learning called knowledge distillation.  This second training stage is where a student neural network having identical architecture (“i.e. a three-layer pyramid structure”) to the teacher neural network is trained using the teacher neural network’s output data through the application of loss functions.  A PHOSITA would have known to use self-distillation techniques taught by Cui in order to improve the transfer of knowledge during training and avoid a situation where a student neural network does not have the ability to absorb the knowledge from a teacher neural network due to its large size or different structure.

Allowable Subject Matter
Claim 5 includes allowable subject matter, but is currently rejected under 35 U.S.C. 101 and 35 U.S.C. 112(b).  Claim 5 would be allowed if rewritten to overcome the rejections set forth under 35 U.S.C. 101 and 35 U.S.C. 112(b), as presented above.

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vibha Bhat whose telephone number is (571)-272-7091. The examiner can normally be reached on Monday – Thursday from 8:00 AM to 5:00 PM EST and every other Friday from 8:00 AM to 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. See MPEP § 713.01. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at https://www.uspto.gov/interviewpractice.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes, can be reached at telephone number (571)-270-1006. The fax phone number for the organization where this application or proceeding is assigned is (571)-273-8300.	Information regarding the status of an application may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or (572)-272-1000.	

/Vibha Bhat/Examiner
Art Unit 2142


/Mariela Reyes/Supervisory Patent Examiner, Art Unit 2142
Read full office action
LEARNING METHOD FOR ENHANCING ROBUSTNESS OF A NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

LEARNING METHOD FOR ENHANCING ROBUSTNESS OF A NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in for Full Analysis