DETAILED ACTION
Claims 1-2,4-5,7-9,11-12,14-16,18 and 20 are pending.
Claims 1, 8 and 15 are independent.
Claims 1-2, 4, 8-9, 11, 14-16, 18 and 20 are amended.
Claims 3, 6, 10, 13, 17 and 19 are cancelled.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 08/20/2025 have been fully considered but they are not persuasive.
In regard to Claim Rejections – 35 USC 101, see Applicant’s remarks pgs. 9-11, Applicant argues “…training the model based on the second learning rate and generating these training results do not correspond to any mental process, mathematical concept, etc., and also do not amount to mere instructions to apply the exception.” Examiner would like to point out, that “training the model based on the second learning rate falls under mere instructions to implement an abstract idea (see MPEP 2106.05(f)). The training being based on the second learning rate simply states that the second learning rate determined is what the model should be trained on. This does not show an improvement to the technology or adds significantly more as the model is simply being trained based on the selected learning rate. Examiner would also like to point out that “generating training results” would not amount to significantly more as training results would be mere data gathering and outputting (see MPEP 2106.05(g) – Insignificant Extra-Solution Activity). Although in the claims data is being selected being interpreted as a mental process due to being able to make a judgement of selecting data in the human mind. Therefore the 35 USC 101 rejection is maintained.
In regard to Claim Rejections – 35 USC 103, see Applicant’s remarks pgs. 11-17, Applicant argues that, Kim’s “point of interest” identified is not used in the equation 9 of the prior art. Examiner would like to point out that to determine the point of interest, loss gradient is included in the determination of the gamma(n) in equation 8. Further in equation 9, gamma(n) is then used to help select the learning rate. The Applicant also argues Thota does not disclose the midpoint of the second learning rate. Examiner would like to point out that the selected optimal learning rate in prior art Kim would be halfway between 0 and 1 where it sits on the chart below.
Specifically:
based on the derivative of the loss being greater than a predetermined derivative threshold: (Kim, Col. 13, paragraph 2, “Herein, if the learning rate is decreased each time when the accumulated number of iterations reaches one of the specific values, the learning of the neural network is quickly finished, and thus the learning may not be sufficiently performed. As such, the learning rate may be set to be decreased only when the loss gradient becomes equal to or greater than a predetermined value [based on the derivative of the loss being greater than a predetermined derivative threshold:], e.g., the minimum loss gradient.”)
Therefore, the 103 rejection is maintained.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
Regarding claim 1,
Step 1: Is the claim to a process, machine, manufacture or composition of matter?
Claim 1 is directed to a process.
Step 1: yes.
Step 2A, prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
selecting a learning rate for the model; (limitation is directed to a mental process).
determining a derivative of a loss for an objective function for the model with respect to the learning rate based on a result of the training; and (limitation is directed to a mental process; also describes a mathematical calculation and therefore falls under the Mathematical Concepts grouping of abstract ideas).
based on the derivative of the loss being greater than a predetermined derivative threshold: (limitation is directed to a mental process).
determining at least one point of interest based on the result of the training wherein determining the at least one point of interest comprises determining a first learning rate where a minimum loss is achieved and determining a second learning rate that achieves a loss approximately halfway between a loss for a learning rate of zero and a loss at the first learning rate; (limitation is directed to a mental process).
selecting a second learning rate based on the at least one point of interest; (limitation is directed to a mental process).
selecting an optimal learning rate based on the training results. (limitation is directed to a mental process).
Step 2A, prong 1: yes.
Step 2A, prong 2: Does the claim recite additional elements that integrate the judicial exception into a practical application?
training the model based on the learning rate; (e.g. “apply it” or mere instructions to implement an abstract idea on a computer (MPEP 2106.05(f)).
training the model based on the second learning rate; and (e.g. “apply it” or mere instructions to implement an abstract idea on a computer (MPEP 2106.05(f)).
Step 2A, Prong 2: no.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
training the model based on the learning rate; (e.g. “apply it” or mere instructions to implement an abstract idea on a computer (MPEP 2106.05(f)).
training the model based on the second learning rate; and (e.g. “apply it” or mere instructions to implement an abstract idea on a computer (MPEP 2106.05(f)).
Step 2B: no.
Regarding claim 2 and analogous claims 9 and 16,
Claim 2 incorporates the analysis of the method of claim 1.
repeating the determining the at least one point of interest and the selecting the second learning rate until a difference between the second learning rate and a learning rate closest to the second learning rate is less than a minimum difference threshold. (Step 2A prong 1: limitation is directed to a mental process).
Regarding claim 4 and analogous claims 11 and 18,
Claim 4 incorporates the analysis of the method of claim 1.
wherein determining at least one point of interest further comprises determining a third learning rate with a minimum curvature less than the second learning rate, and (Step 2A prong 1: limitation is directed to a mental process).
wherein the method further comprises selecting a fourth learning rate that is approximately halfway between the third learning rate and the first learning rate and training the model based on the fourth learning rate. (Step 2A prong 1: limitation is directed to a mental process).
Regarding claim 5 and analogous claim 12,
Claim 5 incorporates the analysis of the method claim 4.
wherein the fourth learning rate is approximately halfway between the third learning rate and the first learning rate on a logarithmic scale. (Step 2A prong 1: limitation is directed to a mental process).
Regarding claim 7,
Claim 7 incorporates the analysis of the method of claim 1.
selecting an updated learning rate, (Step 2A prong 1: limitation is directed to a mental process).
training the model based on the updated learning rate, and (Step 2A prong 2/Step 2B: e.g. “apply it” or mere instructions to implement an abstract idea on a computer (MPEP 2106.05(f)).
determining the derivative of the loss for the objective function for the model based on the updated learning rate based on a result of the training, until the derivative of the loss is greater than the predetermined derivative threshold. (Step 2A prong 1: limitation is directed to a mental process; also describes a mathematical calculation and therefore falls under the Mathematical Concepts grouping of abstract ideas).
Regarding claim 14 and analogous claim 20,
Claim 14 incorporates the analysis of the system of claim 8.
wherein the instructions, when executed, further cause the processor to determine the at least one point of interest by determining a fifth learning rate where the derivative of the loss of the initial learning rate reaches a minimum, and (Step 2A prong 1: limitation is directed to a mental process; also describes a mathematical calculation and therefore falls under the Mathematical Concepts grouping of abstract ideas).
wherein the instructions, when executed, further cause the processor to train the model based on the fifth learning rate. (Step 2A prong 2/Step 2B: e.g. “apply it” or mere instructions to implement an abstract idea on a computer (MPEP 2106.05(f)).
Regarding claim 8,
Step 1: Is the claim to a process, machine, manufacture or composition of matter?
Claim 8 is directed to a machine.
Step 1: yes
The rest of the analysis for claim 8 is analogous to claims 1.
Regarding claim 15,
Step 1: Is the claim to a process, machine, manufacture or composition of matter?
Claim 15 is directed to a manufacture.
Step 1: yes
The rest of the analysis for claim 15 is analogous to claims 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 4, 7-9, 11, 14-16, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al (US Patent No. 10528867, published Jan. 7, 2020, "Kim"), in view of Qiao et al (US Published Patent Application No. 20190332933, published Oct. 31, 2019"Qiao") and Thota et al (OPTIMUM LEARNING RATE FOR CLASSIFICATION PROBLEM WITH MLP IN DATA MINING, “Thota”)
In regard to claim 1, Kim teaches determining a derivative of a loss for an objective function for the model with respect to the learning rate based on a result of the training; and (Kim, Col. 3, paragraph 2, “As one example, at the step of (b ), the learning device changes the k-th gamma to the (k+l)-th gamma by referring to a k-th loss gradient [a derivative of a loss] that is generated by referring to (i) the 30 k-th losses and (ii) (k-1)-th losses of the neural network which are obtained by a (k-1)-th learning process of repeating the learning of the neural network by using a part of the training data while the accumulated number of iterations is greater than a (k-2)-th specific value and is equal to or less 35 than the (k-1)-th specific value.” And Col. 14, paragraph 8, “(i) has performed a first learning process [a result of the training] of repeating the learning of the neural network at a predetermined first learning rate by using a part of training data while the accumulated number of iterations is equal to or less than the first specific value and (ii) has changed the first gamma to a second gamma by referring to first losses of the neural network which are obtained by the first learning process [an objective function for the model with respect to the learning rate];”)
based on the derivative of the loss being greater than a predetermined derivative threshold: (Kim, Col. 13, paragraph 2, “Herein, if the learning rate is decreased each time when the accumulated number of iterations reaches one of the specific values, the learning of the neural network is quickly finished, and thus the learning may not be sufficiently performed. As such, the learning rate may be set to be decreased only when the loss gradient becomes equal to or greater than a predetermined value [based on the derivative of the loss being greater than a predetermined derivative threshold:], e.g., the minimum loss gradient.”)
determining at least one point of interest based on the result of the training, (Kim, Col. 13, paragraph 6, “Herein, for example, the base learning rate (Ir) which is an initial learning rate may be set to 0.0, the step which is as an iteration unit for changing the learning rate may be set to 10,000 [one point of interest based on the result of the training], and the gamma which is a constant value for adjusting the learning rate change rate may be set to 0.9, as initial constant values. Herein, the step is set to a value smaller than 100,000 used in FIG. lA, and the gamma is set to a value greater than 0.1 used in FIG. lA, in order to frequently reduce the learning rate little by little according to the loss(es).”)
wherein determining at least one point of interest comprises determining a first learning rate where a minimum loss is achieved (Kim, Col. 13, paragraph 2, “As such, the learning rate may be set to be decreased only when the loss gradient becomes equal to or greater than a predetermined value, e.g., the minimum loss gradient.” And paragraph 3, “Further, the learning device 100 determines the (k+l)-th learning rate as a result of multiplying the k-th learning rate by the (k+l)-h gamma by using Equation 9 below.”)
selecting the second learning rate based on the at least one point of interest; (Kim, Col. 13, paragraph 6, “Herein, for example, the base learning rate (Ir) which is an initial learning rate may be set to 0.0, the step which is as an iteration unit for changing the learning rate may be set to 10,000, and the gamma which is a constant value for adjusting the learning rate change rate may be set to 0.9 [selecting the second learning rate based on the at least one point of interest], as initial constant values. Herein, the step is set to a value smaller than 100,000 used in FIG. lA, and the gamma is set to a value greater than 0.1 used in FIG. lA, in order to frequently reduce the learning rate little by little according to the loss(es).”)
training the model based on the second learning rate; and (Kim, Col. 13, paragraph 4, “Thereafter, the learning device 100 performs a (k+l)-th learning process of repeating the learning of the neural network [training the model based on the second learning rate] at the (k+ 1 )-th learning rate by using a part of the training data while the accumulated number of iterations is greater than the k-th specific value and is equal to or less than a (k+l)-th specific value.”)
selecting an optimal learning rate based on the training results. (Kim, Col. 13, paragraph 8, “In addition, as shown in FIG. 4B, since the learning rate is changed according to the loss, the loss is continuously reduced unlike the conventional method of FIG. 1B in which the loss is changed according to the change of the learning rate, and thus the optimal learning rate can be obtained by only one learning procedure.”)
However, Kim does not explicitly teach selecting a learning rate for the model;
training the model based on the learning rate;
Qiao teaches selecting a learning rate for the model; (Qiao, paragraph 0004, “An initial value for a constant learning rate (a) and an initial value for a dynamic learning rate (~) are set.”)
training the model based on the learning rate; (Qiao, paragraph 0014, “The initial value of the constant learning parameter (a) to be used to train the new neural network 105 is then set to the constant learning parameter (a) used to train the existing neural network 104 (304).”)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Kim and Qiao before them, to include Qiao’s model optimization system in Kim’s system of adaptive learning rates. One would have been motivated to make such a combination in order to improve the accuracy of learning. (Qiao, paragraph 0009, “The accuracy may be improved by training the neural network using training datasets.”)
However, Kim and Qiao do not explicitly teach determining a second learning rate that achieves a loss approximately halfway between a loss for a learning rate of zero and a loss at the first learning rate;
Thota teaches determining a second learning rate that achieves a loss approximately halfway between a loss for a learning rate of zero and a loss at the first learning rate; (Thota, Fig. 7,
PNG
media_image1.png
418
776
media_image1.png
Greyscale
, Examiner would like to point out that the optimal learning rate is the one selected as the halfway between point.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Kim, Qiao and Thota before them, to include Thota’s optimum learning rate system in Kim and Qiao’s system of adaptive learning rates. One would have been motivated to make such a combination in order to find an optimal learning rate that is stable. (Thota, abstract, “In this paper, we try to find an optimum learning rate which is stable and takes less time for convergence.”)
In regard to claim 8, the claim recites similar limitations as corresponding claim 1, and is rejected
for similar reasons as claim 1 using similar teachings and rationale.
Qiao further teaches A system for training a model, the system comprising: a processor; and a memory storing instructions that, when executed, cause the processor to: (Qiao, paragraph 0023, “These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.”)
In regard to claim 15, the claim recites similar limitations as corresponding claim 1, and is rejected for similar reasons as claim 1 using similar teachings and rationale.
Qiao further teaches A non-transitory computer-readable storage medium comprising instructions that, when executed, cause at least one processor to: (Qiao, paragraph 0023, “These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.”)
In regard to claim 2 and analogous claims 9 and 16, Kim and Qiao teach the method of claim 1.
Kim further teaches repeating the determining the at least one point of interest and the selecting the second learning rate until a difference between the second learning rate and a learning rate closest to the second learning rate is less than a minimum difference threshold. (Kim, Col. 14, paragraph 8, “(a) a learning device, (i) has performed a first learning process of repeating the learning of the neural network at a predetermined first learning rate by using a part of training data while the accumulated number of iterations is equal to or less than the first specific value and (ii) has changed the first gamma to a second gamma by referring to first losses of the neural network which are obtained by the first learning process; and (b) the learning device, while increasing k from 2 to (n-1), (bl) has performed a k-th learning process of repeating the learning of the neural network at a k-th learning rate [selecting the second learning rate] by using a part of the training data while the accumulated number of iterations is greater than a (k-1 )-th specific value and is equal to or less than a k-th specific value [a minimum difference threshold], (b2) (i) has changed a k-th gamma to a (k+l)-th gamma by referring to k-th losses of the neural network which are obtained by the k-th learning process and (ii) has changed a k-th learning rate to a (k+l)-th learning rate by using the (k+l)-th gamma,”)
In regard to claim 4 and analogous claims 11 and 18, Kim, Qiao and Thota teach the method of claim 1.
Thota further teaches wherein determining at least one point of interest further comprises determining a third learning rate with a minimum curvature less than the second learning rate, and (Thota, Fig. 7,
PNG
media_image1.png
418
776
media_image1.png
Greyscale
Examiner would like to point out that the optimal learning rate is the one selected at minimum curvature as the line starts to go up after.)
wherein the method further comprises selecting a fourth learning rate that is approximately halfway between the third learning rate and the first learning rate. (Thota, Fig. 7,
PNG
media_image1.png
418
776
media_image1.png
Greyscale
Examiner would like to point out that the optimal learning rate is the one selected)
Kim, Qiao and Thota are combinable for the same rationale as set forth above with respect to claim 1.
In regard to claim 7, Kim and Qiao teach the method of claim 1.
Kim further teaches selecting an updated learning rate, training the model based on the updated learning rate, and determining the derivative of the loss for the objective function for the model based on the updated learning rate based on a result of the training, until the derivative of the loss is greater than the predetermined derivative threshold. (Kim, Col. 13, “Herein, for example, the base learning rate (Ir) which is an initial learning rate may be set to 0.0, the step which is as an iteration unit for changing the learning rate may be set to 10,000, and the gamma which is a constant value for adjusting the learning rate change rate may be set to 0.9 [selecting an updated learning rate], as initial constant values. Herein, the step is set to a value smaller than 100,000 used in FIG. lA, and the gamma is set to a value greater than 0.1 used in FIG. lA, in order to frequently reduce the learning rate little by little according to the loss(es) [determining the derivative of the loss for the objective function for the model based on the updated learning rate based on a result of the training,].” And ““Thereafter, the learning device 100 performs a (k+l)-th learning process of repeating the learning of the neural network [training the model based on the updated learning rate] at the (k+ 1 )-th learning rate by using a part of the training data while the accumulated number of iterations is greater than the k-th specific value and is equal to or less than a (k+l)-th specific value.”)
In regard to claim 14 and analogous claim 20, Kim, Qiao and Thota teach the system of claim 8.
Qiao teaches wherein the instructions, when executed, further cause the processor to determine the at least one point of interest by determining a fifth learning rate where the derivative of the loss of the initial learning rate reaches a minimum, and wherein the instructions, when executed, further cause the processor to train the model based on the fifth learning rate. (Qiao, Fig. 3, paragraph 0010, “The derivative of J(8) is the gradient. With each iteration in the training of the neural network [train the model based on the fifth learning rate], the value of the weight (8) is adjusted according to the product of the learning rate (a) and the derivative of the error function (J(8)). The learning rate (a) [determining a fifth learning rate] is a predetermined constant which sets the step size of each adjustment of the weight (8) between iterations. The error function (J(8)) is then calculated for a batch of the training dataset and used in the next iteration. This process is repeated until a global minimum of the error function (J(8)) is reached [derivative of the loss of the initial learning rate reaches a minimum].”)
Kim, Qiao and Thota are combinable for the same rationale as set forth above with respect to claim 1.
Claim 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Kim, in view of Qiao and Thorta, and in further view of Surmenok (Estimating an Optimal Learning Rate For a Deep Neural Network) (2019).
In regard to claim 5 and analogous claim 12, Kim, Qiao and Thota teach the method of claim 4.
However, Kim, Qiao and Thota do not explicitly teach wherein the fourth learning rate is approximately halfway between the third learning rate and the first learning rate on a logarithmic scale.
Surmenok teaches wherein the fourth learning rate is approximately halfway between the third learning rate and the first learning rate on a logarithmic scale. (Surmenok, pg. 3, Fig. 2,
PNG
media_image2.png
381
711
media_image2.png
Greyscale
, Examiner would like to point out that the learning rate is being looked at on a log scale.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Kim, Qiao, Thota and Surmenak before them, to include Thota’s optimum learning rate system in Kim and Qiao’s system of adaptive learning rates. One would have been motivated to make such a combination in order to simply and powerfully select a learning rate. (Surmenak, pg. 1, paragraph 2, “In this post, I’m describing a simple and powerful way to find a reasonable learning rate…”)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Trinh et al (Learning Longer-term Dependencies in RNNs with Auxiliary Losses, “Trinh”) teaches abut loss with learning rates.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SKYLAR K VANWORMER whose telephone number is (703)756-1571. The examiner can normally be reached M-F 6:00am to 3:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Jung can be reached at (571) 270-3779. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.K.V./ Examiner, Art Unit 2146
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146