DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 5, 7, and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 and analogous claim 14 are rejected under 35 U.S.C. 112(b) as indefinite because the metes and bounds of the limitation “wherein the additional term is added to the loss function until the local minimum is filled” are not reasonably certain. In particular, the phrase “local minimum is filled” is a term of degree or condition without an objective boundary in the claim language. The claim does not explain what it means for a local minimum to be “filled,” what measurable condition indicates that the minimum has been filled, or when a person of ordinary skill in the art would know that the recited state has been reached. Absent an objective standard in the claim language, it is unclear when the step of adding the additional term must stop and, correspondingly, what subject matter is encompassed by the claim. Accordingly, claim 5 and claim 14 fail to particularly point out and distinctly claim the subject matter regarded as the invention.
Claim 7 is rejected under 35 U.S.C. 112(b) as indefinite because the scope of “further comprising reconstructing an original landscape of the loss function by accessing the stored additional term and subtracting the added additional term” is not reasonably certain. Claim 1 recites repeatedly adding an additional term to the loss function in response to finding a local minimum and continuing to find another local minimum until a criterion is met, which reasonably suggests that multiple additional terms may be added over the course of the method. However, claim 7 refers in the singular to “the stored additional term” and “the added additional term,” rendering it unclear whether the claim requires accessing and subtracting only one term, the most recently added term, or all of the terms added during the iterative process. Further, the phrase “reconstructing an original landscape of the loss function” is ambiguous because the claim does not make clear whether the “original landscape” refers to the landscape before any additional term was added, before a particular additional term was added, or some other intermediate version of the loss function. As such, the claim does not set forth the scope of the reconstruction step with reasonable certainty and therefore fails to particularly point out and distinctly claim the invention.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because they are directed to an abstract idea without significantly more.
Regarding claim 1, and analogous claims 10 and 18:
Step 1: is the claim directed to one of the four statutory categories?
Yes, the claim is directed to a method.
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes. The limitations: “responsive to finding a local minimum, adding an additional term to the loss function and continuing to find another local minimum until a criterion is met;” is directed to a mental process of evaluation under MPEP 2106.04(a)(2)(III) and “searching for a minimum value of a loss function;” and “identifying a global minimum having a lowest minimum value among the found local minima; and update the machine learning model with parameters identified at the global minimum” is directed to a mathematical concept under MPEP 2106.04(a)(2)(I).
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “receiving a data set for training a machine learning model to perform a recognition task; performing an optimization during training of the machine learning model, wherein the optimization comprises at least” is directed to mere data gathering under MPEP 2106.05(g).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitation: “receiving a data set for training a machine learning model to perform a recognition task; performing an optimization during training of the machine learning model, wherein the optimization comprises at least” is directed to well-understood, routine, and conventional processes of “Receiving or transmitting data over a network” under MPEP 2106.05(d).
Regarding claim 2:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the machine learning model includes a deep neural network and the optimization includes a descent-based optimization” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitation: “wherein the machine learning model includes a deep neural network and the optimization includes a descent-based optimization” is directed to field of use under MPEP 2106.05(h).
Regarding claim 3:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the additional term is a Gaussian bias centered around the local minimum.” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitation: “wherein the additional term is a Gaussian bias centered around the local minimum.” is directed to field of use under MPEP 2106.05(h).
Regarding claim 4:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the criterion includes a threshold number of local minima” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitation: “wherein the criterion includes a threshold number of local minima” is directed to field of use under MPEP 2106.05(h).
Regarding claim 5:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1. Further, the limitation: “wherein the additional term is added to the loss function until the local minimum is filled” is directed to a mathematical concept under MPEP 2106.04(a)(2)(I).
Regarding claim 6:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the method further includes storing the additional term” is directed to mere instructions to apply under MPEP 2106.05(f).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitation: “wherein the method further includes storing the additional term” is directed to the well-understood, routine, and conventional activity of “Storing and retrieving information in memory” under MPEP 2106.05(d).
Regarding claim 7:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1. The limitation: “further comprising reconstructing an original landscape of the loss function by accessing the stored additional term and subtracting the added additional term” is directed to a mathematical concept under MPEP 2106.04(a)(2)(1).
Regarding claim 8:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein multiple instances of the optimization are performed in parallel at different initialization points of a loss surface of the loss function” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitation: “wherein multiple instances of the optimization are performed in parallel at different initialization points of a loss surface of the loss function” is directed to field of use under MPEP 2106.05(h).
Regarding claim 9:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “further comprising using the updated machine learning model in performing a recognition task” is directed to mere instructions to apply an exception under MPEP 2106.05(f).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitation “further comprising using the updated machine learning model in performing a recognition task” is directed to mere instructions to apply an exception under MPEP 2106.05(f).
Claim 10 is rejected with the same rationale as claim 1.
Claim 11 is rejected with the same rationale as claim 2.
Claim 12 is rejected with the same rationale as Claim 3.
Claim 13 is rejected with the same rationale as Claim 4.
Claim 14 is rejected with the same rationale as Claim 5.
Claim 15 is rejected with the same rationale as Claim 6.
Claim 16 is rejected with the same rationale as Claim 8.
Claim 17 is rejected with the same rationale as Claim 9.
Claim 18 is rejected with the same rationale as Claim 1.
Claim 19 is rejected with the same rationale as Claim 2.
Claim 20 is rejected with the same rationale as Claim 3.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over US Pre-Grant Patent 2020/0125930 (Martin et al; Martin) in view of US Pre-Grant Patent 2003/0220772 (Chiang et al; Chiang).
Regarding claim 1, and analogous claims 10 and 18:
Martin teaches:
1. receiving a data set for training a machine learning model to perform a recognition task;
(Martin, ¶0051)
“A new task was created by randomly generating a permutation mask and applying it to each of the digits in the dataset [i.e. receiving a data set for training a machine learning model].”
(Martin, ¶0051)
“The artificial neural networks of the present disclosure, and the methods of retraining the artificial neural networks according to various embodiments of the present disclosure, were also tested with a variant of the MNIST optical character recognition problem [i.e. to perform a recognition task;].”
2. performing an optimization during training of the machine learning model, wherein the optimization comprises at least:
(Martin, ¶0051)
“The parameter λ in Equation 2 above was set to 10.0. It was determined that smaller values of λ resulted in the complexity term dominating the fitness, which resulted in a fairly simple fitness landscape with the global optimum being achieved by adding only 1 to 3 neurons at any layer [i.e. performing an optimization during training of the machine learning model, wherein the optimization comprises at least:]”
3. searching for a minimum value of a loss function;
(Martin, ¶0051)
“Setting λ=10.0 provided a better balance between accuracy and complexity, and consequently, a more challenging optimization problem with many good, but suboptimal, local minima [i.e. searching for a minimum value of a loss function;].”
4. and updating the machine learning model with parameters identified at the global minimum.
(Martin, ¶0051)
“In this setting, the global optimum is achieved by adding 17 new neurons to the first hidden layer and no new neurons to the second and third hidden layers. However, good, but suboptimal, local minima can be achieved by adding new neurons to only the second or third hidden layers [i.e. and updating the machine learning model with parameters identified at the global minimum].”
Martin does not explicitly teach:
1. responsive to finding a local minimum, adding an additional term to the loss function and continuing to find another local minimum until a criterion is met;
2. and identifying a global minimum having a lowest minimum value among the found local minima;
Chiang teaches:
1. responsive to finding a local minimum, adding an additional term to the loss function and continuing to find another local minimum until a criterion is met;
(Chiang, ¶0478)
“Step 2. Improvement: consider the local improvement set {x.epsilon.FS.andgate.N(x.sup.k): C(x)<C(x.sup.k)}, where N(x.sup.k) is a neighborhood around x.sup.k. If the set is empty, stop, and x.sup.k is a local minimum with respect to the neighborhood N(x.sup.k); otherwise, choose a member of the set as x.sup.k+1, increase k=k+1, and repeat this step [i.e. responsive to finding a local minimum, adding an additional term to the loss function and continuing to find another local minimum until a criterion is met;].”
2. and identifying a global minimum having a lowest minimum value among the found local minima;
(Chiang, ¶0043)
“The function C(x) is a bounded below over the feasible region so that its global minimal (optimal) solution exists and the number of local minimal (optimal) solutions is finite [i.e. and identifying a global minimum having a lowest minimum value among the found local minima;].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is to improve the system by algorithmically pursuing the most optimal outcome, or in other words, “In general, the solution space of an optimization problem has a finite (usually very large) or infinite number of feasible solutions. Among them, there is one and, only one, global optimal solution (Chiang, ¶0003).”
Regarding claim 2, and analogous claims 11 and 19:
The combination of Martin and Chiang teach the method of claim 1.
Martin teaches:
1. wherein the machine learning model includes a deep neural network and the optimization includes a descent-based optimization.
(Martin, ¶0011)
“In general, the artificial neural network can have any suitable number of hidden layers, and the methods of the present disclosure can add any suitable number of nodes to any of the hidden layers [i.e. wherein the machine learning model includes a deep neural network].”
(Martin, ¶0019)
“Training the artificial neural network on the data from the new task may include minimizing a loss function with stochastic gradient descent [i.e. and the optimization includes a descent-based optimization].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Regarding claim 3, and analogous claims 12 and 20:
The combination of Martin and Chiang teach the method of claim 1.
Martin teaches:
1. wherein the additional term is a Gaussian bias centered around the local minimum.
(Martin, ¶0039)
“In one or more embodiments, the task 120 utilizes samples only from the probability distributions P.sub.Z1(Z1|X1) and P.sub.Z2(Z2|X1), and therefore the task 120 does not require closed-form expressions for the probability distributions, which may be, or may approximately be, Gaussian functions [i.e. wherein the additional term is a Gaussian bias centered around the local minimum].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Regarding claim 4, and analogous claim 13:
The combination of Martin and Chiang teach the method of claim 1.
Chiang teaches:
1. wherein the criterion includes a threshold number of local minima.
(Chiang, ¶0040)
“The function UC(x) is bounded below so that its global minimal (optimal) solution exists and the number of local minimal (optimal) solutions is finite [i.e. wherein the criterion includes a threshold number of local minima].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Regarding claim 5, and analogous claim 13:
The combination of Martin and Chiang teach the method of claim 1.
Martin teaches:
1. wherein the additional term is added to the loss function until the local minimum is filled.
(Martin, ¶0045)
“In one or more embodiments, the task 160 of training the artificial neural network 200 includes minimizing the following loss function using stochastic gradient descent [i.e. wherein the additional term is added to the loss function until the local minimum is filled].”
Examiner interprets “filled” as until the local minimum being discovered via the loss function.
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Regarding claim 6, and analogous claim 15:
The combination of Martin and Chiang teach the method of claim 1.
Martin teaches:
1. wherein the method further includes storing the additional term.
(Martin, ¶0045)
“In one or more embodiments, the task 160 of training the artificial neural network 200 includes minimizing the following loss function using stochastic gradient descent [i.e. wherein the additional term is added to the loss function until the local minimum is filled].”
Examiner notes that in order for the training to function properly, the additional term would have to be stored.
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Regarding claim 7:
The combination of Martin and Chiang teach the method of claim 1.
Martin teaches:
1. further comprising reconstructing an original landscape of the loss function by accessing the stored additional term and subtracting the added additional term.
(Martin, ¶0045)
“The third term of the loss function (Equation 2) not only helps prevent catastrophic forgetting of old tasks, but also enables some drift in the hidden distributions, which promotes integration of information from old and new tasks, thus reducing the required size of the artificial neural network 200 (i.e., minimizing or at least reducing the number of nodes and connections) for a given performance level [i.e. further comprising reconstructing an original landscape of the loss function by accessing the stored additional term and subtracting the added additional term].”
Examiner interprets the integration of information from old or new tasks as removing and adding terms.
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Regarding claim 8, and analogous claim 16:
The combination of Martin and Chiang teach the method of claim 1.
Chiang teaches:
1. wherein multiple instances of the optimization are performed in parallel at different initialization points of a loss surface of the loss function.
(Chiang, ¶0108)
“(iii). Repeat the step (i) and (ii) until the norm of the vector field (5-7) at said current exit point obtained in step (ii) is smaller than a threshold value, i.e, .parallel.F.sup.U(x.sub.ex).parallel..ltoreq- ..epsilon.. Then said point is declared as a minimal distance point (MDP), say x.sub.d.sup.0 [i.e. wherein multiple instances of the optimization are performed in parallel at different initialization points of a loss surface of the loss function].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Regarding claim 9, and analogous claim 17:
The combination of Martin and Chiang teach the method of claim 1.
Martin teaches:
1. wherein the computer is further caused to use the updated machine learning model in performing a recognition task.
(Martin, ¶0051)
“A new task was created by randomly generating a permutation mask and applying it to each of the digits in the dataset. The permutation mask was created by randomly selecting two non-intersecting sets of pixel indices, and then swapping the corresponding pixels in each image [i.e. wherein the computer is further caused to use the updated machine learning model in performing a recognition task].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Martin in view of Chiang. The motivation is the same as claim 1.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL JUSTIN BREENE whose telephone number is (571)272-6320. Examiner
interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-
based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO
Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on 303-297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786 9199 (IN USA OR CANADA) or 571-272-1000.
/P.J.B./ Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129