Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-6 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a mathematical relationship without significantly more. The claims recite the mathematical relationship of learning weights, determining weight similarity to create overlapping filter weights, initializing non-shared weights (e.g. setting to 0), ending training based on a error change rate. This judicial exception is not integrated into a practical application because the additional elements directed to a device, storage, computer readable media and various units are directed to generic computer components. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements only link the mathematical relationship to the field of computing.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning to Mallya and FEDERATED LEARNING WITH MATCHED AVERAGING to Wang et al
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning to Mallya, FEDERATED LEARNING WITH MATCHED AVERAGING to Wang et al and Early Stopping — But When? by Prechelt.
Mallya teaches claims 1, 5 and 6. A machine learning device comprising:
a weight storage unit that stores weights of a plurality of filters used to detect a feature of a task; (Mallya fig. 1 and sec. 3 p. 3 “The initial weights of a filter are depicted in gray in Figure 1 (a). … we obtain a network with sparse filters and minimal reduction in performance on Task I. The surviving parameters of Task I, those in gray in Figure 1 (b), are hereafter kept fixed.”)
PNG
media_image1.png
222
692
media_image1.png
Greyscale
a continual learning unit that trains the weights of the plurality of filters in response to an input task in continual learning; and (Mallya fig. 1 and sec. 3 p. 3 “The initial weights of a filter are depicted in gray in Figure 1 (a). … we obtain a network with sparse filters and minimal reduction in performance on Task I.” The training/re-training for task I-III is continual learning.)
a filter control unit that, after a predetermined epoch number has been learned (Learning happens for a number of epochs, the number of epochs is not learned.) in continual learning, compares the weight of a filter that has learned the task with the weight of a filter that is learning the task and extracts (Mallya p. 3 sec. 3 “pruning and re-training is about 1.5× longer than simple fine-tuning, as we generally re-train for half the training epochs…. The weights in a layer are sorted by their absolute magnitude, and the lowest 50% or 75% are selected for removal… By following the iterative training procedure, for a particular Task K, we obtain a filter that is the superposition of weights learned for that particular task and weights learned for all previous Tasks 1, · · · , K − 1.” The epoch number is “half the training epochs”.)
Mallya doesn’t teach an overlap filter.
However, Wang teaches how an algorithm extracts overlap filters having a similarity in weight equal to or greater than a predetermined threshold value as shared filters shared by tasks. (Wang p. 3 sec. 2.1 “Due to data heterogeneity, local model j’ may have neurons not present in the global model built from other local models, therefore we want to avoid “poor” matches by saying that if the optimal match has cost larger than some threshold value ε, instead of matching we create a new global neuron from the corresponding local one.” The cost is the similarity in weighs and it is extracted using a “Euclidian” distance. Id. Having a low cost means that there is a high similarity. The low cost neurons will be overlapped, no new neuron is created for the low cost neuronal overlap. That means when cost is below the threshold ε, that the similarity is over a threshold, and no new neuron is created, so the global/overlap neuron is maintained.)
Mallya, Wang and the claims all share filters. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to overlap similar filter neurons “to adapt to heterogeniety in the data…. [and] reduce[] the communications burden…” Wang sec. 1 p. 2.
Mallya teaches claim 2. The machine learning device according to claim 1, wherein the filter control unit leaves one of the overlap filters as the shared filter and initializes the weights of filters other than the shared filter. (Mallya fig. 1 below shows the initialization of non-shared weights, below with red arrows to show initialized non-shared weights.)
PNG
media_image2.png
222
692
media_image2.png
Greyscale
Mallya teaches claim 3. The machine learning device according to claim 2, wherein the continual learning unit trains initialized weights of filters other than the shared filter in response to a further task in continual learning. (Mallya fig. 1 below shows weights that are not shared and then used for a new task in continual learning.)
PNG
media_image3.png
222
692
media_image3.png
Greyscale
Mallya teaches claim 4. The machine learning device according to claim 1, wherein the predetermined epoch number (The instant specification paragraph 33 says that the predetermined epoch number is configured accordingly “(1) Loss is equal to or lower than a certain level (e.g., 0.75). (2) Accuracy is equal to or greater than a certain level (e.g., 0.75). (3) Both conditions (1) and (2) are met.” This is not a determination about an epoch number, this is a classical “stop when loss meets a criteria” or “stop when loss change rate meets a criteria” test. The broadest reasonable interpretation is stop training when loss change rate hits a condition, or another way of saying the same thing is that accuracy change rate hits a condition. (Mallya p. 3 sec. 3 “pruning and re-training is about 1.5× longer than simple fine-tuning, as we generally re-train for half the training epochs…. The weights in a layer are sorted by their absolute magnitude, and the lowest 50% or 75% are selected for removal… By following the iterative training procedure, for a particular Task K, we obtain a filter that is the superposition of weights learned for that particular task and weights learned for all previous Tasks 1, · · · , K − 1.” The epoch number is “half the training epochs”.)
Mallya doesn’t teach error change rate.
However, Prechelt teaches epoch number is determined based on a condition related to a change rate in loss defined as an error between an output value from a learning model and a correct answer given by training data or to a change rate in accuracy defined as an accuracy rate of an output value from a learning model. (Prechelt p. 57 sec. 2.2.1 “stop when the generalization error increased in s successive strips.” This is error over s strips, which is an error change rate, and “increased” is the condition.)
Prechelt, Mallya and the claims are machine learning algorithms with training. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to end training on error change rate increase because “such increases indicate the beginning of final overfitting…” Prechelt p. 57 sec. 2.2.1.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AUSTIN HICKS/Primary Examiner, Art Unit 2142