Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. 2021/133,392, filed on 08/18/2021.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-3 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.
CLAIM 2
Step 1: This recites a product, one of the four categories of eligible matter.
Step 2A Prong 1:
determines necessity of training of the third machine learning model based on comparison between a first inference accuracy representing accuracy of inference concerning the first machine learning model and a second inference accuracy representing accuracy of inference concerning the second machine learning model (Mental Process of Evaluation and Judgements which can reasonably be performed in one’s mind with the aid of pencil and paper.)
Step 2A Prong 2:
and upon determining that training of the third machine learning model is necessary, trains the third machine learning model. (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] to acquire a first training condition and a first machine learning model trained in accordance with the first training condition (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] set a second training condition used to reduce a model size of the first machine learning model, different from the first training condition, (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] in accordance with the second training condition and based on the first machine learning model, train a second machine learning model whose model size is smaller than that of the first machine learning model, (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] and in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition, train a third machine learning model based on the second machine learning model. (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
Step 2B:
and upon determining that training of the third machine learning model is necessary, trains the third machine learning model. (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] to acquire a first training condition and a first machine learning model trained in accordance with the first training condition (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] set a second training condition used to reduce a model size of the first machine learning model, different from the first training condition, (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] in accordance with the second training condition and based on the first machine learning model, train a second machine learning model whose model size is smaller than that of the first machine learning model, (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
[Claim 1] and in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition, train a third machine learning model based on the second machine learning model. (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
The claim, when considered as a whole, does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
CLAIM 3 incorporates the rejections of claim 2.
Step 2A Prong 1:
determines the necessity of the third machine learning model based on comparison between a best value in second inference accuracies corresponding to the second machine learning models each having a model size not less than a reference value in the plurality of second machine learning models and the reference value based on the first inference accuracy. (Mental Process of Evaluation and Judgements which can reasonably be performed in one’s mind with the aid of pencil and paper.)
Step 2A Prong 2:
sets a plurality of second training conditions different from each other, trains a plurality of second machine learning models in accordance with the plurality of second training conditions (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
Step 2B:
sets a plurality of second training conditions different from each other, trains a plurality of second machine learning models in accordance with the plurality of second training conditions (Field of Use and Technological Environment, it does no more than generally link a judicial exception to a particular technological environment. MPEP 2106.05(h)).
The claim, when considered as a whole, does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 5, 8-10, and 16-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Nakata et al. (US 2021/0089271 A1).
Regarding claim 1, Nakata teaches the invention substantially as claimed, including:
A learning apparatus comprising a processing circuit configured to acquire a first training condition and a first machine learning model trained in accordance with the first training condition (in Nakata [0023]: The processor is configured to [a processing circuit configured to] remove a part of parameters of a predetermined number of parameters [acquire a first training condition] from a first model [and a first machine learning model] which includes the predetermined number of parameters and is trained [trained in accordance with the first training condition] so as to output second data corresponding to input first data, and determine the number of bits of weight parameters according).
set a second training condition used to reduce a model size of the first machine learning model, different from the first training condition, in accordance with the second training condition and based on the first machine learning model (in Nakata [0023]: The processor is configured to remove a part of parameters [different from the first training condition] of a predetermined number of parameters [set a second training condition] from a first model which includes the predetermined number of parameters and is trained so as to output second data corresponding to input first data, and determine the number of bits of weight parameters according to the required performance to generate a second model. The processor is configured to input the first data [based on the first machine learning model] into the second model to acquire data output in the second model with a smaller computational complexity [reduce the model size] than the first model [of the first machine learning model])
train a second machine learning model whose model size is smaller than that of the first machine learning model, (in Nakata [0023]: The processor is configured to input the first data into the second model to acquire data output in the second model with a smaller computational complexity than the first model [whose model size is smaller than that of the first machine learning model].)
in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition, train a third machine learning model based on the second machine learning model. (in Nakata [0086]: FIG. 18 is a flowchart illustrating an example of the fourth model generation process according to the present embodiment.; in Nakata [0097]: Here, as described above, the number of parameters of the fourth model may be identical to [in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition] or larger or smaller than the number of parameters of the first model according to the first embodiment.; in Nakata [0028]: The training device 200 is a device that generates a trained machine learning model)
PNG
media_image1.png
806
602
media_image1.png
Greyscale
The prior art's "trained machine learning model" is being interpreted as the claimed invention's [first machine learning model]. The prior art's "third model" is being interpreted as the claimed invention's [second machine learning model] and the prior art's "fourth" model is being interpreted as the claimed invention's [third model]. Which the prior art's "fourth model" [third machine learning model] is, according to Fig. 18, trained [train] based on the previous machine learning model [based on the second machine learning model] and according to Nakata [0097] can use the same parameters [third training condition] as the first trained model [is not the same as the second training condition and complies with the first training condition].
Regarding claim 5, Nakata teaches the invention substantially as claimed, including:
wherein the processing circuit initializes a learning parameter of the third machine learning model in accordance with a predetermined random number, or initializes the learning parameter by copying some trained weight coefficients of the second machine learning model. (in Nakata [0085]: Although an example in which a part (parameter) of the weight filters is randomly added to the small model (third model) [initializes a learning parameter of the third machine learning model] will be described in the following description, the present invention is not limited to a case where the number and order of parameters to be removed are random [in accordance with a predetermined random number]. The number and order of parameters to be removed may be set in advance and may be stored in the ROM 230, or may be determined based on a predetermined calculation expression stored in the ROM.)
Regarding claim 8, Nakata teaches the invention substantially as claimed, including:
wherein the processing circuit displays, on a display device, architectures of the first machine learning model, the second machine learning model, and/or the third machine learning model. (in Nakata [0032]: The I/F 140 may include a display circuit [the processing circuit displays, on a display device] that outputs the characteristic data [architectures] related to the first model or the second model [of the first machine learning model or the second machine learning model] such that the characteristic data can be presented (for example, displayed) to a user, and an input circuit that receives an input from the user corresponding to the presented characteristic data related to the first model or second model.; in Nakata [0068]: the correspondence between the inference accuracy and the computational complexity related to each condition [architectures] obtained in S302 is stored as characteristic data).
Regarding claim 9, Nakata teaches the invention substantially as claimed, including:
wherein the processing circuit displays, on a display device, the model sizes of the first machine learning model, the second machine learning model, and/or the third machine learning model. (in Nakata [0032]: The I/F 140 may include a display circuit [the processing circuit displays, on a display device] that outputs the characteristic data related to the first model or the second model [of the first machine learning model or the second machine learning model] such that the characteristic data can be presented (for example, displayed) to a user, and an input circuit that receives an input from the user corresponding to the presented characteristic data related to the first model or second model.; in Nakata [0068]: the correspondence between the inference accuracy and the computational complexity [the model sizes] related to each condition obtained in S302 is stored as characteristic data)
Regarding claim 10, Nakata teaches the invention substantially as claimed, including:
wherein the processing circuit displays, on a display device, performance of the first machine learning model, the second machine learning model, and/or the third machine learning model. (in Nakata [0032]: The I/F 140 may include a display circuit [the processing circuit displays, on a display device] that outputs the characteristic data related to the first model or the second model [of the first machine learning model or the second machine learning model] such that the characteristic data can be presented (for example, displayed) to a user, and an input circuit that receives an input from the user corresponding to the presented characteristic data related to the first model or second model.; in Nakata [0068]: the correspondence [performance] between the inference accuracy and the computational complexity related to each condition obtained in S302 is stored as characteristic data.)
Regarding claim 16, Nakata teaches the invention substantially as claimed, including:
A learning method comprising: acquiring a first training condition and a first machine learning model trained in accordance with the first training condition (in Nakata [0023]: The processor is configured to remove a part of parameters of a predetermined number of parameters [acquire a first training condition] from a first model [and a first machine learning model] which includes the predetermined number of parameters and is trained [trained in accordance with the first training condition] so as to output second data corresponding to input first data, and determine the number of bits of weight parameters according; in Nakata [0024]: Exemplary embodiments of an arithmetic operation device, an arithmetic operation method, and a training method [a learning method] will be explained below in detail with reference to the accompanying drawings ).
setting a second training condition used to reduce a model size of the first machine learning model, different from the first training condition, in accordance with the second training condition and based on the first machine learning model (in Nakata [0023]: The processor is configured to remove a part of parameters [different from the first training condition] of a predetermined number of parameters [set a second training condition] from a first model which includes the predetermined number of parameters and is trained so as to output second data corresponding to input first data, and determine the number of bits of weight parameters according to the required performance to generate a second model. The processor is configured to input the first data [based on the first machine learning model] into the second model to acquire data output in the second model with a smaller computational complexity [reduce the model size] than the first model [of the first machine learning model])
training a second machine learning model whose model size is smaller than that of the first machine learning model, (in Nakata [0023]: The processor is configured to input the first data into the second model to acquire data output in the second model with a smaller computational complexity than the first model [whose model size is smaller than that of the first machine learning model].)
in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition, training a third machine learning model based on the second machine learning model. (in Nakata [0086]: FIG. 18 is a flowchart illustrating an example of the fourth model generation process according to the present embodiment.; in Nakata [0097]: Here, as described above, the number of parameters of the fourth model may be identical to [in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition] or larger or smaller than the number of parameters of the first model according to the first embodiment.; in Nakata [0028]: The training device 200 is a device that generates a trained machine learning model)
PNG
media_image1.png
806
602
media_image1.png
Greyscale
The prior art's "trained machine learning model" is being interpreted as the claimed invention's [first machine learning model]. The prior art's "third model" is being interpreted as the claimed invention's [second machine learning model] and the prior art's "fourth" model is being interpreted as the claimed invention's [third model]. Which the prior art's "fourth model" [third machine learning model] is, according to Fig. 18, trained [train] based on the previous machine learning model [based on the second machine learning model] and according to Nakata [0097] can use the same parameters [third training condition] as the first trained model [is not the same as the second training condition and complies with the first training condition].
Regarding claim 17, Nakata teaches the invention substantially as claimed, including:
A non-transitory computer readable storage medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform operations comprising: acquiring a first training condition and a first machine learning model trained in accordance with the first training condition (in Nakata [0023]: The processor is configured to remove a part of parameters of a predetermined number of parameters [acquire a first training condition] from a first model [and a first machine learning model] which includes the predetermined number of parameters and is trained [trained in accordance with the first training condition] so as to output second data corresponding to input first data, and determine the number of bits of weight parameters according; in Nakata [0027]: A hard disk drive (HDD), a solid state drive (SSD), and an integrated circuit storage device [A non-transitory computer readable storage medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform operations comprising:] can be appropriately used as these storage devices).
setting a second training condition used to reduce a model size of the first machine learning model, different from the first training condition, in accordance with the second training condition and based on the first machine learning model (in Nakata [0023]: The processor is configured to remove a part of parameters [different from the first training condition] of a predetermined number of parameters [set a second training condition] from a first model which includes the predetermined number of parameters and is trained so as to output second data corresponding to input first data, and determine the number of bits of weight parameters according to the required performance to generate a second model. The processor is configured to input the first data [based on the first machine learning model] into the second model to acquire data output in the second model with a smaller computational complexity [reduce the model size] than the first model [of the first machine learning model])
training a second machine learning model whose model size is smaller than that of the first machine learning model, (in Nakata [0023]: The processor is configured to input the first data into the second model to acquire data output in the second model with a smaller computational complexity than the first model [whose model size is smaller than that of the first machine learning model].)
in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition, training a third machine learning model based on the second machine learning model. (in Nakata [0086]: FIG. 18 is a flowchart illustrating an example of the fourth model generation process according to the present embodiment.; in Nakata [0097]: Here, as described above, the number of parameters of the fourth model may be identical to [in accordance with a third training condition that is not the same as the second training condition and complies with the first training condition] or larger or smaller than the number of parameters of the first model according to the first embodiment.; in Nakata [0028]: The training device 200 is a device that generates a trained machine learning model)
PNG
media_image1.png
806
602
media_image1.png
Greyscale
The prior art's "trained machine learning model" is being interpreted as the claimed invention's [first machine learning model]. The prior art's "third model" is being interpreted as the claimed invention's [second machine learning model] and the prior art's "fourth" model is being interpreted as the claimed invention's [third model]. Which the prior art's "fourth model" [third machine learning model] is, according to Fig. 18, trained [train] based on the previous machine learning model [based on the second machine learning model] and according to Nakata [0097] can use the same parameters [third training condition] as the first trained model [is not the same as the second training condition and complies with the first training condition].
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Nakata in view of Gweon et al. (US 10,970,633 B1).
Regarding claim 2, Nakata teaches all the limitations from claim 1 as mentioned above.
However, Nakata does not appear to explicitly teach wherein the processing circuit determines necessity of training of the third machine learning model based on comparison between a first inference accuracy representing accuracy of inference concerning the first machine learning model and a second inference accuracy representing accuracy of inference concerning the second machine learning model, and upon determining that training of the third machine learning model is necessary, trains the third machine learning model.
However, Gweon teaches wherein the processing circuit determines necessity of training of the third machine learning model based on comparison between a first inference accuracy representing accuracy of inference concerning the first machine learning model and a second inference accuracy representing accuracy of inference concerning the second machine learning model, and upon determining that training of the third machine learning model is necessary, trains the third machine learning model. (in Gweon [Col. 3 - Lines 19-24]: and (c) the learning device performing or supporting another device to perform a process of calculating one or more first losses by referring to the first inference result and the second inference result and a process of training the Sub-kernel Module [and upon determining that training of the third machine learning model is necessary, trains the third machine learning model] by using the first losses.; [Col. 4 - Lines 3-9]: if a difference between the first inference result [based on comparison between a first inference accuracy representing accuracy of inference concerning the first machine learning model] and the second inference result [and a second inference accuracy representing accuracy of inference concerning the second machine learning model] is determined as higher than a predetermined threshold [determines necessity], the learning device performs or supports another device to perform the process of transmitting the training data and the architecture information on the specific Small Neural Network Model to the server).
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Nakata and Gweon before them, to include Gweon’s third machine learning model’s necessity-based usage in Nakata’s system that reduces the computational complexity of a model. One would have been motivated to make such a combination in order to reduce the amount of manpower and cost it takes to improve a performance level of the neural network model as taught by Gweon ([Col. 2]).
Regarding claim 4, Nakata teaches all the limitations of claim 1 as discussed above.
Nakata further teaches (in Nakata [0066]: Here, the part of parameters removed from the first model include at least one neuron [number of nodes], at least one weight parameters, or at least one intermediate layer [the number of layers]. In S302, the inference accuracy and the computational complexity are calculated in a state in which some of all the parameters of the first model [in accordance with] are removed.)
However, Nakata does not appear to explicitly teach wherein the processing circuit sets the third machine learning model
Gweon further teaches wherein the processing circuit sets the third machine learning model (in Gweon [Col. 4 - Lines 38-43]: wherein the specific sub-kernel [sets the third machine learning model] for training is a subset of a super kernel corresponding to the maximal capacity of the Big Neural Network Model and is comprised of a kernel size [kernel size] equal to or less than that of the super kernel and the number of the channels in the kernel [the number of channels] equal to or less than that in the super kernel [of the second machine learning model].)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Nakata and Gweon before them, to include Gweon’s third machine learning model’s similar kernel size and channels in Nakata’s system that reduces the computational complexity of a model. One would have been motivated to make such a combination in order to reduce the amount of manpower and cost it takes to improve a performance level of the neural network as taught by Gweon ([Col. 2]).
Claims 3, 11-12, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Nakata in view of Gweon as applied to claim 2 above, and further in view of Liang et. Al (US 2021/0365780 A1) and further in view of Zhan et al. (US 2020/0090073 A1).
Regarding claim 3, Nakata in view of Gweon teach all the limitations of claim 2 as discussed above.
However, Nakata in view of Gweon does not appear to explicitly teach wherein the processing circuit sets a plurality of second training conditions different from each other, trains a plurality of second machine learning models in accordance with the plurality of second training conditions, and determines the necessity of the third machine learning model based on comparison between a best value in second inference accuracies corresponding to the second machine learning models each having a model size not less than a reference value in the plurality of second machine learning models and the reference value based on the first inference accuracy.
However, Liang teaches wherein the processing circuit sets a plurality of second training conditions different from each other, trains a plurality of second machine learning models in accordance with the plurality of second training conditions (in Liang [0009]: updating a parameter of a machine learning model generated by a first machine learning using a plurality of pieces of first training data, by an initial execution of a second machine learning using second training data satisfying a specific condition on the machine learning model; and repeating the second machine learning [trains a plurality of second machine learning models] to update the parameter [sets a plurality of second training conditions different from each other] of the machine learning model, while reducing a degree of influence of the second training data on update of the parameter [in accordance with the plurality of second training conditions] as a difference between a first value of the parameter before the initial execution of the second machine learning and a second value of the parameter updated by a previous second machine learning increases.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Nakata, Gweon, and Liang before them, to include Liang’s use of a plurality of second machine learning models in Nakata and Gweon’s system that performs tradeoff techniques. One would have been motivated to make such a combination in order to improve the model by preventing overfitting and a reduction in accuracy as taught by Liang ([0021]).
However, Nakata in view of Gweon and Liang does not appear to explicitly teach and determines the necessity of the third machine learning model based on comparison between a best value in second inference accuracies corresponding to the second machine learning models each having a model size not less than a reference value in the plurality of second machine learning models and the reference value based on the first inference accuracy.
However, Zhan teaches and determines the necessity of the third machine learning model based on comparison between a best value in second inference accuracies corresponding to the second machine learning models each having a model size not less than a reference value in the plurality of second machine learning models and the reference value based on the first inference accuracy. (in Zhan [0051]: generating model parameter combinations, and generating machine learning models respectively corresponding to the model parameter combinations, where the model parameters indicate an associated relationship between input vectors and output vectors of the machine learning models; executing a dividing operation: dividing preset machine learning data into training data and validation data; executing training and validation operations: training the machine learning models in parallel [corresponding to the second machine learning models] respectively based on the training data; validating a learning accuracy of the trained machine learning models respectively based on the validation data to obtain validation scores, where the validation scores indicate a ratio of consistency between data types corresponding to the output vectors output by the machine learning models based on the validation data and types of the validation data; and executing a model generation operation: determining an optimal model parameter combination [determines the necessity of the third machine learning model based on comparison between a best value in second inference accuracies … and the reference value based on the first inference accuracy.] corresponding to a machine learning model to be generated based on the validation scores, and generating a machine learning model corresponding to the optimal model parameter combination.; in Zhan [0009]: realizing training and validation of the machine learning models respectively corresponding to the model parameter combinations [each having a model size not less than a reference value in the plurality of second machine learning models] in parallel.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Nakata, Gweon, Liang and Zhan before them, to include Zhan’s comparison of second machine learning models in Nakata, Gweon, and Liang’s system that performs tradeoff techniques using multiple second machine learning models. One would have been motivated to make such a combination in order to improve the whole parameter optimization process, and rapidly generating a desired machine learning model as taught by Zhan ([0009]).
Regarding claim 11, Nakata in view of Gweon, Liang, and Zhan teach all the limitations of claim 3 as mentioned above.
Nakata further teaches wherein the processing circuit displays, on a display device, a graph that plots a plurality of points representing the inference accuracies and the model sizes of the plurality of second machine learning models. (in Nakata [0047]: FIG.5 is a diagram for describing an example of a plurality of characteristics realized by the machine learning model according to the embodiment. ... In the graph illustrated in FIG.5, a vertical axis represents recognition accuracy (inference accuracy) [a graph that plots a plurality of points representing the inference accuracies], and a horizontal axis represents the computational complexity [and the model sizes]. It is assumed that the computational complexity is represented by the product of the number of MAC computations)
Regarding claim 12, Nakata in view of Gweon, Liang, and Zhan teach all the limitations of claim 3 as mentioned above.
Nakata further teaches wherein the processing circuit displays, on the graph, a point corresponding to the reference value and the best value and/or a region that satisfies the reference value and the best value.
PNG
media_image2.png
529
812
media_image2.png
Greyscale
Regarding claim 14, Nakata in view of Gweon, Liang, and Zhan teach all the limitations of claim 3 as mentioned above.
Nakata further teaches wherein the processing circuit plots, on the graph, a point representing the inference accuracy and the model size of the third machine learning model.
PNG
media_image3.png
529
812
media_image3.png
Greyscale
Regarding claim 15, Nakata in view of Gweon, Liang, and Zhan teach all the limitations of claim 3 as mentioned above.
Nakata further teaches wherein the processing circuit displays a plurality of points corresponding to the plurality of second machine learning models and a point corresponding to the third machine learning model in different shapes, sizes, and/or colors.
PNG
media_image4.png
529
816
media_image4.png
Greyscale
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Nakata in view of Gweon, Liang and Zhan as applied to claim 3 above, and further in view of Narang.
Regarding claim 13, Nakata in view of Gweon, Liang, and Zhan teach all the limitations of claim 12 as mentioned above.
Nakata further teaches wherein of the plurality of points, the processing circuit displays a point that is included in the region and a point that is not included in the region
PNG
media_image5.png
500
655
media_image5.png
Greyscale
However, the combination of references does not appear to explicitly teach
However, Narang further teaches
PNG
media_image6.png
774
648
media_image6.png
Greyscale
The graph above shows that the points are plotted in gray-scale (light gray vs. dark gray which are being considered different colors).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Nakata as applied to claim 1 above, and further in view of Tanizawa et al. (US 2021/0073641).
Regarding claim 6, Nakata teaches all the limitations of claim 1 as discussed above.
However, Nakata does not appear to explicitly teach wherein as the second training condition, the processing circuit sets an optimization method to Adam, introduces L2 regularization, and sets an activation function to a saturation nonlinear function, different from the first training condition.
However, Tanizawa teaches wherein as the second training condition, the processing circuit sets an optimization method to Adam, introduces L2 regularization, and sets an activation function to a saturation nonlinear function, different from the first training condition. (in [0095]: Here, it is assumed that a hyperparameter [training condition] having a combination of Adam as the optimization technique [sets an optimization method to Adam], ReLU as the activation function of the base model [sets an activation function to a saturation nonlinear function], and L2 regularization as the regularization function [introduces L2 regularization] is selected as the searched parameter 29'.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Nakata and Tanizawa before them, to include Tanizawa’s hyperparameters into Nakata’s system that reduces the computational complexity of a model. One would have been motivated to make such a combination in order to reduce memory usage and transfer amount by obtaining a structure of a neural network model having a high generalization capability and a simple structure as taught by Tanizawa ([0003]).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Nakata as applied to claim 1 above, and further in view of Narang et al. (US 2019/0130271 A1).
Regarding claim 7, Nakata teaches all the limitations of claim 1 as mentioned above.
However, it does not appear to explicitly teach wherein as the second training condition, the processing circuit adds a BN layer and introduces L1 regularization to the BN layer, different from the first training condition.
However, Narang teaches wherein as the second training condition, the processing circuit adds a BN layer and introduces L1 regularization to the BN layer, different from the first training condition. (in Narang [0057]: In order to introduce block sparsity in RNNs, three different types of experiments, Block Pruning (BP), Group Lasso (GL) [and introduces L1 regularization to the BN layer], and Group Lasso with block pruning (GLP), were run. In one or more embodiments, weights were pruned in the recurrent layers (both linear and recurrent weights) and fully connected layers. Biases, batch-normalization parameters [adds a BN layer] and weights in the convolutional and CTC layers are not pruned since they account for a small portion of the total weights in the network.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Nakata and Narang before them, to include Narang’s batch normalization and regularization features in Nakata’s system that reduces the computational complexity of a model. One would have been motivated to make such a combination in order to reduce compute and memory requirements of deep learning models as taught by Narang ([0003]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Lavrik et al. (Patent No.: US 2023/0213918 A1) – “METHOD AND SYSTEM FOR DETERMINING A COMPRESSION RATE FOR AN AI MODEL OF AN INDUSTRIAL TASK” relates to the use of different AI models with different parameters and compression rates to determine the optimal compression rate for a new AI model to perform a task.
Erlandson (Patent No.: US 2019/0213475 A1) – ‘REDUCING MACHINE-LEARNING MODEL COMPLEXITY WHILE MAINTAINING ACCURACY TO IMPROVE PROCESSING SPEED” relates to determining descriptor values associated with different machine learning models and determining which model has to lowest in order to perform a task.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Gregory Paul Shipmon whose telephone number is (571)272-3131. The examiner can normally be reached Monday - Friday, 7:30 A.M. - 4:30 P.M. ET..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Jung, can be reached at 571-270-3779. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/G.P.S./Examiner, Art Unit 2146
/ANDREW J JUNG/Supervisory Patent Examiner, Art Unit 2146