Detailed Action
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 12/15/2022 and Preliminary Amendment was filed on 02/03/2023 respect to specification and figure. Claims 1-17 are pending and have been examined.
Priority
2. The examiner acknowledges the priority benefit to U.S. Provisional Application No. 63/265,436, filed on 12/15/2021.
Claim Rejections - 35 USC § 101
3. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1:
Step 1: Claim 1 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
processing the training data in respective forward and backward passes- In the context of the claim limitation, this encompasses a mathematical concept of computing forward and backward passes.
computing gradients of a pre-determined loss function with respect to the network weights and/or computing gradients of the pre-determined loss function with respect to the computed activations of the network - In the context of the claim limitation, this encompasses a mathematical concept of calculating a gradient of a loss function.
wherein an adjustment parameter is applied to at least a subset of values…the values comprising at least one of: the network weights, the activations computed in the forward pass, the gradients with respect to activations computed in the backward pass, and the gradients with respect to weights computed in the backward pass - In the context of the claim limitation, this encompasses a mental processing of evaluating to parameter to at least a subset of values.
updating the network weights in dependence on the computed gradients with respect to the weights - In the context of the claim limitation, this encompasses a mental processing of evaluating weight based on calculated gradients.
computing a proportion of the subset of values falling above a predefined threshold - In the context of the claim limitation, this encompasses a mathematical concept of calculating values.
updating the adjustment parameter applied to the subset…in dependence on the computed proportion - In the context of the claim limitation, this encompasses a mental processing of evaluating parameter in dependence on the computed proportion.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “based on a set of training data, a multi-layer neural network comprising a set of network weights”; “through a sequence of layers of the network, the forward pass comprising computing a set of activations by applying an activation function in dependence on the network weights and training data, and the backward pass”; “in the neural network”; “machine learning parameters” – these are mere instructions to apply the judicial exception using a generic computer programmed with instructions/program code/logic. See MPEP 2106.05(f). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 2:
Step 1: Claim 2 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
wherein the adjustment parameter is a scale factor, and wherein the scale factor is applied on the backward pass to at least a subset of the gradients with respect to at least one of the activations and the gradients with respect to the network weights, wherein the scale factor is updated in dependence on the proportion of the gradients of that subset that have a value falling above a pre-defined threshold - In the context of the claim limitation, this encompasses a mental processing of evaluating parameter in dependence on the computed proportion.
Step 2A Prong 2: Please see analysis of an independent claim 1.
Step 2B Analysis: Please see analysis of the independent claim 1.
Claim 3:
Step 1: Claim 3 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
applying the scale factor to at least one of gradients with respect to weights and gradients with respect to activations of all layers of the network by multiplying the loss function by the scale factor - In the context of the claim limitation, this encompasses a mathematical concept of multiplying the loss function by a scale factor.
Step 2A Prong 2: Please see analysis of the claim 2.
Step 2B Analysis: Please see analysis of the claim 2.
Claim 4:
Step 1: Claim 4 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
constructing a histogram of gradients, the histogram comprising a plurality of bins, wherein the scale factor is updated based on a proportion of gradients occupying bins above a threshold value - In the context of the claim limitation, this encompasses a mathematical concept of constructing a histogram of a gradient.
Step 2A Prong 2: Please see analysis of the claim 2.
Step 2B Analysis: Please see analysis of the claim 2.
Claim 5:
Step 1: Claim 5 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
constructing a respective histogram of gradients…wherein the proportion of gradients occupying each of a set of bins for each histogram is input to an accumulator to obtain an aggregated proportion for each bin, the scale factor being derived by computing an aggregated proportion occupying bins above an overall threshold - In the context of the claim limitation, this encompasses a mathematical concept of construction a histogram of gradient.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “for each layer of the neural network” – this is a mere instruction to apply the judicial exception using generic computer programmed with generic computer equipment. See MPEP 2106.05(f). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 6:
Step 1: Claim 6 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
constructing a respective histogram of gradients for each layer, wherein for each layer a respective layer-wise scale factor is applied during the backward pass, the layer-wise scale factor being updated based on a proportion of gradients in the histogram for the corresponding layer occupying bins above a corresponding layer-wise threshold value - In the context of the claim limitation, this encompasses a mathematical concept of constructing a respective histogram gradient.
Step 2A Prong 2: Please see analysis of the claim 4.
Step 2B Analysis: Please see analysis of the claim 4.
Claim 7:
Step 1: Claim 7 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
processes a respective subset of the training data in each of the forward and backward passes, and computes a respective histogram of gradients for the corresponding subset of the training data, each histogram having defined a common set of bins, wherein the proportion of gradients occupying each bin of the set of bins defined for each histogram is aggregated to obtain an aggregated proportion for each bin, with a scale factor being derived by computing an aggregated proportion occupying bins above an overall threshold - In the context of the claim limitation, this encompasses a mathematical concept of computing a histogram of gradients for the training data.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “when implemented on a plurality of processors, wherein each processor” – these are mere instructions to apply the judicial exception using generic computer programmed with generic computer equipment. See MPEP 2106.05(f). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 8:
Step 1: Claim 8 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: Please see analysis of the independent claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim also recites “storing at least a subset of the network weights, gradients and activations in computer memory in floating-point format” - which recite the insignificant extra-solution activity of mere data gathering. MPEP 2106.05(g). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The recitation of “storing…” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing and receiving data (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 9:
Step 1: Claim 9 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: Please see analysis of the claim 8.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim also recites “storing at least a subset of the network weights, gradients and activations in computer memory in eight-bit floating-point format” - which recite the insignificant extra-solution activity of mere data gathering. MPEP 2106.05(g). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The recitation of “storing…” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing and receiving data (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 10:
Step 1: Claim 10 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: Please see analysis of the claim 8.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim also recites “storing at least a subset of the network weights, gradients and activations in computer memory in sixteen-bit floating-point format” - which recite the insignificant extra-solution activity of mere data gathering. MPEP 2106.05(g). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The recitation of “storing…” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing and receiving data (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 11:
Step 1: Claim 11 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: Please see analysis of the claim 8.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim also recites “storing the subset of values in a floating-point format, and wherein the adjustment parameter is an exponent bias applied to the floating- point representations of the subset of weights, gradients and activation” - which recite the insignificant extra-solution activity of mere data gathering. MPEP 2106.05(g). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The recitation of “storing…” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing and receiving data (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 12:
Step 1: Claim 12 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: Please see analysis of independent claim 11.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “wherein the subset of values in the neural network is a subset of network weights and activations and the adjustment parameter is an exponent bias applied to the subset of values of the network weights and activations in the forward pass” – this is mere instruction to apply the judicial exception using generic computer programmed with generic computer equipment. See MPEP 2106.05(f). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 13:
Step 1: Claim 13 recites a computer-implemented method of training; thus, it is a process, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: Please see analysis of the claim 11.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim also recites “wherein a subset of network weights, activations and gradients which are inputs to compute operations in at least one of the forward and backward passes are stored in eight-bit floating-point format, the compute operations comprising at least one of a matrix operation and a convolution operation” - which recite the insignificant extra-solution activity of mere data gathering. MPEP 2106.05(g). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The recitation of “storing…” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing and receiving data (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 14:
Step 1: Claim 14 recites a computer system; thus, it is a machine, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
processing the training data in respective forward and backward passes through a sequence of layers of the network, the forward pass comprising computing a set of activations by applying an activation function in dependence on the network weights and training data, and the backward pass comprising determining a set of gradients of a pre-determined loss function with respect to the weights and/or activations of the network, wherein an adjustment parameter is applied to at least a subset of values in the neural network, wherein the values on the forward pass comprise at least one of the network weights and computed activations, and the values on the backwards pass comprise the computed gradients with respect to activations and gradients with respect to weights - In the context of the claim limitation, this encompasses a mathematical concept of computing a activation based on weights and training data.
updating the network weights in dependence on the computed gradients with respect to the weights - In the context of the claim limitation, this encompasses a mental processing of evaluating weight based on calculated gradients.
on at least one of the forward and backward pass, computing a proportion of the subset of values falling above a predefined threshold - In the context of the claim limitation, this encompasses a mathematical concept of calculating values.
updating the adjustment parameter applied to the subset of machine learning parameters in dependence on the computed proportion - In the context of the claim limitation, this encompasses a mental processing of evaluating parameter in dependence on the computed proportion.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “one or more processors configured to train a multi- layer neural network comprising a set of network weights, and memory holding the network weights, the processor configured to train the neural network” – these are mere instructions to apply the judicial exception using generic computer programmed with generic computer equipment. See MPEP 2106.05(f). The claim also recites “receiving a set of training data”; “storing the values to memory” - which recite the insignificant extra-solution activity of mere data gathering and output. MPEP 2106.05(g). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Furthermore, the recitation of “receiving…” is directed to insignificant extra-solution activity that is well known, routine and conventional because the limitation is directed to receiving or transmitting data over a network, e.g., using the Internet to gather data. See MPEP 2106.05(d)(II), OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network). The recitation of “storing…” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing and receiving data (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 15:
Step 1: Claim 15 recites a computer system; thus, it is a machine, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: Please see analysis of the independent claim 14.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “comprising a plurality of processors, wherein each processor is configured to process a respective subset of the training data” – this is a mere instruction to apply the judicial exception using generic computer programmed with generic computer equipment. See MPEP 2106.05(f). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 16:
Step 1: Claim 16 recites a computer system; thus, it is a machine, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
wherein the adjustment parameter is updated in dependence on an aggregated proportion of values… falling above a predefined threshold, the aggregated proportion computed by aggregating a computed proportion of the subset of values falling above the predefined threshold - In the context of the claim limitation, this encompasses a mental processing of adjustment parameter of values.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “all processors… for each of the plurality of processors” – this is a mere instruction to apply the judicial exception using generic computer programmed with generic computer equipment. See MPEP 2106.05(f). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim 17:
Step 1: Claim 17 recites a non-transitory computer-readable storage medium storing computer program instructions which when executed perform a method of training; thus, it is an article of manufacture, one of the four statutory categories of patentable subject matter.
Step 2A Prong 1: The claim recites the limitations:
processing the training data in respective forward and backward passes through a sequence of layers of the network, the forward pass comprising computing a set of activations by applying an activation function in dependence on the network weights and training data, and the backward pass comprising determining a set of gradients of a pre-determined loss function with respect to the weights and/or activations of the network, wherein an adjustment parameter is applied to at least a subset of values in the neural network, and wherein the values on the forward pass comprise at least one of the network weights and computed activations, and the values on the backwards pass comprise the computed gradients with respect to activations and gradients with respect to weights - In the context of the claim limitation, this encompasses a mathematical concept of computing an activation based on weights and training data.
updating the network weights in dependence on the computed gradients with respect to the weights - In the context of the claim limitation, this encompasses a mental processing of evaluating weight based on calculated gradients.
on at least one of the forward and backward pass, computing a proportion of the subset of values falling above a predefined threshold - In the context of the claim limitation, this encompasses a mathematical concept of calculating values.
updating the adjustment parameter applied to the subset…parameters in dependence on the computed proportion - In the context of the claim limitation, this encompasses a mental processing of evaluating parameter in dependence on the computed proportion.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “based on a set of training data, a multi-layer neural network comprising a set of network weights” “machine learning parameters” – these are mere instructions to apply the judicial exception using generic computer programmed with generic computer equipment. See MPEP 2106.05(f). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, some of the additional elements are directed to mere instructions to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significantly more. See MPEP 2106.05(f). Therefore, the claim does not include additional elements which provide an inventive concept nor represent significantly more than the abstract idea, and the claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 8, 10-12 and 14-17 are rejected under 35 U.S.C. 103 as being unpatentable over Ginsburg (US 12299577 B2) in view of Tomioka (US 20180336458 A1).
Claim 1.
Ginsburg teaches a computer-implemented method of training, based on a set of training data, a multi-layer neural network comprising a set of network weights, the method comprising (Column 5, lines 40-49 “ FIG. 1 depicts an exemplary computer-implemented method for training an artificial neural network. Steps 101-109 describe exemplary steps of the flowchart 100 in accordance with the various embodiments herein described. As depicted in FIG. 1, training of an artificial neural network typically begins at step 101 with receiving a training set of data as input. At step 103, the data is fed (typically as one or more matrices of values) into a corresponding number of neurons. At step 105, the data at each neuron is manipulated according to pre-determined parameters (weights)”; Column 8, lines 50-55 “forward propagation can be performed for any layer of a neural network (e.g., inner product layers) or convolutional neural network (e.g., convolutional layers+ inner product layers) teaches a method of training based on set of training data and weight of neural network):
processing the training data in respective forward and backward passes through a sequence of layers of the network, the forward pass comprising computing a set of activations by applying an activation function in dependence on the network weights and training data, and the backward pass comprising (Column 8, lines 50-55 “forward propagation can be performed for any layer of a neural network (e.g., inner product layers) or convolutional neural network (e.g., convolutional layers+ inner product layers). To avoid the issue of vanishing or exploding activations (due to underflow and overflow, respectively), the rescaling operations described above can be performed” and Column 11, lines 37-40 “the forward and backward propagation are performed in a neural network, the gradients for the weights that are used to adjust the influence of the data output from each neuron are calculated and subsequently readjusted” teaches training data processing using forward and backward pass which comprising process of activation function of weight and training data):
computing gradients of a function with respect to the network weights and/or computing gradients of the pre-determined loss function with respect to the computed activations of the network (Column 10, lines 35 -41“three main operations are performed during backward propagation in a convolutional layer. FIG. 2 depicts the three main computer-implemented operations performed during backward propagation. Steps 201-205 describe exemplary steps of the flowchart 100 in accordance with the various embodiments herein described” and Column 1, lines 47-53 “The output is then compared to the target output using a loss function, and an error value is calculated for each of the elements in the output layer. During back prop phase the gradients of error function are computed and then propagated backwards through the layers to determine gradients corresponding to each neuron” teaches computing gradients of a loss with respect to activations),
the values comprising at least one of: the network weights, the activations computed in the forward pass, the gradients with respect to activations computed in the backward pass, and the gradients with respect to weights computed in the backward pass (Column 10, lines 35-46 “three main operations are performed during backward propagation in a convolutional layer. FIG. 2 depicts the three main computer-implemented operations performed during backward propagation. Steps 201-205 describe exemplary steps of the flowchart 100 in accordance with the various embodiments herein described. As depicted in FIG. 2, backward propagation begins at step 201, wherein gradients are propagated backward. In one or more embodiments, the gradient for input matrix X can be calculated as the convolution of the gradient of Y and the values for weights W: e.g., dX=conv(dY, W.sup.T)” teaches network weights, update parameter of the neural network and compute gradient respect to forward and backward pass);
updating the network weights in dependence on the computed gradients with respect to the weights (Column 10, lines 35-46 “wherein gradients are propagated backward. In one or more embodiments, the gradient for input matrix X can be calculated as the convolution of the gradient of Y and the values for weights W: e.g., dX=conv(dY, W.sup.T)” teaches updating the weights of the network based on gradient);
Ginsburg does not explicitly teach wherein an adjustment parameter is applied to at least a subset of values in the neural network… computing a proportion of the subset of values falling above a predefined threshold; and updating the adjustment parameter applied to the subset of machine learning parameters in dependence on the computed proportion.
However, Tomioka teaches wherein an adjustment parameter is applied to at least a subset of values in the neural network (Para [0026] “A neural network is distributed over the worker nodes so that model parallelism is implemented whereby individual ones of the worker nodes hold subsets of the neural network model” and Para [0050] “The worker node computes 802 one or more gradients of a loss function with respect to (1) the message it received during the forward pass and (2) the parameters of the local subgraph… If the number of gradients in the accumulator meets or exceeds a threshold at check 804 the worker node asynchronously updates 806 the parameters of the neural network subgraph at the worker node” and Para [0022] “The training data instance is labeled and so the ground truth output of the neural network is known and the difference or error between the observed output and the ground truth output is found and provides information about a loss function which is passed back through the neural network layers in a backward propagation or backwards pass. A search is made to try find a minimum of the loss function which is a set of weights of the neural network that enable the output of the neural network to match the ground truth data” teaches a subset of values in the neural network);
computing a proportion of the subset of values falling above a predefined threshold (Para [0050] “The worker node checks the number of gradients in the accumulator. If the number of gradients in the accumulator meets or exceeds a threshold at check 804 the worker node asynchronously updates 806 the parameters of the neural network subgraph at the worker node. It then clears 810 the accumulator. As mentioned above the threshold at operation 804 is either set globally for the pipeline as a whole or is set on a per-worker node basis” teaches computing the number of gradients that exceeds a threshold at worker node and threshold is set corresponding to predefined threshold);
and updating the adjustment parameter applied to the subset of machine learning parameters in dependence on the computed proportion (Para [0050] “The worker node checks the number of gradients in the accumulator. If the number of gradients in the accumulator meets or exceeds a threshold at check 804 the worker node asynchronously updates 806 the parameters of the neural network subgraph at the worker node. It then clears 810 the accumulator. As mentioned above the threshold at operation 804 is either set globally for the pipeline as a whole or is set on a per-worker node basis” teaches parameters are adjusted based on in depended on the number of gradients are exceeds a threshold).
Ginsburg and Tomioka are analogous art because they are each directed to systems involving backpropagation for data processing.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Tomioka into the disclosed invention of Ginsburg.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “A neural network is distributed over the worker nodes so that model parallelism is implemented whereby individual ones of the worker nodes hold subsets of the neural network model. Where the neural network is represented as a graph the subsets are referred to as subgraphs… In various examples described herein model parallelism is combined with asynchronous updates of the neural network subgraph parameters at the individual worker nodes. This scheme is found to give extremely good efficiency (as explained with reference to FIG. 9 below) and is found empirically to work well in practice despite the fact that the conventional theoretical convergence guarantee for stochastic gradient descent would not apply to the asynchronous updates of the subgraph parameters” (Tomioka, Para [0026]-[0027]).
Claim 2.
Ginsburg in view of Tomioka teaches the method of claim 1,
Ginsburg further teaches wherein the adjustment parameter is a scale factor, and wherein the scale factor is applied on the backward pass to at least a subset of the gradients with respect to at least one of the activations and the gradients with respect to the network weights, wherein the scale factor is updated in dependence on the proportion of the gradients of that subset that have a value falling above a pre-defined threshold (Column 10 , lines 58-64 “the calculated gradients are also propagated backwards. In one or more embodiments, the gradients for matrix X (dX) may be calculated by the convolution of the gradient of Y (dY) using the matrix of weights W as a filter. In order to ensure that the absolute value of dX[i] is less than an upper threshold, (e.g., |dx[i]|<U, the following conditions are imposed” , Column 11 , lines 5-21 “However, if any one of the conditions is not met, gradient of Y, dY, and the matrix W are both rescaled such that dY=(α/k1, dy′.sub.16), where dy′[i]=dy[i]*k1; and W=(β/k2, w′.sub.16), where w′[i]=w[i]*k2. To ensure that no overflow can occur during rescaling, scale values k1 and k2 conform to the following conditions: k1*amax(dy)<U; and k2*amax(w)<U. Likewise, to ensure that no underflow can occur during rescaling, scale values k1 and k2 conform to the following conditions: k1*amean(dy)>L; and k2*amean(w)>L” teaches wherein the parameter k1 and k2 is scale factor, wherein the scale factor is applied for forward and backward pass with respect to the network weights, wherein the scale factor is updated based on the value is above the threshold).
Claim 3.
Ginsburg in view of Tomioka teaches the method of claim 2, comprising
Ginsburg further teaches applying the scale factor to at least one of gradients with respect to weights and gradients with respect to activations of all layers of the network by multiplying the loss function by the scale factor (Column 14 , lines 19-27 “Initially gradients are very small, so λ*ΔW(t) is much smaller than W, and well below normal float16 range, therefore, using traditional float16 format may cause the gradients to vanish. As a solution, the modified float16 data format described herein can be extended to prevent the loss of gradient data (precision). At later stages in training, gradients can be high, but λ becomes small, so λ*ΔW(t) becomes much smaller than W, and the weight update will disappear in traditional float16 formats due to rounding” teaches loss of gradient data ΔW(t) multiply by scaler λ).
Claim 8.
Ginsburg in view of Tomioka teaches the method of claim 1, comprising
Ginsburg further teaches storing at least a subset of the network weights, gradients and activations in computer memory in floating-point format (Column 14 , lines 32-35 “One possible solution is to use an extra copy of weights in float (32 bits) format. According to such embodiments, one copy of weights is stored in memory as float32, and a second as float16 for forward and backward propagation” and Column 14 , lines 19-22 “Initially gradients are very small, so λ*ΔW(t) is much smaller than W, and well below normal float16 range, therefore, using traditional float16 format may cause the gradients to vanish” and Column 2 , lines 63-66 “wherein a matrix is represented by the tuple X, where X=(a, v[.]), wherein a is a float scale factor and v[.] are scaled values stored in the float16 format” teaches gradient, network weight and value store in floating format).
Claim 10.
Ginsburg in view of Tomioka teaches the method of claim 8,
Ginsburg further teaches comprising storing at least a subset of the network weights, gradients and activations in computer memory in sixteen-bit floating-point format (Column 2 , lines 63-66 “wherein a matrix is represented by the tuple X, where X=(a, v[.]), wherein a is a float scale factor and v[.] are scaled values stored in the float16 format” teaches storing in the float 16-bit format).
Claim 11.
Ginsburg in view of Tomioka teaches the method of claim 8, comprising
Ginsburg further teaches storing the subset of values in a floating-point format, and wherein the adjustment parameter is an exponent bias applied to the floating-point representations of the subset of weights, gradients and activations (Column 5 , lines 5-9 “the novel data representation described herein can be used to convert single precision float into a (half-precision) float16 format that uses scalars for exponent extension” teaches uses an exponent bias applied to floating format).
Claim 12.
Ginsburg in view of Tomioka teaches the method of claim 11,
Ginsburg further teaches wherein the subset of values in the neural network is a subset of network weights and activations and the adjustment parameter is an exponent bias applied to the subset of values of the network weights and activations in the forward pass (Column 5 , lines 5-9 “the novel data representation described herein can be used to convert single precision float into a (half-precision) float16 format that uses scalars for exponent extension” and Column 5 , lines 51-55 “the next neuron in the next layer in sequence. The neuron in each layer receives the weighted output from the previous neuron as input, and the process is propagated forward at step 109 for each intervening layer between the input and output layers” teaches an exponent bias applied to floating format).
Claim 14.
Ginsburg teaches a computer system comprising one or more processors configured to train a multi- layer neural network comprising a set of network weights, and memory holding the network weights, the processor configured to train the neural network by (Column 5 , lines 40-49 “ FIG. 1 depicts an exemplary computer-implemented method for training an artificial neural network. Steps 101-109 describe exemplary steps of the flowchart 100 in accordance with the various embodiments herein described. As depicted in FIG. 1, training of an artificial neural network typically begins at step 101 with receiving a training set of data as input. At step 103, the data is fed (typically as one or more matrices of values) into a corresponding number of neurons. At step 105, the data at each neuron is manipulated according to pre-determined parameters (weights)” teaches a method of training based on set of training data and weight of neural network and Column 15 , lines 39-43 “In one embodiment, the processes 200 and 300 may be performed, in whole or in part, by graphics subsystem 405 in conjunction with the processor 401 and memory 402, with any resulting output displayed in attached display device 410” teaches processor to train multi-layer):
receiving a set of training data (Column 5 , lines 40-49 “ training of an artificial neural network typically begins at step 101 with receiving a training set of data as input” teaches receiving a training data);
processing the training data in respective forward and backward passes through a sequence of layers of the network, the forward pass comprising computing a set of activations by applying an activation function in dependence on the network weights and training data (Column 8 , lines 50-55 “forward propagation can be performed for any layer of a neural network (e.g., inner product layers) or convolutional neural network (e.g., convolutional layer