Office Action Analysis: 18182567 — Machine Learning System Enabling Effective Training

Office Action

§101 §102 §103 §112
DETAILED ACTION 
This communication is in response to Application No. 18/182,567 filed on March 13th 2023, in which claims 1-22 are presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement submitted on 09/11/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement was considered by the examiner.
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference character “1202” has been used to designate both the “LOGIC PROCESSOR” and the “VOLATILE MEMORY” in Figure 21.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: “1204” (Spec., Pg. 20-21).  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The contents of the specification are sufficient for examination purposes.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claim 12 is rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.

Regarding Claim 12, the claim recites the limitation “the inputs comprise a set of input gradients and a set of weights and/or activations” (ln. 1-2), which is indefinite. Specifically, in the event that the “or” language is imposed, it is not clear whether the alternative language applies to the requirement that “the inputs comprise a set of input gradients”. For example, the limitation could be interpreted as requiring the inputs to comprise either 1) “a set of input gradients” and “a set of weights” or 2) “a set of input gradients” and a set of “activations”. Alternatively, the limitation could be interpreted as requiring the inputs to comprise either 1) “a set of input gradients and a set of weights” or 2) “activations”. As a result, the limitation of what the “inputs” must “comprise” is not clear. Therefore, the claim is rejected. The claim should be amended to clarify which, if either, of the above to interpretations is correct. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more.

Regarding Claim 1:
Step 1: Claim 1 is a machine claim. Therefore, claims 1-14 are directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, steps of the claimed subject matter are mental processes. Specifically, the claim recites 
“perform at least one operation based on one or more inputs . . . wherein the at least one operation is scaled by a first scaling factor which has been calculated to cause a variance of an output of the at least one operation to have a target variance” (mental process – amounts to exercising judgment to evaluate data to generate an output with regard to a known or observed target variance and a known or observed variable that is calculated to achieve the target variance, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“A machine learning system implementing a machine learning model, the system comprising: at least one layer of processing nodes, each processing node comprising a processor configured to execute computer readable instructions to . . . the processing node” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and 
“received at” (amounts to insignificant extra-solution because receiving inputs amounts to the transmission of data, which is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“A machine learning system implementing a machine learning model, the system comprising: at least one layer of processing nodes, each processing node comprising a processor configured to execute computer readable instructions to . . . the processing node” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept) and
“received at” (transmission of data, such as through a network, see buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014), or by accessing information in memory, see Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93, is well‐understood, routine, and conventional; which is recited here with a high level of generality, and remains insignificant extra-solution activity even upon reconsideration).
For the reasons above, Claim 1 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 2-14. The additional limitations of the dependent claims are addressed below.

Regarding Claim 2:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 2 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the target variance is a unit variance” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the target variance is a unit variance” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 2 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 3:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 3 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the target variance is a variance which matches a variance of the one or more inputs” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the target variance is a variance which matches a variance of the one or more inputs” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 3 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 4:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 4 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the at least one operation is implemented in a forward pass” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea) and
“of the machine learning model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the at least one operation is implemented in a forward pass” (merely reciting a particular technological environment or field of use does not provide an inventive concept) and
“of the machine learning model” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept).
Accordingly, Claim 4 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 5:
Step 2A Prong 1: See the rejection of Claim 4 above, which Claim 5 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the system is configured to perform a training process to train the machine learning model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“the forward pass forms part of the training process” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the system is configured to perform a training process to train the machine learning model” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept) and
“the forward pass forms part of the training process” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 5 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 6:
Step 2A Prong 1: See the rejection of Claim 4 above, which Claim 6 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites:
“perform an inference process” (mental process – amounts to exercising judgement to form an opinion).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein system is configured to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“the forward pass forms part of the inference process” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein system is configured to” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept) and
“the forward pass forms part of the inference process” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 6 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 7:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 7 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites:
“determine a gradient of a loss function . . . by carrying out a gradient calculation in a gradient operation” (mental process – amounts to exercising judgement to form an opinion on a gradient rate of change of a loss function with respect to known or observed information, by carrying out calculations as part of an operation, which may be aided by pen and paper) and
“wherein the gradient operation is scaled by a second scaling factor to generate outputs with a second target variance” (mental process – amounts to exercising judgment to form an opinion on how to scale known or observed data to achieve a desired output, with reference to additional variables and targets, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the processing nodes are configured to . . . of the machine learning model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“in a backward pass . . . through the layer” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the processing nodes are configured to . . . of the machine learning model” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept) and
“in a backward pass . . . through the layer” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 7 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 8:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 8 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites:
“the gradient calculation is performed” (mental process – amounts to exercising judgement to evaluate data to form an a degree of change with respect to a comparison between known or observed information).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the one or more inputs comprise weights . . . with respect to the weights” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the one or more inputs comprise weights . . . with respect to the weights” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 8 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 9:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 9 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites:
“the gradient calculation is performed” (mental process – amounts to exercising judgement to evaluate data to form an a degree of change with respect to a comparison between known or observed information).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the one or more outputs comprise activations . . . with respect to the activations” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the one or more outputs comprise activations . . . with respect to the activations” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 9 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 10:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 10 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the inputs and outputs are tensors” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the inputs and outputs are tensors” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 10 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 11:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 11 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the inputs comprise a set of input activations and a set of weights, and the outputs comprise a set of output activations” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the inputs comprise a set of input activations and a set of weights, and the outputs comprise a set of output activations” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 11 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 12:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 12 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the inputs comprise a set of input gradients and a set of weights and/or activations, and the outputs comprise a set of output gradients” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the inputs comprise a set of input gradients and a set of weights and/or activations, and the outputs comprise a set of output gradients” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 12 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 13:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 13 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the machine learning system is configured to execute a computational graph” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“the computational graph comprising: a plurality of graph nodes corresponding to computational operations, and a plurality of graph edges corresponding to inputs and outputs of the graph nodes; wherein the at least one operation corresponds to a graph node of the plurality of graph nodes of the computational graph” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the machine learning system is configured to execute a computational graph” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept) and
“the computational graph comprising: a plurality of graph nodes corresponding to computational operations, and a plurality of graph edges corresponding to inputs and outputs of the graph nodes; wherein the at least one operation corresponds to a graph node of the plurality of graph nodes of the computational graph” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 13 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 14:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 14 depends on. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the system is configured to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea);
“store the inputs and/or outputs” (amounts to insignificant extra-solution because storage of inputs and outputs is the storage and retrieval of information in memory, which is incidental to the claimed subject matter); and
“in a floating-point number representation comprising 16 bits or fewer” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the system is configured to” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept);
“store the inputs and/or outputs” (storage of information in memory is well‐understood, routine, and conventional, see Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93; which is recited here with a high level of generality, and remains insignificant extra-solution activity even upon reconsideration); and
“in a floating-point number representation comprising 16 bits or fewer” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 14 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 15:
Step 1: Claim 15 is a process claim. Therefore, claims 15-20 are directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, steps of the claimed subject matter are mental processes. Specifically, the claim recites 
“inserting a first scaling factor into the computational graph . . . the first scaling factor calculated to cause a variance of an output . . . to have a target variance” (mental process – amounts to exercising judgment to evaluate data, organized in a graph structure, to generate an output with regard to a known or observed target variance, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“A computer-implemented method comprising . . . each node of the plurality of nodes corresponding to a computational operation for training a machine learning model . . . of the at least one node” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea);
“receiving a computational graph” (amounts to insignificant extra-solution because the receiving amounts to the transmission of data, which is incidental to the claimed subject matter); and
“the computational graph comprising: a plurality of nodes . . . a plurality of edges, each edge connecting a pair of the nodes and corresponding to an output of a first node of the pair of the nodes and an input to a second node of the pair of the nodes . . . associated with at least one node of the plurality of nodes” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“A computer-implemented method comprising . . . each node of the plurality of nodes corresponding to a computational operation for training a machine learning model . . . of the at least one node” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept);
“receiving a computational graph” (transmission of data, such as through a network, see buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014), or by accessing information in memory, see Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93, is well‐understood, routine, and conventional; which is recited here with a high level of generality, and remains insignificant extra-solution activity even upon reconsideration); and
“the computational graph comprising: a plurality of nodes . . . a plurality of edges, each edge connecting a pair of the nodes and corresponding to an output of a first node of the pair of the nodes and an input to a second node of the pair of the nodes . . . associated with at least one node of the plurality of nodes” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
For the reasons above, Claim 15 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 16-20. The additional limitations of the dependent claims are addressed below.

Regarding Claim 16:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 16 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites
“wherein the computational operation is selected from one of a plurality of computational operations, and the first scaling factor is selected based on the selected computational operation” (mental process – amounts to exercising judgment to form an opinion on a function to be performed, based on a plurality of known or observed functions, and then forming an opinion on a value with reference to the first formed opinion, which may be aided by pen and paper).
Step 2A Prong 2 & Step 2B: There are no elements left for consideration of implementation within a practical application or for consideration of significantly more.
Accordingly, Claim 16 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 17:
Step 2A Prong 1: See the rejection of Claim 16 above, which Claim 17 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites
“wherein the first scaling factor is selected based on an assumed statistical distribution of inputs to the selected computational operation” (mental process – amounts to exercising judgment to form an opinion on a value, based on assumed characteristics of known or observed data; which may be aided by pen and paper).
Step 2A Prong 2 & Step 2B: There are no elements left for consideration of implementation within a practical application or for consideration of significantly more.
Accordingly, Claim 17 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 18:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 18 depends on. As discussed above, if a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Additionally, if a claim limitation, under its broadest reasonable interpretation, recites mathematical relationships, mathematical formulas or equations, or mathematical calculations, then it falls within the “Mathematical concepts” grouping of ab abstract ideas. Here, the claim recites additional elements that are mental processes and mathematical concepts. Specifically, the claim recites:
“multiplied with an output of the computational operation . . . to cause the variance to have the target variance . . . multiplied with a result of a gradient operation” (mental process – amounts to exercising to calculate scaled values of information, with reference to a known or observed target variance, which may be aided by pen and paper; mathematical concept – the multiplication of two values to generate an output amounts to a mathematical calculation or mathematical formula);
“identifying edges other than the cut edges” (mental process – amounts to exercising judgment to form an opinion on a classification for known or observed information, which may be aided by pen and paper); and 
“setting the second scaling factor of nodes connected by edges other than the cut edges equal to the first scaling factor” (mental process – amounts to exercising judgment to form an opinion on a value that should be used with certain categories of known or observed information, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“the first scaling factor is a forward scaling parameter . . . of the at least one node . . . each node comprises a second scaling factor, the second scaling factor being a backward scaling parameter . . . applied to the node . . . a subset of the edges are cut edges, the cut edges being edges that if cut disconnect the pair of nodes connected by the cut edge such that there is no other path between the pair of nodes in the computational graph” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“the first scaling factor is a forward scaling parameter . . . of the at least one node . . . each node comprises a second scaling factor, the second scaling factor being a backward scaling parameter . . . applied to the node . . . a subset of the edges are cut edges, the cut edges being edges that if cut disconnect the pair of nodes connected by the cut edge such that there is no other path between the pair of nodes in the computational graph” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 18 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 19:
Step 2A Prong 1: See the rejection of Claim 18 above, which Claim 19 depends on.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“receiving . . . input identifying the cut edges” (amounts to insignificant extra-solution because receiving information amounts to transmitting data, which is incidental to the claimed subject matter);
“via a user interface” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea); and
“user input” (amounts to merely reciting a particular technological environment or field of use, which does not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“receiving . . . input identifying the cut edges” (transmission of data, such as through a network, see buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014), or by accessing information in memory, see Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93, is well‐understood, routine, and conventional; which is recited here with a high level of generality, and remains insignificant extra-solution activity even upon reconsideration);
“via a user interface” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept); and
“user input” (merely reciting a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 19 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 20:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 20 depends on.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“receiving . . . the first scaling factor” (amounts to insignificant extra-solution because receiving information amounts to transmitting data, which is incidental to the claimed subject matter) and 
“via a user interface” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“receiving . . . the first scaling factor” (transmission of data, such as through a network, see buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014), or by accessing information in memory, see Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93, is well‐understood, routine, and conventional; which is recited here with a high level of generality, and remains insignificant extra-solution activity even upon reconsideration) and
“via a user interface” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept).
Accordingly, Claim 20 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 21:
Step 1: Claim 21 is a machine claim. Therefore, claims 21-22 are directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, steps of the claimed subject matter are mental processes. Specifically, the claim recites 
“at least one scaled operation . . . to generate a tensor of output activations with a target variance” (mental process – amounts to exercising judgment to evaluate data to generate an output with regard to a known or observed target variance, which may be aided by pen and paper) .
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“A non-transitory computer-readable medium comprising computer-executable instructions, the instructions when executed implementing a neural network, wherein the instructions comprise first code embodying . . . configured to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and 
“receive a tensor of weights and a tensor of input activations” (amounts to insignificant extra-solution because receiving tensors amounts to the transmission of data, which is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“A non-transitory computer-readable medium comprising computer-executable instructions, the instructions when executed implementing a neural network, wherein the instructions comprise first code embodying . . . configured to” (mere instructions to apply the exception using generic computer components cannot provide an inventive concept) and
“receive a tensor of weights and a tensor of input activations” (transmission of data, such as through a network, see buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014), or by accessing information in memory, see Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93, is well‐understood, routine, and conventional; which is recited here with a high level of generality, and remains insignificant extra-solution activity even upon reconsideration).
For the reasons above, Claim 21 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claim 22. The additional limitations of the dependent claim are addressed below.

Regarding Claim 22, the claim recites substantially the same limitations as Claim 2. The claim is also directed to performing mental processes without integration into a practical component or significantly more. 
Accordingly, Claim 22 is rejected under the same rationale. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 15-17 and 21-22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bingham et al. (hereinafter Bingham) (“AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks”).

Regarding Claim 15, Bingham teaches a computer-implemented method comprising (Pg. 19, Col. 1, Para. 2, “F Computing Infrastructure[:] Experiments in this paper were run in a distributed frame
work using StudioML software . . . to place jobs on machines with NVIDIA GeForce GTX 108 0Ti and RTX 2080 Ti GPUs. The AutoInit package is available at https://github.com/cognizant-ai-labs/autoinit”, where the “machines with NVIDIA GeForce GTX 108 0Ti and RTX 2080 Ti GPUs” implemented the methods of the “AutoInit package”):
	receiving a computational graph, the computational graph comprising (Pg. 6, Col. 2, Para. 3, “CoDeepNEAT evolves populations of modules and blueprints simultaneously (Figure6a).Modules are small neural networks, complete with layers, connections, and hyperparameters. Blueprints are computation graphs” and Pg. 6, Col. 2, Fig. 6, “The CoDeepNEAT method. Modules replace nodes in the blue print to create a candidate neural network”, where the computational graph, “Blueprints are computation graphs”, must be received to be “evolve[d]” using the “CoDeepNEAT method”):
	a plurality of nodes, each node of the plurality of nodes corresponding to a computational operation for training a machine learning model (Pg. 6-7, Col. 2-1, Para. 3-1, “CoDeepNEAT CoDeepNEAT evolves populations of modules and blueprints simultaneously (Figure6a).Modules are small neural networks, complete with layers, connections, and hyperparameters. Blueprints are computation graphs containing only nodes and directed edges. To create a candidate neural network, CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules . . . CoDeepNEAT evolves hyperparameters like dropout rate, kernel regularization, and learning rate. The network weights are not evolved, but instead  trained with gradient descent . . . [and is] well-suited to analyzing AutoInit’s performance in a variety of open-ended machine learning settings”, where the computational graph comprises a plurality of nodes, “Blueprints are computation graphs containing . . . nodes”, each of which corresponds with a machine learning model, “Modules are small neural networks . . . CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules”, which, in turn, correspond with a computational operations for training the model, as demonstrated by recitations of “dropoutrate, kernel regularization, and learning rate” and “network weights are . . . trained with gradient descent”), and 
a plurality of edges, each edge connecting a pair of the nodes and corresponding to an output of a first node of the pair of the nodes and an input to a second node of the pair of the nodes (Pg. 6, Col. 2, Para. 3, “Blueprints are computation graphs containing only nodes and directed edges” and Pg. 6, Col. 2, Fig. 6, where the computational graph, “Blueprints”, contain a plurality of “directed edges” connecting pairs of “nodes”, which a person of ordinary skill in the art would understand the arrows representing the directed edges to correspond with the use of the “output” of the first node as “input” for the second node in the pair, see Pg. 7, Col. 1, Para. 6, “the networks make use of different activation functions and contain several unique information processing paths from the input to the output”);
	inserting a first scaling factor into the computational graph associated with at least one node of the plurality of nodes (Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”, where the “reinstantiat[ion]” of “the model with the correct weight scaling” is within the broadest reasonable interpretation of inserting a first scaling factor, which occurs while “traversing the TensorFlow computation graph”, which, as discussed above, is associated with the at least one node of the plurality of nodes when “AutoInit” is in conjunction with “CoDeepNEAT[‘s]” “Assembled Network”, see Pg. 6, Col. 2, Fig. 6, “Modules replace
nodes in the blue print to create a candidate neural network” and Pg. 7, Col. 1, Para. 1, “The generality of CoDeepNEAT helps minimize human design biases and makes it well-suited to analyzing AutoInit’s performance in a variety of open-ended machine learning settings”), 
the first scaling factor calculated to cause a variance of an output of the at least one node to have a target variance (Pg. 3, Col. 1, Para. 2, “AutoInit aims to stabilize signal propagation throughout an entire neural network. More precisely, consider a layer that . . . scales the input by a factor of β. Given an input signal with . . . variance νin, after applying the layer, the output signal will have . . . variance νout = β2νin . . . AutoInit calculates analytic mean- and variance-preserving weight initialization so that . . . β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals”, where the scaling factor, “scales the input by a factor of β”, is calculated such that “β” is scaled to “β = 1”, where either the value used to scale “β” to “β = 1” or “β” itself is within the broadest reasonable interpretation of a scaling factor, to cause a “variance” of the output to have a target variance necessary “to stabilize signal propagation” by “avoiding the issues of mean shift and exploding/vanishing signals”; see also Pg. 7, Col. 1, Para. 6, “the networks make use of different activation functions and contain several unique information processing paths from the input to the output” and Pg. 6, Col. 2, Fig. 6, “Modules replace nodes in the blue print to create a candidate neural network”, where the output is from the at least one node).

Regarding Claim 16, Bingham teaches the method of claim 15, wherein the computational operation is selected from one of a plurality of computational operations (Pg. 13, Col. 1, Para. 1-2, “In the AutoInit framework of Algorithm 1, the mean and variance mapping function g needs to be defined for each type of layer in a given neural network”, where the selected plurality of computational operations, “layer” and “function g” for “each type of later” are selected from the plurality of possible layer types, such as “Convolution and Dense Layers”, “Activation Functions”, and “Dropout Layers”, see Pg. 13-15), and 
the first scaling factor is selected based on the selected computational operation (Pg. 3, Col. 1, Para. 2-3, “AutoInit calculates analytic mean- and variance-preserving weight initialization so that α = 0 and β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals . . . The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout). (4) Note that g in Equation 4 depends on the type of layer; e.g. gDropout and gReLU are different functions. For layers with trainable weights, the mean and variance mapping will depend on those weights . . . Thus, if µin and νin are known, it is natural to calculate initial weights θ such that the layer output will have zero mean and unit variance”, where the “variance mapping” “depends on the type of layer; gDropout and gReLU are different functions” and the “layer” “weights”, therefore, the scaling factor needed to scale to “β = 1” is based on the selected computation operation).

Regarding Claim 17, Bingham teaches the method of claim 16, wherein the first scaling factor is selected based on an assumed statistical distribution of inputs to the selected computational operation (Pg. 13, Col. 1, Para. 2, “Inputs to each layer are assumed to be independent and normally distributed. Although these assumptions may not always hold exactly, experiments show that AutoInit models signal propagation across different types of networks well in practice”, where, for the selected computational operation, the computations at the “layer”, the statistical distribution of the inputs is assumed, “Inputs to each layer are assumed to be independent and normally distributed”, which is an underlying “assumption” on which the “AutoInit” process is based upon, including the selection of which scaling factor will allow for “β = 1” after scaling, see Pg. 3, Col. 1, Para. 2-3, “AutoInit calculates analytic mean- and variance-preserving weight initialization so that α = 0 and β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals . . . The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout). (4) Note that g in Equation 4 depends on the type of layer; e.g. gDropout and gReLU are different functions. For layers with trainable weights, the mean and variance mapping will depend on those weights . . . Thus, if µin and νin are known, it is natural to calculate initial weights θ such that the layer output will have zero mean and unit variance”).

Regarding Claim 21, Bingham teaches a non-transitory computer-readable medium comprising computer-executable instructions, the instructions when executed implementing a neural network (Pg. 1, Col. 1, Abstract, “AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around TensorFlow models and is available at https://github.com/cognizant-ai-labs/autoinit”, where “neural network” “TensorFlow models” are implemented using a set of computer-executable instructions contained at “https://github.com/cognizant-ai-labs/autoinit”, which are accessed and executed using a machine learning system, a “Computing Infrastructure” using “StudioML”, “NVIDA” hardware, and the “AutoInit” package, see Pg. 19, Col. 1, Para. 2, “Computing Infrastructure[:] Experiments in this paper were run in a distributed framework using StudioML software . . . to place jobs on machines with NVIDIA GeForce GTX 1080 Ti and RTX 2080 Ti GPUs . . . The AutoInit package is available at https://github.com/cognizant-ai-labs/autoinit”, which requires a non-transitory computer-readable medium to “run” the “[e]xperiments” using the “StudioML” software, “NVIDA” hardware, and the “AutoInit” package),
wherein the instructions comprise first code embodying at least one scaled operation (Pg. 1, Col. 1, Abstract, “ AutoInit appropriately scales the weights at each layer to avoid exploding or vanishing signals . . . The AutoInit package provides a wrapper around TensorFlow models”, where the “wrapper” code instructions contained in the “AutoInit package” comprises a “scale[d]” operation “at each layer”, “AutoInit appropriately scales the weights at each layer”)
configured to receive a tensor of weights and a tensor of input activations and to generate a tensor of output activations with a target variance (Pg. 3, Col. 1, Para. 3, “A given layer in a neural network receives as its input a tensor x with mean µin and variance νin. After applying the layer, the output tensor has mean µout = E(layer(x)) and variance νout = Var(layer(x)). The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout) . . . For layers with trainable weights, the mean and variance mapping will depend on those weights. For example, the function gConv2D,θ maps input mean and variance to output mean and variance after the application of a Conv2D layer parameterized by weights θ”, where “applying the layer”, including the “map[ping]” of the “function glayer”, is one of the at least one scaled operations; where the operation, “layer”, is configured to receive a tensor of input activations, “input a tensor x”, and to generate a tensor of output activations with a variance, “output tensor has . . . variance νout”; and where “the function gConv2D,θ”, which is a subcomponent of the operation, “The function glayer maps . . . when the layer is applied”, is configured to receive a tensor of weights “weights θ”; see also Pg. 3, Col. 1, Para. 2, “AutoInit aims to stabilize signal propagation throughout an entire neural network. More precisely, consider a layer that . . . scales the input by a factor of β. Given an input signal with . . . variance νin, after applying the layer, the output signal will have . . . variance νout = β2νin . . . AutoInit calculates analytic mean- and variance-preserving weight initialization so that . . . β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals”, where the “variance” is the target “to stabilize signal propagation” by “avoiding the issues of mean shift and exploding/vanishing signals”; Pg. 7, Col. 1, Para. 6, “the networks make use of different activation functions and contain several unique information processing paths from the input to the output”, where the “input[s]” are activation inputs and the “output[s]” are activation outputs because they are inputs/outputs of “activation functions”).

Regarding Claim 22, Bingham teaches the non-transitory computer-readable medium of claim 21, wherein the target variance is unit variance (Pg. 3, Col. 1, Para. 3, “Deriving g for all layers makes it possible to model signal propagation across an entire neural network. Thus, if µin and νin are known, it is natural to calculate initial weights θ such that the layer output will have zero mean and unit variance”; see also Pg. 3, Col. 1, Para. 2, “AutoInit aims to stabilize signal propagation throughout an entire neural network. More precisely, consider a layer that . . . scales the input by a factor of β. Given an input signal with . . . variance νin, after applying the layer, the output signal will have . . . variance νout = β2νin . . . AutoInit calculates analytic mean- and variance-preserving weight initialization so that . . . β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals”).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-5 and 10-13 are rejected under 35 U.S.C. 103 as being unpatentable over Bingham in view of Kornbluth et al. (hereinafter Kornbluth) (Pat. Pub. No. US 2022/0100931 A1).

Regarding Claim 1, Bingham teaches a machine learning system implementing a machine learning model, the system comprising (Pg. 1, Col. 1, Abstract, “AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around TensorFlow models”, where “neural network” “TensorFlow models” are machine learning models, which are implemented on a machine learning system, a “Computing Infrastructure” using “StudioML”, “NVIDA” hardware, and the “AutoInit” package, see Pg. 19, Col. 1, Para. 2, “Computing Infrastructure[:] Experiments in this paper were run in a distributed framework using StudioML software . . . to place jobs on machines with NVIDIA GeForce GTX 1080 Ti and RTX 2080 Ti GPUs . . . The AutoInit package is available at https://github.com/cognizant-ai-labs/autoinit”): 
at least one layer of processing nodes, each processing node . . . configured to execute computer readable instructions to perform at least one operation based on one or more inputs received at the processing node (Pg. 6-7, Col. 2-1, Para. 3-6, “CoDeepNEAT CoDeepNEAT evolves populations of modules and blueprints simultaneously (Figure6a).Modules are small neural networks, complete with layers, connections, and hyperparameters. Blueprints are computation graphs containing only nodes and directed edges. To create a candidate neural network, CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules . . . CoDeepNEAT evolves hyperparameters like dropout rate, kernel regularization, and learning rate. The network weights are not evolved, but instead  trained with gradient descent . . . [and is] well-suited to analyzing AutoInit’s performance in a variety of open-ended machine learning settings . . . the networks make use of different activation functions and contain several unique information processing paths from the input to the output”, where the computational graph comprises a plurality of nodes, “Blueprints are computation graphs containing . . . nodes”, each of which corresponds with a machine learning model, “Modules are small neural networks . . . CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules”, which in turn perform operations, “the networks make use of different activation functions and contain several unique information processing paths”, based on one or more received inputs, “unique information processing paths from the input to the output”, which must require execution of computer readable instructions, such as the code corresponding to the “AutoInit” algorithm, see Pg. 3, Col. 2, Algo. 1; see also Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”), 
wherein the at least one operation is scaled (Pg. 1, Col. 1, Abstract, “ AutoInit appropriately scales the weights at each layer to avoid exploding or vanishing signals . . . The AutoInit package provides a wrapper around TensorFlow models”, where the “wrapper” code instructions contained in the “AutoInit package” comprises a “scale[d]” operation “at each layer”, “AutoInit appropriately scales the weights at each layer”)
by a first scaling factor which has been calculated to cause a variance of an output of the at least one operation to have a target variance (Pg. 3, Col. 1, Para. 3, “A given layer in a neural network receives as its input a tensor x with mean µin and variance νin. After applying the layer, the output tensor has mean µout = E(layer(x)) and variance νout = Var(layer(x)). The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout) . . . For layers with trainable weights, the mean and variance mapping will depend on those weights. For example, the function gConv2D,θ maps input mean and variance to output mean and variance after the application of a Conv2D layer parameterized by weights θ”, where “applying the layer”, including the “map[ping]” of the “function glayer”, is one of the at least one scaled operations; where the operation, “layer”, is configured to generate an output with a variance, “output tensor has . . . variance νout”; Pg. 3, Col. 1, Para. 2, “AutoInit aims to stabilize signal propagation throughout an entire neural network. More precisely, consider a layer that . . . scales the input by a factor of β. Given an input signal with . . . variance νin, after applying the layer, the output signal will have . . . variance νout = β2νin . . . AutoInit calculates analytic mean- and variance-preserving weight initialization so that . . . β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals”, where the “variance” is the target “to stabilize signal propagation” by “avoiding the issues of mean shift and exploding/vanishing signals”, which is scaled by a scaling factor, “scales the input by a factor of β”, which is calculated such that “β” is scaled to “β = 1”, where either the value used to scale “β” to “β = 1” or “β” itself is within the broadest reasonable interpretation of a scaling factor, to cause a “variance” of the output to have a target variance necessary “to stabilize signal propagation”).
Bingham does not explicitly disclose . . . comprising a processor . . . (where the relationship between processing nodes and processors is not specifically discussed; but see Pg. 19, Col. 1, Para. 2, “F Computing Infrastructure[:] Experiments in this paper were run in a distributed framework using StudioML software . . . to place jobs on machines with NVIDIA GeForce GTX 108 0Ti and RTX 2080 Ti GPUs”).
	However, Kornbluth teaches . . . [a plurality of processing nodes, each processing node] comprising a processor [configured to execute computer readable instructions to perform at least one operation] . . . (Claim 10, “A computational system for element simulation using a machine learning system parallelized across a plurality of processors, the system comprising: a plurality of processing nodes, each node including a memory storing instructions of a GNN algorithm . . . and a processor programmed to execute the instructions”).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the layers of processing nodes, wherein each processing node is configured to execute computer instructions to perform operations based on received inputs of Bingham with the plurality of processing nodes, wherein each processing node comprises a processor configured to execute computer instructions to perform an operation of Kornbluth in order to parallelize the operations of the machine learning system (Kornbluth, Abstract, “Element simulation is described using a machine learning system parallelized across a plurality of processors”), which increases efficiency of machine learning operations (see Kornbluth, Para. [0032], “Deep neural networks are typically slow on CPUs, because the key operation of matrix multiplication is not as efficiently parallelized . . . even a 10x slowdown by using CPU makes parallelized CPU more efficient than a single GPU”).

Regarding Claim 2, the additional elements are substantially the same as the limitations of Claim 22, therefore it is rejected under the same rationale.

Regarding Claim 3, Bingham in view of Kornbluth teach the system of claim 1, wherein the target variance is a variance which matches a variance of the one or more inputs (Bingham, Pg. 3, Col. 1, Para. 2, “νout = β2νin . . . AutoInit calculates analytic mean- and variance-preserving weight initialization so that α = 0 and β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals”, where the “vout” is matched with “vin”, and the target variance, “vout” when “β = 1”, will match the variance of the inputs, “vin”).

Regarding Claim 4, Bingham in view of Kornbluth teach the system of claim 1, wherein the at least one operation is implemented in a forward pass of the machine learning model (Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network”, where the “forward pass” of the machine learning model, “network”, includes the at least one operation, “scale[d]” operation “at each layer”, see Bingham, Pg. 1, Col. 1, Abstract, “ AutoInit appropriately scales the weights at each layer to avoid exploding or vanishing signals . . . The AutoInit package provides a wrapper around TensorFlow models”).

Regarding Claim 5, Bingham in view of Kornbluth teach the system of claim 4, wherein the system is configured to perform a training process to train the machine learning model, and the forward pass forms part of the training process (Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network”, where the “forward pass” of the machine learning model, “network”, can be part  of the “training” process, see Bingham, Pg. 6, Col. 1, Para. 2, “ResNet-50 was trained from scratch on ImageNet with the default
Initialization and with AutoInit”; see also Bingham, Pg. 19, Col. 1, Para. 2, “F Computing Infrastructure[:] Experiments in this paper were run in a distributed frame work using StudioML software . . . to place jobs on machines with NVIDIA GeForce GTX 108 0Ti and RTX 2080 Ti GPUs. The AutoInit package is available at https://github.com/cognizant-ai-labs/autoinit”).

Regarding Claim 10, Bingham in view of Kornbluth teach the system of claim 1, wherein the inputs and outputs are tensors (Bingham, Pg. 3, Col. 1, Para. 3, “A given layer in a neural network receives as its input a tensor x with mean µin and variance νin. After applying the layer, the output tensor has mean µout = E(layer(x)) and variance νout = Var(layer(x))”).

Regarding Claim 11, Bingham in view of Kornbluth teach the system of claim 1, wherein the inputs comprise a set of input activations and a set of weights, and the outputs comprise a set of output activations (Bingham, Pg. 3, Col. 1, Para. 3, “A given layer in a neural network receives as its input a tensor x with mean µin and variance νin. After applying the layer, the output tensor has mean µout = E(layer(x)) and variance νout = Var(layer(x)). The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout) . . . For layers with trainable weights, the mean and variance mapping will depend on those weights. For example, the function gConv2D,θ maps input mean and variance to output mean and variance after the application of a Conv2D layer parameterized by weights θ”, where the “layer” is configured to receive a tensor of input activations, “input a tensor x”, and to generate a tensor of output activations with a variance, “output tensor has . . . variance νout”; and where “the function gConv2D,θ”, which is a subcomponent of the layer, “The function glayer maps . . . when the layer is applied”, is configured to receive a set of weights, “weights θ”; see also Bingham, Pg. 7, Col. 1, Para. 6, “the networks make use of different activation functions and contain several unique information processing paths from the input to the output”, where the “input[s]” are activation inputs and the “output[s]” are activation outputs because they are inputs/outputs of “activation functions”).

Regarding Claim 12, Bingham in view of Kornbluth teach the system of claim 1, wherein the inputs comprise a set of input gradients and a set of weights and/or activations, and the outputs comprise a set of output gradients (Bingham, Pg. 3, Col. 1, Para. 3, “A given layer in a neural network receives as its input a tensor x with mean µin and variance νin. After applying the layer, the output tensor has mean µout = E(layer(x)) and variance νout = Var(layer(x)). The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout) . . . For layers with trainable weights, the mean and variance mapping will depend on those weights. For example, the function gConv2D,θ maps input mean and variance to output mean and variance after the application of a Conv2D layer parameterized by weights θ”, where “applying the layer”, including the “map[ping]” of the “function glayer”, is one of the at least one scaled operations; where the operation, “layer”, is configured to receive an input, “input a tensor x”, and to generate an output, “output tensor”; and where “the function gConv2D,θ”, which is a subcomponent of the operation, “The function glayer maps . . . when the layer is applied”, is configured to receive an input of weights “weights θ”; Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network. It is possible to similarly model the back ward pass of gradients from the output to the input”, where the inputs and outputs can be “gradients” when “the backward pass” is “model[ed]”).

Regarding Claim 13, Bingham in view of Kornbluth teach the system of claim 1, wherein the machine learning system is configured to (Bingham, Pg. 1, Col. 1, Abstract, “AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around TensorFlow models”, where “neural network” “TensorFlow models” are machine learning models, which are implemented on a machine learning system, a “Computing Infrastructure” using “StudioML”, “NVIDA” hardware, and the “AutoInit” package, see Bingham, Pg. 19, Col. 1, Para. 2, “Computing Infrastructure[:] Experiments in this paper were run in a distributed framework using StudioML software . . . to place jobs on machines with NVIDIA GeForce GTX 1080 Ti and RTX 2080 Ti GPUs . . . The AutoInit package is available at https://github.com/cognizant-ai-labs/autoinit”) 
execute a computational graph, the computational graph comprising (Bingham, Pg. 6, Col. 2, Para. 3, “CoDeepNEAT evolves populations of modules and blueprints simultaneously (Figure6a).Modules are small neural networks, complete with layers, connections, and hyperparameters. Blueprints are computation graphs” and Bigham, Pg. 6, Col. 2, Fig. 6, “The CoDeepNEAT method. Modules replace nodes in the blue print to create a candidate neural network”, where the computational graph, “Blueprints are computation graphs”, must be executed to be “evolve[d]” using the “CoDeepNEAT method”): 
a plurality of graph nodes corresponding to computational operations (Bingham, Pg. 6-7, Col. 2-1, Para. 3-1, “CoDeepNEAT CoDeepNEAT evolves populations of modules and blueprints simultaneously (Figure6a).Modules are small neural networks, complete with layers, connections, and hyperparameters. Blueprints are computation graphs containing only nodes and directed edges. To create a candidate neural network, CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules . . . CoDeepNEAT evolves hyperparameters like dropout rate, kernel regularization, and learning rate. The network weights are not evolved, but instead  trained with gradient descent . . . [and is] well-suited to analyzing AutoInit’s performance in a variety of open-ended machine learning settings”, where the computational graph comprises a plurality of nodes, “Blueprints are computation graphs containing . . . nodes”, each of which corresponds with a machine learning model, “Modules are small neural networks . . . CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules”, which, in turn, correspond with a computational operations for training the model, as demonstrated by recitations of “dropout rate, kernel regularization, and learning rate” and “network weights are . . . trained with gradient descent”), and 
a plurality of graph edges corresponding to inputs and outputs of the graph nodes (Bingham, Pg. 6, Col. 2, Para. 3, “Blueprints are computation graphs containing only nodes and directed edges” and Bingham, Pg. 6, Col. 2, Fig. 6, where the computational graph, “Blueprints”, contain a plurality of “directed edges” connecting pairs of “nodes”, which a person of ordinary skill in the art would understand the arrows representing the directed edges to correspond with the use of the “output” of the first node as “input” for the second node in the pair, see Bingham, Pg. 7, Col. 1, Para. 6, “the networks make use of different activation functions and contain several unique information processing paths from the input to the output”);
	wherein the at least one operation corresponds to a graph node of the plurality of graph nodes of the computational graph (Bingham, Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”, where the “reinstantiat[ion]” of “the model with the correct weight scaling” is the one operation, which occurs while “traversing the TensorFlow computation graph”, which, which, as discussed above, is associated with the at least one node of the plurality of nodes when “AutoInit” is in conjunction with “CoDeepNEAT[‘s]” “Assembled Network”, see Bingham, Pg. 6, Col. 2, Fig. 6, “Modules replace nodes in the blue print to create a candidate neural network” and Bingham, Pg. 7, Col. 1, Para. 1, “The generality of CoDeepNEAT helps minimize human design biases and makes it well-suited to analyzing AutoInit’s performance in a variety of open-ended machine learning settings”).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Bingham in view of Kornbluth and Yang et al. (hereinafter Yang) (“Dropout Inference with Non-Uniform Weight Scaling”).

Regarding Claim 6, Bingham in view of Kornbluth teach the system of claim 4, wherein system is configured to perform . . . the forward pass forms part of the [weight scaling] . . . (Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network”, where the “forward pass” of the machine learning model, “network”, where “AutoInit” performs weight scaling, see Bingham, Pg. 2, Para. 3, “How can weights be scaled so that repeated applications of the activation function do not result in vanishing or exploding signals?”; see also Bingham, Pg. 19, Col. 1, Para. 2, “F Computing Infrastructure[:] Experiments in this paper were run in a distributed framework using StudioML software . . . to place jobs on machines with NVIDIA GeForce GTX 108 0Ti and RTX 2080 Ti GPUs. The AutoInit package is available at https://github.com/cognizant-ai-labs/autoinit”).
	Bingham in view of Kornbluth do not explicitly disclose . . . an inference process, and . . . inference process (where the forward pass and weight scaling is not specifically described in regard to an inference process).
	However, Yang teaches . . . an inference process (Pg. 1, Para. 4, “This work focuses on standard dropout at inference time”), and 
[the weight scaling forms part of the] . . . inference process (Pg. 4, Para. 1, “For the layer i during inference time, weight scaling scales down weights uniformly by probability p. We propose a different formulation where weights are scaled non-uniformly during inference time”; see also Pg. 1, Para. 4, “Dropout was first proposed . . . to prevent overfitting for training neural networks . . . This work focuses on standard dropout at inference time; thus, in the subsequent sections, the discussion is concentrated on dropout inference”, where a person of ordinary skill in the art would understand the “inference” of a ”neural network” to comprise a forward pass, therefore, the “weight scaling” occurs during the forward pass).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the performance of a forward pass as part of a weight scaling operation of Bingham in view of Kornbluth  with the inference process comprising a forward pass, wherein the weight scaling forms part of the inference process of Yang in order to provide better inference results for models trained using dropout (compare Yang, Pg. 6, Para. 2, “models trained with dropout behave more similar to boosting than bagging . . . we propose a non-uniform weight scaling and observe that a non-uniform weight scaling could provide a better result for such situation” with Bingham, Pg. 9, Col. 2, Para. 1, “This paper introduced AutoInit, an algorithm that calculates analytic mean- and variance-preserving weight initialization for neural networks automatically. In convolutional networks, the initialization improved performance with different activation functions, dropout rates, learning rates, and weight decay settings”, where the system is compatible with “droupout”).

Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Bingham in view of Kornbluth and Chen et al. (hereinafter Chen) (“GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks”).

Regarding Claim 7, Bingham in view of Kornbluth teach the system of claim 1, wherein the processing nodes are configured to determine . . . [operation values] in a backward pass of the machine learning model through the layer by carrying out a gradient calculation in a gradient operation (Bingham, Pg. 6-7, Col. 2-1, Para. 3-6, “CoDeepNEAT CoDeepNEAT evolves populations of modules and blueprints simultaneously (Figure6a).Modules are small neural networks, complete with layers, connections, and hyperparameters. Blueprints are computation graphs containing only nodes and directed edges. To create a candidate neural network, CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules . . . CoDeepNEAT evolves hyperparameters like dropout rate, kernel regularization, and learning rate. The network weights are not evolved, but instead  trained with gradient descent . . . [and is] well-suited to analyzing AutoInit’s performance in a variety of open-ended machine learning settings . . . the networks make use of different activation functions and contain several unique information processing paths from the input to the output”, where the computational graph comprises a plurality of nodes, “Blueprints are computation graphs containing . . . nodes”, each of which corresponds with a machine learning model, “Modules are small neural networks . . . CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules”, which in turn perform operations as processing nodes, “the networks make use of different activation functions and contain several unique information processing paths”, based on one or more received inputs, “unique information processing paths from the input to the output”; Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network. It is possible to similarly model the back ward pass of gradients from the output to the input”, where the processing nodes are capable of determining operation values in a backward pass of the machine learning model by carrying out gradient calculations in a gradient operation at each layer, “similarly model the back ward pass of gradients from the output to the input”), 
wherein the gradient operation is scaled by a second scaling factor to generate outputs with a second target variance (Bingham, Pg. 13, Col. 1, Para. 1-2, “In the AutoInit framework of Algorithm 1, the mean and variance mapping function g needs to be defined for each type of layer in a given neural network”, where the selected computational operations are specific to each layer, “layer” and “function g” for “each type of later”; Bingham, Pg. 3, Col. 1, Para. 2-3, “AutoInit calculates analytic mean- and variance-preserving weight initialization so that α = 0 and β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals . . . The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout). (4) Note that g in Equation 4 depends on the type of layer; e.g. gDropout and gReLU are different functions. For layers with trainable weights, the mean and variance mapping will depend on those weights . . . Thus, if µin and νin are known, it is natural to calculate initial weights θ such that the layer output will have zero mean and unit variance”, where the “variance mapping” “depends on the type of layer; gDropout and gReLU are different functions” and the “layer” “weights”, as a result, the scaling factor needed to scale to “β = 1” and the target variance, “νou”, therefore, given that “AutoInit” is compatible with models with multiple layer types, see Bingham, Pg. 13, Col. 1, Para. 1, “In the AutoInit framework of Algorithm 1, the mean and variance mapping function g needs to be defined for each type of layer in a given neural network”, both the forward and backward pass will have at least a second scaling factor and second target variance, see Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network. It is possible to similarly model the backward pass of gradients from the output to the input”).
	Bingham in view of Kornbluth . . . a gradient of a loss function . . . (where the a backward pass of gradients is discussed without specifically discussing a loss function).
	However, Chen teaches . . . [calculating] a gradient of a loss function . . . [of a backward pass of the machine learning model through layers by carrying out a gradient calculation in a gradient operation] (Pg. 1, Col. 2, Para. 3, “our case, we propose an adaptive method, and so wi can vary at each training step t: wi = wi(t). This linear form of the loss function is convenient for implementing gradient balancing . . . To optimize the weights wi(t) for gradient balancing, we propose a simple algorithm that penalizes the network when backpropagated gradients from any task are too large or too small”, where a “loss function” for “gradient balancing”, which requires gradient calculation in a gradient operation, is part of a “backpropagation” in model training, “penalizes the network”; see also Pg. 4, Col. 1, Para. 3, “To train our toy models, we use a 4-layer fully-connected ReLU-activated network with 100 neurons per layer as a common trunk. A final affine transformation layer gives T final predictions (corresponding to T different tasks)”, where the “train[ing]” is for machine learning model layers)
[wherein the gradient operation is scaled by a scaling factor to generate outputs] (Pg. 4, Col. 2, Para. 1, “we measure the task-normalized test time loss to judge test-time performance, which is the sum of the test loss ratios for each task . . . There is therefore a clear measure of overall network performance, which is the sum of losses normalized by each task’s variance σ2 i equivalent (up to a scaling factor) to the sum of loss ratios”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the processing nodes configured to determine operation values as part of a backward pass of a machine learning model, wherein the layer nodes carry out gradient calculations in gradient operations of Bingham in view of Kornbluth with the calculation of a gradient loss function as part of a backward pass through layers of a machine learning model, wherein gradient calculations are carried out as part of gradient operations of Chen in order to implement a simple algorithm to optimize model weights (Chen Pg. 1, Col. 2, Para. 3, “To optimize the weights wi(t) for gradient balancing, we propose a simple algorithm that penalizes the network when backpropagated gradients from any task are too large or too small”), which contributes to a machine learning model with improved accuracy and reduced risks of overfitting (Chen, Pg. 1, Col. 1, Abstract, “GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques”).

Regarding Claim 8, Bingham in view of Kornbluth and Chen teach the system of claim 7, wherein the one or more inputs comprise weights (Bingham, Pg. 3, Col. 1, Para. 3, “A given layer in a neural network receives as its input a tensor x with mean µin and variance νin. After applying the layer, the output tensor has mean µout = E(layer(x)) and variance νout = Var(layer(x)). The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout) . . . For layers with trainable weights, the mean and variance mapping will depend on those weights. For example, the function gConv2D,θ maps input mean and variance to output mean and variance after the application of a Conv2D layer parameterized by weights θ”, where “applying the layer”, including the “map[ping]” of the “function glayer”, is one of the at least one scaled operations; where “the function gConv2D,θ”, which is a subcomponent of the operation, “The function glayer maps . . . when the layer is applied”, is configured to receive inputs of weights “weights θ”), 
and the gradient calculation is performed with respect to the weights (Chen, Pg. 1, Col. 2, Para. 3, “To optimize the weights wi(t) for gradient balancing, we propose a simple algorithm that penalizes the network when backpropagated gradients from any task are too large or too small”). 
	The reasons for obviousness were discussed in regard to the rejection of claim 7 above and remain applicable here.
Specifically, before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the one or more inputs comprising weights of Bingham in view of Kornbluth and Chen and the gradient calculation performed with respect to the weights in further view of Chen in order to implement a simple algorithm to optimize model weights (Chen Pg. 1, Col. 2, Para. 3, “To optimize the weights wi(t) for gradient balancing, we propose a simple algorithm that penalizes the network when backpropagated gradients from any task are too large or too small”), which contributes to a machine learning model with improved accuracy and reduced risks of overfitting (Chen, Pg. 1, Col. 1, Abstract, “GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques”).

Regarding Claim 9, Bingham in view of Kornbluth and Chen teach the system of claim 7, wherein the one or more outputs comprise activations (Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network” and Bingham, Pg. 7, Col. 1, Para. 6, “the networks make use of different activation functions and contain several unique information processing paths from the input to the output”, where the “input[s]” are activation inputs and the “output[s]” are activation outputs because they are inputs/outputs of “activation functions”)
and the gradient calculation is performed with respect to the activations (Bingham, Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network. It is possible to similarly model the backward pass of gradients from the output to the input”, where the “gradients” can be calculated to perform the “backward pass” with respect to the generated “activations” of the “forward pass”). 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Bingham in view of Kornbluth and Wang et al. (hereinafter Wang) (“BFloat16: The secret to high performance on Cloud TPUs”).

Regarding Claim 14, Bingham in view of Kornbluth teach the system of claim 1, wherein the system is configured to store the inputs and/or outputs . . . (Bingham Pg. 7, Col. 1, Para. 6, “the networks make use of different activation functions and contain several unique information processing paths from the input to the output”, where the inputs and outputs must be stored, at least temporarily, for the “information processing paths from the input to the output” to be operationalized).
Bingham in view of Kornbluth . . . in a floating-point number representation comprising 16 bits or fewer (where the manner of input and/or output storage is not specifically discussed).
However, Wang teaches . . . [storing inputs and outputs] in a floating-point number representation comprising 16 bits or fewer (Pg. 1, Para. 4, “Bfloat16 is a custom 16-bit floating point format for machine learning . . . This is different from the industry-standard IEEE 16-bit floating point, which was not designed with deep learning applications in mind”, where each of “Bfloat 16” and “16-bit floating point” are a floating-point number representation comprising 16 bits or fewer; see also Pg. 3, Para. 3, “Storing values in bfloat16 format saves on-chip memory, making 8 GB of memory per core feel more like 16 GB, and 16 GB feel more like 32 GB. More extensive use of bfloat16 enables Cloud TPUs to train models that are deeper, wider, or have larger inputs . . . Storing operands and outputs of those ops in the bfloat16 format reduces the amount of data that must be transferred, improving speed”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the system configured to store inputs and outputs of Bingham in view of Kornbluth with the storing of inputs and outputs in a floating-point number representation comprising 16 bits or fewer of Wang in order to save-on chip memory, allowing for larger inputs in model training, and to improve speed of data transfer (Wang, Pg. 3, Para. 3, “Storing values in bfloat16 format saves on-chip memory, making 8 GB of memory per core feel more like 16 GB, and 16 GB feel more like 32 GB. More extensive use of bfloat16 enables Cloud TPUs to train models that are deeper, wider, or have larger inputs . . . Storing operands and outputs of those ops in the bfloat16 format reduces the amount of data that must be transferred, improving speed”) or to comply with industry standards (Wang, Pg. 1, Para. 4, “the industry-standard IEEE 16-bit floating point”).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Bingham in view of Cairo et al. (hereinafter Cairo) (“A simplified algorithm computing  all s-t bridges and articulation points”).

Regarding Claim 18, Bingham teaches the method of claim 15, wherein:
	the first scaling factor is a forward scaling parameter multiplied with an output of the computational operation (Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network”, where the “forward pass” of the machine learning model, “network”, includes the at least one operation, “scale[d]” operation “at each layer”, see Pg. 1, Col. 1, Abstract, “ AutoInit appropriately scales the weights at each layer to avoid exploding or vanishing signals . . . The AutoInit package provides a wrapper around TensorFlow models”)
of the at least one node (Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”, where the “reinstantiat[ion]” of “the model with the correct weight scaling” is within the broadest reasonable interpretation of inserting a first scaling factor, which occurs while “traversing the TensorFlow computation graph”, which, as discussed above, is associated with the at least one node of the plurality of nodes when “AutoInit” is in conjunction with “CoDeepNEAT[‘s]” “Assembled Network”, see Pg. 6, Col. 2, Fig. 6, “Modules replace
nodes in the blue print to create a candidate neural network” and Pg. 7, Col. 1, Para. 1, “The generality of CoDeepNEAT helps minimize human design biases and makes it well-suited to analyzing AutoInit’s performance in a variety of open-ended machine learning settings”)
 to cause the variance to have the target variance (Pg. 3, Col. 1, Para. 2, “AutoInit aims to stabilize signal propagation throughout an entire neural network. More precisely, consider a layer that . . . scales the input by a factor of β. Given an input signal with . . . variance νin, after applying the layer, the output signal will have . . . variance νout = β2νin . . . AutoInit calculates analytic mean- and variance-preserving weight initialization so that . . . β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals”, where the scaling factor, “scales the input by a factor of β”, is calculated such that “β” is scaled to “β = 1”, to cause a “variance” of the output to have a target variance necessary “to stabilize signal propagation” by “avoiding the issues of mean shift and exploding/vanishing signals”); 
	each node (Pg. 6-7, Col. 2-1, Para. 3-6, “Modules are small neural networks ,complete with layers, connections, and hyperparameters. Blueprints are computation graphs containing only nodes and directed edges. To create a candidate neural network, CoDeepNEAT chooses a blueprint and replaces its nodes with selected modules . . . the networks make use of different activation functions and contain several unique information processing paths from the input to the output”; Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”)
	comprises a second scaling factor, the second scaling factor being a backward scaling parameter multiplied with a result of a gradient operation applied to the node (Pg. 3, Col. 1, Para. 3, “A given layer in a neural network receives as its input a tensor x with mean µin and variance νin. After applying the layer, the output tensor has mean µout = E(layer(x)) and variance νout = Var(layer(x)). The function glayer maps input mean and variance to output mean and variance when the layer is applied: glayer : (µin,νin) → (µout,νout) . . . For layers with trainable weights, the mean and variance mapping will depend on those weights. For example, the function gConv2D,θ maps input mean and variance to output mean and variance after the application of a Conv2D layer parameterized by weights θ”, where “applying the layer”, is one of the at least one scaled operation; where the operation, “layer”, is configured to receive an input, “input a tensor x”, and to generate an output, “output tensor”; Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network. It is possible to similarly model the back ward pass of gradients from the output to the input”, where the inputs and outputs can be “gradients” when “the backward pass” is “model[ed]”, thus the scaling factor during “the backward pass”, which is within the broadest reasonable interpretation of a second scaling factor, would be multiplied with the result of a gradient operation applied to the node, “similarly model the back ward pass of gradients from the output to the input”);
	a subset of the edges are cut edges, the cut edges being edges that if cut disconnect the pair of nodes connected by the cut edge such that there is no other path between the pair of nodes in the computational graph (Pg. 5, Col. 2, Fig. 6, 
    PNG
    media_image1.png
    224
    160
    media_image1.png
    Greyscale
 “The CoDeepNEAT method. Modules replace
nodes in the blueprint to create a candidate neural network”, where a subset of edges, such as the edge in the circle labeled 2 in “Fig. 6” are cut edges because, if cut, the pair would be disconnected such that there would be no path between the pair of nodes, as opposed to the leftmost edge within the circle labeled 1 in “Fig. 6”, where cutting the edge would still allow an indirect path between the pair);
	the method further comprising:
		. . . ;
	setting the second scaling factor of nodes connected by edges other than the cut edges equal to the first scaling factor (Pg. 18, Col. 2, Para. 2, “AutoInit stabilizes signals by analyzing the forward pass of activations from the input to the output of the network. It is possible to similarly model the backward pass of gradients from the output to the input”, where using “AutoInit” to “similarly model the backward pass” to alleviate the effect of “exploding or vanishing signals”, see Pg. 1, Col. 1, Para, 2, “Thus, if a given layer amplifies or diminishes the forward or backward propagation of signals, repeated applications of that layer will result in exploding or vanishing signals”, teaches and suggests equally setting the first and second scaling factors, such that “νout = β2νin” when “β = 1” is equal for both the “forward pass” and “backward pass” are equally scaled for all nodes, including nodes connected by edges other than the cut edges, see Pg. 3, Col. 1, Para. 2-3, “More precisely, consider a layer that shifts its input by α and scales the input by a factor of β . . . If |β| > 1, the network will suffer from a mean shift and exploding signals as it increases in depth . . . In the case that |β| < 1, the network will suffer from a mean shift and vanishing signals . . . AutoInit calculates analytic mean- and variance-preserving weight initialization so that α = 0 and β = 1, thus avoiding the issues of mean shift and exploding/vanishing signals . . . Thus, if µin and νin are known, it is natural to calculate initial weights θ such that the layer output will have zero mean and unit variance”).
	Bingham does not explicitly disclose . . . identifying edges other than the cut edges . . . .
	However, Cairo teaches . . . identifying edges other than the cut edges . . . (Pg. 1, Abstract, “Given a directed graph G and a pair of nodes s and t, an s-t bridge of G is an edge whose removal breaks all s-t paths of G. Similarly, an s-t articulation point of G is a node whose removal breaks all s-t paths of G. Computing the sequence of all s-t bridges of G (as well as the s-t articulation points) is a basic graph problem, solvable in linear time using the classical min-cut algorithm”, where “Computing the sequence of all s-t bridges of G” while identify all edges as either “cut edge[s]” or edges other than “cut edge[s]”, see Pg. 1, Para. 2, “A key notion underlying such algorithms is that of edges (or nodes) critical for connectivity or reachability. The most basic variant of these are bridges . . . A bridge of an undirected graph, also referred as cut edge”).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the use of a computational graph comprising nodes and edges, wherein a subset of the edges are cut edges of Bingham with the identifying of edges other than cut edges of Cairo in order to utilize an algorithm that is solvable in linear time to identify all s-t bridges in a graph (Cairo, Pg. 1, Abstract, “Computing the sequence of all s-t bridges of G (as well as the s-t articulation points) is a
basic graph problem, solvable in linear time”), which will determine fundamental concepts of connectivity and reachability based on criticality of edges and nodes (Cairo, Pg. 1, Para. 2, “Connectivity and reachability are fundamental graph-theoretical problems studied extensively in the literature . . . A key notion underlying such algorithms is that of edges (or nodes) critical for connectivity or reachability”), which will enhance the graph traversal aspects of the method (Bingham, Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Bingham in view of Cairo and Gounares et al. (hereinafter Gounares) (Pat. Pub. No. US 2014/0189652 A1).

Regarding Claim 19, Bingham in view of Cairo teach the method of claim 18, comprising: . . . identifying the cut edges (Cairo, Pg. 1, Abstract, “Given a directed graph G and a pair of nodes s and t, an s-t bridge of G is an edge whose removal breaks all s-t paths of G. Similarly, an s-t articulation point of G is a node whose removal breaks all s-t paths of G. Computing the sequence of all s-t bridges of G (as well as the s-t articulation points) is a basic graph problem, solvable in linear time using the classical min-cut algorithm”, where “Computing the sequence of all s-t bridges of G” while identify all edges as either “cut edge[s]” or edges other than “cut edge[s]”, see Cairo, Pg. 1, Para. 2, “A key notion underlying such algorithms is that of edges (or nodes) critical for connectivity or reachability. The most basic variant of these are bridges . . . A bridge of an undirected graph, also referred as cut edge”).
	The reasons for obviousness were discussed in regard to claim 18 and remain applicable here.
	Specifically, before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the use of a computational graph comprising nodes and edges, wherein a subset of the edges are cut edges of Bingham with the identifying of edges as cut edges of Cairo in order to utilize an algorithm solvable in linear time to identify all s-t bridges in a graph (Cairo, Pg. 1, Abstract, “Computing the sequence of all s-t bridges of G (as well as the s-t articulation points) is a
basic graph problem, solvable in linear time”), which will determine fundamental concepts of connectivity and reachability based on criticality of edges and nodes (Cairo, Pg. 1, Para. 2, “Connectivity and reachability are fundamental graph-theoretical problems studied extensively in the literature . . . A key notion underlying such algorithms is that of edges (or nodes) critical for connectivity or reachability”), which will enhance the graph traversal aspects of the method (Bingham, Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”).
	Bingham in view of Cairo do not explicitly disclose . . . receiving, via a user interface, user input . . . (where a user interface is not specifically discussed in regard to the identifying of cut edges).
	However, Gounares teaches . . . receiving, via a user interface, user input . . . [to identify graph edges] (Abstract, “A graph representing code element and relationships between code elements may have elements combined to consolidate or collapse portions of the graph. A filter may operate between the graph data and a renderer to show the graph in different states. The graph may be implemented with an interactive user interface through which a user may select a node, edge, or groups of nodes and edges, then apply a filter or other transformation”; see also Para. [0026], “A selection of an edge may identify two code elements, as each edge may link the two code elements”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the identifying of cut edges in a computational graph of Bingham in view of Cairo with the receiving user input through a user interface to identify graph edges of Gounares in order to allow users to identify cut edges in the computational graph (Gounares, Para. [0007], “The graph may be implemented with an interactive user interface through which a user may select a node, edge, or groups of nodes and edges, then apply a filter or other transformation”), which will allow for relationship-based analysis of the method’s functionality (compare Gounares, Para. [0005], “Relationships between code elements in an application may be selected and used during analysis and debugging of the application” with Bingham, Pg. 18, Col. 2, Para. 5, “Using AutoInit is simple in practice. The AutoInit package provides a wrapper around TensorFlow models. The wrapper automatically traverses the TensorFlow computation graph, calculates mean and variance estimations for each layer, and reinstantiates the model with the correct weight scaling”), which will allow the method to modified to conform with goals of the user (see Gounares, Para. [0001], “A programmer often examines and tests an application during development in many different manners. The programmer may run the application in various use scenarios, apply loading, execute test suites, or perform other operations on the application in order to understand how the application performs and to verify that the application operates as designed” and Gounares, Para. [0037], “A relationship between code elements may be selected from an interactive graph representing code elements as nodes and relationships between code elements as edges”).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Bingham in view of Olsen et al. (hereinafter Olsen) (Pat. Pub. No. US 2022/0404515 A1).

Regarding Claim 20, Bingham teaches the method of claim 15, comprising: receiving . . . the first scaling factor (Pg. 3, Col. 1, Para. 2, “AutoInit aims to stabilize signal propagation throughout an entire neural network. More precisely, consider a layer that . . . scales the input by a factor of β”, where the scaling factor, the factor used to scale “β”, must be received to be used to scale “the input”).
	Bingham does not explicitly disclose . . .  via a user interface . . . .
	However, Olsen teaches . . . [receiving], via a user interface, [a scaling factor] . . . (Fig. 8; Para. [0039], “FIG. 8 illustrates an example screenshot 800 of a user interface 804 to the reservoir model generation tool through which a user may adjust the input parameters used to train the reservoir models 212 and the determine an optimized model 216. . . . For example, a scaling factor . . . may be input to the reservoir model generation tool 406 via the first panel 808 of the user interface 804”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the receiving of the first scaling factor of Bingham with the receiving, via a user interface, of a scaling factor of Olsen in order to allow users to provide customized inputs for model training and model optimization (Olsen, Fig. 8; Olsen, Para. [0039], “FIG. 8 illustrates an example screenshot 800 of a user interface 804 to the reservoir model generation tool through which a user may adjust the input parameters used to train the reservoir models 212 and the determine an optimized model 216”), which will allow the method to be more oriented around the aims of the user (Olsen, Para. [0016], “A user-oriented tool is also presented for interacting with the reservoir modeling systems and methods to generate an optimized reservoir model to predict reservoir development”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW BRYCE GOLAN whose telephone number is (571)272-5159. The examiner can normally be reached Monday through Friday, 8:00 AM to 5:00 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW BRYCE GOLAN/Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Machine Learning System Enabling Effective Training

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Machine Learning System Enabling Effective Training

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email