DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 12/19/2023 and 12/04/2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 19 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 19 recites the limitation “the neural network” in line 4. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the neural network” has been interpreted as “a neural network”.
Claim 20 recites the limitation “the neural network” in line 5. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the neural network” has been interpreted as “a neural network”.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“sampling a plurality of network inputs from the set of training data”
“determining a clipped gradient for each network input of the plurality of network inputs”
“generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input”
“determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input”
“determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input”
“generating the clipped gradient for the network input by clipping the combined gradient for the network input”
“updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass sampling network inputs from a set of training data (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can sample inputs from training data set); determining a clipped gradient for each input of the plurality of inputs (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can determine a clipped gradient for each of the plurality of network inputs); generating augmented versions of the network input, the augmented version resulting from applying a respective augmentation transformation to the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can apply augmentation transformations to a network input to generate respective augmented versions of the network input); determining a gradient of the objective function for each of the augmented versions of the network input with respect to the neural network parameters when the objective function is evaluated based on a neural network output generated from the augmented version of the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can determine a gradient of the objective function evaluated based on the augmented version of the network input for each of the augmented versions of the input); determining a combined gradient for the network input by combining the gradients determined for the augmented versions of the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can combine the gradients determined for the augmented versions of the network input to determine a combined gradient for the network input); generating the clipped gradient for the network input by clipping the combined gradient (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can clip the combined gradient for the network input to generate the clipped gradient for the network input); and updating the neural network parameters using the clipped gradients for the plurality of network inputs (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the clipped gradients for the plurality of network inputs to update the parameters of the neural network).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“one or more computers”
“a neural network”
“training a set of neural network parameters of the neural network on a set of training data over a plurality of training iterations to optimize an objective function”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Therefore, the additional elements do not integrate the abstracts ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“obtaining a plurality of augmentation transformations, comprising, for each augmentation transformation, randomly sampling parameters defining the augmentation transformation”
“generating each augmented version of the network input by applying a respective augmentation transformation to the network input”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass obtaining a plurality of augmentation transformations by randomly sampling parameters defining the augmentation transformation (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can randomly sample parameters defining augmentation transformations to obtain a plurality of augmentation transformations); and generating augmented versions of the network input by applying a respective augmentation transformation to the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can apply augmentation transformations to the network input to generate augmented versions of the network input).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“processing the augmented version of the network input …, in accordance with current values of the neural network parameters of the neural network, to generate a corresponding network output”
“determining gradients of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on the network output”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass processing the augmented version of the network input in accordance with the current values of the neural network parameters to generate a corresponding output (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the current values of the parameters of the neural network to process the augmented version of the network input to generate a network output); and determining gradients of the objective function with respect to the neural network parameters when the objective function is evaluated on the network output (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can determine gradients of the objective function when the objective function is evaluated on the network output).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“using the neural network”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“averaging the gradients determined for the plurality of augmented versions of the network input”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass averaging the determined gradients for the augmented versions of the network input (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“scaling the combined gradient for the network input to cause a norm of the combined gradient for the network input to satisfy a clipping threshold”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass scaling the combined gradient for the network input to cause a norm of the combined gradient to satisfy a clipping threshold (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can cause a norm of the combined gradient for the network input to satisfy a clipping threshold by scaling the combined gradient).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“scaling the combined gradient for the network input by a scaling factor defined as a ratio of: (i) the clipping threshold, and (ii) the norm of the combined gradient for the network input”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass scaling the combined gradient for the network input by a scaling factor defined as a ratio the clipping threshold and the norm of the combined gradient (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use a scaling factor defined as a ratio the clipping threshold and the norm of the combined gradient for the network input to scale the combined gradient).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 5 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“generating a set of noise parameters, comprising randomly sampling the noise parameters from a noise distribution”
“applying the noise parameters to the clipped gradients for the network inputs of the plurality of network inputs”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass generating a set of noise parameters by randomly sampling the noise parameters from a noise distribution (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can randomly sample noise parameters from a noise distribution to generate a set of noise parameters); and applying the noise parameters to the clipped gradients for the plurality of network inputs (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can apply the noise parameters to the clipped gradients for the plurality of network inputs).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the noise distribution comprises a Gaussian noise distribution”
As drafted, is part of the abstract idea of claim 7 of generating a set of noise parameters by randomly sampling a noise distribution. The limitation of claim 8 further limits the limitation of claim 7 by further defining the noise distribution. The above limitation in the context of this claim encompasses generating a set of noise parameters by randomly sampling the noise parameters from a Gaussian noise distribution (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can randomly sample noise parameters from a Gaussian noise distribution to generate a set of noise parameters).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 7 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Please see the analysis of claim 1. The limitations of claim 9 are only additional elements to the abstract ideas of claim 1.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“wherein the neural network does not include any batch normalization layers”
As drafted, is part of the additional element of claim 1 of a neural network. The limitation of claim 9 further limits the limitation of claim 1 by further defining what the neural network does not include. Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Please see the analysis of claim 1. The limitations of claim 10 are only additional elements to the abstract ideas of claim 1.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“wherein the neural network includes group normalization layers”
As drafted, is part of the additional element of claim 1 of a neural network. The limitation of claim 10 further limits the limitation of claim 1 by further defining what the neural network comprises. Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 11 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“process a network input comprising an image”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass processing a network input comprising an image (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can process an image as a network input).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“the neural network”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 12 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“process a network input comprising audio data”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass processing a network input comprising audio data (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can process audio data as a network input).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“the neural network”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 13 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“process a network input comprising electronic medical record data”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass processing a network input comprising medical record data (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can process medical record data as a network input).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“the neural network”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“process a network input comprising textual data”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass processing a network input comprising textual data (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can process textual data as a network input).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“the neural network”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 15 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Please see the analysis of claim 1. The limitations of claim 15 are only additional elements to the abstract ideas of claim 1.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The limitations:
“wherein the neural network comprises one or more convolutional neural network layers”
As drafted, is part of the additional element of claim 1 of a neural network. The limitation of claim 15 further limits the limitation of claim 1 by further defining what the neural network comprises. Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 16,
Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 16 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Please see the analysis of claim 1. The limitations of claim 16 are only additional elements to the abstract ideas of claim 1.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity (See MPEP 2106.05(g)). The limitations:
“wherein the objective function comprises a classification loss”
As drafted, is part of the additional element of claim 1 of training neural network parameters of the neural network to optimize an objective function. The limitation of claim 16 further limits the limitation of claim 1 by further defining what the optimization function comprises. Additionally, the recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 17,
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 17 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein at each training iteration, the plurality of network inputs comprises at least 4000 network inputs”
As drafted, is part of the abstract idea of claim 1 of sampling a plurality of network inputs from the set of training data. The limitation of claim 17 further limits the limitation of claim 1 by further defining that the sampled network inputs comprises at least 4000 network inputs. The above limitation in the context of this claim encompasses sampling at least 4000 network inputs from a set of training data (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can sample at least 4000 inputs from training data set).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 18,
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 18 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein generating a plurality of augmented versions of the network input comprises generating at least 8 augmented versions of the network input”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass generating at least 8 augmented versions of the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can generate at least 8 augmented versions of the network input).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)). The recitation of additional elements in claim 1 of a generic computer, neural network, and generic training of the neural network, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, neural network, and generic training of the neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 19 is directed to non-transitory computer storage media, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“sampling a plurality of network inputs from the set of training data”
“determining a clipped gradient for each network input of the plurality of network inputs”
“generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input”
“determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input”
“determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input”
“generating the clipped gradient for the network input by clipping the combined gradient for the network input”
“updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass sampling network inputs from a set of training data (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can sample inputs from training data set); determining a clipped gradient for each input of the plurality of inputs (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can determine a clipped gradient for each of the plurality of network inputs); generating augmented versions of the network input, the augmented version resulting from applying a respective augmentation transformation to the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can apply augmentation transformations to a network input to generate respective augmented versions of the network input); determining a gradient of the objective function for each of the augmented versions of the network input with respect to the neural network parameters when the objective function is evaluated based on a neural network output generated from the augmented version of the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can determine a gradient of the objective function evaluated based on the augmented version of the network input for each of the augmented versions of the input); determining a combined gradient for the network input by combining the gradients determined for the augmented versions of the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can combine the gradients determined for the augmented versions of the network input to determine a combined gradient for the network input); generating the clipped gradient for the network input by clipping the combined gradient (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can clip the combined gradient for the network input to generate the clipped gradient for the network input); and updating the neural network parameters using the clipped gradients for the plurality of network inputs (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the clipped gradients for the plurality of network inputs to update the parameters of the neural network).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity (See MPEP 2106.05(g)). The limitations:
“one or more computers”
“training a set of neural network parameters of the neural network on a set of training data over a plurality of training iterations to optimize an objective function”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Therefore, the additional elements do not integrate the abstracts ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer and generic training of a neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 20 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“sampling a plurality of network inputs from the set of training data”
“determining a clipped gradient for each network input of the plurality of network inputs”
“generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input”
“determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input”
“determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input”
“generating the clipped gradient for the network input by clipping the combined gradient for the network input”
“updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass sampling network inputs from a set of training data (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can sample inputs from training data set); determining a clipped gradient for each input of the plurality of inputs (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can determine a clipped gradient for each of the plurality of network inputs); generating augmented versions of the network input, the augmented version resulting from applying a respective augmentation transformation to the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can apply augmentation transformations to a network input to generate respective augmented versions of the network input); determining a gradient of the objective function for each of the augmented versions of the network input with respect to the neural network parameters when the objective function is evaluated based on a neural network output generated from the augmented version of the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can determine a gradient of the objective function evaluated based on the augmented version of the network input for each of the augmented versions of the input); determining a combined gradient for the network input by combining the gradients determined for the augmented versions of the network input (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can combine the gradients determined for the augmented versions of the network input to determine a combined gradient for the network input); generating the clipped gradient for the network input by clipping the combined gradient (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can clip the combined gradient for the network input to generate the clipped gradient for the network input); and updating the neural network parameters using the clipped gradients for the plurality of network inputs (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the clipped gradients for the plurality of network inputs to update the parameters of the neural network).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity (See MPEP 2106.05(g)). The limitations:
“one or more computers”
“one or more storage devices communicatively coupled to the one or more computers”
“training a set of neural network parameters of the neural network on a set of training data over a plurality of training iterations to optimize an objective function”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Therefore, the additional elements do not integrate the abstracts ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a generic computer, storage device, and generic training of a neural network for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-6 and 11-20 are rejected under 35 U.S.C. 103 as being unpatentable over Suresh et al. (US 2021/0049298 A1) in view of Hoffer et al. ("Augment your batch: better training with larger batches").
Regarding Claim 1,
Suresh et al. teaches a method performed by one or more computers for privacy-sensitive training of a neural network (Fig. 1; [0004]: "a system implemented as computer programs on one or more computers in one or more locations that trains a machine learning model, e.g., a neural network, to perform a particular task. In particular, the system trains the machine learning model on training data that includes user data from multiple users in a manner that preserves the privacy of the users" teaches a system comprising one or more computers to execute one or more programs (instructions) to perform neural network training that preserves privacy of the users (e.g. privacy-sensitive training of a neural network). Fig. 2; [0049]: "FIG. 2 is a flow diagram of an example process 200 for training the machine learning model. For convenience, the process 200 will be described as being performed by a system of one or more computers located in one or more locations. For example, a model training system, e.g., the model training system 100 of FIG. 1, appropriately programmed, can perform the process 200" teaches a process (method) for machine learning model training as being performed by the embodied system), the method comprising:
training a set of neural network parameters of the neural network on a set of training data over a plurality of training iterations to optimize an objective function (Fig. 2; [0050]: "The system can repeatedly perform the process 200 on different batches of training data to determine trained values of the model parameters, i.e., by repeatedly updating the current values of the model parameters. For example, the system can continue performing the process 200 until a threshold number of iterations of the process have been performed, until a threshold amount of time has elapsed, or until the values of the model parameters have converged" teaches an iterative training process to train model parameters (neural network parameters) until the model parameters converge (e.g. are optimized). Fig. 2; [0057]-[0062]: "The system computes, for each of the training examples and for each of the model parameters, a respective gradient of an objective function with respect to the model parameter (step 204). The objective function can be any appropriate objective function for the particular task that the model is being trained to perform and generally measures errors between an output generated by the machine learning model for a training input and a target output that should have been generated by the machine learning model for the training input. Examples of objective functions that may be appropriate for various tasks include cross-entropy losses, mean squared error losses, L2 distance losses, log likelihood objectives, and so on. Thus, for any given model parameter, the system computes a respective gradient for each training example in the batch. … The system determines, for each of the training examples and for each of the model parameters, a respective privacy preserving noisy gradient (step 206). … The system determines, for each of the model parameters, a respective privacy preserving update for the model parameter from the privacy preserving noisy gradients for the model parameter for the plurality of training examples (step 208)" teaches training model parameters (neural network parameters) on a set of training examples (set of training data) to optimize an objective function for the model parameter update), comprising, at each training iteration:
sampling a plurality of network inputs from the set of training data (Fig. 1; [0031]-[0033]: "The model training system 100 is a system that obtains training data 102 for training a machine learning model 110 to perform a particular task. The training data 102 includes user data, i.e., at least some of the training examples in the training data 102 are derived from users, i.e., are generated from data that is provided by a user, data that is specific to a particular user, or data that is generated as a result of user interaction with a system. … Generally, the training data 102 includes a set of training examples, with each training example including a training input and, for each training input, a respective target output that should be generated by the machine learning model to perform the particular task. … The system 100 can receive the training data 102 in any of a variety of ways … the system 100 can receive an input from a user specifying which data that is already maintained by the system 100 should be used for training the machine learning model" teaches sampling model inputs (network inputs) for training the machine learning model (neural network) from the set of training examples (set of training data));
determining a clipped gradient for each network input of the plurality of network inputs, comprising (Fig. 2; [0057]-[0060]: "The system computes, for each of the training examples and for each of the model parameters, a respective gradient of an objective function with respect to the model parameter (step 204). The objective function can be any appropriate objective function for the particular task that the model is being trained to perform and generally measures errors between an output generated by the machine learning model for a training input and a target output that should have been generated by the machine learning model for the training input. Examples of objective functions that may be appropriate for various tasks include cross-entropy losses, mean squared error losses, L2 distance losses, log likelihood objectives, and so on. Thus, for any given model parameter, the system computes a respective gradient for each training example in the batch. … The system determines, for each of the training examples and for each of the model parameters, a respective privacy preserving noisy gradient (step 206). … Generally, to perform the modification, the system adaptively clips the respective gradient for the model parameter based on (i) the mean noisy gradient estimate for the model parameter and (ii) the standard deviation noisy gradient estimate for the model parameter and then adds noise to the adaptively clipped gradient" teaches generating a gradient for each training example and model parameter (network inputs) and then adaptively clipping the gradient to generate a privacy preserving noisy gradient (clipped gradient)), for each network input of the plurality of network inputs:
generating the clipped gradient for the network input by clipping the combined gradient for the network input (Fig. 3; [0084]: "The system clips the transformed gradient for the given model parameter such that a vector of the transformed gradients, i.e., a vector that includes the transformed gradients for all of the model parameters, has no greater than a fixed norm, e.g., 1, to generate clipped transformed gradients (step 306). In particular, if the norm of the gradient vector exceeds 1, the system divides each transformed gradient by the norm to generate a corresponding clipped transformed gradient. This results in a vector of the clipped transformed gradients having norm 1. If the norm of the gradient vector does not exceed 1, the system does not modify the transformed gradients and sets the clipped transformed gradients equal to the transformed gradients" teaches clipping the transformed gradient (combined gradient) to generate a clipped transformed gradient (clipped gradient)); and
updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs (Fig. 2; [0062]: "The system determines, for each of the model parameters, a respective privacy preserving update for the model parameter from the privacy preserving noisy gradients for the model parameter for the plurality of training examples (step 208)" teaches updating the model parameters (neural network parameters) using the privacy preserving noisy gradients (clipped gradients) for the plurality of training examples (network inputs)).
Suresh et al. does not appear to explicitly teach generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input; determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input; and determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input.
However, Hoffer et al. teaches generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input (Section 2, third paragraph: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another" teaches generating M augmented instances (augmented versions) of the input sample (network input) by applying a transformation Ti (augmentation transformation) to the input sample);
determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" teaches determining a gradient of the loss function (objective function) for each of the M augmented instances (plurality of augmented versions) of the input sample (network input) with respect to the neural network parameters (wt) when the loss (objective) function is evaluated on a neural network output (yn) generated based on the augmented version of the network input (Ti(xn))); and
determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" and Section 2.2, first paragraph: "BA additionally averages the gradient over several transformed instances T (xn) of the same samples" teaches determining a combined gradient for the network input by averaging (combining) the gradients determined for the M instances (augmented versions) of the network input).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input; determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input; and determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Regarding Claim 2,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Hoffer et al. further teaches wherein generating the plurality of augmented versions of the network input comprises: obtaining a plurality of augmentation transformations, comprising, for each augmentation transformation, randomly sampling parameters defining the augmentation transformation (Section 4.1, first and second paragraphs: "The Cifar10 dataset introduced by Krizhevsky (2009) is a popular image classification dataset containing 50, 000 training images, together with a 10, 000 test set. Each image is of size 32×32 and belongs to one of 10 classes of vehicles and animals. The Cifar100 dataset consists of the same number of training and validation images and the same spatial size, but with an increase to 100 in the number of possible classes for each image. For both datasets, we used the common data augmentation technique as described by He et al. (2016). In this method, the input image is padded with 4 zero-valued pixels at each side, top, and bottom. A random 32 × 32 part of the padded image is then cropped and with a 0.5 probability flipped horizontally. This augmentation method has a rather small space of possible transforms (9 · 9 · 2 = 162), and so it is quickly exhausted by even a M ≈ 10s of simultaneous instances" teaches randomly sampling a plurality of possible transforms (parameters defining the augmentation transformations) as transformations (augmentation transformations) to obtain M simultaneous instances (augmented versions) of the input (network input)); and
generating each augmented version of the network input by applying a respective augmentation transformation to the network input (Section 2, third paragraph: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another" teaches generating M augmented instances (augmented versions) of the input sample (network input) by applying a transformation Ti (augmentation transformation) to the input sample).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein generating the plurality of augmented versions of the network input comprises: obtaining a plurality of augmentation transformations, comprising, for each augmentation transformation, randomly sampling parameters defining the augmentation transformation; and generating each augmented version of the network input by applying a respective augmentation transformation to the network input as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Regarding Claim 3,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Hoffer et al. further teaches wherein determining a gradient of the objective function for an augmented version of the network input comprises: processing the augmented version of the network input using the neural network, in accordance with current values of the neural network parameters of the neural network, to generate a corresponding network output (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" teaches determining a gradient of the loss function (objective function) for each of the M augmented instances (plurality of augmented versions) of the input sample (network input) with respect to the neural network parameters (wt) when the loss (objective) function is evaluated on a neural network output (yn) generated based on the augmented version of the network input (Ti(xn)) (i.e. the augmented input Ti(xn))); and
determining gradients of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on the network output (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" teaches determining a gradient of the loss function (objective function) for each of the M augmented instances (plurality of augmented versions) of the input sample (network input) with respect to the neural network parameters (wt) when the loss (objective) function is evaluated on a neural network output (yn) generated based on the augmented version of the network input (Ti(xn))).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein determining a gradient of the objective function for an augmented version of the network input comprises: processing the augmented version of the network input using the neural network, in accordance with current values of the neural network parameters of the neural network, to generate a corresponding network output; and determining gradients of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on the network output as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Regarding Claim 4,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Hoffer et al. further teaches wherein determining the combined gradient for the network input comprises: averaging the gradients determined for the plurality of augmented versions of the network input (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" and Section 2.2, first paragraph: "BA additionally averages the gradient over several transformed instances T (xn) of the same samples" teaches determining a combined gradient for the network input by averaging the gradients determined for the M instances (augmented versions) of the network input).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein determining the combined gradient for the network input comprises: averaging the gradients determined for the plurality of augmented versions of the network input as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Regarding Claim 5,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Suresh et al. further teaches wherein for one or more of the network inputs, generating the clipped gradient for the network input comprises: scaling the combined gradient for the network input to cause a norm of the combined gradient for the network input to satisfy a clipping threshold (Fig. 3; [0084]: "The system clips the transformed gradient for the given model parameter such that a vector of the transformed gradients, i.e., a vector that includes the transformed gradients for all of the model parameters, has no greater than a fixed norm, e.g., 1, to generate clipped transformed gradients (step 306). In particular, if the norm of the gradient vector exceeds 1, the system divides each transformed gradient by the norm to generate a corresponding clipped transformed gradient. This results in a vector of the clipped transformed gradients having norm 1. If the norm of the gradient vector does not exceed 1, the system does not modify the transformed gradients and sets the clipped transformed gradients equal to the transformed gradients" teaches clipping the transformed gradient (combined gradient) to generate a clipped transformed gradient (clipped gradient), wherein the transformed gradient (combined gradient) can be divided (scaled) such that a norm of the transformed gradient (combined gradient) does not exceed 1 (clipping threshold)).
Regarding Claim 6,
Suresh et al. in view of Hoffer et al. teaches the method of claim 5.
In addition, Suresh et al. further teaches wherein scaling the combined gradient for the network input to cause the norm of the combined gradient for the network input to satisfy the clipping threshold comprises: scaling the combined gradient for the network input by a scaling factor defined as a ratio of: (i) the clipping threshold, and (ii) the norm of the combined gradient for the network input (Fig. 3; [0084]: "The system clips the transformed gradient for the given model parameter such that a vector of the transformed gradients, i.e., a vector that includes the transformed gradients for all of the model parameters, has no greater than a fixed norm, e.g., 1, to generate clipped transformed gradients (step 306). In particular, if the norm of the gradient vector exceeds 1, the system divides each transformed gradient by the norm to generate a corresponding clipped transformed gradient. This results in a vector of the clipped transformed gradients having norm 1. If the norm of the gradient vector does not exceed 1, the system does not modify the transformed gradients and sets the clipped transformed gradients equal to the transformed gradients" teaches clipping the transformed gradient (combined gradient) to generate a clipped transformed gradient (clipped gradient), wherein the transformed gradient (combined gradient) can be divided based on the ratio (scaling factor) of threshold of 1 (clipping threshold) and the norm of the transformed gradient (combined gradient) such that a norm of the transformed gradient (combined gradient) does not exceed 1 (clipping threshold)).
Regarding Claim 11,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Suresh et al. further teaches wherein the neural network is configured to process a network input comprising an image ([0016]: "In some cases, the machine learning model is a neural network that is configured to perform an image processing task, i.e., receive an input image and to process the input image to generate a model output for the input image" teaches that the neural network is configured to process an input image).
Regarding Claim 12,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Suresh et al. further teaches wherein the neural network is configured to process a network input comprising audio data ([0021]: "As another example, the task may be an audio processing task. For example, if the input to the machine learning model is a sequence representing a spoken utterance, the output generated by the machine learning model may be a score for each of a set of pieces of text, each score representing an estimated likelihood that the piece of text is the correct transcript for the utterance. As another example, the task may be a keyword spotting task where, if the input to the machine learning model is a sequence representing a spoken utterance, the output generated by the machine learning model can indicate whether a particular word or phrase (“hot word”) was spoken in the utterance. As another example, if the input to the machine learning model is a sequence representing a spoken utterance, the output generated by the machine learning model can identify the natural language in which the utterance was spoken" teaches that the neural network is configured to process input audio data).
Regarding Claim 13,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Suresh et al. further teaches wherein the neural network is configured to process a network input comprising electronic medical record data ([0026]-[0027]: "Many of these tasks require the machine learning model to be trained on user data, i.e., data that is provided by a user, e.g., a label for an image that is provided by the user, data that is specific to a particular user, e.g., text written by a particular user or a medical record of a user, or that is generated as a result of user interaction with a system, e.g., search queries submitted by the user to a search engine or an interaction or selection history for a user that is based on interactions with content items by the user. … This user data can include sensitive information that users may not wish to make public, e.g., data from typing histories, social networks, financial records, or medical records, and that users may not wish to be made transparent through predictions made by the model" teaches that the neural network is configured to process input data comprising user medical records (electronic medical record data)).
Regarding Claim 14,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Suresh et al. further teaches wherein the neural network is configured to process a network input comprising textual data ([0020]-[0022]: "As another example, if the input to the machine learning model is a sequence of text in one language, the output generated by the machine learning model may be a score for each of a set of pieces of text in another language, with each score representing an estimated likelihood that the piece of text in the other language is a proper translation of the input text into the other language. … As another example, the task can be a natural language processing or understanding task, e.g., an entailment task, a paraphrase task, a textual similarity task, a sentiment task, a sentence completion task, a grammaticality task, and so on, that operates on a sequence of text in some natural language" teaches that the neural network is configured to process an input sequence of text (textual data)).
Regarding Claim 15,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Suresh et al. further teaches wherein the neural network comprises one or more convolutional neural network layers ([0035]: "The machine learning model 110 can have any appropriate architecture that allows the model 110 to receive model inputs of the type required by the particular task and to generate model outputs of the form required for the particular task. Examples of machine learning models 110 that can be trained by the system 100 include fully-connected neural networks, convolutional neural networks, recurrent neural networks, attention-based neural networks, e.g., Transformers, and so on" teaches that the machine learning model (neural network) can be a convolutional neural network (i.e. comprises one or more convolutional neural network layer)).
Regarding Claim 16,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Suresh et al. further teaches wherein the objective function comprises a classification loss ([0057]: "The system computes, for each of the training examples and for each of the model parameters, a respective gradient of an objective function with respect to the model parameter (step 204). The objective function can be any appropriate objective function for the particular task that the model is being trained to perform and generally measures errors between an output generated by the machine learning model for a training input and a target output that should have been generated by the machine learning model for the training input. Examples of objective functions that may be appropriate for various tasks include cross-entropy losses, mean squared error losses, L2 distance losses, log likelihood objectives, and so on" teaches that the objective function comprises a loss function for the given task. [0015]-[0017]: "The machine learning model can be trained to perform any kind of machine learning task, i.e., can be configured to receive any kind of digital data input and to generate any kind of score, classification, or regression output based on the input. … In some cases, the machine learning model is a neural network that is configured to perform an image processing task, i.e., receive an input image and to process the input image to generate a model output for the input image. For example, the task may be image classification and the output generated by the machine learning model for a given image may be scores for each of a set of object categories, with each score representing an estimated likelihood that the image contains an image of an object belonging to the category. … As another example, if the inputs to the machine learning model are Internet resources (e.g., web pages), documents, or portions of documents or features extracted from Internet resources, documents, or portions of documents, the task can be to classify the resource or document, i.e., the output generated by the machine learning model for a given Internet resource, document, or portion of a document may be a score for each of a set of topics, with each score representing an estimated likelihood that the Internet resource, document, or document portion is about the topic" teaches that the task can be a classification task (e.g. for a classification task, the objective function comprises a classification loss)).
Regarding Claim 17,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Hoffer et al. further teaches wherein at each training iteration, the plurality of network inputs comprises at least 4000 network inputs (Section 4.1, first and second paragraphs: "The Cifar10 dataset introduced by Krizhevsky (2009) is a popular image classification dataset containing 50, 000 training images, together with a 10, 000 test set. Each image is of size 32×32 and belongs to one of 10 classes of vehicles and animals. The Cifar100 dataset consists of the same number of training and validation images and the same spatial size, but with an increase to 100 in the number of possible classes for each image. For both datasets, we used the common data augmentation technique as described by He et al. (2016). In this method, the input image is padded with 4 zero-valued pixels at each side, top, and bottom. A random 32 × 32 part of the padded image is then cropped and with a 0.5 probability flipped horizontally. This augmentation method has a rather small space of possible transforms (9 · 9 · 2 = 162), and so it is quickly exhausted by even a M ≈ 10s of simultaneous instances" teaches that the training can use the Cifar10/100 datasets which comprise over 4000 training input images (e.g. each training iteration uses the training inputs of Cifar10/100 (over 4000 inputs))).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein at each training iteration, the plurality of network inputs comprises at least 4000 network inputs as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Regarding Claim 18,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
In addition, Hoffer et al. further teaches wherein generating a plurality of augmented versions of the network input comprises generating at least 8 augmented versions of the network input (Section 4.1, first and second paragraphs: "The Cifar10 dataset introduced by Krizhevsky (2009) is a popular image classification dataset containing 50, 000 training images, together with a 10, 000 test set. Each image is of size 32×32 and belongs to one of 10 classes of vehicles and animals. The Cifar100 dataset consists of the same number of training and validation images and the same spatial size, but with an increase to 100 in the number of possible classes for each image. For both datasets, we used the common data augmentation technique as described by He et al. (2016). In this method, the input image is padded with 4 zero-valued pixels at each side, top, and bottom. A random 32 × 32 part of the padded image is then cropped and with a 0.5 probability flipped horizontally. This augmentation method has a rather small space of possible transforms (9 · 9 · 2 = 162), and so it is quickly exhausted by even a M ≈ 10s of simultaneous instances" teaches generating M simultaneous instances (augmented versions) of the input (network input), with M being more that 8).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein generating a plurality of augmented versions of the network input comprises generating at least 8 augmented versions of the network input as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Regarding Claim 19,
Suresh et al. teaches one or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations (Fig. 1; [0004]: "a system implemented as computer programs on one or more computers in one or more locations that trains a machine learning model, e.g., a neural network, to perform a particular task. In particular, the system trains the machine learning model on training data that includes user data from multiple users in a manner that preserves the privacy of the users" teaches a system comprising one or more computers to execute one or more programs (instructions) to perform neural network training that preserves privacy of the users (e.g. privacy-sensitive training of a neural network). [0090]: "Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them" teaches a non-transitory storage medium storing computer program instructions for execution by a data processing apparatus (computer) for implementing the described embodiments (e.g. implementing the system)) comprising:
training a set of neural network parameters of the neural network on a set of training data over a plurality of training iterations to optimize an objective function (Fig. 2; [0050]: "The system can repeatedly perform the process 200 on different batches of training data to determine trained values of the model parameters, i.e., by repeatedly updating the current values of the model parameters. For example, the system can continue performing the process 200 until a threshold number of iterations of the process have been performed, until a threshold amount of time has elapsed, or until the values of the model parameters have converged" teaches an iterative training process to train model parameters (neural network parameters) until the model parameters converge (e.g. are optimized). Fig. 2; [0057]-[0062]: "The system computes, for each of the training examples and for each of the model parameters, a respective gradient of an objective function with respect to the model parameter (step 204). The objective function can be any appropriate objective function for the particular task that the model is being trained to perform and generally measures errors between an output generated by the machine learning model for a training input and a target output that should have been generated by the machine learning model for the training input. Examples of objective functions that may be appropriate for various tasks include cross-entropy losses, mean squared error losses, L2 distance losses, log likelihood objectives, and so on. Thus, for any given model parameter, the system computes a respective gradient for each training example in the batch. … The system determines, for each of the training examples and for each of the model parameters, a respective privacy preserving noisy gradient (step 206). … The system determines, for each of the model parameters, a respective privacy preserving update for the model parameter from the privacy preserving noisy gradients for the model parameter for the plurality of training examples (step 208)" teaches training model parameters (neural network parameters) on a set of training examples (set of training data) to optimize an objective function for the model parameter update), comprising, at each training iteration:
sampling a plurality of network inputs from the set of training data (Fig. 1; [0031]-[0033]: "The model training system 100 is a system that obtains training data 102 for training a machine learning model 110 to perform a particular task. The training data 102 includes user data, i.e., at least some of the training examples in the training data 102 are derived from users, i.e., are generated from data that is provided by a user, data that is specific to a particular user, or data that is generated as a result of user interaction with a system. … Generally, the training data 102 includes a set of training examples, with each training example including a training input and, for each training input, a respective target output that should be generated by the machine learning model to perform the particular task. … The system 100 can receive the training data 102 in any of a variety of ways … the system 100 can receive an input from a user specifying which data that is already maintained by the system 100 should be used for training the machine learning model" teaches sampling model inputs (network inputs) for training the machine learning model (neural network) from the set of training examples (set of training data));
determining a clipped gradient for each network input of the plurality of network inputs (Fig. 2; [0057]-[0060]: "The system computes, for each of the training examples and for each of the model parameters, a respective gradient of an objective function with respect to the model parameter (step 204). The objective function can be any appropriate objective function for the particular task that the model is being trained to perform and generally measures errors between an output generated by the machine learning model for a training input and a target output that should have been generated by the machine learning model for the training input. Examples of objective functions that may be appropriate for various tasks include cross-entropy losses, mean squared error losses, L2 distance losses, log likelihood objectives, and so on. Thus, for any given model parameter, the system computes a respective gradient for each training example in the batch. … The system determines, for each of the training examples and for each of the model parameters, a respective privacy preserving noisy gradient (step 206). … Generally, to perform the modification, the system adaptively clips the respective gradient for the model parameter based on (i) the mean noisy gradient estimate for the model parameter and (ii) the standard deviation noisy gradient estimate for the model parameter and then adds noise to the adaptively clipped gradient" teaches generating a gradient for each training example and model parameter (network inputs) and then adaptively clipping the gradient to generate a privacy preserving noisy gradient (clipped gradient)), comprising, for each network input of the plurality of network inputs:
generating the clipped gradient for the network input by clipping the combined gradient for the network input (Fig. 3; [0084]: "The system clips the transformed gradient for the given model parameter such that a vector of the transformed gradients, i.e., a vector that includes the transformed gradients for all of the model parameters, has no greater than a fixed norm, e.g., 1, to generate clipped transformed gradients (step 306). In particular, if the norm of the gradient vector exceeds 1, the system divides each transformed gradient by the norm to generate a corresponding clipped transformed gradient. This results in a vector of the clipped transformed gradients having norm 1. If the norm of the gradient vector does not exceed 1, the system does not modify the transformed gradients and sets the clipped transformed gradients equal to the transformed gradients" teaches clipping the transformed gradient (combined gradient) to generate a clipped transformed gradient (clipped gradient)); and
updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs (Fig. 2; [0062]: "The system determines, for each of the model parameters, a respective privacy preserving update for the model parameter from the privacy preserving noisy gradients for the model parameter for the plurality of training examples (step 208)" teaches updating the model parameters (neural network parameters) using the privacy preserving noisy gradients (clipped gradients) for the plurality of training examples (network inputs)).
Suresh et al. does not appear to explicitly teach generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input; determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input; and determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input.
However, Hoffer et al. teaches generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input (Section 2, third paragraph: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another" teaches generating M augmented instances (augmented versions) of the input sample (network input) by applying a transformation Ti (augmentation transformation) to the input sample);
determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" teaches determining a gradient of the loss function (objective function) for each of the M augmented instances (plurality of augmented versions) of the input sample (network input) with respect to the neural network parameters (wt) when the loss (objective) function is evaluated on a neural network output (yn) generated based on the augmented version of the network input (Ti(xn))); and
determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" and Section 2.2, first paragraph: "BA additionally averages the gradient over several transformed instances T (xn) of the same samples" teaches determining a combined gradient for the network input by averaging (combining) the gradients determined for the M instances (augmented versions) of the network input).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input; determining, for each of the plurality of augmented versions of the network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input; and determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Regarding Claim 20,
Suresh et al. teaches a system comprising one or more computers and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform operations (Fig. 1; [0004]: "a system implemented as computer programs on one or more computers in one or more locations that trains a machine learning model, e.g., a neural network, to perform a particular task. In particular, the system trains the machine learning model on training data that includes user data from multiple users in a manner that preserves the privacy of the users" teaches a system comprising one or more computers to execute one or more programs (instructions) to perform neural network training that preserves privacy of the users (e.g. privacy-sensitive training of a neural network). [0090]: "Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them" teaches that the computer programs (instructions) are stored in a storage medium (storage device) communicatively coupled to a computer for execution (e.g. to perform the operations)) comprising:
training a set of neural network parameters of the neural network on a set of training data over a plurality of training iterations to optimize an objective function (Fig. 2; [0050]: "The system can repeatedly perform the process 200 on different batches of training data to determine trained values of the model parameters, i.e., by repeatedly updating the current values of the model parameters. For example, the system can continue performing the process 200 until a threshold number of iterations of the process have been performed, until a threshold amount of time has elapsed, or until the values of the model parameters have converged" teaches an iterative training process to train model parameters (neural network parameters) until the model parameters converge (e.g. are optimized). Fig. 2; [0057]-[0062]: "The system computes, for each of the training examples and for each of the model parameters, a respective gradient of an objective function with respect to the model parameter (step 204). The objective function can be any appropriate objective function for the particular task that the model is being trained to perform and generally measures errors between an output generated by the machine learning model for a training input and a target output that should have been generated by the machine learning model for the training input. Examples of objective functions that may be appropriate for various tasks include cross-entropy losses, mean squared error losses, L2 distance losses, log likelihood objectives, and so on. Thus, for any given model parameter, the system computes a respective gradient for each training example in the batch. … The system determines, for each of the training examples and for each of the model parameters, a respective privacy preserving noisy gradient (step 206). … The system determines, for each of the model parameters, a respective privacy preserving update for the model parameter from the privacy preserving noisy gradients for the model parameter for the plurality of training examples (step 208)" teaches training model parameters (neural network parameters) on a set of training examples (set of training data) to optimize an objective function for the model parameter update), comprising, at each training iteration:
sampling a plurality of network inputs from the set of training data (Fig. 1; [0031]-[0033]: "The model training system 100 is a system that obtains training data 102 for training a machine learning model 110 to perform a particular task. The training data 102 includes user data, i.e., at least some of the training examples in the training data 102 are derived from users, i.e., are generated from data that is provided by a user, data that is specific to a particular user, or data that is generated as a result of user interaction with a system. … Generally, the training data 102 includes a set of training examples, with each training example including a training input and, for each training input, a respective target output that should be generated by the machine learning model to perform the particular task. … The system 100 can receive the training data 102 in any of a variety of ways … the system 100 can receive an input from a user specifying which data that is already maintained by the system 100 should be used for training the machine learning model" teaches sampling model inputs (network inputs) for training the machine learning model (neural network) from the set of training examples (set of training data));
determining a clipped gradient for each network input of the plurality of network inputs (Fig. 2; [0057]-[0060]: "The system computes, for each of the training examples and for each of the model parameters, a respective gradient of an objective function with respect to the model parameter (step 204). The objective function can be any appropriate objective function for the particular task that the model is being trained to perform and generally measures errors between an output generated by the machine learning model for a training input and a target output that should have been generated by the machine learning model for the training input. Examples of objective functions that may be appropriate for various tasks include cross-entropy losses, mean squared error losses, L2 distance losses, log likelihood objectives, and so on. Thus, for any given model parameter, the system computes a respective gradient for each training example in the batch. … The system determines, for each of the training examples and for each of the model parameters, a respective privacy preserving noisy gradient (step 206). … Generally, to perform the modification, the system adaptively clips the respective gradient for the model parameter based on (i) the mean noisy gradient estimate for the model parameter and (ii) the standard deviation noisy gradient estimate for the model parameter and then adds noise to the adaptively clipped gradient" teaches generating a gradient for each training example and model parameter (network inputs) and then adaptively clipping the gradient to generate a privacy preserving noisy gradient (clipped gradient)), comprising, for each network input of the plurality of network inputs:
generating the clipped gradient for the network input by clipping the combined gradient for the network input (Fig. 3; [0084]: "The system clips the transformed gradient for the given model parameter such that a vector of the transformed gradients, i.e., a vector that includes the transformed gradients for all of the model parameters, has no greater than a fixed norm, e.g., 1, to generate clipped transformed gradients (step 306). In particular, if the norm of the gradient vector exceeds 1, the system divides each transformed gradient by the norm to generate a corresponding clipped transformed gradient. This results in a vector of the clipped transformed gradients having norm 1. If the norm of the gradient vector does not exceed 1, the system does not modify the transformed gradients and sets the clipped transformed gradients equal to the transformed gradients" teaches clipping the transformed gradient (combined gradient) to generate a clipped transformed gradient (clipped gradient)); and
updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs (Fig. 2; [0062]: "The system determines, for each of the model parameters, a respective privacy preserving update for the model parameter from the privacy preserving noisy gradients for the model parameter for the plurality of training examples (step 208)" teaches updating the model parameters (neural network parameters) using the privacy preserving noisy gradients (clipped gradients) for the plurality of training examples (network inputs)).
Suresh et al. does not appear to explicitly teach generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input; determining, for each of the plurality of augmented versions of the 32network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input; and determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input.
However, Hoffer et al. teaches generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input (Section 2, third paragraph: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another" teaches generating M augmented instances (augmented versions) of the input sample (network input) by applying a transformation Ti (augmentation transformation) to the input sample);
determining, for each of the plurality of augmented versions of the 32network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" teaches determining a gradient of the loss function (objective function) for each of the M augmented instances (plurality of augmented versions) of the input sample (network input) with respect to the neural network parameters (wt) when the loss (objective) function is evaluated on a neural network output (yn) generated based on the augmented version of the network input (Ti(xn))); and
determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input (Section 2, third and fourth paragraphs: "We suggest to introduce M multiple instances of the same input sample by applying the transformation Ti, here denoted by subscript i ∈ [M] to highlight the fact that they are different from one another. We now use the slightly modified learning rule:
PNG
media_image1.png
62
398
media_image1.png
Greyscale
effectively using a larger M · B batch at each step, that is composed of B samples augmented with M different transforms each" and Section 2.2, first paragraph: "BA additionally averages the gradient over several transformed instances T (xn) of the same samples" teaches determining a combined gradient for the network input by averaging (combining) the gradients determined for the M instances (augmented versions) of the network input).
Suresh et al. is analogous to the claimed invention because it is directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate generating a plurality of augmented versions of the network input, wherein each augmented version of the network input results from applying a respective augmentation transformation to the network input; determining, for each of the plurality of augmented versions of the 32network input, a gradient of the objective function with respect to the neural network parameters of the neural network when the objective function is evaluated on a network output generated by the neural network by processing the augmented version of the network input; and determining a combined gradient for the network input by combining the gradients determined for the plurality of augmented versions of the network input as taught by Hoffer et al. to the disclosed invention of Suresh et al.
One of ordinary skill in the art would have been motivated to make this modification to "enable faster training and better generalization by allowing more computational resources to be used concurrently" (Hoffer et al. Abstract).
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Suresh et al. (US 2021/0049298 A1) in view of Hoffer et al. ("Augment your batch: better training with larger batches") and further in view of Abay et al. ("Privacy Preserving Synthetic Data Release Using Deep Learning").
Regarding Claim 7,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
Suresh et al. in view of Hoffer et al. does not appear to explicitly teach further comprising, before updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs: generating a set of noise parameters, comprising randomly sampling the noise parameters from a noise distribution; and applying the noise parameters to the clipped gradients for the network inputs of the plurality of network inputs.
However, Abay et al. teaches further comprising, before updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs: generating a set of noise parameters, comprising randomly sampling the noise parameters from a noise distribution (Algorithm 2, lines 9-10; Section 4.2, last paragraph: "
PNG
media_image2.png
80
506
media_image2.png
Greyscale
After clipping the gradients, noise is sampled from the Gaussian distribution with zero mean and standard deviation of σC and added to the previously clipped gradients (Line 9–10 in Algorithm 2)" teaches that before the model parameters (neural network parameters) are updated using the clipped gradients, noise parameters z are randomly sampled from a noise distribution N); and
applying the noise parameters to the clipped gradients for the network inputs of the plurality of network inputs (Algorithm 2, lines 9-10; Section 4.2, last paragraph: "
PNG
media_image2.png
80
506
media_image2.png
Greyscale
After clipping the gradients, noise is sampled from the Gaussian distribution with zero mean and standard deviation of σC and added to the previously clipped gradients (Line 9–10 in Algorithm 2)" teaches that the noise parameters z are applied to the clipped gradients for the network inputs).
Suresh et al. and Abay et al. are analogous to the claimed invention because they are directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate further comprising, before updating the neural network parameters using the clipped gradients for the network inputs of the plurality of network inputs: generating a set of noise parameters, comprising randomly sampling the noise parameters from a noise distribution; and applying the noise parameters to the clipped gradients for the network inputs of the plurality of network inputs as taught by Abay et al. to the disclosed invention of Suresh et al. in view of Hoffer et al.
One of ordinary skill in the art would have been motivated to make this modification to "improve the optimization process since it reduces the sensitivity of the gradient present at each instance" (Abay et al. Section 4.2, second paragraph).
Regarding Claim 8,
Suresh et al. in view of Hoffer et al. and further in view Abay et al. teaches the method of claim 7.
In addition, Abay et al. further teaches wherein the noise distribution comprises a Gaussian noise distribution (Section 4.2, last paragraph: "After clipping the gradients, noise is sampled from the Gaussian distribution with zero mean and standard deviation of σC and added to the previously clipped gradients (Line 9–10 in Algorithm 2)" teaches that the noise distribution is a Gaussian noise distribution).
Suresh et al. and Abay et al. are analogous to the claimed invention because they are directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the noise distribution comprises a Gaussian noise distribution as taught by Abay et al. to the disclosed invention of Suresh et al. in view of Hoffer et al.
One of ordinary skill in the art would have been motivated to make this modification to "improve the optimization process since it reduces the sensitivity of the gradient present at each instance" (Abay et al. Section 4.2, second paragraph).
Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Suresh et al. (US 2021/0049298 A1) in view of Hoffer et al. ("Augment your batch: better training with larger batches") and further in view of van der Maaten et al. ("The Trade-Offs of Private Prediction").
Regarding Claim 9,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
Suresh et al. in view of Hoffer et al. does not appear to explicitly teach wherein the neural network does not include any batch normalization layers.
However, van der Maaten et al. teaches wherein the neural network does not include any batch normalization layers (Section 4.3, first paragraph: "4.3 Results: Convolutional Networks … We also evaluated the DP-SGD and subsample-and-aggregate methods2 on the CIFAR-10 dataset [24] using a ResNet-20 model (with “type A” blocks [17]) as φ’(·). To facilitate the computation of per-example gradients in DP-SGD, we replaced batch normalization by group normalization" teaches that batch normalization layers in the convolutional networks have been replaced with group normalization layers (i.e. the neural network does not include any batch normalization layers)).
Suresh et al. and van der Maaten et al. are analogous to the claimed invention because they are directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the neural network does not include any batch normalization layers as taught by van der Maaten et al. to the disclosed invention of Suresh et al. in view of Hoffer et al.
One of ordinary skill in the art would have been motivated to make this modification "to facilitate the computation of per-example gradients in Differentially private SGD (DP-SGD)" (van der Maaten et al. Section 4.3, first paragraph).
Regarding Claim 10,
Suresh et al. in view of Hoffer et al. teaches the method of claim 1.
Suresh et al. in view of Hoffer et al. does not appear to explicitly teach wherein the neural network includes group normalization layers.
However, van der Maaten et al. teaches wherein the neural network includes group normalization layers (Section 4.3, first paragraph: "4.3 Results: Convolutional Networks … We also evaluated the DP-SGD and subsample-and-aggregate methods2 on the CIFAR-10 dataset [24] using a ResNet-20 model (with “type A” blocks [17]) as φ’(·). To facilitate the computation of per-example gradients in DP-SGD, we replaced batch normalization by group normalization" teaches that batch normalization layers in the convolutional networks have been replaced with group normalization layers (i.e. the neural network includes group normalization layers)).
Suresh et al. and van der Maaten et al. are analogous to the claimed invention because they are directed towards machine learning model privacy training.
Hoffer et al. is analogous to the claimed invention because it is directed towards machine learning model training using data augmentation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the neural network includes group normalization layers as taught by van der Maaten et al. to the disclosed invention of Suresh et al. in view of Hoffer et al.
One of ordinary skill in the art would have been motivated to make this modification "to facilitate the computation of per-example gradients in Differentially private SGD (DP-SGD)" (van der Maaten et al. Section 4.3, first paragraph).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN J HALES whose telephone number is (571)272-0878. The examiner can normally be reached M-F 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRIAN J HALES/Examiner, Art Unit 2125
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125