Office Action Analysis: 18193225 — APPARATUS AND METHOD FOR JOINT TRAINING OF MULTIPLE NEURAL NETWORKS

Office Action

§101 §102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 6 and 7 objected to because of the following informalities: The excerpt; "the apparatus is caused to iteratively perform following until a stopping criterion is met:" in both of these claims is grammatically incorrect. It should be changed to; "the apparatus is caused to iteratively perform the following until a stopping criterion is met:".  Appropriate correction is required.
Claim 16, and 17 objected to because of the following informalities: The excerpt; " the method comprises iteratively performing following until a stopping criterion is met:" in both of these claims is grammatically incorrect. It should be changed to; " the method comprises iteratively performing the following until a stopping criterion is met:". Appropriate correction is required.
Claim 20 objected to because of the following informalities: The excerpt; "wherein to random initialize" is grammatically incorrect. It should read; "wherein to random initialize".  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 3 recites the term "the first subset" three times.  There is insufficient antecedent basis for this term in the claim.
Claim 3 recites the terms "each input sample" and “the input sample” two times each.  There is insufficient antecedent basis for this term in the claim.
Claim 4 recites the term "the first subset" three times.  There is insufficient antecedent basis for this term in the claim.
Claim 4 recites the terms "each input sample" once and “the input sample” two times.  There is insufficient antecedent basis for this term in the claim.
Claim 9 recites the terms "the first set" and “the updated first set”.  There is insufficient antecedent basis for this term in the claim.
Claim 13 recites the terms "the first subset” two times.  There is insufficient antecedent basis for this term in the claim.
Claim 13 recites the terms "each input sample" once and “the input sample” two times.  There is insufficient antecedent basis for this term in the claim.
Claim 14 recites the terms "the first subset” two times.  There is insufficient antecedent basis for this term in the claim.
Claim 14 recites the terms "each input sample" once and “the input sample” two times.  There is insufficient antecedent basis for this term in the claim.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1
According to the first part of the analysis, in the instant case, claims 1-10 are directed to an apparatus, claims 10-20 are directed to a method. Each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

Regarding Claim 1
Step 2A Prong One
determine a plurality of weights used to calculate a weighted loss based at least on a performance of a plurality of neural networks on one or more training samples;
(This step for determining weights based on performance of neural networks to calculate a weighted loss is understood to be a mental process)
Step 2A Prong Two
An apparatus comprising at least one processor; and at least one non-transitory memory comprising computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform:
(This step performing the abstract idea on a generic computing system is mere instructions to apply a judicial exception. See MPEP § 2106.05(f))
and jointly train the plurality of neural networks, wherein at each training iteration the plurality of neural networks are trained based at least on the weighted loss.  
(This step for training a plurality of neural networks based on a weighted loss is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as determining weights to be used in a calculation while the additional elements of performing the abstract idea using a generic computing system and training a plurality a plurality of machine learning algorithms using a loss function are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).

	Regarding Claim 2
	Step 2A Prong One
	The apparatus of claim 1, wherein the apparatus is caused to determine the weight further based at least on a value, wherein the value changes based on a predefined schedule, a training iteration number, or a value derived from a training iteration number. 
	(This step for determining the weight based on a value is understood to be a mental process)
	Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
	Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as determining weights based on a value without any technological improvement or inventive step.


Regarding Claim 3
Step 2A Prong One
compute a score for each neural network of at least one of the second subset of neural networks or the third subset of neural networks based at least on a metric,
(This step for computing a score based on a metric is understood to be a mental process)
wherein the metric is computed based at least on an output of each of the neural networks of the second subset of neural network or the third subset of neural networks and on a reference sample; 
(This step for computing a metric based on machine learning output is understood to be a mental process)
and select a neural network from at least one of the second subset of neural networks or the third subset neural network that yields a predetermined score as an optimal neural network for the input sample.  
(This step for selecting a neural network based on a score is understood as a mental process)
Step 2A Prong Two
The apparatus of claim 1, wherein the apparatus is further caused to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, for each input sample, and wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at the inference time, for each input sample, the apparatus is caused to:
(This step for applying the judicial exception using a computing system is mere instructions to apply an exception. See MPEP § 2106.05(f))
apply at least one of a second subset of the plurality of neural networks or a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset of neural networks or the third subset of neural networks, wherein the second subset and third subset comprise at least two neural networks;
(This step for providing input to and receiving output from neural networks is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing a score, computing a metric, and selecting a neural network based on the score, while the additional elements of applying the process to a generic computing system and obtaining output from generic neural networks based on input data are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 4
	Step 2A Prong One
	compute a score for each neural network of at least one of the second subset of neural networks or the third subset of neural network based at least on an output of the each of the neural networks of the second subset of neural networks or the third subset of neural network and on an auxiliary neural network; 
	(This step for computing a score based on output and an auxiliary neural network is determined to be a mental process)
select a neural network from at least one of the second subset of neural networks or the third subset of neural networks that yields a predetermined score as an optimal neural network for the input sample; 
(This step for selecting a neural network based on a predetermined score is determined to be a mental process)
Step 2A Prong Two
The apparatus of claim 1, wherein the apparatus is further caused to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, and wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at the inference time, for the each input sample, the apparatus is caused to: 
(This step for applying the mental process using a generic computing system is mere instructions to apply an exception. See MPEP § 2106.05(f))
apply at least one of a second subset of the plurality of neural networks or a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset or the third subset, wherein the second subset of neural network and the third subset of neural networks comprise at least two neural networks; 
(This step for obtaining output from neural networks based on input is extra-solution activity. See MPEP § 2106.05(g))
and train the auxiliary neural network during a training phase. 
(This step for training a generic neural network is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing a score and selecting a neural network while the additional elements of applying the process on a generic computing system, obtaining outputs from generic neural networks based on input data, and training a neural network are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


Regarding Claim 5
Step 2A Prong One
(Claim 5 depends on claim 3, which has been determined to recite abstract ideas including mental processes. Therefore, claim 5 also recites an abstract idea.)
Step 2A Prong Two
The apparatus of claim 3, wherein the apparatus is further caused to signal information of the optimal neural network from an encoder-side device to a decoder-side device.  
(This step for signaling information between devices is extra-solution activity. See MPEP § 2106.05(f))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional element of signaling information between devices is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 6
	Step 2A Prong One
	compute a loss;
	(This step for computing a loss is understood to be a mental process)
	Step 2A Prong Two
	The apparatus of claim 1, wherein the apparatus is further caused to jointly overfit a first subset of neural networks of the plurality of neural networks, and wherein to jointly overfit the first subset of neural networks, the apparatus is caused to iteratively perform following until a stopping criterion is met: 
This step for overfitting the neural networks using the computed loss does integrate the computed loss into a practical exception. In the applicant’s spec, they state that the purpose of overfitting is to, [0203] “improve the rate-distortion performance.”. However, there is no integration of the abstract ideas inherited from claim 11, so claim 17 still fails step 2A prong two. 
use a decoded video as input to the plurality of neural networks; 
compute an output for each of the first subset of neural networks; 
(This step for computing output for each neural network based on input data is extra-solution activity. See MPEP § 2106.05(g))
backpropagate the loss with respect to at least one parameter of the one or more parameters of the first subset of neural networks; 
(This step for backpropagation is extra-solution activity. See MPEP § 2106.05(g))
and update the at least one parameter based at least on the computed loss.
	(This step for updating parameters is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of overfitting generic neural networks, computing output from generic neural networks based on input, backpropagating loss, and updating parameters are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 7
	Step 2A Prong One
compute a loss; 
	(This step for computing a loss is understood to be a mental process)
	Step 2A Prong Two
The apparatus of claim 1, wherein the apparatus is further caused to jointly overfit a first subset of neural networks of the plurality of neural networks, and wherein to jointly train the plurality of neural networks, the apparatus is caused to iteratively perform following until a stopping criterion is met: 
This step for overfitting the neural networks using the computed loss does integrate the computed loss into a practical exception. In the applicant’s spec, they state that the purpose of overfitting is to, [0203] “improve the rate-distortion performance.”. However, there is no integration of the abstract ideas inherited from claim 11, so claim 17 still fails step 2A prong two. 
use a decoded video as input to the plurality of neural networks; compute an output for each of the plurality of neural networks; 
(This step for receiving output from neural networks based on input is extra-solution activity. See MPEP § 2106.05(g))
backpropagate the loss with respect to at least one parameter of the one or more parameters of the plurality of neural networks; 
(This step for backpropagation is extra-solution activity. See MPEP § 2106.05(g))
and update the at least one parameter based at least on the computed loss.  
(This step for updating parameters is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of overfitting neural networks by receiving output from generic neural networks based on input, backpropagating loss, and updating parameters are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding claim 8
	Step 2A Prong One
The apparatus of claim 6, wherein the loss is the weighted loss, and wherein the weighted loss is computed based at least on the plurality of weights, and wherein each of the plurality of weights is computed based at least on a performance of the plurality of neural networks on the one or more training samples.  
(This step for computing the weighted loss using data including computed weights is understood as a mental process)
	Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing a weighted loss based on computed weights without any technological improvement or inventive step. 


	Regarding claim 9
	Step 2A Prong One
The apparatus claim 6, wherein the apparatus is further caused to: compute a weight-update for each neural network of the first subset of neural networks;
(this step for computing a weight update for neural networks is understood to be a mental process)
	Step 2A Prong Two
compress the weight-update for the each neural network of the first subset of neural networks; 
(This step for compressing the weight-updates is an extra-solution activity. See MPEP § 2106.05(g))
and signal the compressed weight-update for the each neural network of the first subset of neural networks to the decoder-side device in or along the bitstream, 
(This step for sending information between devices is extra-solution activity. See MPEP § 2106.05(g)
wherein the decoder-side device decompresses the compressed weight-update, 
(This step for decompressing the weight-updates is extra-solution activity. See MPEP § 2106.05(g))
uses the decompressed weight-update for updating the first set of neural networks, and uses the updated first set of neural networks for post-processing a decoded video.  
(This step for updating the neural networks using the weight-updates is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing weights while the additional elements of compressing and decompressing weights, passing information between devices, and updating a generic neural network are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 10
	Step 2A Prong One
(Claim 10 depends on claim 4, which has been determined to recite abstract ideas including mental processes. Therefore, claim 10 also recites an abstract idea.)
	Step 2A Prong Two
The apparatus of claim 4, wherein the apparatus is further caused to randomly initialize the plurality of neural networks by assigning a value to one or more of the parameters of the plurality of neural networks based on a random or pseudo-random process.  
(This step for randomly initializing the neural networks is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of randomly initializing the generic neural networks are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 11
	Step 2A Prong One
	A method comprising: determining a plurality of weights used to calculate a weighted loss based at least on a performance of a plurality of neural networks on one or more training samples;
	(This step for determining weights based on performance of neural networks to calculate a weighted loss is understood to be a mental process)
	Step 2A Prong Two
	and jointly training the plurality of neural networks, wherein at each training iteration the plurality of neural networks are trained based at least on the weighted loss.
(This step for training a plurality of neural networks based on a weighted loss is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as determining weights to be used in a calculation while the additional elements of training a plurality a plurality of machine learning algorithms using a loss function are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 12
	Step 2A Prong One
The method of claim 11 further comprising determining the weight further based at least on a value, wherein the value changes based on a predefined schedule, a training iteration number, or a value derived from a training iteration number.  
	(This step for determining the weight based on a value is understood to be a mental process)
	Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as determining weights based on a value without any technological improvement or inventive step.


	Regarding Claim 13
	Step 2A Prong One
computing a score for each neural network of at least one of the second subset of neural networks or the third subset of neural networks based at least on a metric, 
(this step for computing a score for each neural network is understood to be a mental process)
wherein the metric is computed based at least on an output of each of the neural networks of the second subset of neural network or the third subset of neural networks and on a reference sample; 
(This step for computing a metric based on the outputs of the neural networks is understood to be a mental process)
and selecting a neural network from at least one of the second subset of neural networks or the third subset neural network that yields a predetermined score as an optimal neural network for the input sample.  
(This step for selecting a neural network based on a predetermined score is understood to be a mental process)
	Step 2A Prong Two
The method of claim 11, wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, for each input sample, the method further comprises: applying at least one of a second subset of the plurality of neural networks or a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset of neural networks or the third subset of neural networks, wherein the second subset and third subset comprise at least two neural networks;
(This step for receiving output from the neural networks based on input data is understood to be extra-solution activity. See MPEP § 2106.05(g)) 
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing a score, computing a metric, and selecting a neural network based on the score, while the additional elements of obtaining output from generic neural networks based on input data are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 14
	Step 2A Prong One
computing a score for each neural network of at least one of the second subset of neural networks or the third subset of neural networks based at least on a metric, wherein the metric is computed based at least on an output of each of the neural networks of the second subset of neural network or the third subset of neural networks and on a reference sample; 
(This step for computing a score based neural network outputs is understood to be a mental process)
selecting a neural network from at least one of the second subset of neural networks or the third subset of neural networks that yields a predetermined score as an optimal neural network for the input sample;
(This step for selecting a neural network based on a predetermined score is understood to be a mental process)
	Step 2A Prong Two
The method of claim 11, wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, for the each input sample, the method further comprises: applying at least one of a second subset of the plurality of neural networks or a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset or the third subset, wherein the second subset of neural network and the third subset of neural networks comprise at least two neural networks;
(This step for receiving output from neural networks based on input data is extra-solution activity. See MPEP § 2106.05(g))
and training the auxiliary neural network during a training phase. 
(This step for training a neural network is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing a score and selecting a neural network based on the score, while the additional elements of obtaining output from generic neural networks based on input data and training a generic neural network are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 15
	Step 2A Prong One
(Claim 15 depends on claim 13, which has been determined to recite abstract ideas including mental processes. Therefore, claim 15 also recites an abstract idea.)
	Step 2B Prong Two
The method of claim 13 further comprising signaling information of the optimal neural network from an encoder-side device to a decoder-side device.
(This step for signaling information between devices is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional element of signaling information between devices is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 16
	Step 2A Prong One
	compute a loss;
	(This step for computing a loss is understood to be a mental process)
	Step 2A Prong Two
The method of claim 11 further comprising jointly overfitting a first subset of neural networks of the plurality of neural networks, and wherein to jointly overfit the first subset of neural networks, the method comprises iteratively performing following until a stopping criterion is met:
This step for overfitting the neural networks using the computed loss does integrate the computed loss into a practical exception. In the applicant’s spec, they state that the purpose of overfitting is to, [0203] “improve the rate-distortion performance.”. However, there is no integration of the abstract ideas inherited from claim 11, so claim 17 still fails step 2A prong two. 
using a decoded video as input to the plurality of neural networks; 
computing an output for each of the first subset of neural networks; 
(This step for computing output for each neural network based on input data is extra-solution activity. See MPEP § 2106.05(g))
backpropagating the loss with respect to at least one parameter of the one or more parameters of the first subset of neural networks; 
(This step for backpropagation is extra-solution activity. See MPEP § 2106.05(g))
and updating the at least one parameter based at least on the computed loss.
(This step for updating parameters is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of overfitting generic neural networks, computing output from generic neural networks based on input, backpropagating loss, and updating parameters are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 17
	Step 2A Prong One
computing a loss;
(This step for computing a loss is understood to be a mental process)
(Claim 17 depends on claim 11, which has been determined to recite abstract ideas including mental processes. Therefore, claim 17 also recites an abstract idea.)
Step 2A Prong Two
The method of claim 11, further comprising jointly overfitting a first subset of neural networks of the plurality of neural networks, and wherein to jointly train the plurality of neural networks, the method comprises iteratively performing following until a stopping criterion is met: 
This step for overfitting the neural networks using the computed loss does integrate the computed loss into a practical exception. In the applicant’s spec, they state that the purpose of overfitting is to, [0203] “improve the rate-distortion performance.”. However, there is no integration of the abstract ideas inherited from claim 11, so claim 17 still fails step 2A prong two. 
using a decoded video as input to the plurality of neural networks; 
computing an output for each of the plurality of neural networks; 
(This step for computing output for each neural network based on input data is extra-solution activity. See MPEP § 2106.05(g))
backpropagating the loss with respect to at least one parameter of the one or more parameters of the plurality of neural networks; 
(This step for backpropagation is extra-solution activity. See MPEP § 2106.05(g))
and updating the at least one parameter based at least on the computed loss.
(This step for updating parameters is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of overfitting generic neural networks, computing output from generic neural networks based on input, backpropagating loss, and updating parameters are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 18
The method of claim 16, wherein the loss is the weighted loss, and wherein the weighted loss is computed based at least on the plurality of weights, and wherein each of the plurality of weights is computed based at least on a performance of the plurality of neural networks on the one or more training samples.  
(This step for computing the weighted loss using data including computed weights is understood as a mental process)
	Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing a weighted loss based on computed weights without any technological improvement or inventive step. 


	Regarding Claim 19
	Step 2A Prong One
computing a weight-update for each neural network of the first subset of neural networks;
(this step for computing a weight update for neural networks is understood to be a mental process)
Step 2A Prong Two
The method of claim 16 further comprising: 
compressing the weight-update for the each neural network of the first subset of neural networks; 
(This step for compressing the weight-updates is an extra-solution activity. See MPEP § 2106.05(g))
and signaling the compressed weight-update for the each neural network of the first subset of neural networks to the decoder-side device in or along the bitstream, 
(This step for sending information between devices is extra-solution activity. See MPEP § 2106.05(g)
wherein the decoder-side device decompresses the compressed weight-update, 
(This step for decompressing the weight-updates is extra-solution activity. See MPEP § 2106.05(g))
uses the decompressed weight-update for updating the first set of neural networks, and uses the updated first set of neural networks for post-processing a decoded video.  
(This step for updating the neural networks using the weight-updates is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as computing weights while the additional elements of compressing and decompressing weights, passing information between devices, and updating a generic neural network are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding Claim 20
Step 2A Prong One
(Claim 20 depends on claim 14, which has been determined to recite abstract ideas including mental processes. Therefore, claim 20 also recites an abstract idea.)
	Step 2A Prong Two
The method of claim 14 further comprising randomly initializing the plurality of neural networks, wherein to random initialize the plurality of neural networks, the method comprises assigning a value to one or more of the parameters of the plurality of neural networks based on a random or pseudo-random process.  
(This step for randomly initializing the neural networks is extra-solution activity. See MPEP § 2106.05(g))
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of randomly initializing the generic neural networks are a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4, 11-14 is/are rejected under 35 U.S.C. 102(a)(1) as being unpatentable over Maxwell Elliot Jaderberg et al., (WO 2019101836 A1) (hereinafter Jaderberg).
	
	Regarding Claim 1, Jaderberg teaches;
An apparatus comprising at least one processor; and at least one non-transitory memory comprising computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform:
“[pg.27, ln.13-20] Implementations of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory storage medium for execution by, or to control the operation of, data processing apparatus.”
This excerpt specifies an apparatus comprising at least one data processing apparatus, and at least one non-transitory memory comprising computer program code, wherein the at least one memory and computer program code are configured to, with the data processing apparatus, cause the apparatus to perform the subject matter described in the disclosure. 
“[pg.27, ln.27-30] The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.”
	This excerpt explains that the aforementioned “data processing apparatus” can be a processor
determine a plurality of weights
	“[pg.2, ln.22-25] Generally, the network parameters are values that impact the operations performed by the neural network and that are adjusted as part of the iterative training process. For example, the network parameters can include values of weight matrices and, in some cases, bias vectors, of the layers of the neural network.”
	This excerpt describes determining a plurality of weights (weight matrices from the network parameters) for a neural network which are adjusted during the training process.
used to calculate a weighted loss based at least on a performance of a plurality of neural networks
“[pg.12, ln.28 - pg.13, ln.4] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function). The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters. ” 
A loss (step function which uses a loss function) is calculated based at least on a performance (optimizing means increasing performance) of a plurality of neural networks. The loss (step function including a loss function) uses the weights (step function receives as input network parameters including weight matrices) in its calculation. The weights are therefore used to calculate a loss based on at least performance of a plurality (population) of neural networks.
“[pg.15, ln.16-21] In some implementations, the system 100 includes auxiliary losses in the loss function to regularize or otherwise bias the solutions found, or to shape learning dynamics to speed up training. These auxiliary losses can be included in the system 100 without spending a long time tuning weight schedules by hand. That is, weights between different terms in the loss function can be automatically adjusted during the meta-optimization process.”
The aforementioned loss function is a weighted loss function (weights between different terms in the loss function). The plurality of weights is therefore used to calculate a weighted loss based on at least performance of a plurality of neural networks.
on one or more training samples
“[pg.13, ln.1-4] The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters.”
The step function utilizes training samples (hyperparameters and network parameters). The weights are therefore used to calculate a weighted loss based on at least performance of a plurality of neural networks on one or more training samples.
and jointly train the plurality of neural networks, wherein at each training iteration the plurality of neural networks are trained based at least on the weighted loss.
“[pg.12, ln.28-pg.13, ln.1] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function).”
Discloses an iterative training process for the population of candidate neural network, or jointly training the plurality of neural networks. As previously established, this training process uses a step function which uses a weighted loss. This therefore teaches jointly training a plurality of neural networks iteratively, wherein at each iteration, the plurality of neural networks is trained based at least on the weighted loss.


Regarding claim 2, Jaderberg teaches;
The apparatus of claim 1, wherein the apparatus is caused to determine the weight further based at least on a value,
“[pg.12, ln.28 - pg.13, ln.4] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function). The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters.”
The weights (network parameters) are determined (updated) each training iteration based on a value (input to the step function including the existing network parameters).
wherein the value changes based on a predefined schedule, a training iteration number, or a value derived from a training iteration number.
The value (network parameters) changes each training iteration (the network parameters are updated in the iterative training process), and therefore changes based on a predefined schedule.


Regarding claim 3, Jaderberg teaches;
wherein the apparatus is further caused to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, for each input sample, and wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at the inference time, for each input sample, the apparatus is caused to: 
apply at least one of a second subset of the plurality of neural networks OR a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset of neural networks OR the third subset of neural networks, wherein the second subset and third subset comprise at least two neural networks;
“[pg.23, ln.8-10] In another example, a supervised learning task is specified as the machine learning task for the system. Each candidate neural network receives inputs and generates outputs that conform to a supervised machine learning task.”
Each candidate neural network receives input samples (inputs) and produces output samples (outputs). ‘Each candidate neural network’ indicates a plurality of neural networks, which indicates at least two. A subset of the plurality of neural networks could be as many as all of the neural networks in the plurality. Therefore, a first subset can be used interchangeably with a plurality of neural networks. Also, any of the first, second, third subsets could include all of the same neural networks, and can therefore be used interchangeably. Therefore, this excerpt teaches applying at least one of a second subset of the plurality of neural networks OR a third subset of the first subset of neural networks to the input sample (inputs) to obtain at least one of an output sample (outputs) for each neural network in the second subset of neural networks OR the third subset of neural networks, wherein the second and third subset include at least two neural networks.
compute a score for each neural network of at least one of the second subset of neural networks or the third subset of neural networks based at least on a metric, 
“[pg.23, ln.18-20] eval(.Math.). The system updates the quality measure of the candidate neural network based on evaluation of the candidate neural network’s bilingual evaluation understudy score (BLEU score).”
The score (quality measure) for each neural network in the second or third subset of neural networks is computed based at least on a metric (BLEU score).
wherein the metric is computed based at least on an output of each of the neural networks of the second subset of neural network or the third subset of neural networks and on a reference sample;
“[pg.23, ln.21-22] The BLEU score is an evaluation of a generated sequence compared with a reference sequence.”
The metric (BLEU score) is computed based on the output (generated sequence) of each of the neural networks of the second or third subset of neural networks and on a reference sample (reference sequence).
and select a neural network from at least one of the second subset of neural networks or the third subset neural network that yields a predetermined score as an optimal neural network for the input sample.
“[pg.23, ln.24-26] The candidate neural network with the highest mean BLEU score has the highest quality measure and is considered the “best” in terms of measured fitness.”
“[pg.13, ln.15-18] The optimal candidate neural network selected is sometimes referred to in this specification as the “best” candidate neural network 120A-N in the population. A candidate neural network that has a higher quality measure than another candidate neural network is considered “better” than the other candidate neural network.”
An optimal neural network is selected from at least one of the second subset of neural networks or the third subset neural networks (in the population) that yields a predetermined score (the highest quality measure) as an optimal neural network for the input sample (reference sequence).



	Regarding Claim 4, Jaderberg teaches;
	wherein the apparatus is further caused to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, and wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at the inference time, for the each input sample, the apparatus is caused to:
	apply at least one of a second subset of the plurality of neural networks or a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset or the third subset, wherein the second subset of neural network and the third subset of neural networks comprise at least two neural networks;
	“[pg.23, ln.8-10] In another example, a supervised learning task is specified as the machine learning task for the system. Each candidate neural network receives inputs and generates outputs that conform to a supervised machine learning task.”
Each candidate neural network receives input samples (inputs) and produces output samples (outputs). ‘Each candidate neural network’ indicates a plurality of neural networks, which indicates at least two. A subset of the plurality of neural networks could be as many as all of the neural networks in the plurality. Therefore, a first subset can be used interchangeably with a plurality of neural networks. Also, any of the first, second, third subsets could include all of the same neural networks, and can therefore be used interchangeably. Therefore, this excerpt teaches applying at least one of a second subset of the plurality of neural networks OR a third subset of the first subset of neural networks to the input sample (inputs) to obtain at least one of an output sample (outputs) for each neural network in the second subset of neural networks OR the third subset of neural networks, wherein the second and third subset include at least two neural networks.
compute a score for each neural network of at least one of the second subset of neural networks or the third subset of neural network based at least on an output of the each of the neural networks of the second subset of neural networks or the third subset of neural network 
“[pg.23, ln.18-20] eval(.Math.). The system updates the quality measure of the candidate neural network based on evaluation of the candidate neural network’s bilingual evaluation understudy score (BLEU score).”
The score (quality measure) for each neural network in the second or third subset of neural networks is computed based on the BLEU score.
“[pg.23, ln.21-22] The BLEU score is an evaluation of a generated sequence compared with a reference sequence.”
The BLEU score is computed based on the output (generated sequence) of each of the neural networks of the second or third subset of neural networks and on a reference sample (reference sequence).
and on an auxiliary neural network;
“[Abstract] A method includes: training a neural network having a plurality of network parameters to perform a particular neural network task and to determine trained values of the network parameters using an iterative training process having a plurality of hyperparameters, the method comprising: maintaining a plurality of candidate neural networks and, for each of the candidate neural networks, data specifying: (i) respective values of the network parameters for the candidate neural network, (ii) respective values of the hyperparameters for the candidate neural network, and (iii) a quality measure that measures a performance of the candidate neural network on the particular neural network task; and for each of the plurality of candidate neural networks, repeatedly performing additional training operations.”
The method to train a single neural network to perform a particular task includes calculating the quality measure for each of the plurality of candidate neural networks based on that specific task. Here, the single neural network to be trained is considered an auxiliary neural network as it provides the particular task which guides the calculation of the score (quality measure) for each of the candidate neural networks. The score (quality measure) for each neural network in the second or third subset of neural networks (plurality of candidate neural networks) is therefore computed based on the auxiliary neural network (the single neural network to be trained) since the score (quality measure) is tailored to the specific task of the auxiliary network (the single neural network to be trained).
select a neural network from at least one of the second subset of neural networks or the third subset of neural networks that yields a predetermined score as an optimal neural network for the input sample;
“[pg.23, ln.24-26] The candidate neural network with the highest mean BLEU score has the highest quality measure and is considered the “best” in terms of measured fitness.”
“[pg.13, ln.15-18] The optimal candidate neural network selected is sometimes referred to in this specification as the “best” candidate neural network 120A-N in the population. A candidate neural network that has a higher quality measure than another candidate neural network is considered “better” than the other candidate neural network.”
An optimal (candidate) neural network is selected from at least one of the second subset of neural networks or the third subset neural network (population) that yields a predetermined score (the highest quality measure) as an optimal neural network for the input sample (input used to generate output for the candidate neural network).
and train the auxiliary neural network during a training phase.
“[Abstract] A method includes: training a neural network having a plurality of network parameters to perform a particular neural network task and to determine trained values of the network parameters using an iterative training process having a plurality of hyperparameters, the method comprising: maintaining a plurality of candidate neural networks and, for each of the candidate neural networks, data specifying: (i) respective values of the network parameters for the candidate neural network, (ii) respective values of the hyperparameters for the candidate neural network, and (iii) a quality measure that measures a performance of the candidate neural network on the particular neural network task; and for each of the plurality of candidate neural networks, repeatedly performing additional training operations.”
Refers to training the aforementioned auxiliary neural network (the neural network to be trained) which would occur during a training phase.


Regarding claim 11, Jaderberg teaches
determining a plurality of weights
“[pg.2, ln.22-25] Generally, the network parameters are values that impact the operations performed by the neural network and that are adjusted as part of the iterative training process. For example, the network parameters can include values of weight matrices and, in some cases, bias vectors, of the layers of the neural network.”
	This excerpt describes determining a plurality of weights (weight matrices from the network parameters) for a neural network which are adjusted during the training process.
used to calculate a weighted loss based at least on a performance of a plurality of neural networks
“[pg.12, ln.28 - pg.13, ln.4] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function). The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters. ” 
A loss (step function which uses a loss function) is calculated based at least on a performance (optimizing means increasing performance) of a plurality of neural networks. The loss (step function including a loss function) uses the weights (step function receives as input network parameters including weight matrices) in its calculation. The weights are therefore used to calculate a loss based on at least performance of a plurality (population) of neural networks.
“[pg.15, ln.16-21] In some implementations, the system 100 includes auxiliary losses in the loss function to regularize or otherwise bias the solutions found, or to shape learning dynamics to speed up training. These auxiliary losses can be included in the system 100 without spending a long time tuning weight schedules by hand. That is, weights between different terms in the loss function can be automatically adjusted during the meta-optimization process.”
The aforementioned loss function is a weighted loss function (weights between different terms in the loss function). The plurality of weights is therefore used to calculate a weighted loss based on at least performance of a plurality of neural networks.
on one or more training samples;
“[pg.13, ln.1-4] The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters.”
The step function utilizes training samples (hyperparameters and network parameters). The weights are therefore used to calculate a weighted loss based on at least performance of a plurality of neural networks on one or more training samples.
and jointly training the plurality of neural networks, wherein at each training iteration the plurality of neural networks are trained based at least on the weighted loss.
“[pg.12, ln.28-pg.13, ln.1] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function).”
Discloses an iterative training process for the population of candidate neural networks, or jointly training the plurality of neural networks. As previously established, this training process uses a step function which uses a weighted loss. This therefore teaches jointly training a plurality of neural networks iteratively, wherein at each iteration, the plurality of neural networks is trained based at least on the weighted loss.


Regarding claim 12, Jaderberg teaches; 
The method of claim 11 further comprising determining the weight further based at least on a value,
“[pg.12, ln.28 - pg.13, ln.4] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function). The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters.”
The weights (network parameters) are determined (updated) each training iteration based on a value (input to the step function including the existing network parameters).
wherein the value changes based on a predefined schedule, a training iteration number, or a value derived from a training iteration number.
The value (network parameters) changes each training iteration (the network parameters are updated in the iterative training process), and therefore changes based on a predefined schedule.


Regarding claim 13, Jaderberg teaches;
wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, for each input sample, the method further comprises:
applying at least one of a second subset of the plurality of neural networks or a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset of neural networks or the third subset of neural networks, wherein the second subset and third subset comprise at least two neural networks;
“[pg.23, ln.8-10] In another example, a supervised learning task is specified as the machine learning task for the system. Each candidate neural network receives inputs and generates outputs that conform to a supervised machine learning task.”
Each candidate neural network receives input samples (inputs) and produces output samples (outputs). ‘Each candidate neural network’ indicates a plurality of neural networks, which indicates at least two. A subset of the plurality of neural networks could be as many as all of the neural networks in the plurality. Therefore, a first subset can be used interchangeably with a plurality of neural networks. Also, any of the first, second, third subsets could include all of the same neural networks, and can therefore be used interchangeably. Therefore, this excerpt teaches applying at least one of a second subset of the plurality of neural networks OR a third subset of the first subset of neural networks to the input sample (inputs) to obtain at least one of an output sample (outputs) for each neural network in the second subset of neural networks OR the third subset of neural networks, wherein the second and third subset include at least two neural networks.
computing a score for each neural network of at least one of the second subset of neural networks or the third subset of neural networks based at least on a metric, 
“[pg.23, ln.18-20] eval(.Math.). The system updates the quality measure of the candidate neural network based on evaluation of the candidate neural network’s bilingual evaluation understudy score (BLEU score).”
The score (quality measure) for each neural network in the second or third subset of neural networks is computed based at least on a metric (BLEU score).
wherein the metric is computed based at least on an output of each of the neural networks of the second subset of neural network or the third subset of neural networks and on a reference sample;
“[pg.23, ln.21-22] The BLEU score is an evaluation of a generated sequence compared with a reference sequence.”
The metric (BLEU score) is computed based on the output (generated sequence) of each of the neural networks of the second or third subset of neural networks and on a reference sample (reference sequence).
and selecting a neural network from at least one of the second subset of neural networks or the third subset neural network that yields a predetermined score as an optimal neural network for the input sample.
“[pg.23, ln.24-26] The candidate neural network with the highest mean BLEU score has the highest quality measure and is considered the “best” in terms of measured fitness.”
“[pg.13, ln.15-18] The optimal candidate neural network selected is sometimes referred to in this specification as the “best” candidate neural network 120A-N in the population. A candidate neural network that has a higher quality measure than another candidate neural network is considered “better” than the other candidate neural network.”
An optimal neural network is selected from at least one of the second subset of neural networks or the third subset neural networks (in the population) that yields a predetermined score (the highest quality measure) as an optimal neural network for the input sample (reference sequence).


Regarding claim 14, Jaderberg teaches;
wherein to use the plurality of neural networks or the first subset of the plurality of neural networks at an inference time, for the each input sample, the method further comprises:
applying at least one of a second subset of the plurality of neural networks or a third subset of the first subset of neural networks to the input sample to obtain at least one of an output sample for each neural network in the second subset or the third subset, wherein the second subset of neural network and the third subset of neural networks comprise at least two neural networks;
“[pg.23, ln.8-10] In another example, a supervised learning task is specified as the machine learning task for the system. Each candidate neural network receives inputs and generates outputs that conform to a supervised machine learning task.”
Each candidate neural network receives input samples (inputs) and produces output samples (outputs). ‘Each candidate neural network’ indicates a plurality of neural networks, which indicates at least two. A subset of the plurality of neural networks could be as many as all of the neural networks in the plurality. Therefore, a first subset can be used interchangeably with a plurality of neural networks. Also, any of the first, second, third subsets could include all of the same neural networks, and can therefore be used interchangeably. Therefore, this excerpt teaches applying at least one of a second subset of the plurality of neural networks OR a third subset of the first subset of neural networks to the input sample (inputs) to obtain at least one of an output sample (outputs) for each neural network in the second subset of neural networks OR the third subset of neural networks, wherein the second and third subset include at least two neural networks.
computing a score for each neural network of at least one of the second subset of neural networks or the third subset of neural network based at least on an output of the each of the neural networks of the second subset of neural networks or the third subset of neural network 
“[pg.23, ln.18-20] eval(.Math.). The system updates the quality measure of the candidate neural network based on evaluation of the candidate neural network’s bilingual evaluation understudy score (BLEU score).”
The score (quality measure) for each neural network in the second or third subset of neural networks is computed based on the BLEU score.
“[pg.23, ln.21-22] The BLEU score is an evaluation of a generated sequence compared with a reference sequence.”
The BLEU score is computed based on the output (generated sequence) of each of the neural networks of the second or third subset of neural networks and on a reference sample (reference sequence).
and on an auxiliary neural network;
“[Abstract] A method includes: training a neural network having a plurality of network parameters to perform a particular neural network task and to determine trained values of the network parameters using an iterative training process having a plurality of hyperparameters, the method comprising: maintaining a plurality of candidate neural networks and, for each of the candidate neural networks, data specifying: (i) respective values of the network parameters for the candidate neural network, (ii) respective values of the hyperparameters for the candidate neural network, and (iii) a quality measure that measures a performance of the candidate neural network on the particular neural network task; and for each of the plurality of candidate neural networks, repeatedly performing additional training operations.”
The method to train a single neural network to perform a particular task includes calculating the quality measure for each of the plurality of candidate neural networks based on that specific task. Here, the single neural network to be trained is considered an auxiliary neural network as it provides the particular task which guides the calculation of the score (quality measure) for each of the candidate neural networks. The score (quality measure) for each neural network in the second or third subset of neural networks (plurality of candidate neural networks) is therefore computed based on the auxiliary neural network (the single neural network to be trained) since the score (quality measure) is tailored to the specific task of the auxiliary network (the single neural network to be trained).
selecting a neural network from at least one of the second subset of neural networks or the third subset of neural networks that yields a predetermined score as an optimal neural network for the input sample;
“[pg.23, ln.24-26] The candidate neural network with the highest mean BLEU score has the highest quality measure and is considered the “best” in terms of measured fitness.”
“[pg.13, ln.15-18] The optimal candidate neural network selected is sometimes referred to in this specification as the “best” candidate neural network 120A-N in the population. A candidate neural network that has a higher quality measure than another candidate neural network is considered “better” than the other candidate neural network.”
An optimal (candidate) neural network is selected from at least one of the second subset of neural networks or the third subset neural network (population) that yields a predetermined score (the highest quality measure) as an optimal neural network for the input sample (input used to generate output for the candidate neural network).
and training the auxiliary neural network during a training phase.
“[Abstract] A method includes: training a neural network having a plurality of network parameters to perform a particular neural network task and to determine trained values of the network parameters using an iterative training process having a plurality of hyperparameters, the method comprising: maintaining a plurality of candidate neural networks and, for each of the candidate neural networks, data specifying: (i) respective values of the network parameters for the candidate neural network, (ii) respective values of the hyperparameters for the candidate neural network, and (iii) a quality measure that measures a performance of the candidate neural network on the particular neural network task; and for each of the plurality of candidate neural networks, repeatedly performing additional training operations.”
Refers to training the aforementioned auxiliary neural network (the neural network to be trained) which would occur during a training phase.





Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 5, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Maxwell Elliot Jaderberg et al., (WO 2019101836 A1) (hereinafter Jaderberg) in view of Caglar Aytekin et al., (US 20200311551 A1) (hereinafter Caglar).

Regarding claim 5, Jaderberg teaches;
the apparatus of claim 3
(Using the same reasoning for the 102 rejection for claim 3 in view of Jaderberg)
Jaderberg fails to teach but Caglar teaches;
signal information a neural network from an encoder-side device to a decoder-side device.
“[0147] It may happen that the improvement of the overfitted network with respect to the pretrained network, when measured on the image to be encoded, for example based on Peak-Signal-to-Noise-Ratio (PSNR) or MSE, is not sufficiently high (for example based on a predefined threshold), and the encoder system may decide not to encode any weight-update, and it may optionally signal to the decoder that no weight-update needs to be applied to the pretrained network.”
Here, information of a neural network (whether a weight-update needs to be applied based on the improvement of the overfitted neural network) is being signaled from the encoder side device to the decoder side device. 
In this example, information is being signaled to make sure weight-update processing only occurs when necessary, decreasing computational overhead. 
Jaderberg and Caglar are analogous art to the present invention because they both address jointly training a plurality of neural networks using a plurality of weights and a weighted loss. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the encoder and decoder side devices of Caglar’s disclosure with the apparatus used to select the optimal neural network in Jaderberg’s disclosure to signal optimal neural network weight-updates from the encoder side device to the decoder side device, filtering unnecessary weight updates, when necessary, to decrease computational overhead.  


Regarding claim 15, Jaderberg teaches;
The method of claim 13
(Using the same reasoning for the 102 rejection for claim 13 in view of Jaderberg)
Jaderberg fails to teach but Caglar teaches;
further comprising signaling information of the optimal neural network from an encoder-side device to a decoder-side device.
“[0147] It may happen that the improvement of the overfitted network with respect to the pretrained network, when measured on the image to be encoded, for example based on Peak-Signal-to-Noise-Ratio (PSNR) or MSE, is not sufficiently high (for example based on a predefined threshold), and the encoder system may decide not to encode any weight-update, and it may optionally signal to the decoder that no weight-update needs to be applied to the pretrained network.”
Here, information of a neural network (whether a weight-update needs to be applied based on the improvement of the overfitted neural network) is being signaled from the encoder side device to the decoder side device. 
In this example, information is being signaled to make sure weight-update processing only occurs when necessary, decreasing computational overhead. 
Jaderberg and Caglar are analogous art to the present invention because they both address jointly training a plurality of neural networks using a plurality of weights and a weighted loss. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the encoder and decoder side devices of Caglar’s disclosure with the selected optimal neural network in Jaderberg’s disclosure to signal optimal neural network weight-updates from the encoder side device to the decoder side device, filtering unnecessary weight updates, when necessary, to decrease computational overhead.  



Claim(s) 10, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Maxwell Elliot Jaderberg et al., (WO 2019101836 A1) (hereinafter Jaderberg) in view of Sheldon Brown et al., (US 20210326700 A1) (hereinafter Brown).


Regarding claim 10, Jaderberg teaches;
The apparatus of claim 4,
(Using the same reasoning for the 102 rejection for claim 4 in view of Jaderberg)
Jaderberg fails to teach but Brown teaches;
wherein the apparatus is further caused to randomly initialize the plurality of neural networks by assigning a value to one or more of the parameters of the plurality of neural networks based on a random or pseudo-random process.
“[Abstract] Optimization of existing neural networks and optimization of newly defined neural networks is provided. The system starts from an existing neural network with a known state or from a set of desired characteristics for a newly defined neural network and creates a first generation of candidate neural networks with random variations of architectural structures and hyperparameters. Fitness functions are established to evaluate the candidate neural networks. Each candidate neural network is trained and operated and then evaluated using the fitness functions. Top performing architectural structures and hyperparameters are identified and used to create a second generation of candidate neural networks that trained, operated and evaluated. The process iteratively continues until an optimized candidate neural network is determined.”
This excerpt discloses randomly initializing a plurality of candidate neural networks which are used to optimize future generations of neural networks based on a fitness function. The random initialization includes selecting random parameters for the plurality of candidate neural networks.
“[0044] As explained above with respect to FIGS. 3A-6B, the operational accuracy of existing neural networks can be improved by iterative mutation and evaluation of the hyperparameter-value pairs of an initial neural network that is already highly functional. Similarly, a new, highly accurate neural network can be created for a particular task by initial selection of random characteristics for an initial candidate neural network followed by iterative mutation and evaluation of the hyperparameter-value pairs of the initial candidate neural network. In one embodiment, the architecture and/or accuracy of neural networks that are used to demonstrate the effectiveness of neural networks can be improved. One particular advantage of the presently disclosed systems and methods is the creation of very high performing neural networks using very minimal manpower where the skilled professional is only needed to specify very high level characteristics of the desired outcomes of application of the neural network. For example, such high level characteristics may include performance criteria such as accuracy of task, computational resources used by the neural network, time to produce a solution, and the computational resource used in the tuning process.”
This excerpt describes that a benefit of randomly initializing the population of candidate neural networks in this process is increasing autonomy of the process (less manpower is required to supervise the process).
Jaderberg and Brown are analogous art to the present invention because they both address optimal neural network selection through the training of a plurality of neural networks. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the random initialization of the plurality of candidate neural networks from Brown’s disclosure with the apparatus performing the process of selecting an optimal neural network from a plurality of candidate neural networks from Jaderberg’s disclosure by randomly initializing the plurality of candidate neural networks used to select the optimal neural network to increase the autonomy of the process. 


Regarding claim 20, Jaderberg teaches;
The method of claim 14 further comprising
(Using the same reasoning for the 102 rejection for claim 14 in view of Jaderberg)
Jaderber fails to teach;
randomly initializing the plurality of neural networks, wherein to random initialize the plurality of neural networks, the method comprises assigning a value to one or more of the parameters of the plurality of neural networks based on a random or pseudo-random process.
Brown teaches;
randomly initializing the plurality of neural networks, wherein to random initialize the plurality of neural networks, the method comprises assigning a value to one or more of the parameters of the plurality of neural networks based on a random or pseudo-random process.
“[Abstract] Optimization of existing neural networks and optimization of newly defined neural networks is provided. The system starts from an existing neural network with a known state or from a set of desired characteristics for a newly defined neural network and creates a first generation of candidate neural networks with random variations of architectural structures and hyperparameters. Fitness functions are established to evaluate the candidate neural networks. Each candidate neural network is trained and operated and then evaluated using the fitness functions. Top performing architectural structures and hyperparameters are identified and used to create a second generation of candidate neural networks that trained, operated and evaluated. The process iteratively continues until an optimized candidate neural network is determined.”
This excerpt discloses randomly initializing a plurality of candidate neural networks which are used to optimize future generations of neural networks based on a fitness function. The random initialization includes selecting random parameters for the plurality of candidate neural networks.
“[0044] As explained above with respect to FIGS. 3A-6B, the operational accuracy of existing neural networks can be improved by iterative mutation and evaluation of the hyperparameter-value pairs of an initial neural network that is already highly functional. Similarly, a new, highly accurate neural network can be created for a particular task by initial selection of random characteristics for an initial candidate neural network followed by iterative mutation and evaluation of the hyperparameter-value pairs of the initial candidate neural network. In one embodiment, the architecture and/or accuracy of neural networks that are used to demonstrate the effectiveness of neural networks can be improved. One particular advantage of the presently disclosed systems and methods is the creation of very high performing neural networks using very minimal manpower where the skilled professional is only needed to specify very high level characteristics of the desired outcomes of application of the neural network. For example, such high level characteristics may include performance criteria such as accuracy of task, computational resources used by the neural network, time to produce a solution, and the computational resource used in the tuning process.”
This excerpt describes that a benefit of randomly initializing the population of candidate neural networks in this process is increasing autonomy of the process (less manpower is required to supervise the process).
Jaderberg and Brown are analogous art to the present invention because they both address optimal neural network selection through the training of a plurality of neural networks. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the random initialization of the plurality of candidate neural networks from Brown’s disclosure with the apparatus performing the process of selecting an optimal neural network from a plurality of candidate neural networks from Jaderberg’s disclosure by randomly initializing the plurality of candidate neural networks used to select the optimal neural network to increase the autonomy of the process. 





Claim(s) 6 – 9, 16 - 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Maxwell Elliot Jaderberg et al., (WO 2019101836 A1) (hereinafter Jaderberg) in view of Mun Churl Kim et al., (US 20220366538 A1) (hereinafter Kim) further in view of Caglar Aytekin et al., (US 20200311551 A1) (hereinafter Caglar).


Regarding claim 6, Jaderberg teaches;
The apparatus of claim 1,
(Using the same reasoning as the 102 rejection for claim one)
Jaderberg fails to teach but Kim teaches;
wherein the apparatus is further caused to jointly overfit a first subset of neural networks of the plurality of neural networks,
	[0107] “Through this, the processor 130 may train the first neural network 210 to be overfitted for the entire video. The processor 130 may train the plurality of second neural networks 230-1, 230-2, . . . , 230-n to be overfitted to the plurality of temporal portions 110-1, 110-2, . . . , 110-n included in the video.”
	This excerpt discloses overfitting a first subset of neural networks of the plurality of neural networks on different portions of video data (plurality of second neural networks). “A first subset of neural networks of the plurality of neural networks” as recited in claim 6 could be all of neural networks in the plurality of neural networks, therefore, a first subset of neural networks and the plurality of neural networks can be used interchangeably.
Jaderberg and Kim are analogous to the present disclosure as they both address training a plurality of neural networks. Jaderberg details a process of jointly training a plurality of neural networks using weights and a weighted loss function (as disclosed in claim 1) while Kim details overfitting a plurality of neural networks on different portions of the input data to specialize each of the neural networks. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to jointly overfit a first subset of the plurality of neural networks from the apparatus of claim 1 on different portions of the input data to specialize each of the neural networks. 
	Jaderberg and Kim fail to teach but Caglar teaches; 
	and wherein to jointly overfit  
[0015] “In another embodiment, a method is provided that includes temporarily overfitting a neural network on a first image of a plurality of images for a first predetermined number of times to generate a first temporarily overfitted neural network.”
The overfitting process for a neural network disclosed by Caglar is performed iteratively until a stopping criterion is met (the stopping criterion here is when the predetermined number of times is reached). 
Caglar does not explicitly disclose performing this overfitting process on the first subset of neural networks.
However, Caglar details a method for overfitting neural networks on a subset of data. Caglar further explains that this method of overfitting allows for neural networks to perform better on specific content, which is beneficial in large datasets where memorization is helpful (see paragraph [0044]). Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to jointly overfit a first subset of neural networks of the plurality of networks (as taught by Jaderberg in view of Kim) using the method of overfitting a neural network taught by Caglar, to allow the first subset of neural networks of the plurality of neural networks to perform better on large datasets.
Caglar is analogous art to the present disclosure as it uses encoded / decoded video data to overfit and update neural networks, and is analogous art to Jaderberg and Kim as it pertains to training methods for a plurality of neural networks.
use a decoded video as input to the plurality of neural networks;
“[0113] The overfitting is a training stage which happens by using data belonging to the current image to be encoded, and it is done over one or more training iterations and epochs. As a general comment, the data used for this overfitting may also be a subset of the data. For example, for images, the data used for overfitting may be part of the image, whereas for videos, the data used for overfitting may be some of the frames or some parts of some frames.”
Discloses using frames of videos as input to the neural networks during overfitting. To use video frames as neural network inputs, the video must have been decoded from an encoded bitstream at some point. Encoded video cannot be directly cannot be directly consumed as frames without decoding. Therefore, Caglar teaches using decoded video as input to the [plurality of, as taught by Jaderberg in view of Kim] neural networks during overfitting. 
compute an output for each of the first subset of neural networks;
compute a loss;
 backpropagate the loss with respect to at least one parameter of the one or more parameters of the first subset of neural networks;
and update the at least one parameter based at least on the computed loss.
“[0116] The overfitting is performed by inputting data to the neural network, getting its output, computing a loss on this output, differentiating the loss with respect to the neural network's weights, and updating the weights according to the computed gradients.”
Caglar teaches computing output for the [first subset of, as taught by Jaderberg in view of Kim] neural networks.
Caglar teaches computing a loss on this output.
Caglar teaches differentiating the loss with respect to the neural networks weights which requires backpropagating the loss through the network. The loss is therefore backpropagated with respect to at least one parameter (weights) of the of the [first subset of, as taught by Jaderberg in view of Kim] neural networks.
Caglar teaches updating the weights based on the computed gradients which are based on the loss. Therefore, the parameter (weights) is updated based on the computed loss. 
	 


Regarding claim 7, Jaderberg teaches;
The apparatus of claim 1,
(Using the same reasoning as the 102 rejection for claim one)
Jaderberg fails to teach but Kim teaches;
wherein the apparatus is further caused to jointly overfit a first subset of neural networks of the plurality of neural networks,
	[0107] “Through this, the processor 130 may train the first neural network 210 to be overfitted for the entire video. The processor 130 may train the plurality of second neural networks 230-1, 230-2, . . . , 230-n to be overfitted to the plurality of temporal portions 110-1, 110-2, . . . , 110-n included in the video.”
	This excerpt discloses overfitting a plurality of neural networks on different portions of video data. “A first subset of neural networks of the plurality of neural networks” as recited in claim 7 could be all of neural networks in the plurality of neural networks, therefore, a first subset of neural networks and the plurality of neural networks can be used interchangeably.
Jaderberg and Kim are analogous to the present disclosure as they both address training a plurality of neural networks. Jaderberg details a process of jointly training a plurality of neural networks using weights and a weighted loss function (as disclosed in claim 1) while Kim details overfitting a plurality of neural networks on different portions of the input data to specialize each of the neural networks. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to jointly overfit a first subset of the plurality of neural networks from the apparatus of claim 1 on different portions of the input data to specialize each of the neural networks. 
	Jaderberg and Kim fail to teach but Caglar teaches;
and wherein to 
[0015] “In another embodiment, a method is provided that includes temporarily overfitting a neural network on a first image of a plurality of images for a first predetermined number of times to generate a first temporarily overfitted neural network.”
The training (overfitting is training) process for a neural network disclosed by Caglar is performed iteratively until a stopping criterion is met (the stopping criterion here is when the predetermined number of times is reached). 
Caglar does not explicitly disclose performing this training (overfitting) process a plurality of neural networks.
However, Caglar details a method for overfitting neural networks on a subset of data. Caglar further explains that this method of overfitting allows for neural networks to perform better on specific content, which is beneficial in large datasets where memorization is helpful (see paragraph [0044]). Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to jointly train (overfit) the plurality of neural networks (as taught by Jaderberg in view of Kim) using the method of overfitting a neural network taught by Caglar, to allow the plurality of neural networks to perform better on large datasets.
Caglar is analogous art to the present disclosure as it uses encoded / decoded video data to overfit and update neural networks, and is analogous art to Jaderberg and Kim as it pertains to training methods for neural networks.
use a decoded video as input to the plurality of neural networks;
“[0113] The overfitting is a training stage which happens by using data belonging to the current image to be encoded, and it is done over one or more training iterations and epochs. As a general comment, the data used for this overfitting may also be a subset of the data. For example, for images, the data used for overfitting may be part of the image, whereas for videos, the data used for overfitting may be some of the frames or some parts of some frames.”
Discloses using frames of videos as input to the neural networks during training (overfitting). To use video frames as neural network inputs, the video must have been decoded from an encoded bitstream at some point. Encoded video cannot be directly cannot be directly consumed as frames without decoding. Therefore, Caglar teaches using decoded video as input to the [plurality of, as taught by Jaderberg in view of Kim] neural networks during training (overfitting). 
compute an output for each of the plurality of neural networks;
compute a loss;
backpropagate the loss with respect to at least one parameter of the one or more parameters of the plurality of neural networks;
and update the at least one parameter based at least on the computed loss.
“[0116] The overfitting is performed by inputting data to the neural network, getting its output, computing a loss on this output, differentiating the loss with respect to the neural network's weights, and updating the weights according to the computed gradients.”
Caglar teaches computing output for the [plurality of, as taught by Jaderberg in view of Kim] neural networks.
Caglar teaches computing a loss on this output.
Caglar teaches differentiating the loss with respect to the neural networks weights which requires backpropagating the loss through the network. The loss is therefore backpropagated with respect to at least one parameter (weights) of the of the [plurality of, as taught by Jaderberg in view of Kim] neural networks.
Caglar teaches updating the weights based on the computed gradients which are based on the loss. Therefore, the parameter (weights) is updated based on the computed loss. 


Regarding Claim 8, Jaderberg in light of Kim and Caglar teaches;
The apparatus of claim 6, 
Jaderberg further teaches; 
wherein the loss is the weighted loss, 
	“[pg.12, ln.28 - pg.13, ln.4] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function). The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters. ” 
A loss (step function which uses a loss function) is calculated based on a plurality of neural networks during the iterative training process. 
“[pg.15, ln.16-21] In some implementations, the system 100 includes auxiliary losses in the loss function to regularize or otherwise bias the solutions found, or to shape learning dynamics to speed up training. These auxiliary losses can be included in the system 100 without spending a long time tuning weight schedules by hand. That is, weights between different terms in the loss function can be automatically adjusted during the meta-optimization process.”
The aforementioned loss function is a weighted loss function (weights between different terms in the loss function). 
and wherein the weighted loss is computed based at least on the plurality of weights,
	“[pg.2, ln.22-25] Generally, the network parameters are values that impact the operations performed by the neural network and that are adjusted as part of the iterative training process. For example, the network parameters can include values of weight matrices and, in some cases, bias vectors, of the layers of the neural network.”
	This excerpt describes a plurality of weights (weight matrices from the network parameters) for a neural network which are adjusted during the training process.
“[pg.12, ln.28 - pg.13, ln.4] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function). The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters. ” 
The aforementioned weighted loss function (in the step function) uses the plurality of weights (step function receives as input network parameters which includes weight matrices) in its calculation. The plurality of weights is therefore used to calculate the weighted loss.
and wherein each of the plurality of weights is computed based at least on a performance of the plurality of neural networks on the one or more training samples
“[pg.12, ln.28 - pg.13, ln.4] The iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N. In some implementations, the iterative training process optimizes the network parameters 125A-N for the population of candidate neural networks 120A-N in an iterative manner by using a step function (e.g., stochastic gradient descent on some loss function). The step function receives as input the maintained hyperparameters and network parameters for the candidate neural network, and returns the candidate neural network with updated network parameters, in accordance with the maintained hyperparameters.” 
The iterative training process computes (optimizing the network parameters, including the weight matrices, indicates that they are recomputed iteratively) the plurality of weights based on a performance of the plurality of neural networks (optimizing for the plurality of neural networks indicates that the network parameters are recomputed based on the performance of the plurality of neural networks). Additionally, the step function is responsible for computing (updating) the plurality of weights (network parameters), and takes one or more training samples (maintained hyperparameters and network parameters) as input. Therefore, the plurality of weights is computed based at least on a performance of the plurality of neural networks on the one or more training samples. 
Jaderberg discloses an iterative training process for the plurality of neural networks using a weighted loss function (which uses the plurality of weights computed based on a performance of the plurality of neural networks on one or more training samples) to regularize or bias the solutions, or to speed up training [see pg.15, ln.16-21]. Additionally, the overfitting of claim 6 is performed using a similar iterative training process (which also uses a loss function) for the plurality of neural networks. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the weighted loss function described in Jaderberg as the loss function in the overfitting process described by claim 6 (as taught by Cagler) to regularize solutions or improve efficiency.


Regarding Claim 9, Jaderberg in light of Kim and Caglar teaches;
The apparatus of claim 6, 
Caglar further teaches;
wherein the apparatus is further caused to:
compute a weight-update for 
compress the weight-update for the 
“[Abstract] A method, apparatus, and computer program product are provided for training a neural network or providing a pre-trained neural network with the weight-updates being compressible using at least a weight-update compression loss function and/or task loss function. The weight-update compression loss function can comprise a weight-update vector defined as a latest weight vector minus an initial weight vector before training.”
Discloses a weight-update (vector) being computed (latest weight vector minus initial weight vector) for a neural network.
“[0158] In some embodiments, the method can further include compressing the weight-updates by pruning small-valued weight-updates.”
Discloses compressing the weight-updates for the neural network.
Caglar does not explicitly teach performing these operations for each neural network of the first subset of neural networks, but it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform the operations described by Caglar for each neural network of the first subset of neural networks (as taught by Jaderberg in view of Kim). Further information as to why this would be obvious is explained at the end of the response to this claim. 
and signal the compressed weight-update for the 
“[0148] In some embodiments, the encoder system may skip the overfitting process if the quality (for example as measured by PSNR or MSE) achieved by the pretrained network is sufficiently high (for example with respect to a predefined threshold). Then, the encoder may not include any weight update into the encoded bitstream, and it may optionally signal to the decoder that there is no need to update the pretrained neural network, for example by including a flag into the bitstream.” [0147] [0149] also describe signaling weight-updates to the decoder-side device.
Indicates that weight updates for the neural network are signaled to the decoder via a bitstream. 
Caglar does not explicitly recite performing this operation for each neural network of the first subset of neural networks. However, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform this operation for each neural network of the first subset of neural networks (as taught by Jaderberg in view of Kim). Further information as to why this would be obvious is explained at the end of the response to this claim.
wherein the decoder-side device decompresses the compressed weight-update, uses the decompressed weight-update for updating the 
“[0149] At the decoder-side, the weight-update may be decoded (e.g., entropy-decoded if it was entropy-encoded at encoder side). Other reconstruction steps may be needed, for example if the zeros were omitted and some signaling has been used for indicating where the zeros should be re-inserted into the reconstructed weight-update. Also, cluster labels may be assigned the corresponding centroid values in order to reconstruct the quantized non-zero values.”
The decoder side device decompresses (decodes) the compressed weight-updates of the neural network.
“[0150] Once the weight-update has been decoded and reconstructed, it is then applied to the corresponding pretrained network. The application may comprise adding the weight-update vector to the weight vector of the pretrained network.”
The decoded weight-update is then used to update (adding the weight-update vector to the weight vector) its respective neural network.
Caglar does not explicitly recite performing this operation for each neural network of the first subset of neural networks. However, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform this operation for each neural network of the first subset of neural networks (as taught by Jaderberg in view of Kim). Further information as to why this would be obvious is explained at the end of the response to this claim.
and uses the updated 
“[0097] Many embodiments described herein relate generally to data compression using neural networks. As such, the disclosure will mainly focus on embodiments and approaches for image and video compression, but various embodiments of the disclosed approach can be used with any other data type or combination thereof.”, “[0099] In some embodiments, it can be assumed and/or determined that one or more neural networks can be/are used at least at the decoder side, for example within the decoding loop or during post-processing or as a neural network performing most or all the decoding process.”, “[0100] One example of in-loop neural network is a neural network performing intra-prediction. Another example is a neural network performing filtering of the output of intra or inter prediction. One example of post-processing is to enhance an image or frames which has been reconstructed by a decoding process. Another example is a neural network performing most or the whole decoding process such as in neural auto-encoders.”
The present embodiment relates to video compression. It can be determined that all of the plurality of neural networks can be used at the decoder side for post-processing, which includes the updated neural networks. An example for a post processing task is enhancing frames of a decoded video. Therefore, the updated neural networks are used for post-processing a decoded video.
The above operations are performed to minimize processing overhead by only transmitting weight updates when they meaningfully improve quality, and by compressing, signaling, reconstructing, and applying those updates at the decoder in a controlled way. 
Caglar does not explicitly recite performing these operations for each neural network of the first subset of neural networks. However, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform each of these operations for each neural network of the first subset of neural networks (as taught by Jaderberg in view of Kim) to minimize processing overhead of applying weight updates to the neural networks. 


Claim 16 is a method claim which corresponds directly to the apparatus of claim 6. It is therefore rejected using the same reasoning as for claim 6. 

Claim 17 is a method claim which corresponds directly to the apparatus of claim 7. It is therefore rejected using the same reasoning as for claim 7. 

Claim 18 is a method claim which corresponds directly to the apparatus of claim 8. It is therefore rejected using the same reasoning as for claim 8. 

Claim 19 is a method claim which corresponds directly to the apparatus of claim 6. It is therefore rejected using the same reasoning as for claim 9.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Matthew Alan Cady whose telephone number is (571) 272-7229. The examiner can normally be reached Monday - Friday, 7:30 am - 5:00 pm ET. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached on (571)272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) 
at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/MATTHEW ALAN CADY/ Examiner, Art Unit 2145 


/CESAR B PAULA/               Supervisory Patent Examiner, Art Unit 2145
Read full office action
APPARATUS AND METHOD FOR JOINT TRAINING OF MULTIPLE NEURAL NETWORKS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

APPARATUS AND METHOD FOR JOINT TRAINING OF MULTIPLE NEURAL NETWORKS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email