DETAILED ACTION
This action is in response to the application filed 09/20/2022. Claims 26-47 are pending and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities:
Page 3, paragraph 2: The formula
PNG
media_image1.png
67
655
media_image1.png
Greyscale
is fuzzy and very difficult to read.
Pages 4, 9, 16-17, 20-22, 25, and 29-30 each have fuzzy formulas that are very difficult to read.
Appropriate correction is required.
Claim Objections
Claims 28, 30-33, and 40 are objected to because of the following informalities:
Formula
PNG
media_image2.png
36
284
media_image2.png
Greyscale
and terms
PNG
media_image3.png
36
45
media_image3.png
Greyscale
,
PNG
media_image4.png
24
35
media_image4.png
Greyscale
,
PNG
media_image5.png
27
37
media_image5.png
Greyscale
, and
PNG
media_image6.png
29
106
media_image6.png
Greyscale
are fuzzy and very difficult to read. Claims 30-33, 40, similarly utilize fuzzy formulas and / or terms. Appropriate correction is required.
In claim 32: “representing A total number of filters” should be “representing a total number of filters”.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 33 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 33 recites a process “c) random-based initializing by”. Process c is indented as if it’s part of initialization method b, the random-based initialization characterized by an orthonormal span of the dictionary, but it’s unclear whether this is the intention or not. Thus, the scope of the claim is rendered indefinite. Initialization method c is interpreted as an alternate method to b, not part of it.
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 30 and 40 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 30 and 40 recite “a standard basis B, according to
PNG
media_image7.png
58
502
media_image7.png
Greyscale
wherein
e
(
n
)
characterizes an n-th unit vector associated with the standard basis B, and K is a positive integer representing a special size of the at least one filter”. While the instant specification discloses a standard basis comprised of
K
2
filters (published application, paragraph [0009]) and characterizing the dimensions of each dictionary filter with K1 and K2 (Id. at [0004]), it does not disclose a standard basis with a number of unit vectors equal to a squared spatial value of a network filter, as is recited by claims 30 and 40. Thus, this is considered new matter, and the claims are rejected on this basis.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 26-47 are rejected under 35 U.S.C. 101 because the claimed inventions are directed to non-statutory subject matter without significantly more.
Claim 26
Step 1: The claim recites “A computer-implemented method”, and is therefore directed to the statutory category of process
Step 2A Prong 1: The claim recites the following judicial exception(s)
representing at least one filter of the neural network based on at least one filter dictionary: This can be performed as a mental process. One can merely imagine a set of filters of the network as a filter dictionary.
processing input data and/or data derived from input data, using the at least one filter: This can be performed as a mental process. One can merely observe input data.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
A computer-implemented method: This is mere instruction to apply judicial exceptions with a generic computing system (MPEP 2106.05(f)).
for processing data associated with an artificial deep neural network: This merely links the judicial exceptions to a particular field of use (artificial neural networks) (MPEP 2106.05(h)).
processing input data and/or data derived from input data, using the at least one filter: This is mere instruction to apply a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
A computer-implemented method: This is mere instruction to apply judicial exceptions with a generic computing system (MPEP 2106.05(f)).
for processing data associated with an artificial deep neural network: This merely links the judicial exceptions to a particular field of use (artificial neural networks) (MPEP 2106.05(h)).
processing input data and/or data derived from input data, using the at least one filter: This is mere instruction to apply a judicial exception with a generic computing component (MPEP 2106.05(f)).
Claim 27
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites no further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
wherein the artificial deep neural network is a convolutional neural network: This merely links the judicial exceptions to a particular field of use (convolutional neural networks) (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
wherein the artificial deep neural network is a convolutional neural network: This merely links the judicial exceptions to a particular field of use (convolutional neural networks) (MPEP 2106.05(f)).
Claim 28
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the at least one filter dictionary at least partially characterizes a linear space, … , wherein span{F} characterizes the linear space that the at least one filter dictionary at least partially characterizes: Representing network filters based on a filter dictionary can still be performed as a mental process. One can merely imagine a set of vectors corresponding to each of the network filters in a dictionary set. The span of this set, as any set of vectors, characterizes the linear space spanned by the vectors.
wherein the at least one filter dictionary is characterized by
PNG
media_image8.png
39
348
media_image8.png
Greyscale
, wherein
g
(
i
)
characterizes an i-th filter of the at least one filter dictionary, where i =1, .. , N, wherein K1 characterizes a size of the filters of the at least one filter dictionary in a first dimension, wherein K2 characterizes a size of the filters of the at least one filter dictionary in a second dimension: Representing network filters based on a filter dictionary can still be performed as a mental process. One can merely imagine a dictionary as a set of network filters, each represented with a K1 X K2 length vector.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 29
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
a) the at least one filter dictionary does not completely span a space: Representing network filters based on a filter dictionary can still be performed as a mental process, one need only imagine a dictionary with fewer filters than the dimensional length of each vector.
b) at least some elements of the at least one filter dictionary are linearly dependent on one another and the at least one filter dictionary is overcomplete: Representing network filters based on a filter dictionary can still be performed as a mental process, one need only imagine a dictionary with more filters than the dimensional length of each vector.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 30
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the at least one filter dictionary is different from a standard basis B, according to
PNG
media_image7.png
58
502
media_image7.png
Greyscale
wherein
e
(
n
)
characterizes an n-th unit vector associated with the standard basis B, and K is a positive integer representing a spatial size of the at least one filter: Representing network filters based on a filter dictionary can still be performed as a mental process, one need only imagine a dictionary with a fewer or greater number of filters than the dimensional length of each vector.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 31
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the representing of the at least one filter of the neural network based on the at least one filter dictionary is characterized by the following equation and/or is performed based on the following equation:
PNG
media_image9.png
53
280
media_image9.png
Greyscale
, wherein h characterizes the at least one filter, wherein
g
(
n
)
characterizes an n-th filter of the at least one filter dictionary, wherein
λ
n
characterizes a coefficient associated with the n-th filter of the at least one filter dictionary, and wherein n is an index variable that characterizes one of N filters of the at least one filter dictionary: This recites a mathematical concept in the form of an equation.
wherein representing of a plurality of filters
h
(
α
,
β
)
, associated with a layer of the neural network, based on the at least one filter dictionary, is characterized by the following equation and/or is performed based on the following equation:
PNG
media_image10.png
83
582
media_image10.png
Greyscale
, wherein
α
characterizes an index variable associated with a number of output channels of the layer, wherein
β
characterizes an index variable associated with a number of input channels of the layer, wherein
λ
n
(
α
,
β
)
characterizes a coefficient, associated with the n-th filter of the at least one filter dictionary, for the output channel
α
and the input channel
β
of the layer: This recites a mathematical concept in the form of an equation.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 32
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
the processing of the input data and/or the data derived from the input data by using the at least one filter is characterized by the following equation and/or is performed based on the following equation:
PNG
media_image11.png
154
716
media_image11.png
Greyscale
, wherein X characterizes the input data or the data derived from the input data, including an input feature map for a layer of the neural network, wherein
α
characterizes an index variable associated with a number of output channels of the layer, wherein
β
characterizes an index variable associated with a number of input channels of the layer, wherein
λ
n
(
α
,
β
)
characterizes a coefficient, associated with the n-th filter of the at least one filter dictionary, for the output channel
α
and the input channel
β
of the layer, wherein
c
i
n
characterizes a number of the input channels of the layer, and wherein * characterizes a convolution operation, wherein N is a positive integer representing A total number of filters of the at least one filter dictionary, and
g
(
n
)
represents the n-th filter of the at least one filter dictionary: This recites a mathematical concept in the form of an equation.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 33
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
initializing the at least one filter dictionary prior to the representing and/or the processing; wherein the initializing includes at least one of the following alternative methods:
a) random-based initializing by assigning random numbers or pseudorandom numbers to at least some filter coefficients
g
i
,
j
(
n
)
of at least some filters of the at least one filter dictionary: This can be performed as a mental process. One can initially imagine filters of random values in the dictionary.
b) random-based initializing such that a linear space span{F} that is characterized by the at least one filter dictionary is spanned by an orthonormal basis, including:
b1) initializing at least some filter coefficients
g
i
,
j
(
n
)
of at least some filters of the at least one filter dictionary with independently equally distributed filter coefficient values: This can be performed as a mental process. One can initially imagine filters as linearly independent unit vectors, such as ([0, 1], [1, 0]).
b2) applying a Gram-Schmidt orthogonalization method to the elements or filters of the at least one filter dictionary: This recites a mathematical calculation, namely applying Gram-Schmidt orthogonalization to a set of vectors.
c) random-based initializing, including:
c1) initializing at least some filter coefficients
g
i
,
j
(
n
)
of at least some filters of the at least one filter dictionary with independently equally distributed filter coefficient values This can be performed as a mental process. One can initially imagine filters as linearly independent unit vectors, such as ([0, 1], [1, 0]).
c2) rescaling the at least one filter dictionary based on at least one statistical quantity, for example a mean and/or a standard deviation: This recites a mathematical calculation, namely scaling a dictionary by some statistical value.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 34
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
initializing coefficients of at least some filters of the at least one filter dictionary, including at least one of the following:
a) random-based or pseudorandom-based initializing of the coefficients: This can be performed as a mental process. One can merely imagine a set of coefficients for the filters, each assigned random value(s).
b) initializing the coefficients based on the at least one filter dictionary: This can be performed as a mental process. One can merely imagine a set of coefficients, each corresponding to a filter of the filter dictionary.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 35
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
reducing at least one component of the at least one filter dictionary, wherein the reducing includes at least one of the following:
a) reducing at least one filter of the at least one filter dictionary by zeroing at least one filter coefficient of the at least one filter of the at least one filter dictionary: This can be performed as a mental process. One can merely imagine a set of coefficients for each filter of the dictionary, and set some to zero.
b) removing or deleting at least one filter of the at least one filter dictionary: This can be performed as a mental process. One can merely remove a filter from their mental filter dictionary.
c) removing or deleting at least one coefficient associated with the at least one filter dictionary: This can be performed as a mental process. One can merely imagine a set of coefficients for each filter of the dictionary, then remove some of them.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 36
Step 1: The claim recites a process, as in claim 35
Step 2A Prong 1: The claim recites the following further judicial exception(s)
further comprising at least one of the following: a) performing the reducing after an initializing of the at least one filter dictionary, b) performing the reducing after an initializing of coefficients of at least some filters of the at least one filter dictionary, c) performing the reducing during a training of the neural network, d) performing the reducing after the training of the neural network: Performing the reducing can still be performed as a mental process, regardless of when it’s performed.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 37
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites no further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
further comprising at least one of the following: a) using the at least one filter dictionary for a plurality of layers of the neural network, b) using the at least one filter dictionary for a plurality of layers of the neural network that are associated with a same spatial size of data to be processed, c) using the at least one filter dictionary for a respective residual block, the neural network being a residual neural network, d) using the at least one filter dictionary for a layer of the neural network: This is mere instruction to apply a judicial exception to a neural network in a generic manner (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
further comprising at least one of the following: a) using the at least one filter dictionary for a plurality of layers of the neural network, b) using the at least one filter dictionary for a plurality of layers of the neural network that are associated with a same spatial size of data to be processed, c) using the at least one filter dictionary for a respective residual block, the neural network being a residual neural network, d) using the at least one filter dictionary for a layer of the neural network: This is mere instruction to apply a judicial exception to a neural network in a generic manner (MPEP 2106.05(f)).
Claim 38
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
training the neural network based on training data, wherein a trained neural network is obtained: This is mere instruction to train a network in a generic manner (MPEP 2106.05(f)).
using the trained neural network for the processing of the input data: This is mere instruction to use a trained network to process input data in a generic manner (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
training the neural network based on training data, wherein a trained neural network is obtained: This is mere instruction to train a network in a generic manner (MPEP 2106.05(f)).
using the trained neural network for the processing of the input data: This is mere instruction to use a trained network to process input data in a generic manner (MPEP 2106.05(f)).
Claim 39
Step 1: The claim recites “A computer-implemented method”, and is therefore directed to the statutory category of process
Step 2A Prong 1: The claim recites the following judicial exception(s)
wherein at least one filter of the neural network is represented based on at least one filter dictionary: This can be performed mentally. One can merely imagine a filter dictionary that contains an entry for each filter of the network.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
A computer-implemented method: This is mere instruction to apply judicial exceptions with a generic computing system (MPEP 2106.05(f)).
for training an artificial deep neural network: This merely links the judicial exceptions to a particular field of use (artificial neural networks) (MPEP 2106.05(h)).
training at least one component of the at least one filter dictionary, wherein the training of the at least one component of the at least one filter dictionary is performed at least temporarily simultaneously and/or together with a training of at least one other component of the neural network: This is mere instruction to train a filter dictionary in a generic manner (MPEP 2106.05(f)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
A computer-implemented method: This is mere instruction to apply judicial exceptions with a generic computing system (MPEP 2106.05(f)).
for training an artificial deep neural network: This merely links the judicial exceptions to a particular field of use (artificial neural networks) (MPEP 2106.05(h)).
training at least one component of the at least one filter dictionary, wherein the training of the at least one component of the at least one filter dictionary is performed at least temporarily simultaneously and/or together with a training of at least one other component of the neural network: This is mere instruction to train a filter dictionary in a generic manner (MPEP 2106.05(f)).
Claim 40
Step 1: The claim recites a process, as in claim 39
Step 2A Prong 1: The claim recites the following further judicial exception(s)
providing a filter dictionary characterizing a standard basis, wherein the standard basis is characterized according to
PNG
media_image12.png
50
493
media_image12.png
Greyscale
wherein
e
(
n
)
characterizes an n-th unit vector associated with the standard basis B, and K ius a positive integer representing a spatial size of the at least one filter: The filter dictionary can still be imagined mentally. One can merely replace each filter of the dictionary with its standard basis equivalent for its dimensions.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
changing the filter dictionary, characterizing the standard basis, based on the training: This is mere instruction to alter the filter dictionary based on the training in a generic manner (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
changing the filter dictionary, characterizing the standard basis, based on the training: This is mere instruction to alter the filter dictionary based on the training in a generic manner (MPEP 2106.05(f)).
Claim 41
Step 1: The claim recites a process, as in claim 39
Step 2A Prong 1: The claim recites the following further judicial exception(s)
providing a filter dictionary not characterizing a standard basis: This can be performed as a mental process. One need only imagine a dictionary with a fewer or greater number of filters than the dimensional length of each vector.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
changing the filter dictionary not characterizing a standard basis, based on the training: This is mere instruction to alter the filter dictionary based on the training in a generic manner (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
changing the filter dictionary not characterizing a standard basis, based on the training: This is mere instruction to alter the filter dictionary based on the training in a generic manner (MPEP 2106.05(f)).
Claim 42
Step 1: The claim recites a process, as in claim 39
Step 2A Prong 1: The claim recites the following further judicial exception(s)
performing a reducing on the pre-trained neural network
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
providing a pre-trained neural network: This is mere data gathering and is insignificant extra-solution activity (MPEP 2106.05(g)).
performing a first training for the neural network: This is mere instruction to generically train a network (MPEP 2106.05(f)).
performing a further training: This is mere instruction to generically train a network (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
providing a pre-trained neural network: This is an instance of retrieving data from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
performing a first training for the neural network: This is mere instruction to generically train a network (MPEP 2106.05(f)).
performing a further training: This is mere instruction to generically train a network (MPEP 2106.05(f)).
Claim 43
Step 1: The claim recites a process, as in claim 39
Step 2A Prong 1: The claim recites no further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
wherein the training includes: training the at least one filter dictionary together with at least one coefficient associated with the at least one filter dictionary: This is mere instruction to train a filter dictionary and its coefficient(s) in a generic manner (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
wherein the training includes: training the at least one filter dictionary together with at least one coefficient associated with the at least one filter dictionary: This is mere instruction to train a filter dictionary and its coefficient(s) in a generic manner (MPEP 2106.05(f)).
Claim 44
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
wherein the processing of the input data includes at least one of the following: a) processing one- and/or multi-dimensional data, b) processing image data, c) processing audio data, the audio data including voice data and/or operating noises from technical equipment or systems, d) processing video data or parts of video data, 8 e) processing sensor data; and wherein the processing of the input data includes a classification of the input data: This merely links the judicial exceptions to a particular field of use (image, audio, video, or sensor data classification) (MPEP 2106.05(h)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
wherein the processing of the input data includes at least one of the following: a) processing one- and/or multi-dimensional data, b) processing image data, c) processing audio data, the audio data including voice data and/or operating noises from technical equipment or systems, d) processing video data or parts of video data, 8 e) processing sensor data; and wherein the processing of the input data includes a classification of the input data: This merely links the judicial exceptions to a particular field of use (image, audio, video, or sensor data classification) (MPEP 2106.05(h)).
Claim 45
Step 1: The claim recites a process, as in claim 44
Step 2A Prong 1: The claim recites the following judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
using output data obtained based on the processing of the input data to control and/or regulate at least one component of a technical system: This is mere instruction to control / regulate a technical system in a generic manner (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
using output data obtained based on the processing of the input data to control and/or regulate at least one component of a technical system: This is mere instruction to control / regulate a technical system in a generic manner (MPEP 2106.05(f)).
Claim 46
Step 1: The claim recites a process, as in claim 26
Step 2A Prong 1: The claim recites the following further judicial exception(s)
further comprising at least one of the following elements:
a) initializing the at least one filter dictionary: This can be performed as a mental process. One can merely begin to imagine the filter dictionary.
b) initializing coefficients associated with the at least one filter dictionary: This can be performed as a mental process. One can merely imagine a set of coefficients associated with the filter dictionary.
c) reducing at least one component of the at least one filter dictionary: This can be performed as a mental process. One can merely cease imagining an element and / or coefficient of the filter dictionary.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
d) training the at least one filter dictionary together with at least one further component of the neural network based on a stochastic, gradient-based optimization method: This is mere instruction to train the filter dictionary and a component of the neural network in a generic manner (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
d) training the at least one filter dictionary together with at least one further component of the neural network based on a stochastic, gradient-based optimization method: This is mere instruction to train the filter dictionary and a component of the neural network in a generic manner (MPEP 2106.05(f)).
Claim 47
Step 1: The claim recites “An apparatus”, and is therefore directed to the statutory category of machine
Step 2A Prong 1: The claim recites the following judicial exception(s)
represent at least one filter of the neural network based on at least one filter dictionary: This can be performed as a mental process. One can merely imagine a set of filters of the network as a filter dictionary.
process data associated with an artificial deep neural network: This can be performed as a mental process. One can merely observe data associated with a DNN.
process input data and/or data derived from input data, using the at least one filter: This can be performed as a mental process. One can merely observe input data.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
An apparatus configured to: This is mere instruction to execute judicial exceptions with generic computing hardware (MPEP 2106.05(f)).
process input data and/or data derived from input data, using the at least one filter: This is mere instruction to apply a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
An apparatus configured to: This is mere instruction to execute judicial exceptions with generic computing hardware (MPEP 2106.05(f)).
process input data and/or data derived from input data, using the at least one filter: This is mere instruction to apply a judicial exception with a generic computing component (MPEP 2106.05(f)).
Claim 48
Step 1: The claim recites “A non-transitory computer-readable storage medium”, and is therefore directed to the statutory category of article of manufacture
Step 2A Prong 1: The claim recites the following judicial exception(s)
representing at least one filter of the neural network based on at least one filter dictionary: This can be performed as a mental process. One can merely imagine a set of filters of the network as a filter dictionary.
processing input data and/or data derived from input data, using the at least one filter: This is mere instruction to apply a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
non-transitory computer-readable storage medium on which are stored instructions for processing data associated with an artificial deep neural network: This merely links the judicial exceptions to a particular field of use (artificial neural networks) (MPEP 2106.05(h)).
the instructions, when executed by a computer, causing the computer to perform the following steps: This is mere instruction to execute the judicial exceptions with generic computing hardware (MPEP 2106.05(f)).
processing input data and/or data derived from input data, using the at least one filter: This is mere instruction to apply a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
non-transitory computer-readable storage medium on which are stored instructions for processing data associated with an artificial deep neural network: This merely links the judicial exceptions to a particular field of use (artificial neural networks) (MPEP 2106.05(h)).
the instructions, when executed by a computer, causing the computer to perform the following steps: This is mere instruction to execute the judicial exceptions with generic computing hardware (MPEP 2106.05(f)).
processing input data and/or data derived from input data, using the at least one filter: This is mere instruction to apply a judicial exception with a generic computing component (MPEP 2106.05(f)).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 26-28, 31-32, 37-39, and 42-44 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Li et al. (“Learning Filter Basis for Convolutional Neural Network Compression”, published 12/23/2019, arXiv:1908.08932v2), hereafter referred to as ‘Li’.
Regarding claim 26, Li teaches [a] computer-implemented method, for processing data associated with an artificial deep neural network, comprising:
representing at least one filter of the neural network based on at least one filter dictionary: “in this paper, we try to reduce the number of parameters of CNNs (artificial deep neural network[s]) by learning a basis of the filters in convolutional layers” (Li, page 1, Abstract); “Each 3D filter (filter of the neural network)
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
" (Li, page 3, left column, paragraph 5).
processing input data and/or data derived from input data, using the at least one filter: “we utilize linear combination of filter basis to reconstruct the 3D filter
W
i
=
Σ
j
=
1
m
a
j
,
i
B
j
… Thus, the convolution between the input feature map x and the 3D kernel becomes Thus, the convolution between the input feature map x and the 3D kernel becomes
PNG
media_image13.png
129
704
media_image13.png
Greyscale
” (Li, page 3, left column, paragraph 4)
Regarding claim 27, the rejection of claim 1 in view of Li is incorporated. Li further discloses a method, wherein the artificial deep neural network is a convolutional neural network: “in this paper, we try to reduce the number of parameters of CNNs by learning a basis of the filters in convolutional layers … We validate our proposed solution for multiple CNN architectures” (Li, page 1, Abstract).
Regarding claim 28, the rejection of claim 1 in view of Li is incorporated. Li further discloses a method, wherein
the at least one filter dictionary at least partially characterizes a linear space, … span{F} characterizes the linear space that the at least one filter dictionary at least partially characterizes: “Each 3D filter
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(filter dictionary / F) with the coding coefficient vector
A
i
∈
R
m
×
1
" (Li, page 3, left column, paragraph 5). Li’s filter dictionary F is a set of filter bases. As one of ordinary skill in the art would know, each basis comprises a set of vectors, and the span of all these vectors characterizes the linear space formed by all possible linear combinations of the vectors. Thus, the span of Li’s dictionary (F) characterizes a linear space.
the at least one filter dictionary is characterized by
PNG
media_image8.png
39
348
media_image8.png
Greyscale
, wherein
g
(
i
)
characterizes an i-th filter of the at least one filter dictionary, where i =1, .. , N:
“Each 3D filter
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set (filter dictionary) of m filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(
g
(
i
)
characterizes an i-th filter of the at least one filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
" (Li, page 3, left column, paragraph 5). The set of all filter bases is a filter dictionary, where
B
j
characterizes the jth filter basis.
“we utilize linear combination of filter basis to reconstruct the 3D filter
W
i
=
Σ
j
=
1
m
a
j
,
i
B
j
” (Li, page 3, left column, paragraph 4). The filter bases of the dictionary are used to construct 3D filters for the CNN.
K1 characterizes a size of the filters of the at least one filter dictionary in a first dimension, … K2 characterizes a size of the filters of the at least one filter dictionary in a second dimension: “Each 3D filter
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m filter basis
{
B
j
|
j
=
1
,
…
,
m
}
with the coding coefficient vector
A
i
∈
R
m
×
1
:
PNG
media_image14.png
104
407
media_image14.png
Greyscale
where
A
i
is the i-th column of A,
B
j
is the j-th filter basis (filter of the at least one filter dictionary) with dimension
c
w
h
K
1
×
1
(
K
2
)
or
w
h
K
1
×
1
(
K
2
)
for the 3D filter-wise decomposition and 2D channel-wise decomposition cases, respectively." (Li, page 3, left column, paragraph 5).
Regarding claim 31, the rejection of claim 26 in view of Li is incorporated. Li further discloses a method,
wherein the representing of the at least one filter of the neural network based on the at least one filter dictionary is characterized by the following equation and/or is performed based on the following equation:
PNG
media_image9.png
53
280
media_image9.png
Greyscale
, wherein h characterizes the at least one filter, wherein
g
(
n
)
characterizes an n-th filter of the at least one filter dictionary, wherein
λ
n
characterizes a coefficient associated with the n-th filter of the at least one filter dictionary, and wherein n is an index variable that characterizes one of N filters of the at least one filter dictionary: “Each 3D filter
W
i
∈
R
c
w
h
×
1
(h) (or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m (n) filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
:
PNG
media_image14.png
104
407
media_image14.png
Greyscale
where
A
i
is the i-th column of A,
B
j
is the j-th filter basis (
g
(
n
)
) with dimension
c
w
h
×
1
or
w
h
×
1
for the 3D filter-wise decomposition and 2D channel-wise decomposition cases, respectively." (Li, page 3, left column, paragraph 5).
a
j
,
i
, the jth entry of
A
i
, is a coefficient corresponding to a jth basis (n-th filter), and is equivalent to
λ
n
.
wherein representing of a plurality of filters
h
(
α
,
β
)
, associated with a layer of the neural network, based on the at least one filter dictionary, is characterized by the following equation and/or is performed based on the following equation:
PNG
media_image10.png
83
582
media_image10.png
Greyscale
, wherein
α
characterizes an index variable associated with a number of output channels of the layer, wherein
β
characterizes an index variable associated with a number of input channels of the layer, wherein
λ
n
(
α
,
β
)
characterizes a coefficient, associated with the n-th filter of the at least one filter dictionary, for the output channel
α
and the input channel
β
of the layer:
“We assume that a convolution layer has c input channels (
β
) and n output channels (
α
), and the kernel size is w x h” (Li, page 3, left column, paragraph 4)
PNG
media_image15.png
614
944
media_image15.png
Greyscale
”Comparison of different filter decomposition methods. Right: each channel of the 3D filter is considered as a basic element. A unique set of basis is learned for the n 2D filters in each channel” (Li, page 3, Figure 1). In channel-wise decomposition, a unique basis is learned for each 2D filter, across input (c) and output (n) channels, for a total of n x c bases.
"
PNG
media_image14.png
104
407
media_image14.png
Greyscale
where
A
i
is the i-th column of A,
B
j
is the j-th filter basis with dimension
c
w
h
×
1
or
w
h
×
1
for the 3D filter-wise decomposition and 2D channel-wise decomposition cases, respectively" (Li, page 3, left column, paragraph 5). With channel-wise decomposition, each basis corresponds to a particular input and output channel.
a
j
,
i
corresponds to some particular basis
B
j
, and thus corresponds to a particular input (
β
) and a particular output (
α
) channel.
a
j
,
i
is thus equivalent to
λ
n
(
α
,
β
)
.
Regarding claim 32, the rejection of claim 26 in view of Li is incorporated. Li further discloses a method, wherein the processing of the input data and/or the data derived from the input data by using the at least one filter is characterized by the following equation and/or is performed based on the following equation:
PNG
media_image11.png
154
716
media_image11.png
Greyscale
, wherein X characterizes the input data or the data derived from the input data, including an
input feature map for a layer of the neural network, wherein
α
characterizes an index variable
associated with a number of output channels of the layer, wherein
β
characterizes an index
variable associated with a number of input channels of the layer, wherein
λ
n
(
α
,
β
)
characterizes a
coefficient, associated with the n-th filter of the at least one filter dictionary, for the output
channel
α
and the input channel
β
of the layer, wherein
c
i
n
characterizes a number of the input
channels of the layer, wherein * characterizes a convolution operation, wherein N is a positive integer representing [a] total number of filters of the at least one filter dictionary, and
g
(
n
)
represents the n-th filter of the at least one filter dictionary:
“Each 3D filter
W
i
∈
R
c
w
h
×
1
(h) (or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m (n) filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
:
PNG
media_image14.png
104
407
media_image14.png
Greyscale
where
A
i
is the i-th column of A,
B
j
is the j-th filter basis (
g
(
n
)
) with dimension
c
w
h
×
1
or
w
h
×
1
for the 3D filter-wise decomposition and 2D channel-wise decomposition cases, respectively." (Li, page 3, left column, paragraph 5).
a
j
,
i
, the jth entry of
A
i
, is a coefficient corresponding to a jth basis (n-th filter), and is equivalent to a coefficient
λ
n
.
“We assume that a convolution layer has c input channels (
β
) and n output channels (
α
), and the kernel size is w x h” (Li, page 3, left column, paragraph 4)
PNG
media_image15.png
614
944
media_image15.png
Greyscale
”Comparison of different filter decomposition methods. Right: each channel of the 3D filter is considered as a basic element. A unique set of basis is learned for the n 2D filters in each channel” (Li, page 3, Figure 1). In channel-wise decomposition, a unique basis is learned for each 2D filter, across input (c) and output (n) channels, for a total of n x c bases. With channel-wise decomposition, each basis corresponds to a particular input and output channel.
a
j
,
i
corresponds to some particular basis
B
j
, and thus corresponds to a particular input (
β
) and a particular output (
α
) channel.
a
j
,
i
is thus equivalent to
λ
n
(
α
,
β
)
.
“the convolution (*) between the input feature map x (X) and the 3D kernel becomes
PNG
media_image16.png
175
944
media_image16.png
Greyscale
” (Li, page 4, right column, paragraph 1). For channel-wise decomposition, each basis is associated with a particular input and output channel. Thus, this equation is evaluated over a number of input and output channels, as in the claimed equation.
“For the forward pass, the learned basis is used to approximate the original filters and then used as parameters for the convolutional layers (layer[s] of the neural network)“ (Li, page 1, Abstract); “each split of feature map is firstly convolved with the filter basis, and then the final output is achieved by a weighted summation of the convolution results” (Li, page 4, right column, paragraph 2).
Regarding claim 37, the rejection of claim 26 in view of Li is incorporated. Li further discloses a method, comprising at least one of the following: a) using the at least one filter dictionary for a plurality of layers of the neural network, b) using the at least one filter dictionary for a plurality of layers of the neural network that are associated with a same spatial size of data to be processed, c) using the at least one filter dictionary for a respective residual block, the neural network being a residual neural network, d) using the at least one filter dictionary for a layer of the neural network: “To compress the networks further, we can force several or all convolutional layers (layer[s] of the neural network) to share the same basis set (filter dictionary) depending on the compression degree we want to achieve. The weight sharing strategy can be customized to the networks. For example, in ResNet [16] (residual neural network) and the following works SRResNet [28], EDSR [34], there are two convolutions in the residual block. We can let the two convolutions share the basis … The channels in the lower residual block groups are relatively small (16 and 32 for the first and second group) (layers … associated with a same spatial size of data to be processed)” (Li, page 5, right column, paragraph 3)
Regarding claim 38, the rejection of claim 26 in view of Li is incorporated. Li further teaches a method, comprising training the neural network based on training data, wherein a trained neural network is obtained using the trained neural network for the processing of the input data: “The networks are trained on DIV2K [2] dataset that contains 1,000 2K images (training data). We test the networks (trained neural network[s]) on five datasets: Set5 [4], Set14 [49], B100 [37], Urban100 [20], and DIV2K validation set (input data)” (Li, page 6, right column, paragraph 1).
Regarding claim 39, Li discloses A computer-implemented method for training an artificial deep neural network
wherein at least one filter of the neural network is represented based on at least one filter dictionary, the method comprising: “in this paper, we try to reduce the number of parameters of CNNs (artificial deep neural network[s]) by learning a basis of the filters in convolutional layers” (Li, page 1, Abstract); “Each 3D filter (filter of the neural network)
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
" (Li, page 3, left column, paragraph 5).
training at least one component of the at least one filter dictionary, wherein the training of the at least one component of the at least one filter dictionary is performed at least temporarily simultaneously and/or together with a training of at least one other component of the neural network:
“4.1. General filter basis learning approach
We jointly minimize the approximation error
PNG
media_image17.png
38
205
media_image17.png
Greyscale
and the network target loss
PNG
media_image18.png
37
134
media_image18.png
Greyscale
. For example, to compress image restoration network with mean square error (MSE) loss, our training objective function is
PNG
media_image19.png
102
648
media_image19.png
Greyscale
, where
f
B
,
A
|
θ
(
∙
)
denotes the CNN with parameter {B, A}, conditioned that the other parameters
θ
are known” (Li, page 5, right column, paragraph 1).
Examiner’s note: This loss jointly trains both the filter dictionary through its second term and other parameters of the neural network through its first.
Regarding claim 43, the rejection of claim 39 in view of Li is incorporated. Li further discloses a method, wherein the training includes: training the at least one filter dictionary together with at least one coefficient associated with the at least one filter dictionary:
“4.1. General filter basis learning approach
We jointly minimize the approximation error
PNG
media_image17.png
38
205
media_image17.png
Greyscale
and the network target loss
PNG
media_image18.png
37
134
media_image18.png
Greyscale
. For example, to compress image restoration network with mean square error (MSE) loss, our training objective function is
PNG
media_image19.png
102
648
media_image19.png
Greyscale
, where
f
B
,
A
|
θ
(
∙
)
denotes the CNN with parameter {B, A}, conditioned that the other parameters
θ
are known” (Li, page 5, right column, paragraph 1). This loss jointly trains the filter dictionary (B) along with its associated coefficients (A).
Regarding claim 44, the rejection of claim 26 in view of Li is incorporated. Li further discloses a method, wherein the processing of the input data includes at least one of the following:
a) processing one- and/or multi-dimensional data, b) processing image data:
“We evaluate the performance of compressed models on CIFAR10 [25] dataset. The training and testing subset contains 50,000 and 10,000 images, respectively” (Li, page 6, left column, paragraph 3).
PNG
media_image20.png
200
400
media_image20.png
Greyscale
“SR results of bid image for upscaling factor x4. Network compression methods are applied on EDSR” (Li, page 8, Figure 4). As one of ordinary skill in the art could see, there’s at least one pixel in this image data (dimensionality >= 1).
c) processing audio data, the audio data including voice data and/or operating noises from technical equipment or systems
d) processing video data or parts of video data
e) processing sensor data
wherein the processing of the input data includes a classification of the input data: “We show the experimental results in this section and compare with the state-of-the-art methods on both image classification and image SR. For classification, we applied our basis learning method to various networks including VGG [45], ResNet [16], and DenseNet [19]. We evaluate the performance of compressed models on CIFAR10 [25] dataset. The training and testing subset contains 50,000 and 10,000 images, respectively” (Li, page 6, left column, paragraph 3).
Regarding claim 45, the rejection of claim 44 in view of Li is incorporated. Li further discloses a method, comprising: using output data obtained based on the processing of the input data to control and/or regulate at least one component of a technical system:
“the final output is achieved by a weighted summation of the convolution results” (Li, page 5, left column, paragraph 1)
“4.1. General filter basis learning approach
We jointly minimize the approximation error
PNG
media_image17.png
38
205
media_image17.png
Greyscale
and the network target loss
PNG
media_image18.png
37
134
media_image18.png
Greyscale
. For example, to compress image restoration network with mean square error (MSE) loss, our training objective function is
PNG
media_image19.png
102
648
media_image19.png
Greyscale
, where
f
B
,
A
|
θ
(
∙
)
(output data obtained based on the processing of the input data) denotes the CNN with parameter {B, A}” (Li, page 5, right column, paragraph 1). The output of the network, derived by processing input, is used to regulate Li’s system through training.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 29-30, 35-36, 40-41, and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“Learning Filter Basis for Convolutional Neural Network Compression”, published 12/23/2019, arXiv:1908.08932v2), hereafter referred to as ‘Li’, in view of Engan et al. ("Method of optimal directions for frame design," 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), Phoenix, AZ, USA, 1999, pp. 2443-2446 vol.5, doi: 10.1109/ICASSP.1999.760624), hereafter referred to as ‘Engan’.
Regarding claim 29, the rejection of claim 26 in view of Li is incorporated. While Li fails to disclose the further limitations of the claim, Engan discloses a method, wherein a) the at least one filter dictionary does not completely span a space or b) at least some elements of the at least one filter dictionary are linearly dependent on one another and the at least one filter dictionary is overcomplete: “A vector can also be written as a linear combination of an overcomplete set of vectors. If the N-dimensional vector space V contains a set F = {fj} of K vectors where K > N, and F spans the space V, F is an overcomplete set (element of the at least one filter dictionary). The vectors fj are not independent (linearly dependent), and F is not a basis but a frame [7]. Any vector, v, in the set V can be expanded as a linear combination of the frame vectors:
v
=
Σ
j
=
1
K
a
j
f
j
, but because of the linear dependence of the frame vectors, the expansion is not unique any more” (Engan, page 2443, right column, paragraph 3); “The term frame covers both a basis and an overcomplete set of vectors. We use the term frame for a general linearly dependent set of vectors, mostly overcomplete, which spans the space” (Engan, page 2443, right column, paragraph 4).
Li and Engan relate to approximating tensors with linear combinations of bases and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to use overcomplete sets of vectors for its bases, as disclosed by Engan. Reconstructing a signal that’s low resolution, non-stationary, or sampled from a non-gaussian process is a difficult, non-trivial task, one that can be overcome by reconstructing the signal with a linear combination of an overcomplete set of vectors. See Engan, page 2443, left column, paragraph 1.
Regarding claim 30, the rejection of claim 26 in view of Li is incorporated. While Li fails to disclose the further limitations of the claim, Engan discloses a method, wherein the at least one filter dictionary is different from a standard basis B, according to
PNG
media_image7.png
58
502
media_image7.png
Greyscale
wherein
e
(
n
)
characterizes an n-th unit vector associated with the standard basis B, and K is a positive integer representing a special size of the at least one filter:
“A vector can also be written as a linear combination of an overcomplete set of vectors. If the N-dimensional vector space V contains a set F = {fj} of K vectors where K > N, and F spans the space V, F is an overcomplete set. The vectors fj are not independent, and F is not a basis but a frame [7]. Any vector, v, in the set V can be expanded as a linear combination of the frame vectors:
v
=
Σ
j
=
1
K
a
j
f
j
, but because of the linear dependence of the frame vectors, the expansion is not unique any more” (Engan, page 2443, right column, paragraph 3). The filter dictionary is the set of all frames, containing at least one element F.
“The term frame covers both a basis and an overcomplete set of vectors. We use the term frame for a general linearly dependent set of vectors, mostly overcomplete, which spans the space” (Engan, page 2443, right column, paragraph 4).
Examiner’s note: A standard basis contains N vectors for an N-dimensional vector space. Engan’s overcomplete sets contain K > N vectors, so they differ from standard bases.
Li and Engan relate to approximating tensors with linear combinations of bases and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to use overcomplete sets of vectors for its bases, as disclosed by Engan. Reconstructing a signal that’s low resolution, non-stationary, or sampled from a non-gaussian process if a difficult, non-trivial task, one that can be overcome by reconstructing the signal with a linear combination of an overcomplete set of vectors. See Engan, page 2443, left column, paragraph 1.
Regarding claim 35, the rejection of claim 26 in view of Li is incorporated. While Li fails to disclose the further limitations of the claim, Engan teaches a method of reducing at least one component of the at least one filter dictionary, wherein the reducing includes at least one of the following:
a) reducing at least one filter of the at least one filter dictionary by zeroing at least one filter coefficient of the at least one filter of the at least one filter dictionary:
“The term frame (filter of the at least one filter dictionary) covers both a basis and an overcomplete set of vectors. We use the term frame for a general linearly dependent set of vectors, mostly overcomplete, which spans the space” (Engan, page 2443, right column, paragraph 4). The set of all frames comprises the filter dictionary.
“Let F denote an N x K matrix where K >= N. The columns, {fj} , j = 1,. . . , K, constitute a frame (filter of the at least one filter dictionary). Let
x
i
be a real signal vector,
x
i
∈
R
N
,
x
i
can then be represented or approximated as
PNG
media_image21.png
72
186
media_image21.png
Greyscale
” (Engan, page 2443, right column, paragraph 5).
“In a good compression scheme, many of the wi(j)’s (coefficient[s]) will be zero” (Engan, page 2443, right column, paragraph 5)
“Only m of the wi(j)’s are different to zero” (Engan, page 2445, left column, paragraph 1)
b) removing or deleting at least one filter of the at least one filter dictionary
c) removing or deleting at least one coefficient associated with the at least one filter dictionary
Li and Engan relate to signal approximation with a linear combination of matrices and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to zero some of the filter coefficients, as disclosed by Engan. Doing so would compress the complexity of the system without necessarily losing much signal information needed to reconstruct the network filter. See Engan, page 2443, left column, paragraph 1)
Regarding claim 36, the rejection of claim 35 in view of Li and Engan is incorporated. Engan further discloses a method, comprising at least one of the following:
a) performing the reducing after an initializing of the at least one filter dictionary:
“In the frame design algorithm presented here the number of frame vectors to be used, m, is constant for all training vectors and iterations. The main steps in of algorithm are as follows:
1. Begin with an initial frame
F
0
of size N x K (initializing of the at least one filter dictionary), and decide the number of frame vectors to be used in each approximation, m. Assign counter variable i = 1.
2. Approximate each training vector,
x
i
, using a vector selection algorithm:
PNG
media_image22.png
137
276
media_image22.png
Greyscale
where
w
i
(
j
)
is the coefficient corresponding to vector
f
j
, and only m of the
w
i
(
j
)
's are different from zero.
3. Given the approximations and residuals, adjust the frame vectors =>
F
i
.
4. Find the new approximations, and calculate the new residuals. If (stop-criterion = FALSE) + i = i + 1, go to step 3. Otherwise stop” (Engan, page 2444, right column, paragraph 3). When a frame of the set of frames (the dictionary) is initialized, the dictionary is fully or partially initialized.
b) performing the reducing after an initializing of coefficients of at least some filters of the at least one filter dictionary:
c) performing the reducing during a training of the neural network
“part a) in the Lloyd Iteration involves finding approximations for all the vectors in the training set. We will not call this classification since it includes both finding the frame vectors to be used when approximating a signal vector and their associated coefficients. Thus, part a) in the Lloyd Iteration for frame design is to find an approximation for each training vector” (Engan, page 2444, left column, paragraph 4). Approximating a training vector comprises finding frame vectors and their associated coefficients.
“In the frame design algorithm presented here the number of frame vectors to be used, m, is constant for all training vectors and iterations. The main steps in of algorithm are as follows:
1. Begin with an initial frame
F
0
of size N x K, and decide the number of frame vectors to be used in each approximation, m. Assign counter variable i = 1.
2. Approximate each training vector,
x
i
, using a vector selection algorithm:
PNG
media_image22.png
137
276
media_image22.png
Greyscale
where
w
i
(
j
)
is the coefficient corresponding to vector
f
j
, and only m of the
w
i
(
j
)
's are different from zero.
3. Given the approximations and residuals, adjust the frame vectors =>
F
i
.
4. Find the new approximations, and calculate the new residuals. If (stop-criterion = FALSE) + i = i + 1, go to step 3. Otherwise stop” (Engan, page 2444, right column, paragraph 3). This is a training process used to iteratively find better frames. The coefficients, (K – m) of them equal to zero, are found during this step 2 of this process. Thus, the reducing happens during the training process.
d) performing the reducing after the training of the neural network
Li and Engan relate to signal approximation with a linear combination of matrices and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Li and Engan to select the zero coefficients during training, as disclosed by Engan. Doing so would compress the filters by zeroing (K – m) columns and reducing their complexity for filter learning. See Engan, page 2444, right column, paragraph 3)
Regarding claim 41, the rejection of claim 39 in view of Li is incorporated. While Li fails to disclose the further limitations of the claim, Engan teaches a method, comprising:
providing a filter dictionary not characterizing a standard basis:
“A vector can also be written as a linear combination of an overcomplete set of vectors. If the N-dimensional vector space V contains a set F = {fj} of K vectors where K > N, and F spans the space V, F is an overcomplete set. The vectors fj are not independent, and F is not a basis but a frame [7]. Any vector, v, in the set V can be expanded as a linear combination of the frame vectors:
v
=
Σ
j
=
1
K
a
j
f
j
, but because of the linear dependence of the frame vectors, the expansion is not unique any more” (Engan, page 2443, right column, paragraph 3). The filter dictionary is the set of all frames, containing at least one element F.
“The term frame covers both a basis and an overcomplete set of vectors. We use the term frame for a general linearly dependent set of vectors, mostly overcomplete, which spans the space” (Engan, page 2443, right column, paragraph 4).
Examiner’s note: A standard basis contains N vectors for an N-dimensional vector space. Engan’s overcomplete sets contain K > N vectors, so they differ from standard bases.
changing the filter dictionary not characterizing a standard basis, based on the training:
“In the frame design algorithm presented here the number of frame vectors to be used, m, is constant for all training vectors and iterations. The main steps in of algorithm are as follows:
1. Begin with an initial frame
F
0
of size N x K (non-standard basis), and decide the number of frame vectors to be used in each approximation, m. Assign counter variable i = 1.
…
3. Given the approximations and residuals, adjust the frame vectors =>
F
i
.
4. Find the new approximations, and calculate the new residuals. If (stop-criterion = FALSE) + i = i + 1, go to step 3. Otherwise stop” (Engan, page 2444, right column, paragraph 3). This is a training process used to iteratively find better frames.
Li and Engan relate to approximating tensors with linear combinations of bases and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to use overcomplete sets of vectors for its bases, as disclosed by Engan. Reconstructing a signal that’s low resolution, non-stationary, or sampled from a non-gaussian process is a difficult, non-trivial task, one that can be overcome by reconstructing the signal with a linear combination of an overcomplete set of vectors. See Engan, page 2443, left column, paragraph 1.
Regarding claim 42, the rejection of claim 39 in view of Li is incorporated. Li further discloses a method, comprising:
providing a pre-trained neural network or performing a first training for the neural network; … and performing a further training: “We train the compressed networks for 300 epochs with SGD optimizer and an initial learning rate of 0.1. The learning rate is decayed by 10 after 50% and 75% of the epochs” (Li, page 6, left column, paragraph 3)
While Li fails to disclose the further limitations of the claim, Engan discloses a method of performing a reducing on the pre-trained neural network:
“The term frame (filter of the at least one filter dictionary) covers both a basis and an overcomplete set of vectors. We use the term frame for a general linearly dependent set of vectors, mostly overcomplete, which spans the space” (Engan, page 2443, right column, paragraph 4). The set of all frames comprises the filter dictionary.
“Let F denote an N x K matrix where K >= N. The columns, {fj} , j = 1,. . . , K, constitute a frame (filter of the at least one filter dictionary). Let
x
i
be a real signal vector,
x
i
∈
R
N
,
x
i
can then be represented or approximated as
PNG
media_image21.png
72
186
media_image21.png
Greyscale
” (Engan, page 2443, right column, paragraph 5).
“In a good compression scheme, many of the wi(j)’s (coefficient[s]) will be zero” (Engan, page 2443, right column, paragraph 5)
“Only m of the wi(j)’s are different to zero” (Engan, page 2445, left column, paragraph 1)
Li and Engan relate to signal approximation with a linear combination of matrices and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to zero some of the filter coefficients, as disclosed by Engan. Doing so would compress the complexity of the system without necessarily losing much signal information needed to reconstruct the network filter. See Engan, page 2443, left column, paragraph 1).
Regarding claim 46, the rejection of claim 26 in view of Li is incorporated. Li further discloses a method, comprising a) initializing the at least one filter dictionary, b) initializing coefficients associated with the at least one filter dictionary, c) reducing at least one component of the at least one filter dictionary, or d) training the at least one filter dictionary together with at least one further component of the neural network based on a stochastic, gradient-based optimization method:
“4.1. General filter basis learning approach
We jointly minimize the approximation error
PNG
media_image17.png
38
205
media_image17.png
Greyscale
and the network target loss
PNG
media_image18.png
37
134
media_image18.png
Greyscale
. For example, to compress image restoration network with mean square error (MSE) loss, our training objective function is
PNG
media_image19.png
102
648
media_image19.png
Greyscale
, where
f
B
,
A
|
θ
(
∙
)
denotes the CNN with parameter {B, A}, conditioned that the other parameters
θ
are known” (Li, page 5, right column, paragraph 1).
Examiner’s note: To calculate this loss function and use it to train the network, its variables (including the set of bases (filter dictionary) and their coefficients) must be initialized.
Examiner’s note: This loss jointly trains both the filter dictionary through its second term and other parameters of the neural network through its first.
“Adam optimizer [24] (stochastic, gradient-based optimization method) is used for training SR networks” (Li, page 6, right column, paragraph 1).
While Li fails to disclose the further limitations of the claim, Engan teaches a method, comprising c) reducing at least one component of the at least one filter dictionary:
“The term frame (filter of the at least one filter dictionary) covers both a basis and an overcomplete set of vectors. We use the term frame for a general linearly dependent set of vectors, mostly overcomplete, which spans the space” (Engan, page 2443, right column, paragraph 4). The set of all frames comprises the filter dictionary.
“Let F denote an N x K matrix where K >= N. The columns, {fj} , j = 1,. . . , K, constitute a frame (filter of the at least one filter dictionary). Let
x
i
be a real signal vector,
x
i
∈
R
N
,
x
i
can then be represented or approximated as
PNG
media_image21.png
72
186
media_image21.png
Greyscale
” (Engan, page 2443, right column, paragraph 5).
“In a good compression scheme, many of the wi(j)’s will be zero” (Engan, page 2443, right column, paragraph 5)
“Only m of the wi(j)’s are different to zero” (Engan, page 2445, left column, paragraph 1)
Li and Engan relate to signal approximation with a linear combination of matrices and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to zero some of the filter coefficients, as disclosed by Engan. Doing so would compress the complexity of the system without necessarily losing much signal information needed to reconstruct the network filter. See Engan, page 2443, left column, paragraph 1)
Claims 33-34 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“Learning Filter Basis for Convolutional Neural Network Compression”, published 12/23/2019, arXiv:1908.08932v2), hereafter referred to as ‘Li’, in view of Ambai (“NEURAL NETWORK APPARATUS, VEHICLE CONTROL SYSTEM, DECOMPOSITION DEVICE, AND PROGRAM”, published 9/19/2019, US 20190286982 A1).
Regarding claim 33, the rejection of claim 26 in view of Li is incorporated. While Li fails to disclose the further limitations of the claim, Ambai discloses a method, comprising initializing the at least one filter dictionary prior to the representing and/or the processing, wherein the initializing includes at least one of the following alternative methods:
a) random-based initializing by assigning random numbers or pseudorandom numbers to at least some filter coefficients
g
i
,
j
(
n
)
of at least some filters of the at least one filter dictionary:
“In the above neural network apparatus, the neural network model may be a convolutional neural network model, in the convolutional neural network model, a plurality of filters (network filters) of a convolutional layer may be collected and be regarded as the weight matrix (W), the convolutional layer may be regarded as a fully connected layer, and the weight matrix (W) may be constituted by a product of a weight basis matrix (
M
w
) (filter of the at least one filter dictionary) of integers and a weight coefficient matrix (
C
w
) of real numbers” (Ambai, [0065]). The filter dictionary is the set of all basis matrices.
“Randomly initialize the basis matrix
M
w
and the coefficient matrix
C
w
” (Ambai, [0096])
b) random-based initializing such that a linear space span{F} that is characterized by the at least one filter dictionary is spanned by an orthonormal basis, including:
b 1) initializing at least some filter coefficients
g
i
,
j
(
n
)
of at least some filters of the at least one filter dictionary with independently equally distributed filter coefficient values,
b2) applying a Gram-Schmidt orthogonalization method to the elements or filters of the at least one filter dictionary,
c) random-based initializing, including:
c 1) initializing at least some filter coefficients
g
i
,
j
(
n
)
of at least some filters of the at least one filter dictionary with independently equally distributed filter coefficient values,
c2) rescaling the at least one filter dictionary based on at least one statistical quantity, for example a mean and/or a standard deviation
Li and Ambai relate to representing CNN filters with linear bases and are analogous to the claimed invention. Li teaches a method of approximating a CNN filter with a linear combination of trained bases. Ambai teaches a method of randomly initializing a filter basis and its coefficient(s). It would have been obvious to one of ordinary skill in the art to randomly initialize Li’s bases and / or basis coefficients. This would achieve the predictable result of unbiased initial values for bases and / or basis coefficients in training, with the linear combination of trained bases and the random initialization of matrices and / or vectors performing the same together as they did separately. (MPEP 2143 I. (A) Combining prior art elements according to known methods to yield predictable results).
Regarding claim 34, the rejection of claim 26 in view of Li is incorporated. While Li fails to disclose the further limitations of the claim, Ambai discloses a method of initializing coefficients of at least some filters of the at least one filter dictionary, including at least one of the following:
a) random-based or pseudorandom-based initializing of the coefficients:
“In the above neural network apparatus, the neural network model may be a convolutional neural network model, in the convolutional neural network model, a plurality of filters (network filters) of a convolutional layer may be collected and be regarded as the weight matrix (W), the convolutional layer may be regarded as a fully connected layer, and the weight matrix (W) may be constituted by a product of a weight basis matrix (
M
w
) (filter of the at least one filter dictionary) of integers and a weight coefficient matrix (
C
w
) of real numbers” (Ambai, [0065]). The filter dictionary is the set of all basis matrices.
“Randomly initialize the basis matrix
M
w
and the coefficient matrix
C
w
” (Ambai, [0096])
b) initializing the coefficients based on the at least one filter dictionary
Li and Ambai relate to representing CNN filters with linear bases and are analogous to the claimed invention. Li teaches a method of approximating a CNN filter with a linear combination of trained bases. Ambai teaches a method of randomly initializing a filter basis and its coefficient(s). It would have been obvious to one of ordinary skill in the art to randomly initialize Li’s bases and / or basis coefficients. This would achieve the predictable result of unbiased initial values for bases and / or basis coefficients in training, with the linear combination of trained bases and the random initialization of matrices and / or vectors performing the same together as they did separately. (MPEP 2143 I. (A) Combining prior art elements according to known methods to yield predictable results).
Claim 39 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“Learning Filter Basis for Convolutional Neural Network Compression”, published 12/23/2019, arXiv:1908.08932v2), hereafter referred to as ‘Li’, in view of Taboga (“Standard basis”, published 5/7/2021, StatLect, retrieved from https://web.archive.org/web/20210507153125/https://www.statlect.com/matrix-algebra/standard-basis).
Regarding claim 40, the rejection of claim 39 in view of Li is incorporated. Li further discloses a method, comprising:
providing a filter dictionary characterizing a standard basis, wherein the standard basis is characterized according to
PNG
media_image12.png
50
493
media_image12.png
Greyscale
wherein
e
(
n
)
characterizes an n-th unit vector associated with the standard basis B, and K is a positive integer representing a spatial size of the at least one filter:
“Each 3D filter
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m filter basis
{
B
j
(
e
n
)
|
j
=
1
,
…
,
m
}
(filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
" (Li, page 3, left column, paragraph 5).
“In order to achieve a better trade-off between compressing the basis and coefficients, we split the 3D filters along the channel dimension as illustrated in the middle part of Fig. 1, namely, thinking of the c × w × h filter as being composed of s smaller p × w × h filters and c = s × p. As a result, the n 3D c × w × h filters can be regarded as n × s filters with size p × w × h.” (Li, page 4, left column, paragraph 2)
PNG
media_image23.png
218
476
media_image23.png
Greyscale
”m is the number of basis. The number of splits p (spatial size of the at least one filter) for one convolution is 4.” (Li, page 6, left column, Table 1). In one of the test cases, m = 16 =
p
2
, making m analogous to
K
2
.
changing the filter dictionary, characterizing the standard basis, based on the training: “In this section, we present our learning method for learning filter basis” (Li, page 5, left column, paragraph 5); “We jointly minimize the approximation error
PNG
media_image17.png
38
205
media_image17.png
Greyscale
and the network target loss
PNG
media_image18.png
37
134
media_image18.png
Greyscale
. For example, to compress image restoration network with mean square error (MSE) loss, our training objective function is
PNG
media_image19.png
102
648
media_image19.png
Greyscale
” (Li, page 5, right column, paragraph 1).
While Li fails to disclose the further limitations of the claim, Taboga discloses a method, wherein the standard basis is characterized according to
PNG
media_image12.png
50
493
media_image12.png
Greyscale
, wherein
e
(
n
)
characterizes an n-th unit vector associated with the standard basis B: “Let S be the space of all K-dimensional vectors. Denote by
e
k
a vector whose k-th entry is equal to 1 and whose remaining K-1 entries are equal to 0. Then, the set of K vectors
PNG
media_image24.png
200
400
media_image24.png
Greyscale
PNG
media_image25.png
33
134
media_image25.png
Greyscale
is called the standard basis of S” (Taboga, page 2, paragraph 1). One of ordinary skill in the art could construct a standard basis of arbitrary dimension using this information. For instance, to construct a standard basis spanning the space of a set of K-dimensional vectors, one merely needs to construct K basis vectors, each with one nonzero element in a unique position.
Li relates to approximating network filters with a linear combination of bases and is analogous to the claimed invention. Taboga relates to standard bases and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to transform learned bases into standard form, as disclosed by Taboga. For a vector space of any dimensionality, the standard basis is the simplest basis that can span the entire space, and can span the same space with fewer vectors as an overcomplete set. See Taboga, page 1, paragraph 1 and page 6, paragraph 1).
Claims 46-47 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“Learning Filter Basis for Convolutional Neural Network Compression”, published 12/23/2019, arXiv:1908.08932v2), hereafter referred to as ‘Li’, in view of Guo et al. (“METHODS AND APPARATUS FOR ENHANCING A NEURAL NETWORK USING BINARY TENSOR AND SCALE FACTOR PAIRS”, filed 5/22/2018), hereafter referred to as ‘Guo’.
Regarding claim 47, Li discloses a method, comprising:
represent[ing] at least one filter of the neural network based on at least one filter dictionary: “Each 3D filter (filter of the neural network)
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
" (Li, page 3, left column, paragraph 5).
process[ing] input data and / or data derived from input data, using the at least one filter: “we utilize linear combination of filter basis to reconstruct the 3D filter
W
i
=
Σ
j
=
1
m
a
j
,
i
B
j
… Thus, the convolution between the input feature map x and the 3D kernel becomes Thus, the convolution between the input feature map x and the 3D kernel becomes
PNG
media_image13.png
129
704
media_image13.png
Greyscale
” (Li, page 3, left column, paragraph 4)
While Li fails to disclose the further limitations of the claim, Guo discloses [a]n apparatus configured to process data associated with an artificial deep neural network, the apparatus configured to: “Examples and embodiments of the present invention include apparatuses, systems and methods for enhancing a neural network using binary tensor and scale factor pairs” (Guo, page 78, right column, lines 30-33).
Li and Guo relate to approximating CNN filters with linear bases and are analogous to the claimed invention. Li teaches a method of approximating filters with a linear combination of bases. The claimed invention improves upon this method by processing it with an apparatus. Guo teaches apparatuses for executing operations related to linear basis approximations of CNN filters, applicable to Li. A person of ordinary skill in the art would have recognized that storing Li’s method as computer instructions on Guo’s hardware would lead to the predictable result of the method being executable by a computing system, and would improve the known device by allowing it to be performed with real data (MPEP 2143 I. (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results).
Regarding claim 48, Li discloses a method, comprising:
representing at least one filter of the neural network based on at least one filter dictionary: “Each 3D filter (filter of the neural network)
W
i
∈
R
c
w
h
×
1
(or
W
i
∈
R
w
h
×
1
for the channel-wise decomposition case) is represented by the linear combination of a set of m filter basis
{
B
j
|
j
=
1
,
…
,
m
}
(filter dictionary) with the coding coefficient vector
A
i
∈
R
m
×
1
" (Li, page 3, left column, paragraph 5).
processing input data and/or data derived from input data, using the at least one filter: “we utilize linear combination of filter basis to reconstruct the 3D filter
W
i
=
Σ
j
=
1
m
a
j
,
i
B
j
… Thus, the convolution between the input feature map x and the 3D kernel becomes Thus, the convolution between the input feature map x and the 3D kernel becomes
PNG
media_image13.png
129
704
media_image13.png
Greyscale
” (Li, page 3, left column, paragraph 4)
While Li fails to disclose the further limitations of the claim, Guo discloses [a] non-transitory computer-readable storage medium on which are stored instructions for processing data associated with an artificial deep neural network, the instructions, when executed by a computer, causing the computer to perform the following steps: “A non-transitory machine-readable medium comprising instructions which when operated on by the machine cause the machine to perform a method comprising: using a binary structure directly in a pre-trained filter of a trained CNN to produce binary weight models via tensor expansion” (Guo, Claim 12)
Li and Guo relate to approximating CNN filters with linear bases and are analogous to the claimed invention. Li teaches a method of approximating filters with a linear combination of bases. The claimed invention improves upon this method by storing it in the form of instructions on computer hardware. Guo teaches computer hardware for storing instructions related to linear basis approximations of CNN filters, applicable to Li. A person of ordinary skill in the art would have recognized that storing Li’s method as computer instructions on Guo’s hardware would lead to the predictable result of the method being executable by a computing system, and would improve the known device by allowing it to be performed with real data (MPEP 2143 I. (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results).
Response to Arguments
The following responses address arguments and remarks made in the instant remarks dated 12/29/2025.
Objections
In light of the instant amendments, some previous objections to the specification have been withdrawn. However, some previous objections to the specification remain.
In light of the instant amendments, some previous objections to the claims have been withdrawn. However, some previous objections to the claims remain, and new objections have been found.
112 Rejections
In light of the instant amendments, some previous claim rejections under 35 U.S.C. 112 have been withdrawn. However, some previous rejections remain. The Examiner notes that the rejection of claim 33 is caused by confusion over the indenting of method c relative to methods a and b, where the indentation suggests that c is part of b’s method.
Additionally, new rejections under 35 U.S.C. 112(a) have been made in light of the amended claims.
101 Rejections
On pages 10-11 of the instant remarks, the Applicant argues that the claimed invention cannot be performed as a mental process, and is practically integrated through improvement to existing technology:
“V. Rejection of Claims Under 35 U.S.C. § 101
Claims 26-47 were rejected under 35 U.S.C. § 101 because the claimed invention
is directed to a judicial exception (i.e., an abstract idea) without significantly more.
The claimed invention is not directed to an abstract idea because it recites a
specific, computer-implemented method for representing and processing data in artificial deep
neural networks using filter dictionaries, which is applied to improve computational efficiency
and accuracy in processing complex input data such as images, audio, video, or sensor data. The
present claims recite initializing, reducing, and training filter dictionaries, and processing input
data using convolutional operations, which are concrete computational steps that transform input
data into improved output representations, providing a technical solution to a technical problem
in neural network data processing. For example, applying Gram-Schmidt orthogonalization to
the filter dictionary, or rescaling filters based on statistical quantities, materially improves the
quality and stability of neural network computations, which cannot be performed solely in the
human mind for large-scale data. Moreover, the claims recite practical integration into a
technical system, such as controlling or regulating components of a device based on processed
output data, which further demonstrates that the claimed methods produce a specific
technological improvement rather than abstract mathematical ideas or mental steps.
In view of all of the foregoing, withdrawal of the rejection is respectfully
requested.”
Regarding the argument that limitations of the claimed invention cannot be performed mentally, the Examiner respectfully disagrees. As stated in MPEP 2106.04(a)(2)(III), The courts do not distinguish between mental processes that are performed entirely in the human mind and mental processes that require a human to use a physical aid (e.g., pen and paper or a slide rule) to perform the claim limitation. See, e.g., Benson, 409 U.S. at 67, 65, 175 USPQ at 674-75, 674 … Nor do the courts distinguish between claims that recite mental processes performed by humans and claims that recite mental processes performed on a computer. As the Federal Circuit has explained, "[c]ourts have examined claims that required the use of a computer and still found that the underlying, patent-ineligible invention could be performed via pen and paper or in a person’s mind." Versata Dev. Group v. SAP Am., Inc., 793 F.3d 1306, 1335, 115 USPQ2d 1681, 1702 (Fed. Cir. 2015). See also Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307, 1318, 120 USPQ2d 1353, 1360 (Fed. Cir. 2016) (‘‘[W]ith the exception of generic computer-implemented steps, there is nothing in the claims themselves that foreclose them from being performed by a human, mentally or with pen and paper.’’); Mortgage Grader, Inc. v. First Choice Loan Servs. Inc., 811 F.3d 1314, 1324, 117 USPQ2d 1693, 1699 (Fed. Cir. 2016) (holding that computer- implemented method for "anonymous loan shopping" was an abstract idea because it could be "performed by humans without a computer").
Claim 26 recites limitations amounting to mental processes performed on generic computing machines, which are insufficient to render a mentally performable task non-abstract. For example, claim 26 recites the mentally performable “representing at least one filter of the neural network based on at least one filter dictionary” as being performed by a generic computer (“A computer-implemented method”), insufficient to render the limitation non-abstract.
The Examiner asserts that the claimed invention, as amended, recites mental processes, and maintains rejections under 35 U.S.C. 101 on this basis.
In response to the Applicant’s argument that the claimed invention is practically integrated through an improvement to existing technology or technical field, the Examiner notes that improvements cannot be made through a recited judicial exception. As noted by MPEP 2106.05(a), It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception. See MPEP § 2106.04(d) (discussing Finjan, Inc. v. Blue Coat Sys., Inc., 879 F.3d 1299, 1303-04, 125 USPQ2d 1282, 1285-87 (Fed. Cir. 2018)).
The Applicant is arguing improvement through representing network filters with a filter dictionary, recited in claim 26 as “representing at least one filter of the neural network based on at least one filter dictionary”, which can be performed as a mental process. While the claimed invention contains additional elements, they are insufficient to provide the argued improvements to existing technology or technical fields, as noted in greater detail under the 101 rejections section.
Thus, no rejections under 35 U.S.C. 101 are withdrawn on these grounds.
102 / 103 Rejections
On page 12 of the instant remarks, the Applicant argues that Khosla fails to disclose the full limitations of the claims:
“The claims recite the feature of acquiring a traffic flow that crosses a traffic flow regulated by
the traffic light signal, and ascertaining characterizing data of the crossing traffic flow. In
contrast, Khosla discloses evaluating vehicle positions, densities, or priorities but does not
disclose or suggest identifying a "crossing traffic flow," e.g., a perpendicular or conflicting flow
at the intersection.”
In response to the Applicant’s arguments above, the Examiner notes that no reference named “Khosla” has been referenced during prosecution of the instant application, nor do any of the amended claims mention traffic light signals or traffic flow. The Examiner respectfully notes that it appears this argument may have been made in reference to the prosecution of a different application.
Thus, no rejections are withdrawn on these grounds.
On page 12 of the instant remarks, the Applicant argues that Li fails to disclose representing at least one filter based on a filter dictionary:
“The present claims recite the feature of representing at least one filter of the
neural network based on at least one filter dictionary. In contrast, nowhere does Li disclose or
suggest this feature. Li discloses a low-rank factorization in which filters may be approximated
as a product of basis matrices learned during training. However, Li does not disclose or suggest a
filter dictionary. A filter dictionary, as recited, is a stored collection of dictionary atoms used to
encode filters through an explicit representation step. Li's basis filters are not a dictionary and
are not used in a dictionary-based representation; instead, they are simply trained parameters in a
matrix decomposition model.”
In response to applicant's arguments above, it is noted that the storage and encoding of the filter dictionary upon which the Applicant relies are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Regarding “representing at least one filter of the neural network based on at least one filter dictionary”, as recited in claim 26, Li discloses representing network filters based on linear combinations of filter bases from an indexed set of filter bases B (Li, page 3, left column, paragraph 5). An indexed set of related bases falls within the broadest reasonable interpretation of a “dictionary” in the context of computer science, and is commensurate in scope with the claim language.
No rejections are withdrawn on these grounds. See the 102 rejections section for more detail.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Jacobsen et al. (“Dynamic Steerable Frame Networks”, published 2017, ICLR 2017) teaches that filters, even in a standard CNN not utilizing Li’s method, can be considered linear combinations of standard bases: “In a standard convolutional network, a filter kernel is a linear combination over the standard basis for
l
2
(
N
)
” (Jacobsen, page 3, paragraph 1)
Goldston et al. (“AM compatible digital audio broadcasting signal transmision using digitally modulated orthogonal noise-like sequences”, published 9/21/1999, US5956373A) teaches the use of Gram-Schmidt orthogonalization and random initialization of linear bases
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Aaron P Gormley whose telephone number is (571)272-1372. The examiner can normally be reached Monday - Friday 12:00 PM - 8:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle T Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AG/Examiner, Art Unit 2148 /MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148