DETAILED ACTION
This action is in response to the amendments filed 04/03/2026. Claims 1-8, 10-17, and 19 are pending and have been examined.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 4/3/2026 has been entered.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-8, 10-17, and 19 are rejected under 35 U.S.C. 101 because the claimed inventions are directed to non-statutory subject matter without significantly more.
Claim 1
Step 1: The claim recites “A data processing method”, and is therefore directed to the statutory category of process
Step 2A Prong 1: The claim recites the following judicial exception(s)
determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model: This can be performed as a mental process. One can merely identify a subset of nodes in the first neural network model and assign them to a second, choosing fewer nodes when resources and performance are more constrained.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
obtaining, by a data processing apparatus, an available resource state of a terminal device: This is directed to mere data gathering and is thus insignificant extra-solution activity (MPEP 2106.05(g)).
obtaining, by the data processing apparatus, a first neural network model: This is directed to mere data gathering and is thus insignificant extra-solution activity (MPEP 2106.05(g)).
… wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads and a first feedforward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers: This merely links the judicial exceptions to a particular field of use (transformer networks) (MPEP 2106.05(h)).
determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model: This is mere instruction to apply a judicial exception with a generic computer component (MPEP 2106.05(f)).
… such that the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
… wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of attention heads based on a resource state in a generic manner (MPEP 2106.05(f)).
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
… wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of neurons based on a resource state in a generic manner (MPEP 2106.05(f)).
sending, by the data processing apparatus, the second neural network model to the terminal device: This amounts to mere data transfer and is insignificant extra-solution activity (MPEP 2106.05(g)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
obtaining, by a data processing apparatus, an available resource state of a terminal device: This is an instance of retrieving information from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
obtaining, by the data processing apparatus, a first neural network model: This is an instance of retrieving information from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
… wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads and a first feedforward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers: This merely links the judicial exceptions to a particular field of use (transformer networks) (MPEP 2106.05(h)).
determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model: This is mere instruction to apply a judicial exception with a generic computer component (MPEP 2106.05(f)).
… such that the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
… wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of attention heads based on a resource state in a generic manner (MPEP 2106.05(f)).
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
… wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of neurons based on a resource state in a generic manner (MPEP 2106.05(f)).
sending, by the data processing apparatus, the second neural network model to the terminal device: This is an instance of transmitting data over a network, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. i.)
Claim 2
Step 1: The claim recites a process, as in claim 1
Step 2A Prong 1: The claim recites no further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more first width sizes, wherein each of the one or more first width sizes indicates a quantity of attention heads, wherein the quantity of attention heads is one of the one or more first width sizes is mere instruction to apply the judicial exceptions with a generic training method (MPEP 2106.05(f)).
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more second width sizes, wherein each of the one or more second width sizes indicates a quantity of neurons, wherein the quantity of neurons of the second intermediate layer is one of the one or more second width sizes is mere instruction to apply the judicial exceptions with a generic training method (MPEP 2106.05(f)).
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more depth sizes, wherein each of the one or more depth sizes indicates a quantity of transformer layers, wherein a quantity of second transformer layers is one of the one or more depth sizes is mere instruction to apply the judicial exceptions with a generic training method (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more first width sizes, wherein each of the one or more first width sizes indicates a quantity of attention heads, wherein the quantity of attention heads is one of the one or more first width sizes is mere instruction to apply the judicial exceptions with a generic training method (MPEP 2106.05(f)).
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more second width sizes, wherein each of the one or more second width sizes indicates a quantity of neurons, wherein the quantity of neurons of the second intermediate layer is one of the one or more second width sizes is mere instruction to apply the judicial exceptions with a generic training method (MPEP 2106.05(f)).
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more depth sizes, wherein each of the one or more depth sizes indicates a quantity of transformer layers, wherein a quantity of second transformer layers is one of the one or more depth sizes is mere instruction to apply the judicial exceptions with a generic training method (MPEP 2106.05(f)).
Claim 3
Step 1: The claim recites a process, as in claim 1
Step 2A Prong 1: The claim recites no further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
each attention head of the second transformer layer is one of the M attention heads of the first transformer layer: This merely links the judicial exception to a particular field of use (knowledge distillation) (MPEP 2106.05(h)).
each neuron of the second intermediate layer is one of the N neurons of the first intermediate layer: This merely links the judicial exception to a particular field of use (knowledge distillation) (MPEP 2106.05(h)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
each attention head of the second transformer layer is one of the M attention heads of the first transformer layer: This merely links the judicial exception to a particular field of use (knowledge distillation) (MPEP 2106.05(h)).
each neuron of the second intermediate layer is one of the N neurons of the first intermediate layer: This merely links the judicial exception to a particular field of use (knowledge distillation) (MPEP 2106.05(h)).
Claim 4
Step 1: The claim recites a process, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein a ratio of the quantity of neurons of the second intermediate layer to a quantity of neurons of the first intermediate layer is a first ratio, a ratio of the quantity of attention heads of the second transformer layer to a quantity of attention heads of the first transformer layer is a second ratio, and the first ratio is equal to the second ratio: The comparison of ratios can be performed as a mental process. One can merely compare the number of attention heads and intermediate layer neurons by observing the networks.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 5
Step 1: The claim recites a process, as in claim 1
Step 2A Prong 1: The claim recites no further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
the second transformer layer comprises a first attention head, the M attention heads comprise the first attention head and a second attention head, the second transformer layer does not comprise the second attention head, and a capability of affecting an output result of the first neural network model by the first attention head is greater than a capability of affecting the output result of the first neural network model by the second attention head: This merely links the judicial exception to a particular field of use (neural network pruning) (MPEP 2106.05(h)).
the second intermediate layer comprises a first neuron, the N neurons comprise the first neuron and a second neuron, the second intermediate layer does not comprise the second neuron, and a capability of affecting an output result of the first neural network model by the first neuron is greater than a capability of affecting the output result of the first neural network model by the second neuron: This merely links the judicial exception to a particular field of use (neural network pruning) (MPEP 2106.05(h)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
the second transformer layer comprises a first attention head, the M attention heads comprise the first attention head and a second attention head, the second transformer layer does not comprise the second attention head, and a capability of affecting an output result of the first neural network model by the first attention head is greater than a capability of affecting the output result of the first neural network model by the second attention head: This merely links the judicial exception to a particular field of use (neural network pruning) (MPEP 2106.05(h)).
the second intermediate layer comprises a first neuron, the N neurons comprise the first neuron and a second neuron, the second intermediate layer does not comprise the second neuron, and a capability of affecting an output result of the first neural network model by the first neuron is greater than a capability of affecting the output result of the first neural network model by the second neuron: This merely links the judicial exception to a particular field of use (neural network pruning) (MPEP 2106.05(h)).
Claim 6
Step 1: The claim recites a process, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
determining first width size information, second width size information, or depth size information of the second neural network model based on the available resource state, wherein the first width size information comprises the quantity of attention heads of the second transformer layer, the second width size information comprises the quantity of neurons of the second intermediate layer, and the depth size information comprises the quantity of transformer layers of the second neural network model can be performed as a mental process. One can merely decide on smaller quantities of attention heads, intermediate neurons, and transformer layers proportional to how tight the resource / performance requirements are.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
Claim 7
Step 1: The claim recites a process, as in claim 6
Step 2A Prong 1: The claim recites the following further judicial exception(s)
determining the first width size information, the second width size information, or the depth size information of the second neural network model based on a preset association relationship and the available resource state, wherein the preset association relationship indicates a correspondence between the available resource state and the first width size information of the second neural network model, a correspondence between the available resource state and the second width size information of the second neural network model, or a correspondence between the available resource state and the depth size information of the second neural network model can be performed as a mental process. One can merely decide on smaller quantities of attention heads, intermediate neurons, and transformer layers proportional to how tight the resource / performance requirements are.
the preset association relationship is a preset function; and an input of the preset function is the available resource state, and an output of the preset function is the first width size information of the second neural network model; or an input of the preset function is the available resource state, and an output of the preset function is the second width size information of the second neural network model; or an input of the preset function is the available resource state, and an output of the preset function is the depth size information of the second neural network model: The mental process by which one can decide on smaller quantities of attention heads, intermediate neurons, and transformer layers proportional to how tight the resource / performance requirements are can be considered a preset function held in someone’s mind. Thus, The determination of the second network information is still a mental process.
the preset association relationship is a preset table; and the preset table comprises a plurality of available resource states, and first width size information that is of the second neural network model and that corresponds to each available resource state; or the preset table comprises a plurality of available resource states, and second width size information that is of the second neural network model and that corresponds to each available resource state; or the preset table comprises a plurality of available resource states, and depth size information that is of the second neural network model and that corresponds to each available resource state: One can mentally pair quantities of attention heads, intermediate neurons, and transformer layers with to how tight the resource / performance requirements are and store these pairings as a table in their minds. Thus, The determination of the second network information is still a mental process.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s). No further additional elements are recited.
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s). No further additional elements are recited.
Claim 8
Step 1: The claim recites a process, as in claim 1
Step 2A Prong 1: The claim recites no further judicial exception(s)
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
wherein the available resource state comprises power consumption of the terminal device, a computing capability of the terminal device, or an available storage size of the terminal device: Obtaining the resource state or performance requirement information of a terminal device is still insignificant extra-solution activity directed to mere data gathering (MPEP 2106.05(g)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
wherein the available resource state comprises power consumption of the terminal device, a computing capability of the terminal device, or an available storage size of the terminal device: Obtaining the resource state or performance requirement is still an instance of retrieving information from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
Claim 10
Step 1: The claim recites “A data processing apparatus”, and is therefore directed to the statutory category of article of manufacture
Step 2A Prong 1: The claim recites the following judicial exception(s)
determining a second neural network model based on the available resource state of the terminal device and the first neural network model can be performed as a mental process. One can merely identify a subset of nodes in the first neural network model and assign them to a second, choosing fewer nodes when resources and performance are more constrained.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
A data processing apparatus, comprising: a memory storing executable instructions; and a processor configured to execute the executable instructions to perform operations: This is mere instruction to apply the judicial exception with generic computing hardware (MPEP 2106.05(f)).
obtaining an available resource state of a terminal device is insignificant extra-solution activity directed to mere data gathering (MPEP 2106.05(g)).
obtaining a first neural network model is insignificant extra-solution activity directed to mere data gathering (MPEP 2106.05(g)).
wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads and a first feedforward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers: This merely links the judicial exceptions to a particular field of use (transformer networks) (MPEP 2106.05(h)).
such that the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of attention heads based on a resource state in a generic manner (MPEP 2106.05(f)).
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of neurons based on a resource state in a generic manner (MPEP 2106.05(f)).
sending, by the data processing apparatus, the second neural network model to the terminal device: This amounts to mere data transfer and is insignificant extra-solution activity (MPEP 2106.05(g)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
A data processing apparatus, comprising: a memory storing executable instructions; and a processor configured to execute the executable instructions to perform operations: This is mere instruction to apply the judicial exception with generic computing hardware (MPEP 2106.05(f)).
obtaining an available resource state of a terminal device is an instance of retrieving information from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
obtaining a first neural network model is an instance of retrieving information from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads and a first feedforward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers: This merely links the judicial exceptions to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of attention heads based on a resource state in a generic manner (MPEP 2106.05(f)).
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of neurons based on a resource state in a generic manner (MPEP 2106.05(f)).
sending, by the data processing apparatus, the second neural network model to the terminal device: This is an instance of transmitting data over a network, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. i.)
Claims 11-17
Step 1: Claims 11-17 recite an article of manufacture, as in claim 10.
Step 2A Prong 1: Claims 11-17 recite the same judicial exception(s) as claims 2-8, respectively.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through any additional elements. The analysis of claims 11-17 at this step mirrors that of claims 2-8, respectively, with the exception that claims 11-17 are directed to “A data processing apparatus, comprising: a memory storing executable instructions; a processor configured to execute the executable instructions to perform operations”, said operations mirroring those of claims 2-8. This is a mere instruction to apply the exceptions using generic computer equipment (MPEP 2106.05(f)).
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s). The analysis of claims 11-17 at this step mirrors that of claims 2-8, with the exception that claims 11-17 are directed to “A data processing apparatus, comprising: a memory storing executable instructions; a processor configured to execute the executable instructions to perform operations”, said operations mirroring those of claims 2-8. This is mere instruction to apply the exceptions using generic computer equipment (MPEP 2106.05(f)).
Claim 19
Step 1: The claim recites “A non-transitory computer-readable storage medium”, and is therefore directed to the statutory category of article of manufacture
Step 2A Prong 1: The claim recites the following judicial exception(s)
determining a second neural network model based on the available resource state of the terminal device and the first neural network model can be performed as a mental process. One can merely identify a subset of nodes in the first neural network model and assign them to a second, choosing fewer nodes when resources and performance are more constrained.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
A non-transitory computer-readable storage medium having stored on computer-executable instructions that when executed by a computer causes the computer to perform operations: This is mere instruction to apply the judicial exception with generic computing hardware (MPEP 2106.05(f)).
obtaining an available resource state of a terminal device is insignificant extra-solution activity directed to mere data gathering (MPEP 2106.05(g)).
obtaining a first neural network model is insignificant extra-solution activity directed to mere data gathering (MPEP 2106.05(g)).
wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads and a first feedforward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers: This merely links the judicial exceptions to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of attention heads based on a resource state in a generic manner (MPEP 2106.05(f)).
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of neurons based on a resource state in a generic manner (MPEP 2106.05(f)).
sending, by the data processing apparatus, the second neural network model to the terminal device: This amounts to mere data transfer and is insignificant extra-solution activity (MPEP 2106.05(g)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
A non-transitory computer-readable storage medium having stored on computer-executable instructions that when executed by a computer causes the computer to perform operations: This is mere instruction to apply the judicial exception with generic computing hardware (MPEP 2106.05(f)).
obtaining an available resource state of a terminal device is an instance of retrieving information from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
obtaining a first neural network model is an instance of retrieving information from memory, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. iv.)
wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads and a first feedforward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers: This merely links the judicial exceptions to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of attention heads based on a resource state in a generic manner (MPEP 2106.05(f)).
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N: This merely links the judicial exception to a particular field of use (transformer networks) (MPEP 2106.05(h)).
wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device: This is mere instruction to determine a quantity of neurons based on a resource state in a generic manner (MPEP 2106.05(f)).
sending, by the data processing apparatus, the second neural network model to the terminal device: This is an instance of transmitting data over a network, a limitation known to be well-understood, routine, and conventional (MPEP 2106.05(d) II. i.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-8 are rejected under 35 U.S.C. 103 as being unpatentable over McCarley (Pruning a BERT-based Question Answering Model, published October 14th, 2019, arXiv:1910.06360v1) in view of Veniat et al. (Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks, published 5/22/2018, arXiv:1706.00046v4), hereafter referred to as Veniat.
Regarding claim 1, McCarley discloses [a] data processing method, comprising:
obtaining, by the data processing apparatus, a first neural network model: “We investigate compressing a BERT-based question answering system by pruning parameters from the underlying BERT model (first neural network model). We start from models (first neural network model[s]) trained for SQuAD 2.0 and introduce gates that allow selected parts of transformers to be individually eliminated.” (McCarley, page 1, left column, Abstract)
… wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads:
“In each self-attention sublayer (transformer layer), we place a mask,
Γ
a
t
t
n
of size
n
H
which selects attention heads to remain active. (section 3.2.2)” (McCarley, page 2, right column, paragraph 1)
… and a first feed-forward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers:
“Specifically, we investigate … (2) reducing the intermediate width of the feed-forward sublayer (feed-forward layer / intermediate layer) of each transformer” (McCarley, page 1, left column, Abstract)
“In each feed-forward sublayer (feed-forward layer / intermediate layer), we place a mask,
Γ
f
f
of size
d
I
which selects ReLU/GeLU activations to remain active. (section 3.3),” (McCarley, page 2, right column, paragraph 2).
PNG
media_image1.png
131
370
media_image1.png
Greyscale
” Notation: important dimensions of a BERT model” (McCarley, page 2, left column, Figure 1). The base BERT model has 12 > 1 attention heads and 3072 > 1 intermediate neurons.
determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model: “We start from models (first neural network model[s]) trained for SQuAD 2.0 and introduce gates that allow selected parts of transformers to be individually eliminated.” (McCarley, page 1, left column, Abstract). Each pruned model with components eliminated is a second neural network model.
… such that the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M, wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device:
“transformer is similar. We insert three masks into each transformer. Each mask is a vector of gate variables
γ
i
∈
[
0,1
]
, where
γ
i
=
0
indicates a slice of transformer parameters to be pruned, and
γ
i
=
1
indicates a slice to remain active.” (McCarley, page 2, left column, paragraph 4).
“After the values of the
γ
i
have been determined by one of the above methods, the model is pruned. Attention heads corresponding to
γ
i
a
t
t
n
=
0
are removed.” (McCarley, page 3, left column, paragraph 1). Pruned attention heads are removed, resulting in layers forming subsets of the first network’s transformer layers.
PNG
media_image2.png
193
674
media_image2.png
Greyscale
” Decoding times, accuracies, and space savings achieved by two sample operating points on large-qa “ (McCarley, page 4, Table 1). Some attention heads are pruned and removed.
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N, wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device:
“Slices of the feed forward linear transformations corresponding to
γ
i
f
f
=
0
are removed.” (McCarley, page 3, left column, paragraph 1). Pruned intermediate neurons are removed, resulting in layers forming subsets of the first network’s feedforward layers.
Examiner’s note: As seen in Table 1 above, some feedforward neurons are pruned and removed.
McCarley relates to transformer attention head & feedforward neuron pruning and is analogous to the claimed invention.
While McCarley fails to disclose the further limitations of the claim, Veniat discloses [a] data processing method, comprising:
obtaining, by a data processing apparatus, an available resource state of a terminal device:
“Let us also define C the maximum cost (available resource state) the user would allow. For instance, when solving the problem of learning a model with a computation time lower than 200 ms then C is equal to 200ms … the evaluated cost is specific to the particular infrastructure on which the model is ran. For instance, if C is the cost in milliseconds, the value of
C
H
⊙
E
will not be the same depending on the device (terminal device) on which the model is used. Note that the only required property of
C
H
⊙
E
is that this cost can be measured during training” (Veniat, page 3, right column, paragraph 1)
“Each model is trained with various values for the objective cost C” (Veniat, page 5, left column, paragraph 1)
determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model:
“Our model called Budgeted Super Network (BSN) is based on the following principles: (i) the user provides a (big) Super Network (first neural network model) (see Section 2) defining a large set of possible final network architectures (second neural network model[s]) as well as a maximum authorized cost.” (Veniat, page 1, right column, paragraph 3)
“We formulate this issue as a problem of automatically learning a neural network architecture (second neural network model) under budget constraints. To tackle this problem, we propose a budgeted learning approach that integrates a maximum cost (available resource state) directly in the learning objective function.” (Veniat, page 1, right column, paragraph 2)
sending, by the data processing apparatus, the second neural network model to the terminal device: “Figure 2 and Table 1 show the performance of different models over CIFAR-10. Each point corresponds to a model evaluated both in term of accuracy and computation cost. When considering the B-ResNet model, and by fixing the value of C to the computation cost of the different ResNet architectures, we obtain budgeted models (second neural network model[s]) that have approximatively the same costs than the ResNets, but with a higher accuracy” (Veniat, page 6, left column, paragraph 1). The pruned models are inherently run on some device to execute this experiment and measure results.
Veniat relates to pruning neural networks based on end-hardware efficiency and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified McCarley to penalize the network pruning process with a loss function incorporating a hardware cost term, as disclosed by Veniat. Doing so would enable the system to automatically identify optimal pruned networks given arbitrary hardware cost constraints, a feature critical to executing neural networks on a variety of different end devices with different requirements. See Veniat, page 1, right column, paragraph 1 & page 8, right column, paragraph 4.
Regarding claim 2, the rejection of claim 1 in view of McCarley is incorporated. McCarley further discloses a method, wherein
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more first width sizes, wherein each of the one or more first width sizes indicates a quantity of attention heads wherein the quantity of attention heads is one of the one or more first width sizes: “In each self-attention sublayer, we place a mask,
Γ
a
t
t
n
of size
n
H
(first width size) which selects attention heads to remain active. (section 3.2.2)” (McCarley, page 2, right column, paragraph 1)
the first neural network model is obtained by performing knowledge distillation training on an initial neural network model based on one or more second width sizes, wherein each of the one or more second width sizes indicates a quantity of neurons: “In each feed-forward sublayer, we place a mask,
Γ
f
f
of size
d
I
(second width size) which selects ReLU/GeLU activations to remain active. (section 3.3),” (McCarley, page 2, right column, paragraph 2).
Regarding claim 3, the rejection of claim 1 in view of McCarley and Veniat is incorporated. McCarley further discloses a method, wherein
each attention head of the second transformer layer is one of the M attention heads of the first transformer layer; or each neuron of the second intermediate layer is one of the N neurons of the first intermediate layer: “After the values of the
γ
i
have been determined by one of the above methods, the model is pruned. Attention heads corresponding to
γ
i
a
t
t
n
=
0
are removed. Slices of the feed forward linear transformations corresponding to
γ
i
f
f
=
0
are removed.” (McCarley, page 3, left column, paragraph 1). All remaining attention heads and feedforward neurons are present in the original unpruned network.
Regarding claim 4, the rejection of claim 1 in view of McCarley and Veniat is incorporated. McCarley further discloses a method, wherein a ratio of the quantity of neurons of the second intermediate layer to a quantity of neurons of the first intermediate layer is a first ratio, a ratio of the quantity of attention heads of the second transformer layer to a quantity of attention heads of the first transformer layer is a second ratio, and the first ratio is equal to the second ratio: “We investigate four approches [sic] to determining the gate values. (1) ‘random:’ each
γ
i
is sampled from a Bernoulli distribution of parameter p (first / second ratio), where p is manually adjusted to control the sparsity” (McCarley, page 2, right column, paragraph 5. Sampling each mask, for attention heads and feedforward neurons, with the same pruning probability, will prune the same proportion of both attention heads and feedforward neurons.
Regarding claim 5, the rejection of claim 1 in view of McCarley and Veniat is incorporated. McCarley further discloses a method, wherein
the second transformer layer comprises a first attention head, the M attention heads comprise the first attention head and a second attention head, the second transformer layer does not comprise the second attention head, and a capability of affecting an output result of the first neural network model by the first attention head is greater than a capability of affecting the output result of the first neural network model by the second attention head; or the second intermediate layer comprises a first neuron, the N neurons comprise the first neuron and a second neuron, the second intermediate layer does not comprise the second neuron, and a capability of affecting an output result of the first neural network model by the first neuron is greater than a capability of affecting the output result of the first neural network model by the second neuron: “We investigate four approches [sic] to determining the gate values … (2) ‘gain:’ We follow the method of (Michel et al., 2019) and estimate the influence of each gate i on the training set likelihood L by computing the mean value of
PNG
media_image3.png
91
151
media_image3.png
Greyscale
(‘head importance score’) (capability of affecting an output result) during one pass over the training data. We threshold
g
i
to determine which transformer slices to retain.” (McCarley, page 2, right column, paragraph 6). Only the less influential attention heads and feedforward neurons are pruned via this method of computing masks. Thus, the remaining attention heads and feedforward neurons have greater capability of affecting the model output.
Regarding claim 6, the rejection of claim 1 in view of McCarley and Veniat is incorporated. McCarley, in combination with Veniat, discloses a method, further comprising: determining first width size information, second width size information, or depth size information of the second neural network model based on the available resource state, wherein the first width size information comprises the quantity of attention heads of the second transformer layer, the second width size information comprises the quantity of neurons of the second intermediate layer, and the depth size information comprises the quantity of transformer layers of the second neural network model:
(McCarley) “In each self-attention sublayer, we place a mask,
Γ
a
t
t
n
of size
n
H
(first width size) which selects attention heads to remain active. (section 3.2.2)” (McCarley, page 2, right column, paragraph 1)
(McCarley) “In each feed-forward sublayer, we place a mask,
Γ
f
f
of size
d
I
(second width size) which selects ReLU/GeLU activations to remain active. (section 3.3),” (McCarley, page 2, right column, paragraph 2).
Examiner’s note, by combining Veniat’s pruning loss term that incorporated resource state information into McCarley’s method (as in the combination of claim 1), the available resource state information is being used to determine these pruning parameters.
Veniat relates to pruning neural networks based on end-hardware efficiency and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified McCarley to penalize the network pruning process with a loss function incorporating a hardware cost term, as disclosed by Veniat. Doing so would enable the system to automatically identify optimal pruned networks given arbitrary hardware cost constraints, a feature critical to executing neural networks on a variety of different end devices with different requirements. See Veniat, page 1, right column, paragraph 1 & page 8, right column, paragraph 4.
Regarding claim 7, the rejection of claim 6 in view of McCarley and Veniat is incorporated. McCarley, in combination with Veniat, further discloses a method, wherein the step of determining the first width size information, second width size information, or depth size information of the second neural network model comprises:
determining the first width size information, the second width size information, or the depth size information of the second neural network model based on a preset association relationship and the available resource state, wherein the preset association relationship indicates a correspondence between the available resource state and the first width size information of the second neural network model, a correspondence between the available resource state and the second width size information of the second neural network model, or a correspondence between the available resource state and the depth size information of the second neural network model; and the preset association relationship is a preset function; and an input of the preset function is the available resource state and an output of the preset function is the first width size information of the second neural network model; or an input of the preset function is the available resource state, and an output of the preset function is the second width size information of the second neural network model; or an input of the preset function is the available resource state, and an output of the preset function is the depth size information of the second neural network model:
(Veniat) “Let us also define C the maximum cost (available resource state) the user would allow. For instance, when solving the problem of learning a model with a computation time lower than 200 ms then C is equal to 200ms.We aim at solving the following soft constrained budgeted learning problem (preset function):
PNG
media_image4.png
98
522
media_image4.png
Greyscale
” (Veniat, page 3, right column, paragraph 1)
(Veniat) “Indeed, each sub-graph of E (subset of edges) corresponds itself to a S-network and will be denoted
H
⊙
E
, where H (output of the preset function) corresponds to a binary matrix used as a mask to select the edges in E and
⊙
is the Hadamard product. Our objective will thus be to identify the best matrix H such that the corresponding S-network (
H
⊙
E
,
θ
) will be a network efficient in terms of both predictive quality and computation/ memory/... cost” (Veniat, page 3, left column, paragraph 4). The output of the solved budgeted learning problem is a matrix defining the pruned network. In combination with McCarley, this output would define first width size information or second width size information.
Veniat relates to pruning neural networks based on end-hardware efficiency and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified McCarley to penalize the network pruning process with a loss function incorporating a hardware cost term, as disclosed by Veniat. Doing so would enable the system to automatically identify optimal pruned networks given arbitrary hardware cost constraints, a feature critical to executing neural networks on a variety of different end devices with different requirements. See Veniat, page 1, right column, paragraph 1 & page 8, right column, paragraph 4.
Regarding claim 8, the rejection of claim 1 in view of McCarley and Veniat is incorporated. Veniat further discloses a method, wherein the available resource state comprises power consumption of the terminal device, a computing capability of the terminal device, or an available storage size of the terminal device: “we investigate the ability of our method to deal with three different costs: (i) the computation cost (computing capability) reflecting the inference speed of the resulting model, (ii) the memory consumption cost (available storage size) that measures the final size of the model, and the (iii) distributed computation cost that measures the inference speed when computations are distributed over multiple machines or processors” (Veniat, page 1, right column, paragraph 2)”
Veniat relates to pruning neural networks based on end-hardware efficiency and is analogous to the claimed invention. The existing combination teaches a method of pruning a neural network based on a terminal device cost metric. The claimed invention improves upon this method by measuring computing capability or storage size as a pruning cost metric. Veniat teaches a method of pruning a neural network based on terminal device computing capability or storage size as a cost metric, applicable to the existing combination. A person of ordinary skill in the art would have recognized that optimizing pruning for processing power and / or memory size would lead to the predictable result of optimizing a network for a device with arbitrary processor and memory capabilities, and would improve the known device by optimizing the pruning process on two of the most universal and important resource limitations neural networks face when being deployed on end devices (MPEP 2143 I. (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results).
Claims 10-17 & 19 are rejected under 35 U.S.C. 103 as being unpatentable over McCarley (Pruning a BERT-based Question Answering Model, published October 14th, 2019, arXiv:1910.06360v1) in view of Veniat et al. (Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks, published 5/22/2018, arXiv:1706.00046v4), hereafter referred to as Veniat, and further in view of Liu et al. (MULTI-TASK KNOWLEDGE DISTILLATION FOR LANGUAGE MODEL, filed 12/16/2019, US 2021/0142164 A1), hereafter referred to as Liu.
Regarding claim 10, McCarley discloses operations of:
obtaining a first neural network model: “We investigate compressing a BERT-based question answering system by pruning parameters from the underlying BERT model (first neural network model). We start from models (first neural network model[s]) trained for SQuAD 2.0 and introduce gates that allow selected parts of transformers to be individually eliminated.” (McCarley, page 1, left column, Abstract)
… wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads:
“In each self-attention sublayer (transformer layer), we place a mask,
Γ
a
t
t
n
of size
n
H
which selects attention heads to remain active. (section 3.2.2)” (McCarley, page 2, right column, paragraph 1)
… and a first feed-forward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers:
“Specifically, we investigate … (2) reducing the intermediate width of the feed-forward sublayer (feed-forward layer / intermediate layer) of each transformer” (McCarley, page 1, left column, Abstract)
“In each feed-forward sublayer (feed-forward layer / intermediate layer), we place a mask,
Γ
f
f
of size
d
I
which selects ReLU/GeLU activations to remain active. (section 3.3),” (McCarley, page 2, right column, paragraph 2).
PNG
media_image1.png
131
370
media_image1.png
Greyscale
” Notation: important dimensions of a BERT model” (McCarley, page 2, left column, Figure 1). The base BERT model has 12 > 1 attention heads and 3072 > 1 intermediate neurons.
determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model: “We start from models (first neural network model[s]) trained for SQuAD 2.0 and introduce gates that allow selected parts of transformers to be individually eliminated.” (McCarley, page 1, left column, Abstract). Each pruned model with components eliminated is a second neural network model.
… such that the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M, wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device:
“transformer is similar. We insert three masks into each transformer. Each mask is a vector of gate variables
γ
i
∈
[
0,1
]
, where
γ
i
=
0
indicates a slice of transformer parameters to be pruned, and
γ
i
=
1
indicates a slice to remain active.” (McCarley, page 2, left column, paragraph 4).
“After the values of the
γ
i
have been determined by one of the above methods, the model is pruned. Attention heads corresponding to
γ
i
a
t
t
n
=
0
are removed.” (McCarley, page 3, left column, paragraph 1). Pruned attention heads are removed, resulting in layers forming subsets of the first network’s transformer layers.
PNG
media_image2.png
193
674
media_image2.png
Greyscale
” Decoding times, accuracies, and space savings achieved by two sample operating points on large-qa “ (McCarley, page 4, Table 1). Some attention heads are pruned and removed.
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N, wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device:
“Slices of the feed forward linear transformations corresponding to
γ
i
f
f
=
0
are removed.” (McCarley, page 3, left column, paragraph 1). Pruned intermediate neurons are removed, resulting in layers forming subsets of the first network’s feedforward layers.
Examiner’s note: As seen in Table 1 above, some feedforward neurons are pruned and removed.
McCarley relates to transformer attention head & feedforward neuron pruning and is analogous to the claimed invention.
While McCarley fails to disclose the further limitations of the claim, Veniat discloses operations of:
obtaining an available resource state of a terminal device:
“Let us also define C the maximum cost (available resource state) the user would allow. For instance, when solving the problem of learning a model with a computation time lower than 200 ms then C is equal to 200ms … the evaluated cost is specific to the particular infrastructure on which the model is ran. For instance, if C is the cost in milliseconds, the value of
C
H
⊙
E
will not be the same depending on the device (terminal device) on which the model is used. Note that the only required property of
C
H
⊙
E
is that this cost can be measured during training” (Veniat, page 3, right column, paragraph 1)
“Each model is trained with various values for the objective cost C” (Veniat, page 5, left column, paragraph 1)
determining a second neural network model based on the available resource state of the terminal device and the first neural network model:
“Our model called Budgeted Super Network (BSN) is based on the following principles: (i) the user provides a (big) Super Network (first neural network model) (see Section 2) defining a large set of possible final network architectures (second neural network model[s]) as well as a maximum authorized cost.” (Veniat, page 1, right column, paragraph 3)
“We formulate this issue as a problem of automatically learning a neural network architecture (second neural network model) under budget constraints. To tackle this problem, we propose a budgeted learning approach that integrates a maximum cost (available resource state) directly in the learning objective function.” (Veniat, page 1, right column, paragraph 2)
sending the second neural network model to the terminal device: “Figure 2 and Table 1 show the performance of different models over CIFAR-10. Each point corresponds to a model evaluated both in term of accuracy and computation cost. When considering the B-ResNet model, and by fixing the value of C to the computation cost of the different ResNet architectures, we obtain budgeted models (second neural network model[s]) that have approximatively the same costs than the ResNets, but with a higher accuracy” (Veniat, page 6, left column, paragraph 1). The pruned models are inherently run on some device to execute this experiment and measure results.
Veniat relates to pruning neural networks based on end-hardware efficiency and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified McCarley to penalize the network pruning process with a loss function incorporating a hardware cost term, as disclosed by Veniat. Doing so would enable the system to automatically identify optimal pruned networks given arbitrary hardware cost constraints, a feature critical to executing neural networks on a variety of different end devices with different requirements. See Veniat, page 1, right column, paragraph 1 & page 8, right column, paragraph 4.
While Veniat fails to disclose the further limitations of the claim, Liu teaches [a] data processing apparatus, comprising a memory storing executable instructions; a processor configured to execute the executable instructions to perform operations: “memory 120 may include non-transitory, tangible, machine readable media (memory) that includes executable code (instructions) that when run by one or more processors (e.g., processor 110) may cause the one or more processors to perform the methods described in further detail herein” (Liu, [0027]).
Liu relates to finding subnets for transformer networks and is analogous to the claimed invention. The combination of the previously cited prior art teaches a method for finding subnets for transformer networks. The claimed invention improves upon this method by storing it in the form of instructions on computer hardware. Liu teaches computer hardware that can store and execute instructions for deriving transformer subnets, applicable to the combination of previously cited prior art. A person of ordinary skill in the art would have recognized that storing the combination’s method as computer instructions on Liu’s hardware would lead to the predictable result of the method being executable by a computing system, and would improve the known device by allowing it to be performed with real data (MPEP 2143 I. (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results).
The analysis of claims 11-17 mirrors that of claims 2-8, with the exception that claims 11-17 are directed to generic computer hardware which executes the methods of claims 2-8. This generic hardware is taught by Liu, as discussed regarding claim 10. Thus, claims 11-17 are rejected under the same rationales used for claims 2-8, respectively.
Regarding claim 19, McCarley discloses operations of:
obtaining a first neural network model: “We investigate compressing a BERT-based question answering system by pruning parameters from the underlying BERT model (first neural network model). We start from models (first neural network model[s]) trained for SQuAD 2.0 and introduce gates that allow selected parts of transformers to be individually eliminated.” (McCarley, page 1, left column, Abstract)
… wherein the first neural network model comprises a first transformer layer, the first transformer layer comprises M attention heads:
“In each self-attention sublayer (transformer layer), we place a mask,
Γ
a
t
t
n
of size
n
H
which selects attention heads to remain active. (section 3.2.2)” (McCarley, page 2, right column, paragraph 1)
… and a first feed-forward layer, the first feed-forward layer comprises a first intermediate layer, the first intermediate layer comprises N neurons, and M and N are positive integers:
“Specifically, we investigate … (2) reducing the intermediate width of the feed-forward sublayer (feed-forward layer / intermediate layer) of each transformer” (McCarley, page 1, left column, Abstract)
“In each feed-forward sublayer (feed-forward layer / intermediate layer), we place a mask,
Γ
f
f
of size
d
I
which selects ReLU/GeLU activations to remain active. (section 3.3),” (McCarley, page 2, right column, paragraph 2).
PNG
media_image1.png
131
370
media_image1.png
Greyscale
” Notation: important dimensions of a BERT model” (McCarley, page 2, left column, Figure 1). The base BERT model has 12 > 1 attention heads and 3072 > 1 intermediate neurons.
determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model: “We start from models (first neural network model[s]) trained for SQuAD 2.0 and introduce gates that allow selected parts of transformers to be individually eliminated.” (McCarley, page 1, left column, Abstract). Each pruned model with components eliminated is a second neural network model.
… such that the second neural network model meets at least one of the following conditions:
the second neural network model comprises a second transformer layer corresponding to the first transformer layer, and a quantity of attention heads of the second transformer layer is less than M, wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device:
“transformer is similar. We insert three masks into each transformer. Each mask is a vector of gate variables
γ
i
∈
[
0,1
]
, where
γ
i
=
0
indicates a slice of transformer parameters to be pruned, and
γ
i
=
1
indicates a slice to remain active.” (McCarley, page 2, left column, paragraph 4).
“After the values of the
γ
i
have been determined by one of the above methods, the model is pruned. Attention heads corresponding to
γ
i
a
t
t
n
=
0
are removed.” (McCarley, page 3, left column, paragraph 1). Pruned attention heads are removed, resulting in layers forming subsets of the first network’s transformer layers.
PNG
media_image2.png
193
674
media_image2.png
Greyscale
” Decoding times, accuracies, and space savings achieved by two sample operating points on large-qa “ (McCarley, page 4, Table 1). Some attention heads are pruned and removed.
the second neural network model comprises a second intermediate layer corresponding to the first intermediate layer, and a quantity of neurons of the second intermediate layer is less than N, wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device:
“Slices of the feed forward linear transformations corresponding to
γ
i
f
f
=
0
are removed.” (McCarley, page 3, left column, paragraph 1). Pruned intermediate neurons are removed, resulting in layers forming subsets of the first network’s feedforward layers.
Examiner’s note: As seen in Table 1 above, some feedforward neurons are pruned and removed.
McCarley relates to transformer attention head & feedforward neuron pruning and is analogous to the claimed invention.
While McCarley fails to disclose the further limitations of the claim, Veniat discloses operations of:
obtaining an available resource state of a terminal device:
“Let us also define C the maximum cost (available resource state) the user would allow. For instance, when solving the problem of learning a model with a computation time lower than 200 ms then C is equal to 200ms … the evaluated cost is specific to the particular infrastructure on which the model is ran. For instance, if C is the cost in milliseconds, the value of
C
H
⊙
E
will not be the same depending on the device (terminal device) on which the model is used. Note that the only required property of
C
H
⊙
E
is that this cost can be measured during training” (Veniat, page 3, right column, paragraph 1)
“Each model is trained with various values for the objective cost C” (Veniat, page 5, left column, paragraph 1)
determining a second neural network model based on the available resource state of the terminal device and the first neural network model:
“Our model called Budgeted Super Network (BSN) is based on the following principles: (i) the user provides a (big) Super Network (first neural network model) (see Section 2) defining a large set of possible final network architectures (second neural network model[s]) as well as a maximum authorized cost.” (Veniat, page 1, right column, paragraph 3)
“We formulate this issue as a problem of automatically learning a neural network architecture (second neural network model) under budget constraints. To tackle this problem, we propose a budgeted learning approach that integrates a maximum cost (available resource state) directly in the learning objective function.” (Veniat, page 1, right column, paragraph 2)
sending the second neural network model to the terminal device: “Figure 2 and Table 1 show the performance of different models over CIFAR-10. Each point corresponds to a model evaluated both in term of accuracy and computation cost. When considering the B-ResNet model, and by fixing the value of C to the computation cost of the different ResNet architectures, we obtain budgeted models (second neural network model[s]) that have approximatively the same costs than the ResNets, but with a higher accuracy” (Veniat, page 6, left column, paragraph 1). The pruned models are inherently run on some device to execute this experiment and measure results.
Veniat relates to pruning neural networks based on end-hardware efficiency and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified McCarley to penalize the network pruning process with a loss function incorporating a hardware cost term, as disclosed by Veniat. Doing so would enable the system to automatically identify optimal pruned networks given arbitrary hardware cost constraints, a feature critical to executing neural networks on a variety of different end devices with different requirements. See Veniat, page 1, right column, paragraph 1 & page 8, right column, paragraph 4.
While Veniat fails to disclose the further limitations of the claim, Liu teaches [a] non-transitory computer-readable storage medium having stored on computer-executable instructions that when executed by a computer causes the computer to perform operations of: “memory 120 may include non-transitory, tangible, machine readable media (computer-readable storage medium) that includes executable code (computer-executable instructions) that when run by one or more processors (e.g., processor 110) may cause the one or more processors to perform the methods described in further detail herein” (Liu, [0027]).
Liu relates to finding subnets for transformer networks and is analogous to the claimed invention. The existing combination teaches a method for finding subnets for transformer networks. The claimed invention improves upon this method by storing it in the form of instructions on computer hardware. Liu teaches computer hardware that can store and execute instructions for deriving transformer subnets, applicable to the existing combination. A person of ordinary skill in the art would have recognized that storing the combination’s method as computer instructions on Liu’s hardware would lead to the predictable result of the method being executable by a computing system, and would improve the known device by allowing it to be performed with real data (MPEP 2143 I. (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results).
Response to Arguments
The following responses address arguments and remarks made in the instant remarks dated 09/28/2025 and 11/07/2025.
Objections
In light of the instant amendments, previous objections to paragraph [00283] have been withdrawn. The Examiner notes that the “paragraph [0283]” amended in the instant amendments seems intended to be equivalent to paragraph [00283] of the instant specification.
The Examiner notes that the previous objection to the title has not been addressed by the instant amendments.
112 Rejections
On pages 11-12 of the instant remarks, the Applicant argues that limitations argued to be new matter are represented in the specification and originally filed claims:
“However, the specification (e.g., [00318]-[00325]) and Claim 7 explicitly disclose a "preset
association relationship" (a function or table) that maps an available resource state (input) to a
specific model parameter ( output). By amending Claim 1 to clarify that the data processing
apparatus performs this determination, the claim now more clearly aligns with the disclosed
technical architecture. A person of ordinary skill in the art understands that providing a functional
mapping between a hardware variable (e.g., RAM/Battery) and a structural parameter (e.g., neurons/heads) constitutes a full and clear description. The "quantity" is not arbitrary; it is a direct
mechanical output of the disclosed mapping logic”
Regarding the Applicant’s arguments above, the Examiner partly agrees.
Paragraphs [00318-00325] of the instant specification don’t describe any relationship between a resource state of the terminal device and either the number of attention heads or number of intermediate neurons. Rather, they generically describe processing attention layer inputs with multiple attention heads and including an intermediate layer, with no suggestion that the layer inputs are equivalent to a resource state.
Claim 7 of the originally filed claims received 08/08/2022 clearly describes a preset association relationship that maps the available resource state to the number of attention heads (via the first width size) and / or the available resource state to the number of neurons in the intermediate layer (via the second width size). Thus, previous rejections under 35 U.S.C. 112(a) concerning “the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device” and “the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device” being new matter have been withdrawn.
Previous objections under 35 U.S.C. 112(b) have been withdrawn in light of the instant amendments.
101 Rejections
On pages 13-14 of the instant remarks, the Applicant argues that the claimed invention represents an improvement to technology:
“Applicant respectfully traverses the rejection under 35 U.S.C. § 101. The claimed invention
is not directed to an abstract idea, but rather to a specific technical improvement in the field of on-device
artificial intelligence.
…
Even if the underlying determination were found to involve mathematical concepts, the claim is
patent-eligible because the judicial exception is integrated into a practical application that improves computer functionality. See En.fish, LLC v. Microsoft Corp., 822 F.3d 1327 (Fed. Cir.
2016).
Improvement to the Functioning of a Computer: The claimed invention solves a specific
technical problem in mobile computing: the execution of memory-intensive Transformer
models on resource-constrained hardware. By dynamically compressing the model width
based on a terminal's current RAM or power state, the data processing apparatus can
effectively improve data processing precision of a scaled model (the second neural
network model) and meet performance requirement of the terminal device.
Specific Rules vs. General Concepts: Like the eligible claims in McRO, Inc. v. Bandai
Namco Games Am. Inc., 837 F.3d 1299 (Fed. Cir. 2016), Claim 1 employs a specific set of
rules (the "preset association relationship") to automate a hardware-management task that
was previously impossible to perform dynamically. This is not the mere "automation of a
manual process," but a technical solution rooted in the structural parameters of Transformer
networks.”
In response to the Applicant’s argument that the claimed invention improves upon existing technology, the Examiner respectfully disagrees. The improvement of a claimed invention must be represented by the claim language, as noted in MPEP 2106.05(a): “If it is asserted that the invention improves upon conventional functioning of a computer, or upon conventional technology or technological processes, a technical explanation as to how to implement the invention should be present in the specification. That is, the disclosure must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art … After the examiner has consulted the specification and determined that the disclosed invention improves technology, the claim must be evaluated to ensure the claim itself reflects the disclosed improvement in technology. Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307, 1316, 120 USPQ2d 1353, 1359 (Fed. Cir. 2016) (patent owner argued that the claimed email filtering system improved technology by shrinking the protection gap and mooting the volume problem, but the court disagreed because the claims themselves did not have any limitations that addressed these issues). That is, the claim must include the components or steps of the invention that provide the improvement described in the specification.”
While the Applicant has described a specific technical problem (executing Transformer models on resource-constrained hardware) and proposed improvements to said problem (improving processing of a scaled model and adhering to performance requirements of the terminal device), the Examiner does not find that these argued improvements are represented by the claim language.
The claims describe determining the scaled second network model based on the resource state of the terminal device at a high level of generality. For example, claim 1 discloses “wherein the quantity of attention heads of the second transformer layer is determined based on the available resource state of the terminal device” and “wherein the quantity of neurons of the second intermediate layer is determined based on the available resource state of the terminal device” in its fourth and fifth limitations. It’s unclear what the relationship between the resource state and the quantity of attention heads or intermediate neurons in the second network comprises, thus it’s unclear whether such a system would actually adhere to resource limitations / requirements of the terminal device.
It’s additionally unclear how the second network would improve on data processing precision or maintain performance from the first model.
Thus, the claimed invention is not found to be representative of an improvement to technology, and no rejections are withdrawn on these grounds. The Examiner suggests pointing to specific paragraphs of the instant specification that make it clear how the described invention is achieving these argued improvements, and ensuring that this functionality is fully represented by the claim language.
On page 13 of the instant remarks, the Applicant argues that the claimed invention is not directed to any judicial exceptions:
“The Examiner alleges that "identifying a subset of nodes" is a mental process. However,
Claim 1 defines a specific Data Processing Apparatus performing a sequence of high-speed
technical operations that cannot be performed in the human mind.
The claim requires the apparatus to (i) monitor an available resource state ( e.g., hardware
memory capacity or thermal state) of a terminal device (remote terminal); (ii) determine a second
neural network model based on the first neural network model and the available resource state of the
terminal device, where the available resource state of the terminal device is used to determine an
intermediate layer or a transformer layer having reduced size, thereby compressing the neural
network model; and (iii) send the second model ( compressed model) to the terminal device. A
human cannot mentally monitor a terminal's fluctuating power/memory states and recompute the
architectural weights of a high-dimensional neural network in a runtime environment. These steps
are "inextricably tied to computer technology." See DDRHoldings, LLC v. Hotels.com, L.P., 773
F.3d 1245 (Fed. Cir. 2014).”
In regards to the Applicant’s arguments above, the Examiner respectfully disagrees that the claims, as amended, recite no mental processes. As stated in MPEP 2106.04(a)(2)(III), The courts do not distinguish between mental processes that are performed entirely in the human mind and mental processes that require a human to use a physical aid (e.g., pen and paper or a slide rule) to perform the claim limitation. See, e.g., Benson, 409 U.S. at 67, 65, 175 USPQ at 674-75, 674 … Nor do the courts distinguish between claims that recite mental processes performed by humans and claims that recite mental processes performed on a computer. As the Federal Circuit has explained, "[c]ourts have examined claims that required the use of a computer and still found that the underlying, patent-ineligible invention could be performed via pen and paper or in a person’s mind." Versata Dev. Group v. SAP Am., Inc., 793 F.3d 1306, 1335, 115 USPQ2d 1681, 1702 (Fed. Cir. 2015). See also Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307, 1318, 120 USPQ2d 1353, 1360 (Fed. Cir. 2016) (‘‘[W]ith the exception of generic computer-implemented steps, there is nothing in the claims themselves that foreclose them from being performed by a human, mentally or with pen and paper.’’); Mortgage Grader, Inc. v. First Choice Loan Servs. Inc., 811 F.3d 1314, 1324, 117 USPQ2d 1693, 1699 (Fed. Cir. 2016) (holding that computer- implemented method for "anonymous loan shopping" was an abstract idea because it could be "performed by humans without a computer").
The claimed invention recites limitations amounting to mental processes performed on generic computer components insufficient to render a mentally performable task non-abstract. For example, claim 1 recites the limitation “determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model”, reciting a mental process of determining a second neural network model based on the available resource state of the terminal device and the first neural network model, performed by a “data processing apparatus”, a generic computer component insufficient to render the limitation non-abstract.
The Examiner asserts that the claimed invention, as amended, recites mental processes, and maintains its rejections on the basis of the Alice/Mayo tests performed (See 101 rejections).
On page 14 of the instant remarks, the Applicant argues that the claimed invention amounts to significantly more due to an unconventional ordered combination of steps:
“The requirement to determine a second neural network model, based on the available resource
state of the terminal device and the first neural network model, where the available resource state of
the terminal device is used to determine the quantity of attention heads of a second transformer layer
of a second neural network model or the quantity of neurons of a second intermediate layer of the
second neural network model, and send the second neural network model to the terminal device in
response to its specific resource state provides an unconventional "ordered combination" of steps.
Conventional AI deployment can not provide models that meet the performance requirement of a
terminal device. This "non-conventional and non-routine" coordination between a data processing
apparatus and a terminal device constitutes "significantly more" than the abstract idea of model
selection. See Berkheimer v. HP Inc., 881 F.3d 1360 (Fed. Cir. 2018).”
Regarding the Applicant’s arguments above, the Examiner respectfully disagrees. MPEP 2106.05(d)(I)(3) notes “Even if one or more additional elements are well-understood, routine, conventional activity when considered individually, the combination of additional elements may amount to an inventive concept. Diamond v. Diehr, 450 U.S. at 188, 209 USPQ at 9 (1981) ("[A] new combination of steps in a process may be patentable even though all the constituents of the combination were well known and in common use before the combination was made."). For example, a microprocessor that performs mathematical calculations and a clock that produces time data may individually be generic computer components that perform merely generic computer functions, but when combined may perform functions that are not generic computer functions and thus be an inventive concept. See, e.g. Rapid Litig. Mgmt. v. CellzDirect, Inc., 827 F.3d 1042, 1051, 119 USPQ2d 1370, 1375 (Fed. Cir. 2016) (holding that while the additional steps of freezing and thawing hepatocytes were well known, repeating those steps, contrary to what was taught in the art, was not routine or conventional). For example, in BASCOM, even though the court found that all of the additional elements in the claim recited generic computer network or Internet components, the elements in combination amounted to significantly more because of the non-conventional and non-generic arrangement that provided a technical improvement in the art. BASCOM Global Internet Servs. v. AT&T Mobility LLC, 827 F.3d 1341, 1350-51, 119 USPQ2d 1236, 1243-44 (2016)” (Bolding added for emphasis).
The Applicant asserts that determining a network model to meet performance requirements of a terminal device is an inventive concept representing a non-conventional and non-routine combination of elements. However, this concept is represented in the claim language as a mentally-performable process (“determining, by the data processing apparatus, a second neural network model based on the available resource state of the terminal device and the first neural network model”), and not as an additional element or combination of additional elements. A single limitation reciting an abstract idea does not constitute a combination of additional elements.
Additionally, the way in which a network is determined based on resource state information of a terminal device is recited at a very high level of generality, thus this concept is still represented in a highly generic manner in the claim language.
No rejections are withdrawn on this basis.
103 Rejections
On pages 15-18 of the instant remarks, the Applicant argues that the amended claims are not obvious over the references previously relied upon. These arguments have been considered but are moot because the new grounds of rejection for the amended claims do not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Li et al. (Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search, published 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)) discloses a method of automatically pruning neural networks based on inferred latency and accuracy on end hardware.
Cai et al. (AutoML for Architecting Efficient and Specialized Neural Networks, published 11/12/2019, IEEE Micro Volume 40, Issue 1) discloses a method of automatically pruning neural networks based on target hardware latency.
Fan et al. (REDUCING TRANSFORMER DEPTH ON DEMAND WITH STRUCTURED DROPOUT, published 9/25/2019, arXiv:1909.11556v1) discloses a method of automatically pruning layers from a transformer network.
Michel et al. (Are Sixteen Heads Really Better than One?, 2019, arXiv:1905.10650v3) teaches a method of pruning heads in a transformer without reducing performance.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Aaron P Gormley whose telephone number is (571)272-1372. The examiner can normally be reached Monday - Friday 12:00 PM - 8:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle T Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AG/Examiner, Art Unit 2148
/Ryan Barrett/Primary Examiner, Art Unit 2148