Last updated: April 19, 2026
Application No. 18/446,170
METHODS AND PROCESSORS FOR TRAINING A NEURAL NETWORK

Non-Final OA §101§102§103
Filed
Aug 08, 2023
Examiner
WU, NICHOLAS S
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
1 (Non-Final)
This examiner grants 47% of cases after interview

— +43.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 38 resolved cases, 2023–2026
Examiner Intelligence

WU, NICHOLAS S View full profile →
Grants 47% of resolved cases
Career Allow Rate
18 granted / 38 resolved
-7.6% vs TC avg
Strong +43% interview lift
Without
With
+43.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
44 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.7%
-13.3% vs TC avg
§103
52.6%
+12.6% vs TC avg
§102
3.1%
-36.9% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 38 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-18 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more. 
Regarding claim 1, in step 1 of the 101 analysis set forth in MPEP 2106, the claim recites A method of using a Neural Network (NN),. The claim recites a method. A method is one of the four statutory categories of invention.  
In Step 2A, Prong 1 of the 101 analysis set forth in MPEP 2106, the examiner has determined that the following limitations recite a process that, under broadest reasonable interpretation, covers a mental process or mathematical concept but for the recitation of generic computer components:
during a first training iteration of the NN: determining a first continuous sequence of intermediate layers from the plurality of intermediate layers of the NN, the input layer, the first continuous sequence of intermediate layers, and the output layer forming a first sub-network of the NN; (i.e., the broadest reasonable interpretation includes a step of observation, evaluation, and judgement and could be performed mentally or with pen and paper like selecting layers to form a first subnetwork, which is either a mental process of observation/evaluation/judgement (MPEP 2106)).
during a second training iteration of the NN: determining a second continuous sequence of intermediate layers from the plurality of intermediate layers of the NN, the input layer, the second continuous sequence of intermediate layers, and the output layer forming a second sub-network of the NN, the second continuous sequence of intermediate layers being different from the first sequence of continuous layers, the second continuous sequence of intermediate layers at least partially overlapping the first sequence of continuous layers; (i.e., the broadest reasonable interpretation includes a step of observation, evaluation, and judgement and could be performed mentally or with pen and paper like selecting additional layers to make up a second subnetwork that overlaps with the first subnetwork, which is either a mental process of observation/evaluation/judgement (MPEP 2106)).
during an inference iteration of the NN: selecting a target sub-network amongst the first sub-network and the second sub-network; (i.e., the broadest reasonable interpretation includes a step of observation, evaluation, and judgement and could be performed mentally or with pen and paper like selecting the subnetwork that is the fastest, which is either a mental process of observation/evaluation/judgement (MPEP 2106)).
If the claim limitations, under their broadest reasonable interpretation, covers activities classified under Mental processes: concepts performed in the human mind (including observation, evaluation, judgement, or opinion) (see MPEP 2106.04(a)(2), subsection (III)) or Mathematical concepts: mathematical relationships, mathematical formulas or equations, or mathematical calculations (see MPEP 2106.04(a)(2), subsection (I)). Accordingly, the claim recites an abstract idea.
In Step 2A, Prong 2 of the 101 analysis, set forth in MPEP 2106, the examiner has determined that the following additional elements do not integrate this judicial exception into a practical application:
the NN comprising an input layer, an output layer, and a plurality of intermediate layers, the method executable by at least one processor and comprising: (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f))).
training the first sub-network based on training data; (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f))).
training the second sub-network based on the training data; (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f))).
and generating an inference output by employing only the target sub-network of the NN on inference data for reducing computational resources of the at least one processor for generating the inference output. (i.e., the broadest reasonable interpretation of outputting a data instance is mere data outputting, which is an insignificant extra solution activity (MPEP 2106.05(g))).
Since the claim does not contain any other additional elements, that amount to integration into a practical application, the claim is directed to an abstract idea. 
In Step 2B of the 101 analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (VII), under the broadest reasonable interpretation, recite steps of mere data gathering/outputting, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data gathering/outputting as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018)). 
Examiner uses Berkheimer: Option 2, a citation to one or more of the court decisions discussed in MPEP 2106.05(d)(II) as noting well-understood, routine, and conventional nature of the additional elements:
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).
Further, limitation (IV), under the broadest reasonable interpretation, merely recite steps that apply a generic computer component and a generic neural network as tools to perform judicial exceptions, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Similarly, limitations (V and VI), under the broadest reasonable interpretation, merely recite steps that apply generic training of a neural network, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Considering additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claim 2, it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 2 recites during the first training iteration: determining a first depth index indicative of a first depth of the first continuous sequence of intermediate layers in the NN, and wherein the determining the first continuous sequence of intermediate layers includes: determining a continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the first depth index;. Under the broadest reasonable interpretation, the limitations recite determining a depth of the first subnetwork which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. Claim 2 also recites and wherein: during the second training iteration: determining a second depth index indicative of a second depth of the second continuous sequence of intermediate layers in the NN, the second depth index being different from the first depth index, and wherein the determining the second continuous sequence of intermediate layers includes: determining an other continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the second depth index. Under the broadest reasonable interpretation, the limitations recite determining a depth of the second subnetwork which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. The steps of observation, evaluation, and judgement are mental processes. Therefore, claim 2 does not solve the deficiencies of claim 1.
Regarding claim 3, it is dependent upon claim 2 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 3 recites wherein the determining the first depth index includes randomly determining the first depth index from an interval of depth indexes, and the determining the second depth index includes randomly determining the second depth index from the interval of depth indexes, the interval of depth indexes having been pre-determined based on a depth of the NN. Under the broadest reasonable interpretation, the limitations recite randomly selecting depths from a range of depths which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. The steps of observation, evaluation, and judgement are mental processes. Therefore, claim 3 does not solve the deficiencies of claim 2.
Regarding claim 4, it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 4 recites wherein the method further comprises: during the first training iteration: determining a first width index for the first continuous sequence of intermediate layers indicative a first partial width of the first continuous sequence of intermediate layers to be trained during the first training iteration,. Under the broadest reasonable interpretation, the limitations recite determining a width of the first subnetwork which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. Claim 4 also recites and wherein the training the first sub-network includes: training only the first partial width of the first continuous sequence of intermediate layers based on the training data, Under the broadest reasonable interpretation, the limitations merely recite steps that apply generic training to a subset of parameters, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Claim 4 also recites and wherein: during the second training iteration: determining a second width index for the second continuous sequence of intermediate layers indicative a second partial width of the second continuous sequence of intermediate layers to be trained during the second training iteration, the second width index being different from the first width index,. Under the broadest reasonable interpretation, the limitations recite determining a width of the second subnetwork which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. The steps of observation, evaluation, and judgement are mental processes. Claim 4 also recites the training the second sub-network includes: training only the second partial width of the second continuous sequence of intermediate layers based on the training data. Under the broadest reasonable interpretation, the limitations merely recite steps that apply generic training to a subset of parameters, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Therefore, claim 4 does not solve the deficiencies of claim 1.
Regarding claim 5, it is dependent upon claim 4 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 5 recites wherein the determining the first width index includes randomly determining the first width index from an interval of width indexes, and the determining the second width index includes randomly determining the second width index from the interval of width indexes, the interval of width indexes having been pre-determined based on a width of the plurality of intermediate layers of the NN. Under the broadest reasonable interpretation, the limitations recite randomly selecting widths from a range of widths which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. The steps of observation, evaluation, and judgement are mental processes. Therefore, claim 5 does not solve the deficiencies of claim 4.
Regarding claim 6, it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 6 recites wherein the selecting the target sub-network comprises comparing at least one of: 
(i) a first accuracy parameter of the first sub-network and a second accuracy parameter of the second sub-network, 
(ii) a first latency parameter of the first sub-network and a second latency parameter of the second sub-network, and
(iii) a first importance parameter of the first sub-network and a second importance parameter of the second sub-network.
Under the broadest reasonable interpretation, the limitations recite selecting a subnetwork by comparing subnetworks with a criterion which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. The steps of observation, evaluation, and judgement are mental processes. Therefore, claim 6 does not solve the deficiencies of claim 1.
Regarding claim 7, it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 7 recites wherein the plurality of intermediate layers is a plurality of architectural blocks of the NN, a given one of the plurality of architectural blocks including a sub-set of intermediate layers for generating an output of the given one of the plurality of architectural blocks. Under the broadest reasonable interpretation, the limitations recite grouping layers into blocks which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. The steps of observation, evaluation, and judgement are mental processes. Therefore, claim 7 does not solve the deficiencies of claim 1.
Regarding claim 8, it is dependent upon claim 7 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 8 recites wherein the plurality of architectural blocks include at least one of: 
(i) a convolutional block with at least one convolutional layer;
(ii) a pooling block with at least one pooling layer;
(iii) a fully connected block with at least one fully-connected layer;
(iv) a residual block with at least one skip connection;
(v) a batch normalization block with at least one batch normalization layer;
(vi) a recurrent block with at least one recurrence mechanism;
(vii) an attention block with at least one self-attention mechanism; and
(viii) an activation block with at least one activation layer.
Under the broadest reasonable interpretation, the limitations merely recite steps that apply generic components of machine learning models, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Therefore, claim 8 does not solve the deficiencies of claim 7.
Regarding claim 9, it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 9 recites wherein the output layer is at least two output layers, and wherein the output layer of the first sub-network is a first one from the at least two output layers, and the output layer of the second sub-network is a second one from the at least two output layers, the first and second one of the at least two output layers being different output layers. Under the broadest reasonable interpretation, the limitations recite determining an output layer for each subnetwork which is a step of observation, evaluation, and judgement which can be performed mentally or with pen and paper. The steps of observation, evaluation, and judgement are mental processes. Therefore, claim 9 does not solve the deficiencies of claim 1.
Regarding claim 10, in step 1 of the 101 analysis set forth in MPEP 2106, the claim recites A system for using a Neural Network (NN), the NN comprising an input layer, an output layer, and a plurality of intermediate layers, the system comprising a controller and a memory storing a plurality of executable instructions which, when executed by the controller, cause the system to:. The claim recites system with hardware which is interpreted as a machine. A machine is one of the four statutory categories of invention. For the Step 2A/2B analyses, since claim 10 is similar to claim 1 it is rejected under the same rationales as claim 1. 
The additional limitation below fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception.  
A system for using a Neural Network (NN), the NN comprising an input layer, an output layer, and a plurality of intermediate layers, the system comprising a controller and a memory storing a plurality of executable instructions which, when executed by the controller, cause the system to:… (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f))).
Considering additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claims 11-18, the claims are similar to claims 2-9 and rejected under the same rationales. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 6, 9-10, 15, and 18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Teerapittayanon, et al., “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” (“Teerapittayanon”).
Regarding claim 1, Teerapittayanon discloses:
A method of using a Neural Network (NN), the NN comprising an input layer, an output layer, and a plurality of intermediate layers, (Teerapittayanon, pg. 2464 col. 2 and Figure 1, “Figure 1 shows how BranchyNet modifies a standard AlexNet by adding two branches with their respective exit points; Figure 1 shows an example layout of BranchyNet had how it has input, output, and hidden layers (i.e. A method of using a Neural Network (NN), the NN comprising an input layer, an output layer, and a plurality of intermediate layers,).”).
the method executable by at least one processor (Teerapittayanon, pg. 2467 col. 1, “We use a 3.0GHz CPU [the method executable by at least one processor] with 20 MB L3 Cache and NVIDIA GeForce GTX TITANX (Maxwell) 12GB GPU.”).
and comprising: during a first training iteration of the NN: determining a first continuous sequence of intermediate layers from the plurality of intermediate layers of the NN, the input layer, the first continuous sequence of intermediate layers, and the output layer forming a first sub-network of the NN; (Teerapittayanon, pg. 2465-2466 and Figure 1, “BranchyNet modifies the standard deep network structure by adding exit branches (also called side branches or simply branches for brevity), at certain locations throughout the network. These early exit branches allow samples which can be accurately classified in early stages of the network to exit at that stage; in Figure 1, the layers that are part of the path to the first exit is interpreted as the first sub-network (i.e. and comprising: during a first training iteration of the NN: determining a first continuous sequence of intermediate layers from the plurality of intermediate layers of the NN, the input layer, the first continuous sequence of intermediate layers, and the output layer forming a first sub-network of the NN;).”).
training the first sub-network based on training data; (Teerapittayanon, pg. 2466 col. 2, “The design goal of each exit branch is to minimize this loss function. To train the entire BranchyNet, we form a joint optimization problem as a weighted sum of the loss functions of each exit branch [training the first sub-network based on training data;]”).
during a second training iteration of the NN: determining a second continuous sequence of intermediate layers from the plurality of intermediate layers of the NN, the input layer, the second continuous sequence of intermediate layers, and the output layer forming a second sub-network of the NN, the second continuous sequence of intermediate layers being different from the first sequence of continuous layers, (Teerapittayanon, pg. 2465-2466 and Figure 1, “BranchyNet modifies the standard deep network structure by adding exit branches (also called side branches or simply branches for brevity), at certain locations throughout the network. These early exit branches allow samples which can be accurately classified in early stages of the network to exit at that stage; in Figure 1, the path to the second exit is interpreted as the second sub-network (i.e. during a second training iteration of the NN: determining a second continuous sequence of intermediate layers from the plurality of intermediate layers of the NN, the input layer, the second continuous sequence of intermediate layers, and the output layer forming a second sub-network of the NN, the second continuous sequence of intermediate layers being different from the first sequence of continuous layers,).”).
the second continuous sequence of intermediate layers at least partially overlapping the first sequence of continuous layers; (Teerapittayanon, see Figure 1, Figure 1 shows that the second exit path utilizes the first 5x5 Conv layer of the base model which is also used by the first exit path (i.e. the second continuous sequence of intermediate layers at least partially overlapping the first sequence of continuous layers;)).
training the second sub-network based on the training data; (Teerapittayanon, pg. 2466 col. 2, “The design goal of each exit branch is to minimize this loss function. To train the entire BranchyNet, we form a joint optimization problem as a weighted sum of the loss functions of each exit branch [training the second sub-network based on the training data;]”).
during an inference iteration of the NN: selecting a target sub-network amongst the first sub-network and the second sub-network; (Teerapittayanon, pg. 2465 col. 1, “Once the network is trained [during an inference iteration of the NN:], BranchyNet utilizes the exit points to allow the samples to exit early, thus reducing the cost of inference; using an exit point is interpreted as selecting a subnetwork (i.e. selecting a target sub-network amongst the first sub-network and the second sub-network;).”).
and generating an inference output by employing only the target sub-network of the NN on inference data for reducing computational resources of the at least one processor for generating the inference output. (Teerapittayanon, pg. 2465 col. 1, “Once the network is trained, BranchyNet utilizes the exit points to allow the samples to exit early, thus reducing the cost of inference [for reducing computational resources of the at least one processor for generating the inference output.]. At each exit point, BranchyNet uses the entropy of a classification result (e.g., by softmax) as a measure of confidence in the prediction. If the entropy of a test sample is below a learned threshold value, meaning that the classifier is confident in the prediction, the sample exits the network with the prediction result at this exit point, and is not processed by the higher network layers [and generating an inference output by employing only the target sub-network of the NN on inference data].”). 
Regarding claim 6, Teerapittayanon discloses the method of claim 1. Teerapittayanon further discloses wherein the selecting the target sub-network comprises comparing at least one of: 
(i) a first accuracy parameter of the first sub-network and a second accuracy parameter of the second sub-network, 
(ii) a first latency parameter of the first sub-network and a second latency parameter of the second sub-network, and
(iii) a first importance parameter of the first sub-network and a second importance parameter of the second sub-network.
(Teerapittayanon, pg. 2465 col. 1, “At each exit point, BranchyNet uses the entropy of a classification result (e.g., by softmax) as a measure of confidence in the prediction. If the entropy of a test sample is below a learned threshold value, meaning that the classifier is confident in the prediction, the sample exits the network with the prediction result at this exit point, and is not processed by the higher network layers. If the entropy value is above the threshold, then the classifier at this exit point is deemed not confident, and the sample continues to the next exit point in the network [wherein the selecting the target sub-network comprises comparing at least one of: (i) a first accuracy parameter of the first sub-network and a second accuracy parameter of the second sub-network,].”).
Regarding claim 9, Teerapittayanon discloses the method of claim 1. Teerapittayanon further discloses wherein the output layer is at least two output layers, and wherein the output layer of the first sub-network is a first one from the at least two output layers, and the output layer of the second sub-network is a second one from the at least two output layers, the first and second one of the at least two output layers being different output layers. (Teerapittayanon, pg. 2466 col. 1 and see Figure 1, “A branch is a subset of the network containing contiguous layers, which do not overlap other branches; since the branches do not overlap, each exit is interpreted as having their own different output layers (i.e. the first and second one of the at least two output layers being different output layers.), followed by an exit point. The main branch can be considered the baseline (original) network before side branches are added; Figure 1 shows that there are 3 exits of the model which is interpreted as having 3 output layers as they perform classification at each exit (i.e. wherein the output layer is at least two output layers, and wherein the output layer of the first sub-network is a first one from the at least two output layers, and the output layer of the second sub-network is a second one from the at least two output layers,).”).
Regarding claim 10, the claim is similar to claim 1 and rejected under the same rationales. Teerapittayanon further discloses the additional limitations …the system comprising a controller and a memory storing a plurality of executable instructions which, when executed by the controller, cause the system to:… (Teerapittayanon, pg. 2467 col. 1, “We evaluate Branchy-LeNet (B-LeNet) on the MNIST dataset and both Branchy-AlexNet (B-AlexNet) and Branchy-ResNet (B-ResNet) on the CIFAR10 dataset. We present evaluation results for both CPU and GPU. We use a 3.0 GHz CPU with 20MB L3 Cache and NVIDIA GeForce GTX TITANX (Maxwell) 12GB GPU […the system comprising a controller and a memory storing a plurality of executable instructions which, when executed by the controller, cause the system to:…].”).
Regarding claims 15 and 18, these claims are similar to claims 6 and 9 and are rejected under the same rationales. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 7-8, 11, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Teerapittayanon, et al., “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” (“Teerapittayanon”) in view of Chiang, et al., Non-Patent Literature “Optimal Branch Location for Cost-effective Inference on Branchynet” (“Chiang”).
Regarding claim 2, Teerapittayanon teaches the method of claim 1. While Teerapittayanon teaches a system that determines two subnetworks within a base neural network, Teerapittayanon does not explicitly teach wherein the method further comprises: during the first training iteration: determining a first depth index indicative of a first depth of the first continuous sequence of intermediate layers in the NN, and wherein the determining the first continuous sequence of intermediate layers includes: determining a continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the first depth index; and wherein: during the second training iteration: determining a second depth index indicative of a second depth of the second continuous sequence of intermediate layers in the NN, the second depth index being different from the first depth index, and wherein the determining the second continuous sequence of intermediate layers includes: determining an other continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the second depth index.
	Chiang teaches:
wherein the method further comprises: during the first training iteration: determining a first depth index indicative of a first depth of the first continuous sequence of intermediate layers in the NN, and wherein the determining the first continuous sequence of intermediate layers includes: determining a continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the first depth index; (Chiang, pg. 4 col. 2, “We define notations for our branch placement model. Note that it is not possible to place a branch at every layer of a deep learning network, and we assume that there are c candidate layers to place branches [wherein the method further comprises: during the first training iteration: determining a first depth index indicative of a first depth of the first continuous sequence of intermediate layers in the NN,]. For ease of discussion, we partition the network into c + 1 blocks, where each block consists of the layers not having a branch candidate, followed by a layer that is a branch candidate [depth index]. There are c + 1 blocks because we can consider the output layer a branch that always exists. Please refer to Figure 2 for an illustration that three branch candidates divide a network into four blocks; Figure 2 shows that branches are placed starting at the beginning, or input layer, of the base model and determining where to place each candidate branch is interpreted as setting the depth of a subnetwork and the candidate branch number notes how many blocks/layers are executed at that branch, or sub-network (i.e. and wherein the determining the first continuous sequence of intermediate layers includes: determining a continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the first depth index;).”).
and wherein: during the second training iteration: determining a second depth index indicative of a second depth of the second continuous sequence of intermediate layers in the NN, the second depth index being different from the first depth index, and wherein the determining the second continuous sequence of intermediate layers includes: determining an other continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the second depth index. (Chiang, pg. 4 col. 2, “We define notations for our branch placement model. Note that it is not possible to place a branch at every layer of a deep learning network, and we assume that there are c candidate layers to place branches [and wherein: during the second training iteration: determining a second depth index indicative of a second depth of the second continuous sequence of intermediate layers in the NN, the second depth index being different from the first depth index,]. For ease of discussion, we partition the network into c + 1 blocks, where each block consists of the layers not having a branch candidate, followed by a layer that is a branch candidate [depth index]. There are c + 1 blocks because we can consider the output layer a branch that always exists. Please refer to Figure 2 for an illustration that three branch candidates divide a network into four blocks; Figure 2 shows that branches are placed starting at the beginning, or input layer, of the base model and determining where to place each candidate branch is interpreted as setting the depth of a subnetwork and the candidate branch number notes how many blocks/layers are executed at that branch, or sub-network (i.e. and wherein the determining the second continuous sequence of intermediate layers includes: determining an other continuous sequence of intermediate layers that is most adjacent to the input layer of the NN and which includes a total number of layers equal to the second depth index.).”).
Teerapittayanon and Chiang are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Teerapittayanon and Chiang to teach the above limitation(s). The motivation for doing so is that finding the optimal depth to place exits improves the processing efficiency of the model (cf. Chiang, pg. 2 col. 1, “It is crucial to find the branch locations to maximize the benefits of Branchynet. The locations of branches in the recent researches are usually hyperparameters, which are defined manually. However, it is tough to find the number of branches to place and the locations to place them to maximize the benefits of a Branchynet. It is impossible to manually tune the network because the number of possible branch location combinations is tremendous. As a result, it is essential to find the optimal branch locations automatically with an efficient algorithm.”).
Regarding claim 7, Teerapittayanon teaches the method of claim 1. While Teerapittayanon teaches a system that determines two subnetworks within a base neural network, Teerapittayanon does not explicitly teach wherein the plurality of intermediate layers is a plurality of architectural blocks of the NN, a given one of the plurality of architectural blocks including a sub-set of intermediate layers for generating an output of the given one of the plurality of architectural blocks.
Chiang teaches wherein the plurality of intermediate layers is a plurality of architectural blocks of the NN, a given one of the plurality of architectural blocks including a sub-set of intermediate layers for generating an output of the given one of the plurality of architectural blocks. (Chiang, pg. 4 col. 2, “We define notations for our branch placement model. Note that it is not possible to place a branch at every layer of a deep learning network, and we assume that there are c candidate layers to place branches. For ease of discussion, we partition the network into c + 1 blocks, where each block consists of the layers not having a branch candidate, followed by a layer that is a branch candidate. There are c + 1 blocks because we can consider the output layer a branch that always exists. Please refer to Figure 2 for an illustration that three branch candidates divide a network into four blocks [wherein the plurality of intermediate layers is a plurality of architectural blocks of the NN, a given one of the plurality of architectural blocks including a sub-set of intermediate layers for generating an output of the given one of the plurality of architectural blocks.].”).
Teerapittayanon and Chiang are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Teerapittayanon and Chiang to teach the above limitation(s). The motivation for doing so is that using blocks of layers to represent subsets of layers simplifies the design of a network (cf. Chiang, pg. 4 col. 2, “We define notations for our branch placement model. Note that it is not possible to place a branch at every layer of a deep learning network, and we assume that there are c candidate layers to place branches. For ease of discussion, we partition the network into c + 1 blocks, where each block consists of the layers not having a branch candidate, followed by a layer that is a branch candidate.”).
Regarding claim 8, Teerapittayanon in view of Chiang teaches the method of claim 7. Chiang further teaches wherein the plurality of architectural blocks include at least one of: 
(i) a convolutional block with at least one convolutional layer;
(ii) a pooling block with at least one pooling layer;
(iii) a fully connected block with at least one fully-connected layer;
(iv) a residual block with at least one skip connection;
(v) a batch normalization block with at least one batch normalization layer;
(vi) a recurrent block with at least one recurrence mechanism;
(vii) an attention block with at least one self-attention mechanism; and
(viii) an activation block with at least one activation layer.
(Chiang, pg. 4 col. 2, “Please refer to Figure 2 for an illustration that three branch candidates divide a network into four blocks. Let Ci be the i th branch candidate. The model in Figure 2a has three branch candidates, at conv2, conv3, and conv4, which partition the network into four blocks [wherein the plurality of architectural blocks include at least one of: (i) a convolutional block with at least one convolutional layer;].”).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Chiang with the teachings of Teerapittayanon for the same reasons disclosed in claim 7.
Regarding claims 11, 16, and 17, these claims are similar to claims 2, 7, and 8 and are rejected under the same rationales.

 Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Teerapittayanon, et al., “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” (“Teerapittayanon”) in view of Chiang, et al., Non-Patent Literature “Optimal Branch Location for Cost-effective Inference on Branchynet” (“Chiang”) and further in view of Huang, et al., Non-Patent Literature “Deep Networks with Stochastic Depth” (“Huang”).
Regarding claim 3, Teerapittayanon in view of Chiang teaches the method of claim 2. While the combination teaches a system that determines two subnetworks within a base neural network with a depth index, the combination does not explicitly teach wherein the determining the first depth index includes randomly determining the first depth index from an interval of depth indexes, and the determining the second depth index includes randomly determining the second depth index from the interval of depth indexes, the interval of depth indexes having been pre-determined based on a depth of the NN.
	Huang teaches:
wherein the determining the first depth index includes randomly determining the first depth index from an interval of depth indexes, and the determining the second depth index includes randomly determining the second depth index from the interval of depth indexes, (Huang, pg. 4, “To reduce the effective length of a neural network during training, we randomly skip layers entirely. We achieve this by introducing skip connections in the same fashion as ResNets, however the connection pattern is randomly altered for each mini batch. For each mini-batch we randomly select sets of layers [wherein the determining the first depth index includes randomly determining the first depth index…and the determining the second depth index includes randomly determining the second depth index] and remove their corresponding transformation functions, only keeping the identity skip connection.”, and Huang, pg. 5, “We can achieve this goal by randomly dropping entire ResBlocks during training and bypassing their transformations through skip connections. Let b ∈ {0,1} denote a Bernoulli random variable, which indicates whether the lth ResBlock is active (b = 1) or inactive (b = 0). Further, let us denote the “survival” probability of ResBlock as p = Pr(b = 1) [from an interval of depth indexes,].”).
the interval of depth indexes having been pre-determined based on a depth of the NN. (Huang, pg. 4, “Similar to Dropout, stochastic depth can be interpreted as training an ensemble of networks, but with different depths, possibly achieving higher diversity among ensemble members than ensembling those with the same depth. Different from Dropout, we make the network shorter [the interval of depth indexes having been pre-determined based on a depth of the NN.]”). 
Teerapittayanon, in view of Chiang, and Huang are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Teerapittayanon, in view of Chiang, and Huang to teach the above limitation(s). The motivation for doing so is that stochastic depth training improves a deep neural network’s performance (cf. Huang, pg. 14, “Training with stochastic depth allows one to increase the depth of a network well beyond 1000 layers, and still obtain a reduction in test error. Because of its simplicity and practicality we hope that training with stochastic depth may become a new tool in the deep learning “toolbox”, and will help researchers scale their models to previously unattainable depths and capabilities.”).
Regarding claim 12, the claim is similar to claim 3 and is rejected under the same rationales.

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Teerapittayanon, et al., “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” (“Teerapittayanon”) in view of Yu, et al., Non-Patent Literature “Slimmable Neural Networks” (“Yu”).
Regarding claim 4, Teerapittayanon teaches the method of claim 1. While Teerapittayanon teaches a system that determines two subnetworks within a base neural network, Teerapittayanon does not explicitly teach wherein the method further comprises: during the first training iteration: determining a first width index for the first continuous sequence of intermediate layers indicative a first partial width of the first continuous sequence of intermediate layers to be trained during the first training iteration, and wherein the training the first sub-network includes: training only the first partial width of the first continuous sequence of intermediate layers based on the training data, and wherein: during the second training iteration: determining a second width index for the second continuous sequence of intermediate layers indicative a second partial width of the second continuous sequence of intermediate layers to be trained during the second training iteration, the second width index being different from the first width index, the training the second sub-network includes: training only the second partial width of the second continuous sequence of intermediate layers based on the training data.
	Yu teaches:
wherein the method further comprises: during the first training iteration: determining a first width index for the first continuous sequence of intermediate layers indicative a first partial width of the first continuous sequence of intermediate layers to be trained during the first training iteration, (Yu, pg. 2, “The parameters of all model variants are shared and the active channels in different layers can be adjusted. For brevity, we denote a model variant in a slimmable network as a switch [wherein the method further comprises: during the first training iteration: determining a first width index for the first continuous sequence of intermediate layers indicative a first partial width of the first continuous sequence of intermediate layers to be trained], the number of active channels in a switch as its width. 0.25× represents that the width in all layers are scaled by 0.25 of the full model.”, and Yu, pg. 4, “Motivated by the investigations above, we present a simple and highly effective approach, named Switchable Batch Normalization (S-BN), that employs independent batch normalization (Ioffe & Szegedy, 2015) for different switches in a slimmable network [to be trained during the first training iteration,].”, and Yu, pg. 4, “The switchable width list [width index] is predefined, indicating the available switches in a slimmable network.”).
and wherein the training the first sub-network includes: training only the first partial width of the first continuous sequence of intermediate layers based on the training data, (Yu, pg. 4, “Motivated by the investigations above, we present a simple and highly effective approach, named Switchable Batch Normalization (S-BN), that employs independent batch normalization (Ioffe & Szegedy, 2015) for different switches in a slimmable network [and wherein the training the first sub-network includes: training only the first partial width of the first continuous sequence of intermediate layers based on the training data,].”).
and wherein: during the second training iteration: determining a second width index for the second continuous sequence of intermediate layers indicative a second partial width of the second continuous sequence of intermediate layers to be trained during the second training iteration, the second width index being different from the first width index, (Yu, pg. 2, “The parameters of all model variants are shared and the active channels in different layers can be adjusted. For brevity, we denote a model variant in a slimmable network as a switch [and wherein: during the second training iteration: determining a second width index for the second continuous sequence of intermediate layers indicative a second partial width of the second continuous sequence of intermediate layers], the number of active channels in a switch as its width. 0.25× represents that the width in all layers are scaled by 0.25 of the full model.”, and Yu, pg. 4, “Motivated by the investigations above, we present a simple and highly effective approach, named Switchable Batch Normalization (S-BN), that employs independent batch normalization (Ioffe & Szegedy, 2015) for different switches [the second width index being different from the first width index,] in a slimmable network [to be trained during the second training iteration,].”, and Yu, pg. 4, “The switchable width list [width index] is predefined, indicating the available switches in a slimmable network.”).
the training the second sub-network includes: training only the second partial width of the second continuous sequence of intermediate layers based on the training data. (Yu, pg. 4, “Motivated by the investigations above, we present a simple and highly effective approach, named Switchable Batch Normalization (S-BN), that employs independent batch normalization (Ioffe & Szegedy, 2015) for different switches in a slimmable network [the training the second sub-network includes: training only the second partial width of the second continuous sequence of intermediate layers based on the training data.].”).
Teerapittayanon and Yu are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Teerapittayanon and Yu to teach the above limitation(s). The motivation for doing so is that having adaptable widths allows models to be adapted for different resource budgets (cf. Yu, pg. 2, “The question remains: Given budgets of resources, how to instantly, adaptively and efficiently trade off between accuracy and latency for neural networks at runtime? In this work we introduce slimmable neural networks, a new class of networks executable at different widths, as a general solution to trade off between accuracy and latency on the fly.”).
Regarding claim 13, the claim is similar to claim 4 and is rejected under the same rationales.

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Teerapittayanon, et al., “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” (“Teerapittayanon”) in view of Yu, et al., Non-Patent Literature “Slimmable Neural Networks” (“Yu”) and further in view of Han, et al., Foreign Patent Publication CN112418392A (“Han”), please use provided translated copy for claim mapping.
Regarding claim 5, Teerapittayanon in view of Yu teaches the method of claim 4. Yu further teaches the interval of width indexes having been pre-determined based on a width of the plurality of intermediate layers of the NN. (Yu, pg. 4, “The switchable width list is predefined, indicating the available switches in a slimmable network [the interval of width indexes having been pre-determined based on a width of the plurality of intermediate layers of the NN.].”).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Yu with the teachings of Teerapittayanon for the same reasons disclosed in claim 4.
While the combination teaches a system that determines two subnetworks within a base neural network with a width index, the combination does not explicitly teach wherein the determining the first width index includes randomly determining the first width index from an interval of width indexes, and the determining the second width index includes randomly determining the second width index from the interval of width indexes,
	Han teaches wherein the determining the first width index includes randomly determining the first width index from an interval of width indexes, and the determining the second width index includes randomly determining the second width index from the interval of width indexes, (Han, pg. 15, “The sampling mode in the search space may be random sampling, or sampling according to distribution, which may be specifically adjusted according to the actual application scenario, and this application does not limit this…For example, in the case of sampling the width, the sampling probability may be determined from the distribution of the width, and the sampling probability may be larger for a wide range in which a large number of distributions are distributed, and may be smaller for a wide range in which a small number of distributions are distributed [wherein the determining the first width index includes randomly determining the first width index from an interval of width indexes, and the determining the second width index includes randomly determining the second width index from the interval of width indexes,].”).
Teerapittayanon, in view of Yu, and Han are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Teerapittayanon, in view of Yu, and Han to teach the above limitation(s). The motivation for doing so is that using random sampling removes bias in searching for model parameters (cf. Han, see pg. 15).
Regarding claim 14, the claim is similar to claim 5 and is rejected under the same rationales. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS S WU whose telephone number is (571)270-0939. The examiner can normally be reached Monday - Friday 8:00 am - 4:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at 571-431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/N.S.W./Examiner, Art Unit 2148                                                                                                                                                                                                        /MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Aug 08, 2023
Application Filed
Mar 28, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/882,311
Patent 12488244
APPARATUS AND METHOD FOR DATA GENERATION FOR USER ENGAGEMENT
2y 5m to grant Granted Dec 02, 2025
17/444,687
Patent 12423576
METHOD AND APPARATUS FOR UPDATING PARAMETER OF MULTI-TASK MODEL, AND STORAGE MEDIUM
2y 5m to grant Granted Sep 23, 2025
17/265,476
Patent 12361280
METHOD AND DEVICE FOR TRAINING A MACHINE LEARNING ROUTINE FOR CONTROLLING A TECHNICAL SYSTEM
2y 5m to grant Granted Jul 15, 2025
17/191,518
Patent 12354017
ALIGNING KNOWLEDGE GRAPHS USING SUBGRAPH TYPING
2y 5m to grant Granted Jul 08, 2025
17/161,152
Patent 12333425
HYBRID GRAPH NEURAL NETWORK
2y 5m to grant Granted Jun 17, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
47%
Grant Probability
90%
With Interview (+43.1%)
3y 9m
Median Time to Grant
Low
PTA Risk
Based on 38 resolved cases by this examiner. Grant probability derived from career allow rate.