Prosecution Insights
Last updated: May 29, 2026
Application No. 18/343,173

REDUCING DATA COMMUNICATIONS IN DISTRIBUTED INFERENCE SCHEMES

Non-Final OA §101§102§103§112
Filed
Jun 28, 2023
Priority
Sep 14, 2022 — provisional 63/406,412
Examiner
PHAM, JESSICA THUY
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Western Digital Technologies Inc.
OA Round
1 (Non-Final)
17%
Grant Probability
At Risk
1-2
OA Rounds
1y 2m
Est. Remaining
17%
With Interview

Examiner Intelligence

Grants only 17% of cases
17%
Career Allowance Rate
1 granted / 6 resolved
-38.3% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
20 currently pending
Career history
43
Total Applications
across all art units

Statute-Specific Performance

§103
87.3%
+47.3% vs TC avg
§102
10.1%
-29.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases

Office Action

§101 §102 §103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Status of Claims Claims 1-24 are pending and examined herein. Claims 2-10 are rejected under 35 U.S.C. 112(b). Claims 1-24 are rejected under 35 U.S.C. 101. Claims 21, 23, and 24 are rejected under 35 U.S.C. 102 . Claims 1-20 and 22 are rejected under 35 U.S.C. 103. Information Disclosure Statement The attached information disclosure statement(s) (IDS) filed on 9/11/2023 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement(s) is/are being considered by the examiner. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 2-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. The preamble of claims 2-10 refer to the “node of claim 1.” However, claim 1 is directed to a system, specifically "A system for a distributed inferencing scheme". Therefore, the scope of claims 2-10 is unclear, as it is unclear if the components of the system are included. For purposes of examination, the claims will be treated as referring to the “system of claim 1”. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject matter. The analysis of claims 1-24, in accordance with these steps, follows. Step 1 Analysis: Step 1 is to determine whether the claim is directed to a statutory category (process, machine, manufacture, or composition of matter. Claims 1-10 are directed to a machine and claims 11-24 are directed to a process. All claims are directed to statutory categories. Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis: Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101. None of the claims represent an improvement to technology. Regarding claim 1, the following claim elements are abstract ideas: generate, for a second node of the plurality of nodes, a first sparsified input based on a set of features associated with the second node, wherein: (Generating a sparsified input based on features can be practically performed in the human mind, i.e. deciding which input values should be zero based on the features. This is a mental process.) the set of features associated with the second node are identified based on a weight mask having non-zero values associated with weights for features upon which processing by the second node depends and zeroed values associated with weights for features other than the features upon which processing by the second node depends, the weight mask comprises a mask having been generated based on calculated vector norms for non-diagonal rows in the weight mask, including a number of non-zero weights defined based on a number of output features, a kernel size, and a number of nodes defined for the neural network, and the set of features comprises a subset of features derived from the received input; (Identifying features based on a weight mask can be practically performed in the human mind. This is a mental process. Generating a weight mask by calculating vector norms for rows is performing mathematical calculations, which is a mathematical concept. Defining a number of weights based on a number of output features, a kernel size, and a number of nodes defined for the neural network can be practically performed in the human mind; this is a mental process.) combine the received input and the second sparsified input into a combined input; and (Combining data can be practically performed in the human mind. This is a mental process.) The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: A system for a distributed inferencing scheme, comprising: (This recites a generic system for a generic machine learning concept. This amounts to mere instructions to apply an exception.) a memory having executable instructions stored thereon; and (This recites generic computer components and processes. This amounts to mere instructions to apply an exception.) a processor configured to execute the executable instructions in order to cause to a first node in the distributed inferencing scheme to: (This recites generic computer components and processes for a generic machine learning concept. This amounts to mere instructions to apply an exception.) receive an input for processing by a neural network executing on a plurality of nodes participating in the distributed inference scheme; (Receiving data is a known process in computing, and this recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.) transmit the first sparsified input to the second node for generating an output of the second node; (Transmitting data is a known process in computing, and this recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.) receive a second sparsified input from the second node; (Receiving data is a known process in computing, and this recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.) process the combined input into an output of the first node; (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.) wherein the neural network is configured to generate an inference based on processing at least the output of the first node and the output of the second node and output the generated inference. (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception. Outputting data is the insignificant extra-solution activity of necessary data outputting. See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. iii.) Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea: wherein the set of features associated with the second node of the plurality of nodes is selected further based on a level of communication sparsity defined for the neural network. (Selecting features based on a level of communication sparsity can be practically performed in the human mind. This is a mental process.) Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea: wherein kernels associated with the features upon which processing by the second node depends are associated with non-zero values in the two-dimensional matrix and kernels associated with features other than the features upon which processing by the second node depends are associated with zeroed values in the two-dimensional matrix. (Associating kernels with non-zero values or zeroed values based on the features can be practically performed in the human mind. This is a mental process.) The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein the weight mask comprises a two-dimensional matrix including information identifying weights for each kernel of a plurality of kernels in the neural network, (This is the insignificant extra-solution activity of selecting a particular data source or type of data to be manipulated. See MPEP § 2106.05(g).) Regarding claim 4, the rejection of claim 3 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein diagonal blocks in the two-dimensional matrix comprise non-zero values associated with features used by the neural network for inputs associated with the node. (This is the insignificant extra-solution activity of selecting a particular data source or type of data to be manipulated. See MPEP § 2106.05(g).) Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea: wherein a number of features associated with the second node is based, at least in part, on a number of nodes participating in the distributed inference scheme. (Determining a number of features based on a number of nodes can be practically performed in the human mind. This is a mental process.) Regarding claim 6, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea: wherein the set of features associated with the second node of the plurality of nodes comprises features having a statistical norm that is less than a threshold value. (Determining the set of features using a threshold of a statistical norm is a mathematical calculation, which is a mathematical concept.) Regarding claim 7, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein the processor is further configured to cause the node to take one or more actions based on the generated inference. (This recites generic computer/machine learning components and processes. This amounts to mere instructions to apply an exception.) Regarding claim 8, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein the neural network comprises a convolutional neural network, and the input comprises a feature map representing data to be processed using the convolutional neural network. (This merely indicates the field of use of convolutional neural networks. This is a field of use limitation.) Regarding claim 9, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein the neural network comprises a transformer neural network, and the input comprises one or more neuron vectors representing data to be processed using the transformer neural network. (This merely indicates the field of use of convolutional neural networks. This is a field of use limitation.) Regarding claim 10, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein a first portion of the neural network is executed on the node, and wherein other portions of the neural network are executed on nodes of the plurality of nodes other than the node. (This recites the generic machine learning process of distributed machine learning, which amounts to mere instructions to apply an exception.) Regarding claim 11, the following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: A processor-implemented method by a node participating in a distributed inferencing scheme, comprising: (This recites generic computer/machine learning components and processes. This amounts to mere instructions to apply an exception.) The remainder of claim 11 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis. Claims 12-20 recite substantially similar subject matter to claims 2-10 respectively and are rejected with the same rationale, mutatis mutandis. Regarding claim 21, the following is an abstract idea: generating a respective weight mask matrix for each respective layer of the plurality of layers in the neural network based on a number of input features and a number of output features in the respective layer and calculated vector norms for non-diagonal rows in the respective weight mask matrix; and (Generating a weight mask matrix based on a number of features and vector norms can be practically performed in the human mind, i.e. deciding on which values should be zeros based on the number of features and calculated vector norms. This is a mental process.) The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: A processor-implemented method for training a neural network for a distributed inference scheme, comprising: (This recites generic computer/machine learning components and processes. This amounts to mere instructions to apply an exception.) training a neural network including a plurality of layers; (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.) deploying the neural network and the respective weight mask matrix for each respective layer of the plurality of layers. (This recites generic computer/machine learning components and processes. This amounts to mere instructions to apply an exception.) Regarding claim 22, the rejection of claim 21 is incorporated herein. Further, the following are abstract ideas: identifying the non-diagonal rows in the respective mask matrix as rows having a size based on a number of output features and a number of nodes in the neural network and corresponding to a set of weights in the neural network defined based on the number of output features, a kernel size, and the number of nodes; (Identifying rows having a size based on data can be practically performed in the human mind. This is a mental process.) calculating a sum for each non-diagonal row of the identified non-diagonal rows; and (Calculating a sum is a mathematical calculation, which is a mathematical concept.) setting values in the mask matrix to 0 for non-diagonal rows in the identified non-diagonal rows having calculated sums less than a threshold value. (Setting values in the mask matrix to zero can be practically performed in the human mind. This is a mental process.) Regarding claim 23, the rejection of claim 21 is incorporated herein. Further, the following is an abstract idea: wherein a number of features pruned by setting a corresponding element in the respective mask matrix to 0 is associated with a defined communication sparsity for the respective layer of the plurality of layers in the neural network. (Associating a number of features pruned based on a communication sparsity can be practically performed in the human mind. This is a mental process.) Regarding claim 24, the rejection of claim 21 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein the respective mask matrix identifies features transferred from a first layer in the neural network to a second layer in the neural network, and wherein the features identified in the respective mask matrix comprise a subset of candidate features transferrable between the first layer in the neural network and the second layer in the neural network. (This is the insignificant extra-solution activity of selecting a particular data source or type of data to be manipulated. See MPEP § 2106.05(g).) Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 21, 23, and 24 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zou (“CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures”, July 2022). Regarding claim 21, Zou teaches A processor-implemented method for training a neural network for a distributed inference scheme, comprising: (Page 1633 lists the processor that implements the method, and states "Our experimental platform is a simulated embedded 16core CMP with a mesh NoC. The cores can be general-purpose cores or specialized NNA cores [2], and the detailed configurations are listed in Table 2." Page 1630 states "Algorithm 1 presents the proposed SSW method. Namely, we first partition the weight matrix into n2 groups,1 which is of the same number as the square of the number of cores n. Then, we use the distances between cores as a factor to influence the learning process by assigning different sparsity strength2 to the weights according to their involved communication cost (line 6).. training a neural network including a plurality of layers; (Page 1633 lists neural networks that are used to implement the method, including training, as explained in Algorithm 1 on page 1630.) generating a respective weight mask matrix for each respective layer of the plurality of layers in the neural network based on a number of input features and a number of output features in the respective layer, and calculated vector norms for non-diagonal rows in the respective weight mask matrix; and (Page 1630 states "the group Lasso regularization makes it possible that any specified group of weights in the whole network are more likely to be zero than weights in other locations. Then, the weight distribution is changed and re-structured at a group-level, and the regularization of group Lasso on a group of weights can be represented as R g W = ∑ g = 1 G w g g (2), where ⋅ g is the group Lasso, w g is a group of weights in w and G is the total number of groups. We formulate ⋅ g as w g g = ∑ i = 1 w g w i q 2 2 , (3) where | w g | is the number of weights in w g ". One of ordinary skill in the art would recognize w g g   as an L-2 norm. Page 1631 states "We use the 16 x 16 distance matrix as the factor matrix to sparsify the weight groups during training, so that the groups of parameters can be sparsified according to the specified priority in the factor matrix. Consequently, the parameters that lead to high communication overhead will be pruned at the first. In comparison, the weights on the diagonal groups will not cause any communication. Therefore, we assign low sparsity strength to these groups to keep their value." Therefore, the groups are interpreted as the non-diagonal rows. Page 1631 states "Fig. 7b shows an example of the final grouped weights matrix obtained in our experiments (only shows the first four groups). Each line is mapped to a core which has two convolutional kernels sized as 2 x 2 x 32. The two kernels in one core is further divided into 16 groups. The number 1 represents that the value is not zero. For the first core, only the first group of weights is non-zero after the training, so that other cores do not need to send the results of the previous layer to this core." Therefore, the feature maps (features) sent are identified based on the weight matrix, shown in Fig. 7(b), interpreted as the weight mask. One of ordinary skill in the art would realize that training a neural network means training each layer of the neural network. One of ordinary skill in the art would reason that the number of weights in a convolutional layer depends on both the number of input features and number of output features, and thus, the size of the weight mask matrix would depend on the number of input features and number of output features.) deploying the neural network and the respective weight mask matrix for each respective layer of the plurality of layers. (Table 5 shows that the results of the neural network having been deployed using the sparsification method, CAP. Fig. 7b shows the final weight mask matrix used in the experiments, in which the neural network was deployed. Thus, the weight mask matrix has also been deployed. As performing the experiments requires the plurality of layers, each layer and mask has been deployed.) Regarding claim 23, the rejection of claim 21 is incorporated herein. Zou teaches wherein a number of features pruned by setting a corresponding element in the respective mask matrix to 0 is associated with a defined communication sparsity for the respective layer of the plurality of layers in the neural network. (Page 1632 states "Since communication overhead is related to inter-core distances in our case, we encourage our RL agent to meet the distance budget by limiting the action space. In detail, after the agent gives actions a m to all the subgroups of a layer, we will measure the sums of the transmission distances of all the cores. If the current policy exceeds our distance budget, the agent will first prune the subgroup with the longest total transmission distance which involves large communication overhead." The distance budget is interpreted as the level of communication sparsity defined for the respective layer, as this method is performed per layer.) Regarding claim 24, the rejection of claim 21 is incorporated herein. Zou teaches wherein the respective mask matrix identifies features transferred from a first layer in the neural network to a second layer in the neural network, and wherein the features identified in the respective mask matrix comprise a subset of candidate features transferrable between the first layer in the neural network and the second layer in the neural network. (Page 1638 states "Therefore, we evaluate CAP on top of a hybrid partitioning method, which will find the best partitioning type for each layer, by considering both the intra-layer and inter-layer communication overhead. We use VGG7 to evaluate the method, and the final partitions assigned for each layer are (II-II-II-II-I-I-I). It means that the first four layers use Partition II and the last three layers use Partition I." When the mask matrix for the weights contains zeroes, the corresponding feature map will not be transmitted, including through layers. As there are two different partitions, the feature maps must be transferred from a first layer to a second layer. As the mask matrix prunes the feature maps, the mask matrix contains a subset of candidate features transferrable.) Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claim(s) 1-5. 7-8, 10-15, 17-18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (“CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures”, July 2022) and Guan (“DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search”, April 2022) . Regarding claim 1, Zou teaches A system for a distributed inferencing scheme, comprising: (The abstract states "To remedy this problem and further improve the performance of network inference, in this work, we introduce a communication-aware DNN parallelization technique called CAP, by exploiting the elasticity and noise-tolerance of deep learning algorithms on CMP. Moreover, in the hope that the conducted studies can provide new design values for real-time neural network inference on embedded chips, we also have evaluated the proposed approach on both multi-core Neural Network Accelerators (NNA) chips and general-purpose chip-multiprocessors. Our experimental results show that the proposed CAP can achieve 1.12x-1.65x system speedups and 1.14x-2.70x energy efficiency for different neural networks while maintaining the inference accuracy, compared to baseline approaches.") a memory having executable instructions stored thereon; and (Page 1633 states "Our experimental platform is a simulated embedded 16core CMP with a mesh NoC. The cores can be general-purpose cores or specialized NNA cores [2], and the detailed configurations are listed in Table 2." Table 2 states "Main Memory 1 channel, 1 rank, LPDDR3, 1GB, 4-bank." In order to perform the experiments, the memory must have executable instructions stored thereon. a processor configured to execute the executable instructions in order to cause to a first node in the distributed inferencing scheme to: (Page 1633 states "Our experimental platform is a simulated embedded 16core CMP with a mesh NoC.” The CMP is interpreted as the processor. The cores are interpreted as the nodes, and the first core is interpreted as the first node.) receive an input for processing by a neural network executing on a plurality of nodes participating in the distributed inference scheme; (Fig. 7 shows the input feature maps to each core, which must have been received. Page 1630 states "From this basis, we are aiming to utilizes the neural network sparsification technique to obtain the desired models without influencing their functionality. As depicted in Fig. 6, since the convolutional kernel size (Ni) is consistent with the number of the ifmaps, and each feature map is convolved by the corresponding part of the kernel, when a specific part of parameters in a kernel are sparsified to be all zero (white blanks) in training, the results of the according ofmap will be zero as well no matter what the pixel values of the ifmap are (shown in gray)." Therefore, the inputs are processing by a neural network.) generate, for a second node of the plurality of nodes, a first sparsified input based on a set of features associated with the second node, wherein: (Page 1631 states "Specifically, the SSA is a fine-grained structured sparsification method focusing on feature maps inside a group of SSW weight matrix. For example, if a core generates four output feature maps, unlike SSW sending all the feature maps to the cores which need them, SSA further explores the optimization opportunity on deciding how many and which feature maps to be transmitted. In this case, the cost of inter-core communication can be further reduced, compared to SSW. To this end, the main challenge is to determine the significance of feature maps and then sparse the insignificant ones according to the structured sparsified weight groups after SSW." The sparsified feature maps are interpreted as the sparsified input, as the feature maps that are communicated will be input to a neural network as shown in Fig. 6. As the feature maps represent the presence or absence of a feature, the sparsification is based on a set of features. Page 1630 states "As depicted in Fig. 6, since the convolutional kernel size (Ni) is consistent with the number of the ifmaps, and each feature map is convolved by the corresponding part of the kernel, when a specific part of parameters in a kernel are sparsified to be all zero (white blanks) in training, the results of the according ofmap will be zero as well no mater what the pixel values of the ifmap are (shown in gray). For example, to obtain the input data for Conv2, the cores need to communicate with each other to get the output results from Conv1. However, if the weight kernels of Conv2 are sparsified, then the feature maps to be transmitted will generate zero-values as outputs after convolutional operations. In this case, there is no need for other cores to send the responsible feature maps.” Therefore, when a feature (identified based on the corresponding part of the kernel) is irrelevant to the second core, it will not be sent (will be removed from the input data and therefore, the input data is sparsified). Conversely, the feature maps associated with the second node will be sent.) the set of features associated with the second node are identified based on a weight mask having non-zero values associated with weights for features upon which processing by the second node depends and zeroed values associated with weights for features other than the features upon which processing by the second node depends, (Page 1631 states "Fig. 7b shows an example of the final grouped weights matrix obtained in our experiments (only shows the first four groups). Each line is mapped to a core which has two convolutional kernels sized as 2 x 2 x 32. The two kernels in one core is further divided into 16 groups. The number 1 represents that the value is not zero. For the first core, only the first group of weights is non-zero after the training, so that other cores do not need to send the results of the previous layer to this core." Therefore, the feature maps (features) sent are identified based on the weight matrix, shown in Fig. 7(b), interpreted as the weight mask. As explained above, the feature maps correspond to specific kernels, meaning that the non-zero weights of the kernel identify features on which the second node depends.) the weight mask comprises a mask having been generated based on calculated vector norms for non-diagonal rows in the weight mask, (Page 1630 states "the group Lasso regularization makes it possible that any specified group of weights in the whole network are more likely to be zero than weights in other locations. Then, the weight distribution is changed and re-structured at a group-level, and the regularization of group Lasso on a group of weights can be represented as R g W = ∑ g = 1 G w g g (2), where ⋅ g is the group Lasso, w g is a group of weights in w and G is the total number of groups. We formulate ⋅ g as w g g = ∑ i = 1 w g w i q 2 2 , (3) where | w g | is the number of weights in w g ". One of ordinary skill in the art would recognize w g g   as an L-2 norm. Page 1631 states "We use the 16 x 16 distance matrix as the factor matrix to sparsify the weight groups during training, so that the groups of parameters can be sparsified according to the specified priority in the factor matrix. Consequently, the parameters that lead to high communication overhead will be pruned at the first. In comparison, the weights on the diagonal groups will not cause any communication. Therefore, we assign low sparsity strength to these groups to keep their value." Therefore, the groups are interpreted as the non-diagonal rows.) including a number of non-zero weights defined based … , and a number of nodes defined for the neural network, and the set of features comprises a subset of features derived from the received input; (Page 1632 states "For the action, the DDPG agent selects the sparsity status of each individual subgroup at each step, in which 0 and 1 are used to indicate the subgroup is removed or retained respectively." As the sparsity of status of the subgroup is equivalent to setting the subgroup of weights to 0 or non-zero, the number of non-zero weights are defined based on the sparsity status selected by the agent. Page 1632 further states "In detail, after the agent gives actions { a m } to all the subgroups of a layer, we will measure the sums of the transmission distances of all the cores. If the current policy exceeds our distance budget, the agent will first prune the subgroup with the longest total transmission distance which involves large communication overhead." Therefore, as the number of cores (nodes) is related to the sum of the transmission distances (more cores leads to higher transmission distances), the selection of the agent and thus the number of non-zero weights is based on a number of nodes for the neural network.) transmit the first sparsified input to the second node for generating an output of the second node; (Page 1630 states "For example, to obtain the input data for Conv2, the cores need to communicate with each other to get the output results from Conv1. However, if the weight kernels of Conv2 are sparsified, then the feature maps to be transmitted will generate zero-values as outputs after convolutional operations. In this case, there is no need for other cores to send the responsible feature maps. It should be noted that Fig. 6 just presents the core concept of sparsification in parallel inference scenario, and how many kernels can be sparsified actually depends on the final model convergence in training." Therefore, the sparsified feature maps from the first node will be transferred to the second node when the corresponding kernel is not pruned. Fig. 6 shows that each core generates an output from the input feature maps. Therefore, when the input feature maps include the feature maps transmitted from the first node, the second node will generate an output using the sparsified input.) receive a second sparsified input from the second node; (Page 1630 states "For example, to obtain the input data for Conv2, the cores need to communicate with each other to get the output results from Conv1. However, if the weight kernels of Conv2 are sparsified, then the feature maps to be transmitted will generate zero-values as outputs after convolutional operations. In this case, there is no need for other cores to send the responsible feature maps. It should be noted that Fig. 6 just presents the core concept of sparsification in parallel inference scenario, and how many kernels can be sparsified actually depends on the final model convergence in training." Therefore, the sparsified feature maps from the second node will be transferred to the first node when the corresponding kernel is not pruned.) combine the received input and the second sparsified input into a combined input; and (Fig. 4 shows an example where the feature maps are communicated, which would occur when the corresponding kernel is not pruned. The received input and the second input (which would be sparsified when using the method), would be combined into the input feature maps.) process the combined input into an output of the first node; (Fig. 4 shows the combined feature maps processed into output feature maps.) wherein the neural network is configured to generate an inference based on processing at least the output of the first node and the output of the second node and output the generated inference. (Page 1633 states "As shown in Table 3, to evaluate the effectiveness of SLP, we have used several variants of ConvNet by varying the kernel numbers of the convolutional layers on ImageNet10 (images containing ten object classes of ILSVRC 2012). Moreover, as listed in Table 4, we have chosen several representative neural networks, including MLP and LeNet [13] on MNIST, ConvNet [21] and VGG7 [22] on Cifar10, and AlexNet [11], VGG16 [23], MobileNetv2 [24] and ResNet50 [25] on ImageNet, as the benchmarking nets for CAP. The inference performances of networks that are parallelized with SFP manner are used as the baselines for comparison." One of ordinary skill in the art would realize that the output of parallelized nodes would be combined to create the output inference. Fig. 2 supports this, as the output feature maps are pooled to generate the prediction.) Zou does not appear to explicitly teach [a number of non-zero weights defined based on] a number of output features, a kernel size, However, Guan—directed to analogous art—teaches [a number of non-zero weights defined based on] a number of output features, a kernel size, (Page 9851 states "The value of the annealing-relaxed channel indicator H T ( a l i ) can be viewed as the probability of preserving the corresponding channel in the final pruned model.” Equation 9 presents the equation for calculating FLOPs which includes “ p l where p l = h l × w l × k l 2 , and k l denotes the kernel size, h l and w l denote the spatial size of the output feature maps. Equation 10 shows the regularizer for finding the optimal pruned model which includes the equation for calculation FLOPs of the model. Therefore, as the regularizer determines the channels that are pruned, equivalent to setting their weights to zero, the number of non-zero weights are defined based on the number output features (size of output feature maps) and kernel size.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Zuo and Guan because, as stated by Zuo on page 9850, "The vanilla cross-entropy loss itself is infeasible to induce a priori structural restrictions, e.g., the number of FLOPs, which play a critical role in pruning. Therefore, we introduce three regularizers into the search procedure when updating the auxiliary parameters." Regarding claim 2, the rejection of claim 1 is incorporated herein. Zou teaches wherein the set of features associated with the second node of the plurality of nodes is selected further based on a level of communication sparsity defined for the neural network. (Page 1632 states "Since communication overhead is related to inter-core distances in our case, we encourage our RL agent to meet the distance budget by limiting the action space. In detail, after the agent gives actions a m to all the subgroups of a layer, we will measure the sums of the transmission distances of all the cores. If the current policy exceeds our distance budget, the agent will first prune the subgroup with the longest total transmission distance which involves large communication overhead." The distance budget is interpreted as the level of communication sparsity defined for the neural network.) Regarding claim 3, the rejection of claim 1 is incorporated herein. Zou teaches wherein the weight mask comprises a two-dimensional matrix including information identifying weights for each kernel of a plurality of kernels in the neural network, (Page 1631 states "Fig. 7b shows an example of the final grouped weights matrix obtained in our experiments (only shows the first four groups). Each line is mapped to a core which has two convolutional kernels sized as 2 x 2 x 32. The two kernels in one core is further divided into 16 groups." The weights matrix in Fig. 7b is two-dimensional. The matrix identifies weights for each convolutional kernel in a core.) wherein kernels associated with the features upon which processing by the second node depends are associated with non-zero values in the two-dimensional matrix and kernels associated with features other than the features upon which processing by the second node depends are associated with zeroed values in the two-dimensional matrix. (Page 1631 states "Fig. 7b shows an example of the final grouped weights matrix obtained in our experiments (only shows the first four groups). Each line is mapped to a core which has two convolutional kernels sized as 2 x 2 x 32. The two kernels in one core is further divided into 16 groups. The number 1 represents that the value is not zero. For the first core, only the first group of weights is non-zero after the training, so that other cores do not need to send the results of the previous layer to this core." Therefore, the kernels with zero values means that the feature map does not need to be sent to the second node, in other words, when the kernel value is zero, the second node does not depend on the associated features. Conversely, when the kernel value is non-zero, the feature map does need to be sent, and the node depends on the associated features.) Regarding claim 4, the rejection of claim 3 is incorporated herein. Zou teaches wherein diagonal blocks in the two-dimensional matrix comprise non-zero values associated with features used by the neural network for inputs associated with the node. (Page 1631 states "In comparison, the weights on the diagonal groups will not cause any communication. Therefore, we assign low sparsity strength to these groups to keep their value." Therefore, as the weights are not sparsified on the diagonal, the matrix would contain non-zero values and depend on the associated features.) Regarding claim 5, the rejection of claim 1 is incorporated herein. Zou teaches wherein a number of features associated with the second node is based, at least in part, on a number of nodes participating in the distributed inference scheme. (Page 1632 states "For the action, the DDPG agent selects the sparsity status of each individual subgroup at each step, in which 0 and 1 are used to indicate the subgroup is removed or retained respectively." As the sparsity of status of the subgroup is equivalent to setting the subgroup of weights to 0 or non-zero, the number of non-zero weights are defined based on the sparsity status selected by the agent. Page 1632 further states "In detail, after the agent gives actions { a m } to all the subgroups of a layer, we will measure the sums of the transmission distances of all the cores. If the current policy exceeds our distance budget, the agent will first prune the subgroup with the longest total transmission distance which involves large communication overhead." Therefore, as the number of cores (nodes) is related to the sum of the transmission distances (more cores leads to higher transmission distances), the selection of the agent and thus the number of non-zero weights is based on a number of nodes for the neural network.) Regarding claim 7, the rejection of claim 1 is incorporated herein. Zou teaches wherein the processor is further configured to cause the node to take one or more actions based on the generated inference. (Page 1628 states "When the computation of the next layer is invoked, each core will have to get the feature maps from other cores in the previous layer to start its computation. In this case, every core has to broadcast its ofmaps to other cores through the NoC and also receive feature maps from others to synchronize the data." The ofmaps (output feature map) is interpreted as the generated inference. As they are broadcasted by each node, an action occurs based on the generated inference.) Regarding claim 8, the rejection of claim 1 is incorporated herein. Zou teaches wherein the neural network comprises a convolutional neural network, and the input comprises a feature map representing data to be processed using the convolutional neural network. (Fig. 6 on page 1630 shows the CNN (convolutional neural network) with input feature maps, which is data processed using the CNN.) Regarding claim 10, the rejection of claim 1 is incorporated herein. Zou teaches wherein a first portion of the neural network is executed on the node, and wherein other portions of the neural network are executed on nodes of the plurality of nodes other than the node. (Page 1636 states "As illustrated in Fig. 12, basically, a CNN layer can be partitioned in three different ways as in most prior works [28], [29]. Our method is based on the Partition I there, which replicates the ifmaps on different cores and divides the weights." Therefore, as the input feature maps are executed on different nodes, a first portion is executed on one node and other portions are executed on other nodes.) Regarding claim 11, Zou teaches A processor-implemented method by a node participating in a distributed inferencing scheme, comprising: (Page 1633 lists the processor that implements the method, and states "Our experimental platform is a simulated embedded 16core CMP with a mesh NoC. The cores can be general-purpose cores or specialized NNA cores [2], and the detailed configurations are listed in Table 2." The first core is interpreted as the node, which participates in a distributed inferencing schema as shown in Fig. 1, page 1627.) The remainder of claim 11 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis. Claims 12-15, 17-18, and 20 recite substantially similar subject matter to claims 2-5, 7-8, and 10 respectively, and are rejected with the same rationale, mutatis mutandis. Claim(s) 6 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (“CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures”, July 2022) and Guan (“DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search”, April 2022) as applied to claim 1 above, and further in view of Li (“Pruning Filters for Efficient ConvNets”, March 2017). Regarding claim 6, the rejection of claim 1 is incorporated herein. Zou teaches wherein the set of features associated with the second node of the plurality of nodes comprises features (See rejection of claim 1.) The combination of Zou and Guan does not appear to explicitly teach [wherein the pruned features comprise features] having a statistical norm that is less than a threshold value. However, Li—directed to analogous art—teaches [wherein the pruned features comprise features] having a statistical norm that is less than a threshold value. (Page 3 states "As shown in Figure 1, when a filter F i , j is pruned, its corresponding feature map x i + 1 , j is removed." Therefore, pruning filters is pruning the associated features. Page 3 states "We measure the relative importance of a filter in each layer by calculating the sum of its absolute weights ∑ | F i , j | , i.e., it’s l 1 -norm F i , j 1 . " Page 3 states "3. Prune m filters with the smallest sum values and their corresponding feature maps. The kernels in the next convolutional layer corresponding to the pruned feature maps are also removed." Therefore, as the m smallest sum values are removed, the threshold value is the m+1-th smallest value.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teaching of Zou and Guan with the teachings of Li because, as stated by Li on page 3, "Similar to the above work, we use `1-norm to select unimportant filters and physically prune them. Our fine-tuning process is the same as the conventional training procedure, without introducing additional regularization. Our approach does not introduce extra layer-wise meta-parameters for the regularizer except for the percentage of filters to be pruned, which is directly related to the desired speedup." Claim 16 recites substantially similar subject matter to claim 6 and is rejected with the same rationale, mutatis mutandis. Claim(s) 9 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (“CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures”, July 2022) and Guan (“DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search”, April 2022) as applied to claim 1 above, and further in view of Guo (“CMT: Convolutional Neural Networks Meet Vision Transformers”, June 2022). Regarding claim 9, the rejection of claim 1 is incorporated herein. The combination of Chen and Guan does not appear to explicitly teach wherein the neural network comprises a transformer neural network, and the input comprises one or more neuron vectors representing data to be processed using the transformer neural network. However, Gou—directed to analogous art—teaches wherein the neural network comprises a transformer neural network, and the input comprises one or more neuron vectors representing data to be processed using the transformer neural network. (Page 2 states "In this paper, we demonstrate the potential of combining the transformer based network together with convolutional layer, the overall architecture follows the elaborated prior convolutional neural networks such as ResNet [16] and EfficientNet [53]." Therefore, the neural network is both a convolutional neural network and a transformer. Page 4 states "In original self-attention module, the input X ∈ R n × d is linearly transformed into query Q ∈ R n × d , key K ∈ R n × d , and value V ∈ R n × d , where n = H × W is the number of patches." Therefore, the input is one or more neuron vectors. Note that the input of the MHSA, after being processed by the Lightweight MHSA is input to convolutional layers.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Zou with the teachings of Guan because, as Zou states on page 1, "Although there are many works successfully applying transformers for vision tasks, they have not shown satisfactory results compared to conventional CNNs, which are still the primary architectures for vision applications. Transformers are especially good at modeling long-range dependencies necessary for downstream vision tasks. However, locality should also be maintained for visual perception. In this paper, we demonstrate the potential of combining the transformer based network together with convolutional layer, the overall architecture follows the elaborated prior convolutional neural networks such as ResNet [16] and EfficientNet [53]." Note that the method of Zou is applied per layer, meaning one of ordinary skill in the art could parallelize the convolutional layers in the transformer neural network of Guo using the method of Zou. Claim 19 recites substantially similar subject matter to claim 9 and is rejected with the same rationale, mutatis mutandis. Claim(s) 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (“CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures”, July 2022) as applied to claim 1 above, and further in view of Li (“Pruning Filters for Efficient ConvNets”, March 2017). Regarding claim 22, the rejection of claim 21 is incorporated herein. Zou teaches identifying the non-diagonal rows in the respective mask matrix as rows having a size based on a number of output features and a number of nodes in the neural network (As the size of the mask matrix is based on the number of output features, the size of the non-diagonal rows is constrained by that number and therefore have a size based on a number of output features. Page 1631 states "Each line is mapped to a core which has two convolutional kernels sized as 2 2 32. The two kernels in one core is further divided into 16 groups." As the lines are mapped to cores, the size of the weight matrix for each core and therefore the size of the non-diagonal rows in the mask matrix depends on the number of cores.) and corresponding to a set of weights in the neural network defined based on the number of output features, a kernel size, and the number of nodes; (One of ordinary skill in the art would recognize that the size of a weight matrix depends on the number of output features and the kernel size. As the lines in the weight matrix are assigned to different cores, the size weight matrix will depend on the number of nodes.) calculating a sum for each non-diagonal row of the identified non-diagonal rows; and (Equation 3 on page 1630 shows that the sum of each group of the group lasso (non-diagonal rows) are calculated.) setting values in the mask matrix to 0 for non-diagonal rows in the identified non-diagonal rows (When the weights are sparsified, their values in the mask matrix are set to zero as shown in Fig. 7b. As the sparsification is based on the group lasso, the mask matrix has values set to 0 for the identified (by group lasso) non-diagonal rows. Zou does not appear to explicitly teach [pruning values] having calculated sums less than a threshold value. However, (Page 3 states "As shown in Figure 1, when a filter F i , j is pruned, its corresponding feature map x i + 1 , j is removed." Therefore, pruning filters is pruning the associated features. Page 3 states "We measure the relative importance of a filter in each layer by calculating the sum of its absolute weights ∑ | F i , j | , i.e., it’s l 1 -norm F i , j 1 . " Page 3 states "3. Prune m filters with the smallest sum values and their corresponding feature maps. The kernels in the next convolutional layer corresponding to the pruned feature maps are also removed." Therefore, as the m smallest sum values are removed, the threshold value is the m+1-th smallest value.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teaching of Zou and Guan with the teachings of Li because, as stated by Li on page 3, "Similar to the above work, we use `1-norm to select unimportant filters and physically prune them. Our fine-tuning process is the same as the conventional training procedure, without introducing additional regularization. Our approach does not introduce extra layer-wise meta-parameters for the regularizer except for the percentage of filters to be pruned, which is directly related to the desired speedup." Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M.. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /J.T.P./Examiner, Art Unit 2121 /Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action

Prosecution Timeline

Jun 28, 2023
Application Filed
Apr 22, 2026
Non-Final Rejection mailed — §101, §102, §103 (current)

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2
Expected OA Rounds
17%
Grant Probability
17%
With Interview (+0.0%)
4y 1m (~1y 2m remaining)
Median Time to Grant
Low
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month