DETAILED ACTION
This nonfinal action is in response to the amendment and remarks filed on 01/20/2026 for application 17/469,853.
Claims 1-20 remain pending in the application. Claims 1, 9, and 17 are independent claims.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/20/2026 has been entered.
Response to Amendment
The amendment filed 01/20/2026 has been entered.
Applicant’s amendment to the claims with respect to resolving claim objections has been considered, and the objections set forth in the office action mailed 10/20/2025 are consequently withdrawn.
Applicant’s amendment to the claims with respect to resolving rejections under 35 U.S.C. 112(a) has been considered, and the written description rejections set forth in the office action mailed 10/20/2025 are consequently withdrawn.
Claim Interpretation
As recited in MPEP § 2111, during patent examination, “the pending claims must
be given their broadest reasonable interpretation consistent with the specification”.
Under a broadest reasonable interpretation (BRI), claim terms must be given their plain
and ordinary meaning (i.e., the meaning that the term would have to a person of
ordinary skill in the art), unless applicant sets forth a special definition of a claim term
within the specification. The plain and ordinary meaning of a term “may be evidenced by
a variety of sources, including the words of the claims themselves, the specification,
drawings, and prior art”.
The claims recite “averaging channel outputs of intermediate nodes of a normal cell in a neural network architecture”, “selecting a maximum output from the intermediate nodes of the normal cell”, and “performing a weighted average of the channel outputs of the intermediate nodes of the normal cell”. The specification does not set forth explicit definitions of the terms “channel outputs”, “intermediate nodes”, or “normal cell”, or set forth an explicitly defined “averaging” procedure, “maximum” selection procedure, or “weighted averag[ing]” procedure.
As best understood under broadest reasonable interpretation in light of the specification [¶ 0003-0004], as well as typical usage of terminology in the prior art (see Kaushik, “Intuitive Explanation of Differentiable Architecture Search”), the terms “channel outputs”, “intermediate nodes” and “normal cell” are interpreted within the context of a typical differentiable architecture search (DARTS) architecture (i.e., type of neural network architecture), wherein feature maps of observed data are passed through cells of the model (as shown below – see Input Feature map and Output Feature map [Kaushik page 5]).
PNG
media_image1.png
460
622
media_image1.png
Greyscale
The term “normal cell” is thereby interpreted as encompassing a type of building block of the DARTS architecture that applies convolution and/or pooling operations to input feature maps, and the term ”intermediate nodes” is interpreted as encompassing the at least one or more units within each normal cell that sequentially apply operations to the feature maps. The term “channel outputs” is interpreted as encompassing the feature maps (of dimension H x W x C in which H is the height dimension, W is the width dimension, and C is the channel dimension [¶ 0004]) that are passed through (i.e., output by) normal cells (including their corresponding intermediate nodes therein), of the DARTS architecture.
As best understood under broadest reasonable interpretation in light of the specification [¶ 0024, 0030], as well as typical usage of terminology in the prior art, the limitations “averaging channel outputs”, “selecting a maximum output”, and “performing a weighted average” are respectively interpreted as encompassing performance of average pooling, max pooling, and weighted average pooling as pooling operations on feature maps (i.e., channel outputs) within the DARTS architecture .
The claims further recite “forming an output node for a first layer of cells in a neural network architecture based on averaging the channel outputs of the intermediate nodes of the normal cell, the output node including a channel dimension that is one-fourth of a channel dimension of the normal cell”. Based on a review of typical terminology usage in the prior art (see Kaushik, “Intuitive Explanation of Differentiable Architecture Search”), the examiner notes that within a typical DARTS architecture, normal cells compute feature maps of equivalent dimension (H x W x C) to input feature maps, and it is “reduction cells”, not normal cells, that apply operations which would then reduce dimensions of input feature maps, as shown below [Kaushik pages 7-8]. The specification also does not explicitly set forth a special definition of a normal cell that would thereby perform said channel dimension reduction.
PNG
media_image2.png
616
567
media_image2.png
Greyscale
PNG
media_image3.png
330
1425
media_image3.png
Greyscale
The specification also does not describe “reduction cells”, or any other cells/modules besides the normal cell, in the DARTS architecture. The specification also does not set forth an explicit definition for what comprises an “output node” or “[first] layer of cells” in the DARTS architecture.
It is also typical within a DARTS architecture for output to be passed from cell to cell in a stacked architecture, e.g., wherein a cell receives, as input, the output from the last two cells (including their corresponding intermediate nodes therein). However, structural details regarding how data is passed between cells of the DARTS architecture are not explicitly discussed within the specification.
As best understood under broadest reasonable interpretation in light of the specification, as well as typical usage of terminology in the prior art, the term “layer of cells” is thereby interpreted as encompassing a combination of any one or more cells/modules (e.g., including normal cells and/or reduction cells) within the DARTS architecture, and the limitation “forming an output node” is interpreted as encompassing production, via the nodes of cells within a layer of cells, of an output feature map which is then passed as input to following cells in the DARTS architecture. As per the specification [¶ 0019], a “first” layer of cells is interpreted as merely referring to a particular combination of cells via a label, and does not imply any type of ordering (i.e., “first layer of cells” can be any combination of cells within the DARTS architecture, not necessarily the initial/input layer).
As best understood under broadest reasonable interpretation in light of the specification, as well as typical usage of terminology in the prior art, a “channel dimension of a normal cell” is interpreted as encompassing the channel dimension, at some point in time, of a feature map being passed through nodes of the normal cell, and the output node “including a channel dimension that is one-fourth of a channel dimension of the normal cell” is interpreted as a channel dimension reduction procedure (of one-fourth ratio) being performed at some point within the operations of the respective layer of cells that eventually forms an output feature map.
The claims further recite “forming channels of input nodes for a second layer of cells”. The specification does not set forth explicit definitions of “channels” of “nodes”, particularly in relation to, e.g., “channel outputs” of “nodes”, or set forth an explicit definition of “input nodes”.
As best understood under broadest reasonable interpretation in light of the specification, as well as typical usage of terminology in the prior art, the limitation “forming channels of input nodes” is interpreted as encompassing further steps of production (e.g., preprocessing using a 1 x 1 convolution), via the nodes of cells within a layer of cells, of an output feature map (including its corresponding channels therein), before being passed as input to following cells (i.e., second layer, including its corresponding cells and nodes therein) in the DARTS architecture.
The claims further recite “a number of output channels of the first layer” and “a number of input channels of the first layer”, but does not set forth an explicit definition of “input channels” or “output channels” of a “layer”.
As best understood under broadest reasonable interpretation in light of the specification, as well as typical usage of terminology in the prior art, the terms “input channels” and “output channels” are interpreted as encompassing feature maps (including the corresponding number of channels therein, at any given point in time) that are passed between (i.e., input to, or output by) the cells within layers of the DARTS architecture.
The claims further recite “a first predetermined number of intrinsic feature maps for the first layer”. The specification does not set forth an explicit definitions of the terms “intrinsic feature map”, particularly in relation to, e.g., “channels” of “nodes”.
As best understood under broadest reasonable interpretation in light of the specification, as well as typical usage of terminology in the prior art, the term “intrinsic feature maps” is interpreted as encompassing feature map representations of the input data (i.e., intrinsic to the observed data itself) that are passed through the layers (and corresponding cells therein) of the DARTS architecture.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).
Independent Claims (Claim 1, Claim 9, Claim 17):
Step 1: Claim 1 is drawn to a method, claim 9 is drawn to a method, and claim 17 is drawn to a method. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 1, 9, and 17 each recite a judicially recognized exception of an abstract idea.
Claim 1 recites, inter alia:
averaging channel outputs of nodes of a cell – Wherein channel outputs of nodes encompass feature maps, i.e., 3D tensors (height x width x channel) representing detected features of observed data (see Claim Interpretation above), this limitation further amounts to processing data via performing mathematical tensor operations (e.g., average pooling (averaging channel outputs)) to determine output values, and therefore recites mathematical calculations within an abstract mathematical procedure.
forming groups of the channel outputs; – This limitation amounts to merely organizing observed feature maps into groups to prepare the data for further processing, and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper.
forming an average channel output for each of the groups of the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., average pooling (forming an average channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
generating an output node for a first layer of cells based on averaging the channel outputs of the nodes of the cell, the output node including a first channel dimension that is one-fourth of a second channel dimension of the cell; – This limitation further amounts to processing data via performing mathematical tensor operations, including average pooling (averaging the channel outputs) and dimensionality reduction (channel dimension that is one-fourth of a channel dimension), to determine output values (forming an output node), and therefore recites mathematical calculations within an abstract mathematical procedure.
forming channels of input nodes for a second layer of the cells by preprocessing the output node using a 1 x 1 convolution, – This limitation further amounts to processing data via performing mathematical tensor operations, including pointwise convolution (using a 1 x 1 convolution), to determine output values (forming channels of input nodes for a second layer), and therefore recites mathematical calculations within an abstract mathematical procedure.
Claim 9 recites substantially similar abstract idea limitations to those recited in claim 1 and further recites, inter alia:
selecting a maximum channel output from nodes of a cell; – Wherein channel outputs of nodes encompass feature maps, i.e., 3D tensors (height x width x channel) representing detected features of observed data (see Claim Interpretation above), this limitation further amounts to processing data via performing mathematical tensor operations (e.g., max pooling (selecting a maximum channel output)) to determine output values, and therefore recites mathematical calculations within an abstract mathematical procedure.
Claim 17 recites substantially similar abstract idea limitations to those recited in claim 1 and further recites, inter alia:
performing a weighted average of channel outputs of nodes of a cell – Wherein channel outputs of nodes encompass feature maps, i.e., 3D tensors (height x width x channel) representing detected features of observed data (see Claim Interpretation above), this limitation further amounts to processing data via performing mathematical tensor operations (e.g., weighted average pooling (performing a weighted average)) to determine output values, and therefore recites mathematical calculations within an abstract mathematical procedure.
Step 2A Prong 2: The following additional elements recited in claims 1, 9, and 17 do not integrate the recited judicial exceptions into a practical application.
Claim 1 additionally recites:
generating intermediate nodes of a normal cell in a neural network architecture, the intermediate nodes of the normal cell including channel inputs and channel outputs – This limitation invokes generic components (intermediate nodes, normal cell) of a differentiable architecture search (DARTS) space, wherein DARTS is a type of neural architecture search (NAS) (i.e., neural network architecture), as elements to be manipulated through the recited mathematical correlations (operations of channel inputs/outputs). It thereby does no more than generally link the recited abstract mathematical procedure to the technological environment of NAS architectures.
a first layer of cells in the neural network architecture; a second layer of cells in the neural network architecture, the second layer being immediately subsequent to the first layer – Wherein passing information across layers is an intrinsic property of, e.g., feedforward neural networks, this limitation thereby does no more than generally link the recited abstract mathematical procedure to the technological environment of feedforward neural architectures.
Claims 9 and 17 recite substantially similar additional elements to those found in claim 1, and therefore also do not integrate the recited judicial exceptions into a practical application.
Step 2B: The additional elements recited in claims 1, 9, and 17, viewed individually or as an ordered combination, do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
Claim 1 additionally recites:
generating intermediate nodes of a normal cell in a neural network architecture, the intermediate nodes of the normal cell including channel inputs and channel outputs – Generically invoking components (intermediate nodes, normal cell) of a differentiable architecture search (DARTS) space, and generally linking the recited abstract mathematical procedure to the technological environment of NAS architectures, does not provide an inventive concept or significantly more to the recited abstract idea. Further, applying DARTS techniques to varied NAS tasks is well-understood, routine, and conventional activity (see Ren et al., “A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions”, [pages 10-14]).
a first layer of cells in the neural network architecture; a second layer of cells in the neural network architecture, the second layer being immediately subsequent to the first layer – Generally linking the recited abstract mathematical procedure to the technological environment of feedforward neural architectures does not provide an inventive concept or significantly more to the recited abstract idea.
Claims 9 and 17 recite substantially similar additional elements to those found in claim 1, and therefore also do not provide an inventive concept or significantly more to the recited abstract idea.
Even when considered as an ordered combination, the additional elements recited in the claims ultimately do no more than place the claims in the context of generically applying an abstract mathematical procedure of processing data to a neural architecture search space (e.g., DARTS). As such, claims 1, 9, and 17 are not patent eligible.
Dependent Claims (Claims 2-8, Claims 10-16, Claims 18-20):
Dependent claims 2-8, 10-16, and 18-20 narrow the scope of independent claims 1, 9, and 17, and thus merely narrow the recited judicial exceptions. With respect to the independent claims, the recited judicial exceptions are not meaningfully integrated into a practical application, and also do not amount to significantly more than the recited abstract ideas themselves. The dependent claims recite abstract idea limitations similar to those recited within the independent claims, as they also do not provide anything more than mathematical concepts or mental processes that are capable of being performed in the human mind and/or using pen and paper. The dependent claims also do not recite any further additional elements that successfully integrate the recited judicial exceptions into a practical application or amount to significantly more than the recited abstract ideas themselves. Consequently, claims 2-8, 10-16, and 18-20 are also rejected under 35 U.S.C. 101.
Step 1: Claims 2-8 are drawn to a method, claims 10-16 are drawn to a method, and claims 18-20 are drawn to a method. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 2-8, 10-16, and 18-20 each recite a judicially recognized exception of an abstract idea.
Claim 2 recites, inter alia:
concatenating the average channel output for each of the groups of the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., concatenation (concatenating the average channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 3 recites, inter alia:
changing a number of output channels of the first layer with respect to a number of input channels of the first layer – Wherein parent claim 1 already recites processing data via an abstract mathematical procedure (see Step 2A Prong 1 analysis of claim 1 above), this limitation further amounts to organizing and manipulating data through established mathematical correlations, i.e., further manipulating an established mathematical relationship between input (number of input channels) and output (number of output channels) variables.
Claim 4 recites, inter alia:
increasing the number of output channels of the first layer with respect to the number of input channels of the first layer – Wherein parent claim 1 already recites processing data via an abstract mathematical procedure (see Step 2A Prong 1 analysis of claim 1 above), this limitation further amounts to organizing and manipulating data through established mathematical correlations, i.e., further manipulating an established relationship between input (number of input channels) and output (number of output channels) variables.
Claim 5 recites, inter alia:
selecting a maximum output from the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., max pooling (selecting a maximum channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 6 recites, inter alia:
performing a weighted average of the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., weighted average pooling (performing a weighted average)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 7 recites, inter alia:
batch normalizing the output node – This limitation amounts to processing data via performing mathematical operations (e.g., normalization (batch normalizing)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 8 recites, inter alia:
using one or more linear transformation operators to generate a second predetermined number of correlated or redundant output nodes for the first layer – This limitation amounts to processing data via performing linear operations (using one or more linear transformation operators) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 10 recites, inter alia:
forming groups of the channel outputs; – This limitation amounts to merely organizing observed feature maps into groups to prepare the data for further processing, and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper.
selecting a maximum output for each of the groups of the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., max pooling (selecting a maximum channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 11 recites, inter alia:
changing a number of output channels of the first layer with respect to a number of input channels of the first layer – Wherein parent claim 9 already recites processing data via an abstract mathematical procedure (see Step 2A Prong 1 analysis of claim 9 above), this limitation further amounts to organizing and manipulating data through established mathematical correlations, i.e., further manipulating an established mathematical relationship between input (number of input channels) and output (number of output channels) variables.
Claim 12 recites, inter alia:
increasing the number of output channels of the first layer with respect to the number of input channels of the first layer – Wherein parent claim 9 already recites processing data via an abstract mathematical procedure (see Step 2A Prong 1 analysis of claim 9 above), this limitation further amounts to organizing and manipulating data through established mathematical correlations, i.e., further manipulating an established mathematical relationship between input (number of input channels) and output (number of output channels) variables.
Claim 13 recites, inter alia:
averaging the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., average pooling (forming an average channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 14 recites, inter alia:
performing a weighted average of the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., weighted average pooling (performing a weighted average)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 15 recites, inter alia:
batch normalizing the output node – This limitation amounts to processing data via performing mathematical operations (e.g., normalization (batch normalizing)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 16 recites, inter alia:
using one or more linear transformation operators to generate a second predetermined number of correlated or redundant output nodes for the first layer – This limitation further amounts to processing data via performing linear operations (using one or more linear transformation operators) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 18 recites, inter alia:
concatenating the weighted-average channel output for each of the groups of the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., concatenation (concatenating the average channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 19 recites, inter alia:
averaging the channel outputs; – This limitation amounts to processing data via performing mathematical tensor operations (e.g., average pooling (forming an average channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Claim 20 recites, inter alia:
selecting a maximum output from the channel outputs – This limitation amounts to processing data via performing mathematical tensor operations (e.g., max pooling (selecting a maximum channel output)) to determine output values, and therefore further recites mathematical calculations within an abstract mathematical procedure.
Step 2A Prong 2: Claims 2-7, 10-15, and 18-20 do not recite any further additional elements besides those already recited in the independent claims, and the following additional elements further recited in claims 8 and 16 also do not integrate the recited judicial exceptions into a practical application.
Claim 8 additionally recites:
generating a first predetermined number of intrinsic feature maps for the first layer; – In parent claim 1, the received channel outputs of nodes are already interpreted as encompassing feature maps (see Claim Interpretation above); this limitation thereby amounts to no more than merely specifying steps with respect to the gathering of input data (i.e., feature maps being intrinsic to observed data) for further processing via the recited abstract mathematical procedure. It therefore recites insignificant extra-solution activity.
Claim 16 additionally recites:
generating a first predetermined number of intrinsic feature maps for the first layer; – In parent claim 9, the received channel outputs of nodes are already interpreted as encompassing feature maps (see Claim Interpretation above); this limitation thereby amounts to no more than merely specifying steps with respect to the gathering of input data (i.e., feature maps being intrinsic to observed data) for further processing via the recited abstract mathematical procedure. It therefore recites insignificant extra-solution activity.
Step 2B: The additional elements recited in claims 8 and 16, viewed individually or as an ordered combination, do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
Claim 8 additionally recites:
generating a first predetermined number of intrinsic feature maps for the first layer; – Receiving and transmitting data for further processing is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 16 additionally recites:
generating a first predetermined number of intrinsic feature maps for the first layer; – Receiving and transmitting data for further processing is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Even when considered as an ordered combination, the additional elements recited in the claims ultimately do no more than place the claims in the context of generically applying an abstract mathematical procedure for processing feature maps of observed data. As such, claims 2-8, 10-16, and 18-20 also are not patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 9-14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Nakai, as applied to claims 1, 9, and 17 above, in view of Wang et al., (“G-DARTS-A: Groups of Channel Parallel Sampling with Attention”, available arXiv 10/16/2020), hereinafter Wang.
Regarding claim 1, Nakai teaches A method (“In this work, we proposed Att-DARTS, a differentiable architecture search that finds cells with attention modules” [Nakai page 8 Conclusion]) comprising:
generating intermediate nodes of a normal cell in a neural network architecture, the intermediate nodes of the normal cell including channel inputs and channel outputs; (“For building a neural architecture, one has to design it manually by following a trial-and-error approach. Manually designing architectures requires a considerable amount of expertise and time to ensure that they achieve state-of-the-art performances. The goal of neural architecture search (NAS) is to automate
this time-consuming and error-prone process [2]” [Nakai page 1 Introduction]; “At the search stage, we searched for cells. We followed the same experimental setting as that of the original study on DARTS unless otherwise stated [10]… We set the number of cells L = 8; two reduction cells and six normal cells” [Nakai page 4 Experimental Details]; “As with many NAS works, the entire network consists of repeated cells, and Att-DARTS searches the good cells. Each cell ck is expressed as a directed acyclic graph with N nodes, two of which are inputs and one is the output. The remaining N - 3 nodes are intermediate nodes. Each node xi is a feature map…At the search stage, Att-DARTS identifies an operation in the operation space O, and an attention module in the attention module space A. When the focus is on the edge from node xi to node xj, each candidate operation o(.) ∈ O has a relative weight so(I,j)” [Nakai pages 3-4 Architecture Search with Attention Modules]; By definition, neural architecture search (NAS) methods (e.g., Att-DARTS) automate the creation of new network architectures, and thereby generate the architectures themselves and their elements therein. As is typical in a DARTS architecture, nodes (including intermediate nodes) xi of cells (including normal cells) ck in Att-DARTS receive and pass feature maps to one another (i.e., channel inputs and outputs) via edges of a directed graph (e.g., see Fig. 2 Illustration of Att-DARTS [Nakai page 3]))
averaging the channel outputs of the intermediate nodes of the normal cell in a neural network architecture; (“At the search stage, we searched for cells. We followed the same experimental setting as that of the original study on DARTS unless otherwise stated [10]… We set the number of cells L = 8; two reduction cells and six normal cells” [Nakai page 4 Experimental Details]; “As with many NAS works, the entire network consists of repeated cells, and Att-DARTS searches the good cells. Each cell ck is expressed as a directed acyclic graph with N nodes, two of which are inputs and one is the output. The remaining N - 3 nodes are intermediate nodes. Each node xi is a feature map…At the search stage, Att-DARTS identifies an operation in the operation space O, and an attention module in the attention module space A. When the focus is on the edge from node xi to node xj, each candidate operation o(.) ∈ O has a relative weight so(I,j)” [Nakai pages 3-4 Architecture Search with Attention Modules]; “The operation space O was the same as that of DARTS: Identity, 3x3 and 5x5 separable convolutions, 3x3 and 5x5 dilated separable convolutions, 3x3 max pooling, 3x3 average pooling, and zero” [Nakai page 4 Architecture Search Space]; As is typical in a DARTS architecture, nodes (including intermediate nodes) xi of cells (including normal cells) ck in Att-DARTS pass feature maps to one another (i.e., channel outputs) via edges of a directed graph (e.g., see Fig. 2 Illustration of Att-DARTS [Nakai page 3]), wherein each edge represents a candidate operation from the operation space O (e.g., average pooling) that is performed on received feature maps (i.e., averaging channel outputs (see Claim Interpretation above))
generating an output node for a first layer of cells in the neural network architecture based on averaging the channel outputs of the intermediate nodes of the normal cell, (“Att-DARTS assumes a CNN composed of repeatedly stacked cells similar to most existing NAS works; however, it inserts an attention module after each operation” [Nakai page 2 Introduction]; “As with many NAS works, the entire network consists of repeated cells, and Att-DARTS searches the good cells. Each cell ck is expressed as a directed acyclic graph with N nodes, two of which are inputs and one is the output. The remaining N - 3 nodes are intermediate nodes. Each node xi is a feature map. The input nodes are obtained from the output nodes of the previous two cells ck-1 and ck-2. The output node is defined as the depth-wise concatenation of all intermediate nodes in the cell.” [Nakai page 3 Architecture Search with Attention Modules]; “We set the number of cells L = 8; two reduction cells and six normal cells. The reduction cells were inserted into the 1/3 and 2/3 locations of the entire network” [Nakai page 4 Experimental Details]; Similarly to a typical DARTS architecture, the output node of a cell is formed based on receiving outputted feature maps from previous cells, and determining (e.g., via depth-wise concatenation) output of the current cell based on the result of operations on feature maps (i.e., channel outputs) performed by its intermediate nodes (including, e.g., average pooling). A combination of cells within the CNN (e.g., two normal cells and a reduction cell that comprise 1/3 of the network) is thereby interpretable as a layer of cells that eventually produces an output feature map as a result of the operations of their respective nodes) the output node including a first channel dimension that is one-fourth of a second channel dimension of the normal cell; (“Att-DARTS assumes a CNN composed of repeatedly stacked cells similar to most existing NAS works; however, it inserts an attention module after each operation” [Nakai page 2 Introduction]; see Fig. 1 including operation –> attention module –> output and candidate attention modules including Squeeze-and-Excitation, Gather-Excite, Bottleneck Attention Module – “Fig. 1: An overview of Att-DARTS. We propose applying not only an operation (convolution or pooling) but also an attention module after the operation is applied” [Nakai page 1]; “Fig. 3 (e) shows a double-attention block (A2-block)…Each attention is obtained by the softmax function following a pointwise convolution that reduces the number of channels from C to N…Further, double-attention block applies a pointwise convolution to the input to reduce the number of channels from C to M, and it applies bilinear pooling using the attention maps…Reduced channel numbers M and N in the double-attention block were set to C/4” [Nakai page 6 Architecture Search Space]; After performing operations on feature maps of intermediate nodes (i.e., channel outputs), Att-DARTS includes an attention module (e.g., double-attention block) that reduces the received number of channels C (i.e., channel dimension of a normal cell) to C/4 as part of the procedure of eventually producing an output feature map) and
forming channels of input nodes for a second layer of the cells in the neural network architecture by preprocessing the output node using a 1 x 1 convolution (“Further, double-attention block applies a pointwise convolution to the input to reduce the number of channels from C to M… Finally, pointwise convolution restores the number of channels from M to C” [Nakai page 6 Architecture Search Space]; After channel reduction, the double-attention block re-applies pointwise (i.e., 1x1) convolution to restore number of channels as part of the procedure of eventually producing an output feature map that is then passed as input to subsequent cells), the second layer being immediately subsequent to the first layer ([Nakai page 2 Introduction] and [Nakai page 3 Architecture Search with Attention Modules]; Similarly to a typical DARTS architecture, the outputted feature maps of a cell, or layer of cells, is passed as input to subsequent cells (i.e., second layer of cells)).
However, Nakai does not expressly teach wherein averaging the channel outputs of the intermediate nodes of the normal cell comprises forming groups of the channel outputs of the intermediate nodes of the normal cell and forming an average channel output for each of the groups of the channel outputs.
In the same field of endeavor, Wang discloses a method of learning a high-performance network architecture derived from convolutional neural networks (“…we propose an approach named Group-DARTS with Attention (G-DARTS-A), using multiple groups of channels for searching. Inspired by the partially sampling strategy of PC-DARTS, we use groups channels to sample the super-network to perform a more efficient search while maintaining the relative integrity of the network information” [Wang Abstract]; DARTS is mentioned for the first time as the baseline of this work. Its search network is composed of a neural network formed by stacking L cells. Each cell can be regarded as a directed acyclic graph (DAG) connected by N nodes…Each pair of nodes (i; j) is connected by edge E(i;j) which is associated with different candidate structural operations, such as convolutional layer and pooling layer” [Wang page 3 Preliminaries of DARTS]) that form[s] groups of the channel outputs of the intermediate nodes of the normal cell; (“In this work, we recommend using multiple channel group to design a super network, which named Channel Group Parallel Sampling method depicted in figure 1. In each cell, we set M edges on each pair of nodes, and each edge on the same connection shares weight. the method divides the feature maps into M groups that are seeded to the corresponding edges. Finally, the number of channels on each edge and M can be adjusted flexibly to adapt to various application scenarios” [Wang page 4 Channel Group Parallel Sampling for Searching]; see M groups in Figure 1 – “Figure 1: Illustration of Channel Group Parallel Sampling (best viewed in color), In the sprite of PC-DARTS, we show the propagation process of information between nodes” [Wang page 5]; As illustrated in Fig. 1, wherein nodes of cells pass information (i.e., channel outputs) between one another, the information is split up into groups for parallel processing); and
form[s] an average channel output for each of the groups of the channel outputs; (“We implement the core methods of G-DARTS-A on DARTS and PC-DARTS…DARTS-with-Attention for balance between each group of channels, we followed its defined search space which contains 8 operations, i.e., 3 * 3 average-pooling,…” [Wang page 7 Implementation Details]; see avg_pool_3x3 operations between nodes in Figure 1 [Wang page 5] which process groups of channels in parallel fashion).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated forming groups of the channel outputs of the intermediate nodes of the normal cell; and forming an average channel output for each of the groups of the channel outputs; as taught by Wang into Nakai because they are both directed towards learning high-performance network architectures derived from convolutional neural networks. Incorporating the teachings of Wang would improve the method of Nakai by helping to mitigate overfitting (“In this work, we found that the backbone provided by DARTS is prone to overfitting. To mitigate this problem, we propose an approach named Group-DARTS with Attention (G-DARTS-A)” [Wang Abstract]).
Regarding claim 2, the combination of Nakai and Wang teaches the limitations of parent claim 1, and Wang further teaches concatenat[es] the average channel output for each of the groups of the channel outputs (see Figure 1: Illustration of Channel Group Parallel Sampling [Wang page 5] and Figure 2: Representation of our weights [Wang page 6]; Results from each group are concatenated (see Concat operation in Figure 1 and Figure 2) to form output).
Regarding claim 3, the combination of Nakai and Wang teaches the limitations of parent claim 1, and Nakai further teaches changing a number of output channels of the first layer with respect to a number of input channels of the first layer (“The reduction cells were inserted into the 1/3 and 2/3 locations of the entire network. Operations were with stride 1 in the normal cells and with stride 2 in the reduction cells; hence, the image size was halved at the reduction cells. Each cell consisted of N = 7 nodes. The initial number of channels was set to 16, and it was doubled at the reduction cells” [Nakai page 4 Experimental Details])
Regarding claim 4, the combination of Nakai and Wang teaches the limitations of parent claim 3, and Nakai further teaches wherein changing the number of output channels of the first layer comprises increasing the number of output channels of the first layer with respect to the number of input channels of the first layer ([Nakai page 4 Experimental Details] as detailed in claim 3 above).
Regarding claim 5, the combination of Nakai and Wang teaches the limitations of parent claim 1, and Nakai further teaches wherein forming the output node comprises selecting a maximum output from the intermediate nodes of the normal cell (“The operation space O was the same as that of DARTS: Identity, 3x3 and 5x5 separable convolutions, 3x3 and 5x5 dilated separable convolutions, 3x3 max pooling, 3x3 average pooling, and zero” [Nakai page 4 Architecture Search Space]; see Claim Interpretation above).
Regarding claim 6, the combination of Nakai and Wang teaches the limitations of parent claim 1, and Nakai further teaches wherein forming the output node comprises performing a weighted average of the channel outputs of the intermediate nodes of the normal cell. (“The operation space O was the same as that of DARTS: Identity, 3x3 and 5x5 separable convolutions, 3x3 and 5x5 dilated separable convolutions, 3x3 max pooling, 3x3 average pooling, and zero” [Nakai page 4 Architecture Search Space]; “During the search process, Att-DARTS optimizes the relative weights of both operations and attention modules” [Nakai page 2 Introduction]; “At the search stage, Att-DARTS identifies an operation in the operation space O, and an attention module in the attention module space A. When the focus is on the edge from node xi to node xj, each candidate operation o(.) ∈ O has a relative weight so(I,j)” [Nakai page 4 Architecture Search with Attention Modules]; see Claim Interpretation above).
Regarding claims 9-14, they are method claims that correspond to the methods of claims 1-6, which are already taught by the combination of Nakai and Wang as detailed above. Nakai further teaches selecting a maximum channel output from intermediate nodes of a normal cell in a neural network architecture ([Nakai page 4 Architecture Search Space] as detailed in claim 5 above). Consequently, claims 9-14 are rejected for the same reasons as claims 1-6 above.
Regarding claims 17-18 and 19-20, they are method claims that correspond to the methods of claims 1-2 and 5-6, which are already taught by Nakai and Wang detailed above. Nakai further teaches performing a weighted average of channel outputs of intermediate nodes of a normal cell in a neural network architecture; ([Nakai page 4 Architecture Search Space] and [Nakai page 2 Introduction] and [Nakai page 4 Architecture Search with Attention Modules] as detailed in claim 6 above). Consequently, claims 17-18 and 19-20 are rejected for the same reasons as claims 1-2 and 5-6 above.
Claims 7-8 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Nakai and Wang, as applied to claims 1 and 9 above, in view of Han et al., (“GhostNet: More Features from Cheap Operations”, available arXiv 03/13/2020), hereinafter Han.
Regarding claim 7, the combination of Nakai and Wang teaches the limitations of parent claim 1.
However, the combination does not expressly teach batch normalizing the output node.
In the same field of endeavor, Han discloses a method of learning a high-performance network architecture derived from convolutional neural networks (“The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks…Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance [Han Abstract]) that batch normaliz[es] the output node (The second Ghost module reduces the number of channels to match the shortcut path. Then the shortcut is connected between the inputs and the outputs of these two Ghost modules. The batch normalization (BN) [25] and ReLU nonlinearity are applied after each layer” [Han page 4 Ghost Bottlenecks]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated batch normalizing the output node as taught by Han into the combination because both Nakai and Han are directed towards learning high-performance network architectures derived from convolutional neural networks. It is known in the art that batch normalization procedure can provide benefits such as higher learning rates and regularization (see Ioffe et al., (“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, 2015) [Abstract]), and can also be applied to convolutional neural networks ([Ioffe pages 4-5 Batch-Normalized Convolutional Networks]). Additionally, Han explicitly states that its disclosed architecture can be taken as a “plug-and-play component to upgrade existing convolutional neural networks” [Han Abstract]; therefore, incorporating the teachings of Han into Nakai would improve the method of Nakai by directly “upgrading” its underlying convolutional neural network architecture.
Regarding claim 8, the combination of Nakai and Wang teaches the limitations of parent claim 1, and Nakai further teaches generating a first predetermined number of intrinsic feature maps for the first layer; (“In this section, we propose a method—Att-DARTS—that searches for a neural architecture using attention modules, as illustrated in Fig. 2…Each cell ck is expressed as a directed acyclic graph with N nodes, two of which are inputs and one is the output…Each node xi is a feature map” [Nakai page 3 Architecture Search with Attention Modules]; “We set the number of cells L = 8; two reduction cells and six normal cells. The reduction cells were inserted into the 1/3 and 2/3 locations of the entire network…Each cell consisted of N = 7 nodes. The initial number of channels was set to 16, and it was doubled at the reduction cells” [Nakai page 4 Experimental Details]).
However, the combination does not expressly teach using one or more linear transformation operators to generate a second predetermined number of correlated or redundant output nodes for the first layer.
In the same field of endeavor, Han discloses a method of learning a high-performance network architecture derived from convolutional neural networks ([Han Abstract]) that us[es] one or more linear transformation operators to generate a second predetermined number of correlated or redundant output nodes for the first layer (“In this paper, we introduce a novel Ghost module to generate more features by using fewer parameters. Specifically an ordinary convolutional layer in deep neural networks will be split into two parts. The first part involves ordinary convolutions but their total number will be rigorously controlled. Given the intrinsic feature maps from the first part, a series of simple linear operations are then applied for generating more feature maps” [Han page 2 Introduction]; “Given the widely existing redundancy in intermediate feature maps calculated by mainstream CNNs as shown in Figure 1, we propose to reduce the required resources, i.e. convolution filters used for generating them. In practice, given the input data X ∈ Rc_h_w, where c is the number of input channels and h and w are the height and width of the input data, respectively, the operation of an arbitrary convolutional layer for producing n feature maps can be formulated as [equation 1]…Y ∈ Rh’_w’_n is the output feature map with n channels” [Han pages 2-3 Ghost Module for More Features]; see Figure 2 – “Figure 2. An illustration of the convolutional layer and the proposed Ghost module for outputting the same number of feature maps” [Han page 3]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated using one or more linear transformation operators to generate a second predetermined number of correlated or redundant output nodes for the first layer as taught by Han into the combination because both Nakai and Han are directed towards learning high-performance network architectures derived from convolutional neural networks. Incorporating the teachings of Han into Nakai would improve the method of Nakai by leveraging the value of redundant information in feature maps for learning the neural network architecture (“Abundant and even redundant information in the feature maps of well-trained deep neural networks often guarantees a comprehensive understanding of the input data…Redundancy in feature maps could be an important characteristic for a successful deep neural network. Instead of avoiding the redundant feature maps, we tend to embrace them, but in a cost-efficient way” [Han page 2 Introduction]).
Regarding claims 15-16, they are method claims that correspond to the method of claims 7-8, which are already taught by the combination of Nakai, Wang, and Han as detailed above. Consequently, claims 15-16 are rejected for the same reasons as claims 7-8.
Response to Arguments
The remarks filed 01/20/2026 have been fully considered.
Applicant’s remarks traversing the non-eligible subject matter rejections under 35 U.S.C. 101 set forth in the office action mailed 10/20/2025, in view of claims 1-20 as amended, have been considered but are not persuasive. Applicant’s arguments are further summarized and addressed below.
Applicant argues [Remarks pages 8-9] that recited limitations of “averaging channel outputs”, “selecting a maximum channel output”, and “performing a weighted average of the channel outputs” do not correspond to mathematical operations/calculations, and that “forming groups of the channel outputs” does not correspond to a mathematical method.
The examiner respectfully disagrees, and notes that per a broadest reasonable interpretation, the recited averaging, maximum selection, and weighted averaging steps do indeed recite mathematical calculations performed on numerical values or variables (i.e., channel outputs, which are at least representable as 3D tensors of dimension H x W x C (see Claim Interpretation above)). Applicant has not provided an explanation beyond conclusory assertion to support their stance that the limitations at issue do not correspond to mathematical operations/calculations. The examiner also notes that the particular step of “forming groups of the channel outputs” was analyzed based on the context of mental processes, and that it is proper, per MPEP guidelines (see MPEP § 2106.04(II)(B)), to consider mathematical concept and mental process steps in tandem as a single abstract procedure.
Applicant argues [Remarks page 9] that the claims are integrated into a practical application via improvement to conventional technology, and cites to the specification [¶ 0023] to explain how the described system generates a reduction in computational requirements with only a 0.15% reduction in accuracy.
The examiner respectfully disagrees, and notes that per MPEP guidelines (see MPEP § 2106.04(d)(1)), “The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology”. The cited portion of the specification does not appear to provide a technical explanation of how the claimed reduction in computational cost is achieved by claimed features, beyond mere assertion of improvement and generic reference to “shrinkage techniques” and a “reduction in parameters”. It thereby appears to ascribe the achieved computational efficiency to the mere implementation of recited abstract calculation steps. However, merely claiming the speed, or improved “computational efficiency”, inherent with implementing an abstract idea on a computer or computational model, without providing actual details of technical implementation, does not adequately support an improvement to conventional operation of a machine learning model, and thereby does not provide integration into a practical application.
Applicant has not presented further arguments with respect to the dependent claims. As such, amended claims 1-20 stand rejected under 35 U.S.C. 101.
Applicant’s remarks traversing the prior art rejections under 35 U.S.C. 102 and 35 U.S.C. 103 set forth in the office action mailed 10/20/2025, in view of claims 1-20 as amended, have been considered but are not persuasive. Applicant’s arguments are further summarized and addressed below.
Applicant argues [Remarks pages 10-11] that the cited portion of Wang (particularly, a “multiple channel group”) does not correspond to claim limitations at issue ("averaging the channel outputs of the intermediate nodes of the normal cell by: forming groups of the channel outputs of the intermediate nodes of the normal cell; and forming an average channel output for each of the groups of the channel outputs") because use of a multiple channel group is not a formed group of channel outputs.
The examiner respectfully disagrees, and notes that references must be considered for all that they contain and would suggest to one of ordinary skill in the art. As expressly depicted in Fig. 1 of Wang [page 5] and further explained in the rejection above, the propagated information (i.e., channel outputs) between nodes is split into groups of channels which are passed therein through operations of nodes (incl. avg_pool_3x3) in a parallel fashion. When considered as a whole, Wang can thereby be broadly interpreted to teach the limitations at issue.
Applicant argues [Remarks page 11] that a cited portion of Nakai does not correspond to claim limitations at issue (“generating intermediate nodes of
a normal cell in a neural network architecture, the intermediate nodes of the normal cell
including channel inputs and channel outputs”) because an existing/remaining node is not a generated intermediate node.
The examiner respectfully disagrees, and notes that as explained in the rejection above, by definition, neural architecture search (NAS) methods, such as Att-DARTS, automate the creation of new network architectures, and thereby generate the architectures themselves and their elements (e.g., intermediate nodes of normal cells) therein.
The remaining remarks [Remarks page 11] discussing references Wang and Han are moot because the rejection of record does not rely on these references to teach the limitation at issue.
Applicant argues [Remarks page 12] that a cited passage of Nakai does not correspond to “selecting a maximum channel output from the channel outputs of the intermediate nodes” because 3x3 max pooling is not a selected maximum channel output, and because Nakai describes that “Att-DARTS does not need max pooling”.
The examiner respectfully disagrees, and maintains the stance that 3x3 max pooling can indeed be broadly interpreted as selecting a maximum channel output when considered in light of the specification (see Claim Interpetation above). The examiner also reiterates the stance taken in the previous office action (see also Response to Arguments in office action mailed 10/20/2025) that the portion of Nakai referred to by applicant does not reach the threshold of teaching away from max pooling (and thereby selecting a maximum channel output), as it does no more than merely suggest a general preference for average pooling over max pooling [see Nakai page 7 Chosen Architecture] with respect to a particular architecture search task [see Nakai page 4 Experimental Details]. Nakai still expressly recites max pooling as being included within the architecture search space [see Nakai page 4 Architecture Search Space], and thereby still reads on the claim.
Applicant argues [Remarks pages 12-13] that the cited portion of Wang (particularly, a “multiple channel group”) does not correspond to claim limitations at issue ("performing a weighted average of the channel outputs of the intermediate nodes of the normal cell by: forming groups of the channel outputs of the intermediate nodes of the normal cell; and forming an average channel output for each of the groups of the channel outputs") because use of a multiple channel group is not a formed group of channel outputs.
The examiner respectfully disagrees, and notes that references must be considered for all that they contain and would suggest to one of ordinary skill in the art. As expressly depicted in Fig. 1 of Wang [page 5] and further explained in the rejection above, the propagated information (i.e., channel outputs) between nodes is split into groups of channels which are passed therein through operations of nodes (incl. avg_pool_3x3) in a parallel fashion. When considered as a whole, Wang can thereby be broadly interpreted to teach the limitations at issue.
Applicant has not presented further arguments with respect to the dependent claims. As such, amended claims 1-20 stand rejected under 35 U.S.C. 103.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY M BALAKRISHNAN whose telephone number is (571) 272-0455. The examiner can normally be reached 10am-5pm EST Mon-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/V.M.B./
Examiner, Art Unit 2143
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143