Last updated: April 19, 2026
Application No. 17/370,854
ADAPTIVE MAC ARRAY SCHEDULING IN A CONVOLUTIONAL NEURAL NETWORK

Non-Final OA §103§112
Filed
Jul 08, 2021
Examiner
LEY, SALLY THI
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Black Sesame International Holding Limited
OA Round
4 (Non-Final)
Interview Optional

— +28.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 33 resolved cases, 2023–2026
Examiner Intelligence

LEY, SALLY THI View full profile →
Grants only 15% of cases
Career Allow Rate
5 granted / 33 resolved
-39.8% vs TC avg
Strong +29% interview lift
Without
With
+28.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
29.2%
-10.8% vs TC avg
§103
50.2%
+10.2% vs TC avg
§102
10.8%
-29.2% vs TC avg
§112
9.8%
-30.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 33 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 17 Oct 2025 has been entered.
 
Status of Claims
	This Office Action is in response to the communication filed on 17 Oct 2025.
	Claims 1-12, 14, and 16-20 are being considered on the merits.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 31 December 2025 has been considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, initialed and dated copies of Applicant's IDS form 1499 is attached to the instant Office action

Claim Rejections - 35 USC § 112(b)
Claims 17-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 17 recites “the plurality of operation modes” in the first line of the claim. There is insufficient antecedent basis for this limitation in the claim.
Claim 18 recites “the same line” and “the same input channels” in the second line of the claim. There is insufficient antecedent basis for this limitation in the claim. 
Claim 19 recites “the other two MAC arrays” in the second line of the claim and “the odd lines” in the final line of the claim. There is insufficient antecedent basis for these limitations in the claim. 

Claim Rejections - 35 USC § 103
Claims 1-12, 14, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Mills, et. al. (US 2021/0125041 A1), in view of Sunkavalli (US 11194490 B1). 

Claim 1, Mills teaches: 
A method of processing an activation data into a plurality of tiles by using a 3D convolution computation core, wherein the method comprising: (Mills, para. 0004 and 0065: “Embodiments relate to a neural engine circuit of a neural processor circuit that perform operations of a three dimensional (3D) convolution on input data.” “Input data is typically split into smaller pieces of data for parallel processing at multiple neural engines 314 or neural engines 314 and planar engine 340. A set of data used for a convolution operation may be referred to as a convolution group, which can be split into multiple smaller units. The hierarchy of smaller units (segments) may be convolution groups, slices, tiles, work units, output channel groups, input channels (Cin), sub-Cins for input stride, etc. For example, a convolution group may be split into several slices; a slice may be split into several tiles; a tile may be split into several work units; and so forth. In the context of neural engine 314, a work unit may be a segment of the input data, such as data processed by planar engine 340 or data processed a prior cycle of neural engines 314 having a size that produces output values that fit into accumulator 414 of neural engine 314 during a single cycle of the computation core 416.”)  
cutting an output activation data in a horizontal direction to obtain an output activation data width; (Mills, para. 0004: “Each batch of accumulators receives and stores, after the processing cycle, the portion of output data for each output depth plane of multiple output depth planes and for a corresponding output channel of multiple output channels. Each output depth plane of the output data includes the portion of output data for an output channel having an output width and an output height.” Examiner notes that the broadest reasonable interpretation of “cutting…in a horizontal direction to obtain a…width” includes any width by definition of how a width is defined)
cutting the output activation data in a vertical direction to obtain an output activation data height; (Mills, para. 0004: “Each batch of accumulators receives and stores, after the processing cycle, the portion of output data for each output depth plane of multiple output depth planes and for a corresponding output channel of multiple output channels. Each output depth plane of the output data includes the portion of output data for an output channel having an output width and an output height.” Examiner notes that the broadest reasonable interpretation of “cutting…in a vertical direction to obtain a…height” includes any height by definition of how a height is defined).
processing the output activation data width and the output activation data height to calculate an input activation data width and an input activation data height to form the input activation data; (Mills, para. 0016: “The 3D convolution of input data involves 3D spatial support (e.g., width, height and depth dimensions) for each input channel and kernel data having 3D spatial support. The 3D convolution can be used in processing of volumetric data (e.g., input data having a width dimension, height dimension and depth dimension) or temporal video data (e.g., input data having a width dimension, height dimension and time dimension).”)
cutting the input activation data along a depth to create an input tile; (Mills, para. 0065: “The hierarchy of smaller units (segments) may be convolution groups, slices, tiles, work units, output channel groups, input channels (Cin), sub-Cins for input stride, etc. For example, a convolution group may be split into several slices; a slice may be split into several tiles; a tile may be split into several work units; and so forth. In the context of neural engine 314, a work unit may be a segment of the input data, such as data processed by planar engine 340 or data processed a prior cycle of neural engines 314 having a size that produces output values that fit into accumulator 414 of neural engine 314 during a single cycle of the computation core 416.” Examiner notes that the broadest reasonable interpretation of “cutting” means to divide into segments)
cutting the output activation data along tile depth to create an output tile; and (Mills, para. 0065: “The hierarchy of smaller units (segments) may be convolution groups, slices, tiles, work units, output channel groups, input channels (Cin), sub-Cins for input stride, etc. For example, a convolution group may be split into several slices; a slice may be split into several tiles; a tile may be split into several work units; and so forth. In the context of neural engine 314, a work unit may be a segment of the input data, such as data processed by planar engine 340 or data processed a prior cycle of neural engines 314 having a size that produces output values that fit into accumulator 414 of neural engine 314 during a single cycle of the computation core 416.” Examiner notes that the broadest reasonable interpretation of “cutting” means to divide into segments)
Mills does not explicitly disclose:
sending the input tile and output tile to a field programmable gate array (FPGA) to perform adaptive MAC array scheduling.
However, Sunkavalli teaches: 
sending the input tile and output tile to a field programmable gate array (FPGA) to perform adaptive MAC array scheduling. (Sunkavalli, col. 3:33-37: “To conserve memory usage, such as block memories on a field programmable gate array (FPGA) yet achieve optimum parallelization, the filter window data for multiple (e.g., 32) output filters are loaded at a time into one of the FIFO buffers 218 or 220. The SA 214 begins to consume filter values once one of the FIFO buffers (e.g., 218) is fully loaded. In parallel, the next set of filters are loaded into the second FIFO buffer (e.g., 220), and the next set of filters (e.g., in FIFO buffer 220) will be used when the filters in the current FIFO buffer (e.g., 218) are exhausted. State machine 222 controls the population and depopulation and the back-and-forth scheduling of the two filter buffers 218 and 220.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Sunkavalli into Mills. Mills teaches a neural engine circuit of a neural processor circuit that perform operations of a three dimensional (3D) convolution on input data; Sunkavalli teaches a circuit arrangement coupled to the memory circuit and configured to input a multi-dimensional data set. One of ordinary skill would have been motivated to combine the teachings of Sunkavalli into Mills in order to allow an SA to operate at a faster clock speed than the other circuitry of the circuit arrangement and not have to wait for data elements to process (Sunkavalli, col. 3:23-30). 
Claim 2, Mills as modified, teaches: 
The method according to claim 1, wherein cutting the output activation data and the input activation data is determined by size of a MAC array. (Mills, paras. 0066 and 0069: “Rasterizer 430 may perform the operations associated with dividing the input data into smaller units (segments) and regulate the processing of the smaller units through the MACs 404 and accumulator 414. Rasterizer 430 keeps track of sizes and ranks of segments of the input/output data (e.g., groups, work units, input channels, output channels) and instructs the components of a neural processor circuit 218 for proper handling of the segments of the input data. For example, rasterizer 430 operates shifters 410 in input buffer circuits 402 to forward correct segments 408 of input data to MAC 404 and send the finished output data 328 to data buffer 334.” “FIG. 4B is a block diagram of neural engine 314 with accumulator circuit 414 divided into multiple batches of accumulators, according to one embodiment. MAD circuits MAD0 through MADN of MAC 404 perform multiply-add operations of a three dimensional (3D) convolution on a work unit of input data 408 using kernel coefficients 422 to generate processed values 412 for at least a subset of output channels in each processing cycle of computation core 416.” Examiner notes that for examination purposes only, “determined by size of the MAC array” is interpreted as “determined by a size of the MAC array” which includes the number of multiply-add (MAD) circuits in a MAC array; Examiner further notes that the broadest reasonable interpretation of “activation data” means data associated with the activation functions of a neural network i.e. all data used in a neural network). 

Claim 3, Mills as modified, teaches: 
The method according to claim 1, wherein cutting the output activation data and the input activation data is determined by size of a kernel of a MAC array (Mills, para. 0070: “Input data 408 stored in input buffer circuit 402 includes multiple input depth planes, Din, each depth plane having an input width, Win, and an input height, Hin, for each input channel of multiple input channels, Cin. Hence, the spatial support of input data 408 is Win×Hin×Din×Cin. Kernel coefficients 422 from kernel extract circuit 432 can be represented as multiple kernel depth planes, Kd, each kernel depth plane having a kernel width, Kw, and a kernel height, Kh. Hence, the spatial support of kernel coefficients 422 (or kernel) is Kw×Kh×Kd. The spatial support of processed values 412 (or output values) and output data 328 is Wout×Hout×Dout×Cout, where Dout is a number of output depth planes, each output depth plane having a width of Wout and a height of Hout, and Cout is a number of output channels.” Examiner notes that for examination purposes only, “determined by a size of a kernel of the MAC array” is interpreted as “determined by a size of a kernel of the MAC array” which includes the number dimensions of a kernel in a MAC array; Examiner further notes that the broadest reasonable interpretation of “activation data” means data associated with the activation functions of a neural network i.e. all data used in a neural network). 

Claim 4, Mills as modified, teaches: 
The method according to claim 1, wherein cutting the output activation data and the input activation data is determined by size of a local memory of a MAC array. (Mills, para. 0062: “Accumulator circuit 414 is a memory circuit that includes multiple accumulators that receive and store processed values 412 from MAD circuits. In one or more embodiments, the accumulator circuit 414 includes multiple sets of accumulators, and each set of accumulators is coupled to a different MAD circuit MAD0 though MADN. In an embodiment, each set of accumulators in the accumulator circuit 414 includes the same number of accumulators, e.g., a number of accumulators in the set is equal to a number of output channels of processed values 412.” Examiner notes that for examination purposes only, “determined by a size of a local memory of the MAC array” is interpreted as “determined by a size of a local memory of the MAC array”; Examiner further notes that the broadest reasonable interpretation of “activation data” means data associated with the activation functions of a neural network i.e. all data used in a neural network).

Claim 5, Mills as modified, teaches: 
The method according to claim 1, wherein cutting the output activation data and the input activation data is determined by size of a local memory bandwidth of a MAC array. (Mills, para. 0048: “The use of neural engine 314 to compute I/O bound computations may not be efficient in terms of both speed and power consumption. In one embodiment, input data may be a tensor whose rank is larger than three (e.g., having three or more dimensions).” Examiner notes that the broadest reasonable interpretation of “bandwidth” means the download/upload speed which includes input output speed). 

Claim 6, Mills as modified, teaches: 
The method according to claim 1, wherein cutting tile output activation data and the input activation data is determined by size of a kernel stride of a MAC array. (Mills, para. 0068: “The components in neural engine 314 may be configured during a configuration period by NE control 418 and neural task manager 310. For this purpose, neural task manager 310 sends configuration information to neural engine 314 during the configuration period. The configurable parameters and modes may include, but are not limited to, mapping between input data elements and kernel elements, the number of input channels, the number of output channels, performing of output strides, and enabling/selection of post-processing operations at post-processor 428.”)

Claim 7, Mills as modified, teaches: 
The method according to claim 1, wherein height of a first tile is based on a local buffer size of a MAC array. (Mills, para. 0070: “Input data 408 stored in input buffer circuit 402 includes multiple input depth planes, Din, each depth plane having an input width, Win, and an input height, Hin, for each input channel of multiple input channels, Cin. Hence, the spatial support of input data 408 is Win×Hin×Din×Cin. Kernel coefficients 422 from kernel extract circuit 432 can be represented as multiple kernel depth planes, Kd, each kernel depth plane having a kernel width, Kw, and a kernel height, Kh. Hence, the spatial support of kernel coefficients 422 (or kernel) is Kw×Kh×Kd. The spatial support of processed values 412 (or output values) and output data 328 is Wout×Hout×Dout×Cout, where Dout is a number of output depth planes, each output depth plane having a width of Wout and a height of Hout, and Cout is a number of output channels.”)

Claim 8, Mills as modified, teaches: 
The method according to claim 1, wherein width of a first tile is based on data size of a MAC array. (Mills, para. 0090: “A work unit is a portion of the input data having a size that produces output values that fit into accumulator circuit 414 of neural engine 314 during a single cycle of the computation core 416. The shape of each work unit can be a horizontal strip. However, the shape of the work unit can be different depending on the shape and size of the tile. The work units also have overlapping parts that represent overfetched to provide support for a corresponding kernel. Especially, work units for the last tile of a slice may have a shape of a vertical strip if the tile is tall. In one or more embodiments, as discussed, the size of each work unit is 256 bytes. In such embodiments, for example, work units can be shaped to one of 16×16, 32×8, 64×4, 128×2 or 256×1 dimension.”)

Claim 9, Mills as modified, teaches: 
The method according to claim 1, wherein depth of a first tile is based on a local buffer size. (Mills, para. 0090: “In one embodiment, input data for each tile is loaded onto data buffer 318 in a read cycle and reused for operations in processing loops for the tile. In the processing loop for the tile is a processing loop for a work unit. Each tile is segmented into multiple work units. A work unit is a portion of the input data having a size that produces output values that fit into accumulator circuit 414 of neural engine 314 during a single cycle of the computation core 416. The shape of each work unit can be a horizontal strip. However, the shape of the work unit can be different depending on the shape and size of the tile. The work units also have overlapping parts that represent overfetched to provide support for a corresponding kernel. Especially, work units for the last tile of a slice may have a shape of a vertical strip if the tile is tall. In one or more embodiments, as discussed, the size of each work unit is 256 bytes. In such embodiments, for example, work units can be shaped to one of 16×16, 32×8, 64×4, 128×2 or 256×1 dimension.”)

Claim 10, Mills as modified, teaches: 
The method according to claim 1, wherein depth of a second tile is based on a local buffer size. (Mills, para. 0090: “The rightmost tile will typically have a width smaller than other tiles of the slice. In one embodiment, input data for each tile is loaded onto data buffer 318 in a read cycle and reused for operations in processing loops for the tile. In the processing loop for the tile is a processing loop for a work unit. Each tile is segmented into multiple work units. A work unit is a portion of the input data having a size that produces output values that fit into accumulator circuit 414 of neural engine 314 during a single cycle of the computation core 416. The shape of each work unit can be a horizontal strip. However, the shape of the work unit can be different depending on the shape and size of the tile. The work units also have overlapping parts that represent overfetched to provide support for a corresponding kernel. Especially, work units for the last tile of a slice may have a shape of a vertical strip if the tile is tall. In one or more embodiments, as discussed, the size of each work unit is 256 bytes. In such embodiments, for example, work units can be shaped to one of 16×16, 32×8, 64×4, 128×2 or 256×1 dimension.” Examiner notes that Mills teaches multiple times such that there is at least a first and a second tile). 

Claim 11, Mills as modified, teaches: 
A method of determining an output activation value in a convolutional neural network of a chip, the method comprising: (Mills, para. 0021: “FIG. 2 is a block diagram illustrating components in device 100, according to one embodiment. Device 100 may perform various operations including implementing one or more machine learning models. For this and other purposes, device 100 may include, among other components, image sensors 202, a system-on-a chip (SOC) component 204, a system memory 230, a persistent storage (e.g., flash memory) 228, a motion sensor 234, and a display 216. The components as illustrated in FIG. 2 are merely illustrative. For example, device 100 may include other components (such as speaker or microphone) that are not illustrated in FIG. 2. Further, some components (such as motion sensor 234) may be omitted from device 100.” Examiner notes that the term “BST Chip” is not defined in the specification or otherwise well-known in the art. For examination purposes only, such term is interpreted to mean a processing chip as taught by Mills). 
summing a kernel height within a three-dimensional multiply accumulate (MAC) layer; (Mills, para. 0040 0070: “A neural network may include an input layer, an output layer, and one or more intermediate layers that may be referred to as hidden layers. Each layer may include one or more nodes, which may be fully or partially connected to other nodes in adjacent layers. In forward propagation, the neural network performs computation in the forward direction based on outputs of a preceding layer. The operation of a node may be defined by one or more functions. The functions that define the operation of a node may include various computation operation such as convolution of data with one or more kernels, pooling of layers, tensor multiplication, etc. The functions may also include an activation function that adjusts the weight of the output of the node. Nodes in different layers may be associated with different functions. For example, a CNN may include one or more convolutional layers that are mixed with pooling layers and are followed by one or more fully connected layers.” “Input data 408 stored in input buffer circuit 402 includes multiple input depth planes, Din, each depth plane having an input width, Win, and an input height, Hin, for each input channel of multiple input channels, Cin. Hence, the spatial support of input data 408 is Win×Hin×Din×Cin. Kernel coefficients 422 from kernel extract circuit 432 can be represented as multiple kernel depth planes, Kd, each kernel depth plane having a kernel width, Kw, and a kernel height, Kh. Hence, the spatial support of kernel coefficients 422 (or kernel) is Kw×Kh×Kd. The spatial support of processed values 412 (or output values) and output data 328 is Wout×Hout×Dout×Cout, where Dout is a number of output depth planes, each output depth plane having a width of Wout and a height of Hout, and Cout is a number of output channels.” )
summing a kernel width within the summation of the kernel height; (Mills, para. 0060 and 0071: “FIG. 4B is a block diagram of neural engine 314 with accumulator circuit 414 divided into multiple batches of accumulators, according to one embodiment. MAD circuits MAD0 through MADN of MAC 404 perform multiply-add operations of a three dimensional (3D) convolution on a work unit of input data 408 using kernel coefficients 422 to generate processed values 412 for at least a subset of output channels in each processing cycle of computation core 416. The 3D convolution can be used in processing of volumetric data (e.g., input data 408 having a width dimension, height dimension and depth dimension) or temporal video data (e.g., input data 408 having a width dimension, height dimension and time dimension). Compared to the two dimensional (2D) convolution, operations of the 3D convolution involve an additional dimension referred to herein as a depth dimension.” “Accumulated processed values 412 are fed back into MAD circuits MAD0 through MADN as feedback information 419 for multiply-add operations during a next processing cycle of computation core 416 as part of the 3D convolution.” Examiner notes that that Mills teaches multiply-add operations which are processed in the next cycle such that one summation occurs and is integrated (i.e. within) the summation of the next where the width calculation is integrated into the height summations). 
summing an activation data map depth within the summation of the kernel width; and (Mills, para. 0060 and 0071: “FIG. 4B is a block diagram of neural engine 314 with accumulator circuit 414 divided into multiple batches of accumulators, according to one embodiment. MAD circuits MAD0 through MADN of MAC 404 perform multiply-add operations of a three dimensional (3D) convolution on a work unit of input data 408 using kernel coefficients 422 to generate processed values 412 for at least a subset of output channels in each processing cycle of computation core 416. The 3D convolution can be used in processing of volumetric data (e.g., input data 408 having a width dimension, height dimension and depth dimension) or temporal video data (e.g., input data 408 having a width dimension, height dimension and time dimension). Compared to the two dimensional (2D) convolution, operations of the 3D convolution involve an additional dimension referred to herein as a depth dimension.” “Accumulated processed values 412 are fed back into MAD circuits MAD0 through MADN as feedback information 419 for multiply-add operations during a next processing cycle of computation core 416 as part of the 3D convolution.” Examiner notes that that Mills teaches multiply-add operations on depth which are processed in the next cycle such that one summation occurs and is integrated (i.e. within) the summation of the next where the depth calculation is integrated into the height summations.). 
outputting a batch within the summation of the activation data map depth, wherein the output activation value is based on processing of a plurality of loops within the batch. (Mills, para. 0040 an 0089: “The operation of a node may be defined by one or more functions. The functions that define the operation of a node may include various computation operation such as convolution of data with one or more kernels, pooling of layers, tensor multiplication, etc. The functions may also include an activation function that adjusts the weight of the output of the node. Nodes in different layers may be associated with different functions. For example, a CNN may include one or more convolutional layers that are mixed with pooling layers and are followed by one or more fully connected layers” “FIG. 6 is a conceptual diagram illustrating loops for processing the input data at neural processor circuit 218, according to one embodiment. The outermost loop represents processing for a convolution group, if group convolution involving multiple convolution group is used. Group convolutions are convolutions where input data of the input channels in each group are used only for generating output data of output channels of each group but are not used for generating output data for output channels of other groups. Hence, each group of the group convolution can be treated as a separate convolution operation”)

Claim 12, Mills as modified, teaches: 
A method of providing the convoluted output in accordance with claim 11, wherein the plurality of loops are eight in count: (Mills, para. 0074 an 0090: “In one or more embodiments, accumulator circuit 414 includes N+1 sets of accumulators, each set of accumulators coupled to a corresponding MAD circuit MAD0 through MADN and includes Cout=8 accumulators for receiving and storing (e.g., over multiple processing cycles) eight output channels of processed values 412.” “In the loop for each convolution group is a processing loop for a slice of the input data. The entire input data for a convolution operation (e.g., 3D convolution operation) is segmented into multiple strips of slices in an overlapping manner. Overlapping portions are parts of the input data that are overfetched in two adjacent slices to provide spatial support for a corresponding kernel. The second outermost loop performs convolution operation (e.g., 3D convolution operation) for each slice in the input data. Within the loop for a slice is a processing loop for a tile of the slice.”)


Claim 14, Mills as modified, teaches: 
A method in accordance with claim 11, wherein the multiply accumulate (MAC) layer is an adaptive multiply accumulate (MAC) layer. (Sunkavalli, col. 7:40-49: “FIG. 8 shows a programmable integrated circuit (IC) 600 on which the disclosed circuits and processes may be implemented. The programmable IC may also be referred to as a System On Chip (SOC) that includes field programmable gate array logic (FPGA) along with other programmable resources. FPGA logic may include several different types of programmable logic blocks in the array. For example, FIG. 8 illustrates programmable IC 600 that includes a large number of different programmable tiles including multi-gigabit transceivers (MGTs) 601”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Sunkavalli into Mills, as set forth above with respect to claim 1.

Claim 16, Mills as modified, teaches: 
The method in accordance with claim 1, further comprising a plurality of operation modes. (Miller, para. 0049: “The circuitry of planar engine 340 may be programmed for operation in one of multiple modes, including a pooling mode, an elementwise mode, and a reduction mode. In the pooling mode, planar engine 340 reduce a spatial size of input data”)

Claim 18, Mills as modified teaches: 
The method in accordance with claim 1, wherein a normal mode comprises a plurality of MAC arrays processing the same line and the same input channels (Mills, para. 0054: “Flow control circuit 332 may perform one or more of the following operations: (i) monitor the size and rank of data (e.g. data may be one or more tensors) that are being processed by neural engines 314 and planar engine 340, (ii) determine which subsets of data are transmitted to neural engines 314 or to planar engine 340 based on the task commands associated with different subsets of data, (iii) determine the manner in which data is transmitted to neural engines 314 and planar engine 340 (e.g., the data processor circuit 318 may operate in a broadcast mode where the same data is fed to multiple input channels of neural engines 314 so that multiple or all neural engines 314 receive the same data or in a unicast mode where different neural engines 314 receives different data), and (iv) transmit a configuration command to the planar engine 340 to direct planar engine 340 to program itself for operating in one of multiple operation modes.” Examiner notes that applicant does not define “line” or “mode" in any particular way such that the claimed mode is taught by Mills’s use of the same subset (i.e. line) of data fed into input channels of multiple neural engines)  
Mills does not explicitly disclose but Sunkavalli teaches: 
and different MAC arrays processing different output channels (Sunkavalli, col. 4:38-45: “Filter 0 is shifted into the first row of MACs, filter 1 is shifted into the second row of MACs, . . . , and filter R−1 is shifted into the Rth row of MACs. The data elements are shifted through each column from row-to-row and are reused for a different output-channel filter in each row. In this manner, each iteration through the systolic array produces C pixel results for R output channels.”). 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Sunkavalli into Mills, as modified, as set forth above with respect to claim 1.

Claims 17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mills, et. al. (US 2021/0125041 A1), in view of Sunkavalli (US 11194490 B1), and further in view of Song (US 2022/0197596 A1). 

Claim 17, Mills as modified, teaches: 
The method in accordance with claim 1, wherein the plurality of operation modes includes a normal mode (Sunkavalli, col. 3:4-7: “FIG. 3 shows a circuit arrangement 200 in which a multi-dimensional data set is formatted for parallel input to a systolic array (SA) for processing. In an exemplary application, the circuit arrangement performs convolution of an input data set with filter data. “ Examiner notes that “normal mode” is not defined or claimed in any specific manner such that any mode could be considered the “normal mode”) 
Mills does not explicitly disclose but Song teaches:
and at least one of: a 2-line mode, a 4-line mode, a 2x2 spatial mode, and a 4x1 spatial mode. (Song, para. 0095: “As illustrated in FIG. 5, the MAC arithmetic operation performed by the PIM system 1-1 may be executed though a matrix calculation. Specifically, the PIM device 100 may execute a matrix multiplying calculation of an ‘M×N’ weight matrix (e.g., ‘8×8’ weight matrix) and a ‘N×1’ vector matrix (e.g., ‘8×1’ vector matrix) according to control of the PIM controller 200 (where, ‘M’ and ‘N’ are natural numbers).”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Song into Mills, as modified. Mills teaches a neural engine circuit of a neural processor circuit that perform operations of a three dimensional (3D) convolution on input data; Song teaches processing-in-memory (PIM) systems including a PIM device and a controller and methods of operating the PIM systems. One of ordinary skill would have been motivated to combine the teachings of Song into Mills as modified in order to facilitate the communication between memory and processor.  

Claim 19, Mills as modified teaches claim 17 above. Song further teaches: 
comprises two MAC arrays processing even lines and different output channels (Sunkavalli, col. 3:46-51: “The SA 214 is composed of an array of multiply-and-accumulate (MAC) circuits. In an exemplary implementation, the height dimension is fixed (e.g., 32 rows) and corresponds to the channel dimension of an output image. In other words, each row of MAC circuits is computing W pixels of an output channel.” Sunkavalli teaches a systolic array i.e. multiple mac arrays processing. ) and the other two MAC arrays processing odd lines. (Sunkavalli, col.4:22-30: “ For example, for a first cycle of processing by the SA, the selector circuit 310 selects the serialization buffers from the setup circuit 306 (even channel data elements), and in the next cycle of the SA, the selector circuit selects the serialization buffers from the set up circuit 308 (odd channel data elements). The setup circuits 306 and 308 can run at one-half the clock frequency of the SA, which allows selection of the odd-channel and even-channel serialization buffers on alternating clock cycles of the SA.” Sunkavalli teaches process of even and odd lines separately). 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Sunkavalli into Mills as set forth above with respect to claim 1. 
Sunkavalli does not explicitly disclose but Song teaches: 
The method in accordance with claim 17, wherein the 2x2 spatial mode (Song, para. 0095: “As illustrated in FIG. 5, the MAC arithmetic operation performed by the PIM system 1-1 may be executed though a matrix calculation. Specifically, the PIM device 100 may execute a matrix multiplying calculation of an ‘M×N’ weight matrix (e.g., ‘8×8’ weight matrix) and a ‘N×1’ vector matrix (e.g., ‘8×1’ vector matrix) according to control of the PIM controller 200 (where, ‘M’ and ‘N’ are natural numbers).”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Song into Mills, as modified, as set forth above with respect to claim 17.

Claim 20, Mills as modified teaches: 
The method in accordance with claim 17, wherein the 4x1 spatial mode comprises each MAC array processing different lines and same output channels. (Song, para. 0095: “As illustrated in FIG. 5, the MAC arithmetic operation performed by the PIM system 1-1 may be executed though a matrix calculation. Specifically, the PIM device 100 may execute a matrix multiplying calculation of an ‘M×N’ weight matrix (e.g., ‘8×8’ weight matrix) and a ‘N×1’ vector matrix (e.g., ‘8×1’ vector matrix) according to control of the PIM controller 200 (where, ‘M’ and ‘N’ are natural numbers).”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Song into Mills, as modified. Mills teaches a neural engine circuit of a neural processor circuit that perform operations of a three dimensional (3D) convolution on input data; Song teaches processing-in-memory (PIM) systems including a PIM device and a controller and methods of operating the PIM systems. One of ordinary skill would have been motivated to combine the teachings of Song into Mills as modified in order to facilitate the communication between memory and processor.  

Response to Applicant Remarks/Argument
35 USC § 112
	Previous rejections made under 35 USC § 112(a) have been reviewed and withdrawn in light of applicant’s amendments and remarks. New rejections under 35 USC § 112(b) have additionally been made as a result of applicant’s amendments and newly added claims.
35 USC § 101
	The previously asserted 101 rejections have been withdrawn in light of applicant’s amendments. 
35 USC § 103 – Independent Claim 1
	On page 7 of applicant’s remarks, applicant argues that Mills does not teach “processing the output activation data width and the output activation data height” However, examiner referenced Mills at paragraph 0065 which includes, “Input data is typically split into smaller pieces of data… The hierarchy of smaller units (segments) may be convolution groups, slices, tiles, work units…In the context of neural engine 314, a work unit may be a segment of the input data, such as data processed by planar engine 340 or data processed a prior cycle of neural engines 314 having a size that produces output values that fit into accumulator 414”. Applicant does not define activation data in particular. Therefore activation data is interpreted as data that is formatted such that it can be used by a neural network. Applicant further argues that Mills does not teach “cutting along a depth”. However, the office action cites to Mills, paragraphs 0004 and 0065 which teaches both a 3D convolution, therefore teaching depth as well as cutting into segments, including specifically tiles. 

“Embodiments relate to a neural engine circuit of a neural processor circuit that perform operations of a three dimensional (3D) convolution on input data.” 

“Input data is typically split into smaller pieces of data for parallel processing at multiple neural engines 314 or neural engines 314 and planar engine 340. A set of data used for a convolution operation may be referred to as a convolution group, which can be split into multiple smaller units. The hierarchy of smaller units (segments) may be convolution groups, slices, tiles, work units, output channel groups, input channels (Cin), sub-Cins for input stride, etc. For example, a convolution group may be split into several slices; a slice may be split into several tiles; a tile may be split into several work units; and so forth. In the context of neural engine 314, a work unit may be a segment of the input data, such as data processed by planar engine 340 or data processed a prior cycle of neural engines 314 having a size that produces output values that fit into accumulator 414 of neural engine 314 during a single cycle of the computation core 416.”

Where the object necessarily is a 3-D object with a depth, splitting such an object into slices and then tiles, and then work units necessarily teaches cutting along a depth, either as a result of slicing the object, the subsequent splitting the object into several tiles, or the subsequent splitting of the tile into work units.  
35 USC § 103 – Dependent Claim 18
	At the bottom of page 7 of applicant’s remarks, applicant argues that Mills does not teach the claim, as newly amended. However, claim 18 is rejected under 35 USC § 103 over Mills in view of Sunkavalli, as set forth above.  	
35 USC § 103 – Dependent Claims 19-20
	Applicant’s newly added claims 19-20 are rejected under 35 USC § 103 as set forth above.  
35 USC § 103 – Dependent Claims
	Applicant makes no independent argument for the allowability of remaining dependent claims therefore such claims remain rejected for reasons set forth in the rejection above. 

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sally T. Ley whose telephone number is (571)272-3406. The examiner can normally be reached Monday - Thursday, 10:00am - 6:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/STL/Examiner, Art Unit 2147                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2151
Read full office action
Prosecution Timeline

Jul 08, 2021
Application Filed
Sep 21, 2024
Non-Final Rejection — §103, §112
Dec 18, 2024
Applicant Interview (Telephonic)
Dec 26, 2024
Examiner Interview Summary
Dec 26, 2024
Response Filed
Jan 27, 2025
Final Rejection — §103, §112
May 01, 2025
Applicant Interview (Telephonic)
May 05, 2025
Examiner Interview Summary
May 07, 2025
Final Rejection — §103, §112
Sep 15, 2025
Interview Requested
Oct 17, 2025
Request for Continued Examination
Oct 24, 2025
Response after Non-Final Action
Jan 02, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/733,393
Patent 12443830
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS
2y 5m to grant Granted Oct 14, 2025
16/835,892
Patent 12135927
EXPERT-IN-THE-LOOP AI FOR MATERIALS DISCOVERY
2y 5m to grant Granted Nov 05, 2024
17/992,958
Patent 11880776
GRAPH NEURAL NETWORK (GNN)-BASED PREDICTION SYSTEM FOR TOTAL ORGANIC CARBON (TOC) IN SHALE
2y 5m to grant Granted Jan 23, 2024
Study what changed to get past this examiner. Based on 3 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

4-5
Expected OA Rounds
15%
Grant Probability
44%
With Interview (+28.8%)
3y 10m
Median Time to Grant
High
PTA Risk
Based on 33 resolved cases by this examiner. Grant probability derived from career allow rate.