Office Action Analysis: 18184651 — DEEP LEARNING HARDWARE

Examiner Intelligence

STANDKE, ADAM C View full profile →
Grants 49% of resolved cases
Career Allowance Rate
64 granted / 130 resolved
-5.8% vs TC avg
Strong +26% interview lift
Without
With
+25.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
13 currently pending
Career history
166
Total Applications
across all art units
Statute-Specific Performance

§101
3.5%
-36.5% vs TC avg
§103
85.7%
+45.7% vs TC avg
§102
3.8%
-36.2% vs TC avg
§112
4.2%
-35.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 130 resolved cases
Office Action

§103
DETAILED ACTION
Response to Arguments
With respect to Applicant’s arguments with respect to the previous prior art not teaching the claim limitation of a pre-multiplication addition operation, Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged by Applicant’s arguments with respect to the above claim limitation in the Remarks submitted on 02/23/2026.
In regards to the newly added claim limitation of a memory being accessed by the corresponding processing unit through one or more direct memory access (DMA) channels, the prior art of Falcon teaches this claim limitation and directs Applicant to the Current Office Action for the detailed teaching. 
Notice of Pre-AIA  or AIA  Status
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
	The information disclosure statement (IDS) submitted on 03/04/2026 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 103
	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
	Claims 1-14 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Yu-Hsin, et al. "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks." IEEE journal of solid-state circuits 52.1 (2016)(“Chen”) in view of Falcon et al. US 2016/0026912 Al(“Falcon”) and in view of DiCecco et al., Caffeinated FPGAs: FPGA framework for convolutional neural networks. In 2016 International Conference on Field-Programmable Technology (FPT) 2016 Dec 7 (“DiCecco”).
	Chen teaches an integrated circuit (IC) chip comprising: 
	a plurality of processing units to collectively perform a matrix multiplication operation with matrix data by performing matrix processing at least partially in parallel, each processing unit of the plurality of processing units to execute an instruction to process a portion of the matrix data to perform a corresponding partial matrix operation(Chen, pgs. 2-4, see also fig. 2 and 4, “Given the shape parameters in Table I, the computation
of a layer is defined as                 
                    O
                    
                            z
                        
                            u
                        
                            x
                        
                            y
                        
                    =
                    R
                    e
                    L
                    U
                    (
                    B
                    
                            u
                        
                    +
                    
                            ∑
                            
                                k
                                =
                                0
                            
                                C
                                -
                                1
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                        R
                                        -
                                        1
                                    
                                            ∑
                                            
                                                j
                                                =
                                                0
                                            
                                                S
                                                -
                                                1
                                            
                                            I
                                            
                                                    z
                                                
                                            [
                                            k
                                            ]
                                        
                                    [
                                    U
                                    x
                                    +
                                    i
                                    ]
                                
                                    U
                                    y
                                    +
                                    j
                                
                            ×
                            W
                            
                                    u
                                
                                    k
                                
                                    i
                                
                            [
                            j
                            ]
                        
                    )
                
            …where O, I, W, and B are the matrices of the ofmaps, ifmaps, filters, and biases, respectively[to collectively perform a matrix multiplication operation with matrix data]… [f]ig. 2 shows the top-level architecture and memory hierarchy of the Eyeriss system… [t]he core clock domain consists of a spatial array of 168 PEs organized as a 12 × 14 rectangle[a plurality of processing units]… each PE can either communicate with its neighbor PEs… A 2-D convolution is composed of many 1-D convolution primitives, and its computation:1) shares the same row of filter or ifmap across primitives and 2) accumulates the psums from multiple primitives together. Therefore, a PE Set, as shown in Fig. 4, is grouped to run a 2-D convolution… [i]n a set, each row of filter is reused horizontally, each row of ifmap is reused diagonally, and rows of psum are accumulated vertically[by performing matrix processing at least partially in parallel, each processing unit of the plurality of processing units to execute an instruction to process a portion of the matrix data to perform a corresponding partial matrix operation].”); 
	a plurality of memories, each memory to store the portion of the matrix data to be processed by a corresponding processing unit of the plurality of processing units(Chen, pgs. 6-7, see also figs. 2 and 13, “The Eyeriss accelerator has a GLB of 108 kB that can communicate with DRAM[a plurality of memories] through the asynchronous interface and with the PE array through the NoC. The GLB stores all the three types of data: ifmaps, filters, and psums/ofmaps[each memory to store the portion of the matrix data to be processed by a corresponding processing unit of the plurality of processing units].”);  
	a plurality of interconnects, a subset of the plurality of interconnects to couple each processing unit of the plurality of processing units to a plurality of neighboring processing units, at least one of the processing units to send partial matrix data to a first neighboring processing unit and to receive partial matrix data from a second neighboring processing unit over corresponding interconnects of the plurality of interconnects(Chen, pgs. 7-8, see also figs. 4, 5, 10, and 11, “The NoC manages data delivery between the GLB and the PE array as well as between different PEs[a plurality of interconnects, a subset of the plurality of interconnects to couple each processing unit of the plurality of processing units to a plurality of neighboring processing units]. The NoC architecture needs to meet the following goals. First, the NoC has to support the data delivery patterns used in the RS dataflow. While the data movement within a PE set is uniform (Fig. 4)[ at least one of the processing units to send partial matrix data to a first neighboring processing unit and to receive partial matrix data from a second neighboring processing unit over corresponding interconnects of the plurality of interconnects], there are three scenarios in the mapping of real CNNs that can break the uniformity and should be taken care of: 1) different convolution strides (U) result in the ifmap delivery, skipping certain rows in the array (AlexNet CONV1 in Fig. 5); 2) a set is divided into segments that are mapped onto different parts of the PE array (AlexNet CONV2 in Fig. 5); and 3) multiple sets are mapped onto the array simultaneously and different data is required for each set (AlexNet CONV4 and CONV5 in Fig. 5).”);
	 a first controller, wherein responsive to the first controller, the plurality of processing units are to collectively execute the matrix multiplication operation in accordance with at least one matrix multiplication command or instruction specifying a first input matrix, A, and a second input matrix, B, the plurality of processing units to produce an output matrix, C, by multiplying the first input matrix, A, and the second input matrix, B(Chen, pgs. 2-4, see also fig. 2 and 4, “Given the shape parameters in Table I, the computation of a layer is defined as                 
                    O
                    
                            z
                        
                            u
                        
                            x
                        
                            y
                        
                    =
                    R
                    e
                    L
                    U
                    (
                    B
                    
                            u
                        
                    +
                    
                            ∑
                            
                                k
                                =
                                0
                            
                                C
                                -
                                1
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                        R
                                        -
                                        1
                                    
                                            ∑
                                            
                                                j
                                                =
                                                0
                                            
                                                S
                                                -
                                                1
                                            
                                            I
                                            
                                                    z
                                                
                                            [
                                            k
                                            ]
                                        
                                    [
                                    U
                                    x
                                    +
                                    i
                                    ]
                                
                                    U
                                    y
                                    +
                                    j
                                
                            ×
                            W
                            
                                    u
                                
                                    k
                                
                                    i
                                
                            [
                            j
                            ]
                        
                    )
                
            …where O[the plurality of processing units to produce an output matrix, C, by multiplying the first input matrix, A, and the second input matrix, B], I[specifying a first input matrix, A,], W[and a second input matrix, B,], and B are the matrices of the ofmaps, ifmaps, filters, and biases, respectively…[t]he accelerator has two levels of control hierarchy. The top-level control coordinates: 1) traffic between the off-chip DRAM and the GLB through the asynchronous interface; 2) traffic between the GLB and the PE array through the NoC; and 3) operation of the RLC CODEC and ReLU module[a first controller]…[t]he accelerator runs the processing of a CNN layer-by-layer. For each layer, it first loads the configuration bits into a 1794 b scan chain serially to reconfigure the entire accelerator, which takes less than 100 μs. These bits configure the accelerator for the processing of filters and fmaps in a certain shape, which includes setting up the PE array computation mappings[wherein responsive to the first controller, the plurality of processing units are to collectively execute the matrix multiplication operation in accordance with at least one matrix multiplication command or instruction]…and NoC data delivery patterns…[t]hey are generated offline and are statically accessed at runtime.”); 
	and a plurality of second controllers, each second controller associated with a processing unit of the plurality of processing units, the second controller to retrieve the portion of the matrix data to be processed by a corresponding processing unit of the plurality of processing units from a system memory and to store the portion of the matrix data to a corresponding memory of the plurality of memories(Chen, pgs. 7-8, see also figs. 10,11 and 12, “[W]e implemented the GIN, as shown in Fig. 10, with two levels of hierarchy: Y-bus and X-bus. A vertical Y-bus consists of 12 horizontal X-buses, one at each row of the PE array, and each X-bus connects to 14 PEs in the row. Each X-bus has a row ID, and each PE
has a col ID…[e]ach data read from the GLB is augmented with a (row, col) tag by the top-level controller…[t]he tag-ID matching is done using the Multicast Controller (MC). There are 12 MCs on the Y-bus to compare the row tag with the row ID of each X-bus, and 14 MCs on each of the X-buses[and a plurality of second controllers, each second controller associated with a processing unit of the plurality of processing units,] to compare the col tag with the col ID of each PE… Eyeriss has separate GINs for each of the three data types (filter, ifmap, and psum) to provide sufficient bandwidth from the GLB to the PE array[the second controller to retrieve the portion of the matrix data to be processed by a corresponding processing unit of the plurality of processing units from a system memory and to store the portion of the matrix data to a corresponding memory of the plurality of memories]. All GINs have 4-b row IDs to address the 12 rows. The filter and psum GINs use 4-b col IDs to address the 14 columns, while ifmap GIN uses 5 b to support maximum 32 ifmap rows passing in diagonal. The filter and psum GINs have data bus width of 64 b (4b×16 b), while the ifmap GIN has the data bus width of 16 b.”).
	While Chen teaches the output matrix C, Chen does not teach: a plurality of pre-multiplication arithmetic engines; and matrix-wide operation circuitry to perform a matrix-wide operation the matrix-wide operation comprising a max value operation, a min value operation, a sum operation, or a max absolute value operation; a memory being accessed by the corresponding processing unit through one or more direct memory access (DMA) channels
	However, Falcon teaches: 
	 a plurality of pre-multiplication arithmetic engines, [each pre-multiplication arithmetic engine to perform a pre-multiplication addition operation with a corresponding portion of the matrix data to generate a new corresponding portion of the matrix data to be used for the matrix multiplication operation](Falcon, paras. 0087-0094, see also figs. 11 and 12, “When calculation circuits 1118 work collaboratively, they may achieve a convolution layer, or a pooling layer, or a fully-connected layer of a CNN system… FIG. 12 illustrates an example embodiment of a calculation circuit 1200 that may be used to implement fully or in part calculation circuit 1118[a plurality of pre-multiplication arithmetic engines]....”);1
	and matrix-wide operation circuitry to perform a matrix-wide operation [on the output matrix, C,] the matrix-wide operation comprising a max value operation, a min value operation, a sum operation, or a max absolute value operation(Falcon, para. 0080, see also fig. 9,  “Pooling layer 904 may perform subsampling to reduce images 910 to a stack of
reduced images 914. Subsampling operations may be achieved through…maximum value computation[and matrix-wide operation circuitry to perform a matrix-wide operation, the matrix-wide operation comprising a max value operation].”).2, 3 
	a memory being accessed by the corresponding processing unit through one or more direct memory access (DMA) channels(Falcon, para. 0047, see also fig. 1 “Processing core 159 may be coupled with bus 141 for communicating with various other system devices, which may include... Synchronous Dynamic Random Access Memory (SDRAM) control 146, Static Random Access Memory (SRAM) control 147[a memory being accessed by the corresponding processing unit]... Direct Memory Access (OMA) controller[through one or more direct memory access (DMA) channels]....”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen with the teachings of Falcon the motivation to do so would be to incorporate reconfigurable logic for shifting and/or scaling cnn kernels to make certain computations such as pooling work(Falcon, para. 0082, “Moreover, embodiments of the present disclosure may include weight-shifting mechanisms for such circuits…such weight-shifting mechanisms may be used to shift low-precision weights up and, after results are determined, scale the results back to original precision. The reconfigurable aspects of the calculation circuits may include the precision of the computation and/or the manner of the computation…[this] include[s] modular, reconfigurable, and variable-precision calculation circuits to perform different layers of CNN.”). 
	While Chen in view of Falcon teach a plurality of pre-multiplication arithmetic engines Chen in view of Falcon do not teach: each pre-multiplication arithmetic engine to perform a pre-multiplication addition operation with a corresponding portion of the matrix data to generate a new corresponding portion of the matrix data to be used for the matrix multiplication operation. 
	However, DiCecco teaches: 
	each pre-multiplication arithmetic engine to perform a pre-multiplication addition operation with a corresponding portion of the matrix data to generate a new corresponding portion of the matrix data to be used for the matrix multiplication operation(DiCecco, pg., 2, see also fig.2, “Winograd convolution exploits the Winograd minimal filtering algorithm to implement convolution using less floating point operations...[t]he Winograd input transformation requires a total of eight instances of the partial transform (PT) in Equation 1 per 4                
                    ×
                
            4 tile, with four applied to the columns and four applied to the rows... as shown in Fig. 2. This results in 32 floating point additions...[see equation 1]... [f]ollowing the input stage, each 4                
                    ×
                
            2 tile and its neighbor are fed into a pipelined processing element. The processing element completes the remaining set of PTs, performs the dot product between the input and the weights, computes
the output transform, and accumulates the result within a buffer. This process is repeated with different weights per output feature map until all of the output feature maps have been computed.”)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen in view of Falcon with the teachings of DiCecco the motivation to do so would be to reduce the number of matrix multiplications during a compute cycle when implementing the convolutional operation(DiCecco, pgs., 1, “CNNs are very computationally intensive with most of the
computation in the convolution layers. This large computational complexity motivates efforts to reduce the number of required operations. To reduce the number of operations, the Winograd minimal filtering algorithm can be used to take advantage of the overlapping computations between adjacent convolution windows.”). 
	Regarding claim 2, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, wherein the max value operation comprises a max pooling operation(Falcon, para. 0080, see also fig. 9,  “Pooling layer 904 may perform subsampling to reduce images 910 to a stack of
reduced images 914. Subsampling operations may be achieved through…maximum value computation.”).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen with the above teachings of Falcon for the same rationale stated at Claim 1.
	Regarding claim 3, Chen in view of Falcon and DiCecco teaches the IC chip of claim 2, wherein the matrix-wide operation is to process data among elements of the output matrix, C(Chen, pgs. 2-4, see also figs.1, 2 and 4, “Given the shape parameters in Table I, the computation of a layer is defined as                 
                    O
                    
                            z
                        
                            u
                        
                            x
                        
                            y
                        
                    =
                    R
                    e
                    L
                    U
                    (
                    B
                    
                            u
                        
                    +
                    
                            ∑
                            
                                k
                                =
                                0
                            
                                C
                                -
                                1
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                        R
                                        -
                                        1
                                    
                                            ∑
                                            
                                                j
                                                =
                                                0
                                            
                                                S
                                                -
                                                1
                                            
                                            I
                                            
                                                    z
                                                
                                            [
                                            k
                                            ]
                                        
                                    [
                                    U
                                    x
                                    +
                                    i
                                    ]
                                
                                    U
                                    y
                                    +
                                    j
                                
                            ×
                            W
                            
                                    u
                                
                                    k
                                
                                    i
                                
                            [
                            j
                            ]
                        
                    )
                
            …where O, I, W, and B are the matrices of the ofmaps, ifmaps, filters, and biases, respectively.”).  
	Regarding claim 4, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, wherein the matrix-wide operation is to process data among elements of the output matrix, C(Chen, pgs. 2-4, see also figs.1, 2 and 4, “Given the shape parameters in Table I, the computation of a layer is defined as                 
                    O
                    
                            z
                        
                            u
                        
                            x
                        
                            y
                        
                    =
                    R
                    e
                    L
                    U
                    (
                    B
                    
                            u
                        
                    +
                    
                            ∑
                            
                                k
                                =
                                0
                            
                                C
                                -
                                1
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                        R
                                        -
                                        1
                                    
                                            ∑
                                            
                                                j
                                                =
                                                0
                                            
                                                S
                                                -
                                                1
                                            
                                            I
                                            
                                                    z
                                                
                                            [
                                            k
                                            ]
                                        
                                    [
                                    U
                                    x
                                    +
                                    i
                                    ]
                                
                                    U
                                    y
                                    +
                                    j
                                
                            ×
                            W
                            
                                    u
                                
                                    k
                                
                                    i
                                
                            [
                            j
                            ]
                        
                    )
                
            …where O, I, W, and B are the matrices of the ofmaps, ifmaps, filters, and biases, respectively.”).  
	Regarding claim 5, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, wherein each pre-multiplication arithmetic engine of the plurality of pre-multiplication arithmetic engines is associated with one of the processing units of the plurality of processing units(Falcon, paras. 0087-0094, see also figs. 11 and 12, “When calculation circuits 1118 work collaboratively, they may achieve a convolution layer, or a pooling layer, or a fully-connected layer of a CNN system… FIG. 12 illustrates an example embodiment of a calculation circuit 1200 that may be used to implement fully or in part calculation circuit 1118… calculation circuit 1200 may include a 16-bit arithmetic left shifter 1240 to scale up inputs for computations of calculation circuit 1200…calculation circuit 1200 may include a right shifter and truncate logic 1232 to scale down resulting calculations of calculation circuit 1200.”).4 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen with the above teachings of Falcon for the same rationale stated at Claim 1.
	Regarding claim 6, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, wherein the first controller comprises a microprocessor(Chen, pg., 9, As fig. 14 details below: 

    PNG
    media_image1.png
    383
    594
    media_image1.png
    Greyscale

	The XilinxVC707 [a microprocessor] interfaces with the Eyeriss accelerator chip and acts as the first controller).  
	Regarding claim 7, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, wherein the processing units comprise processing clusters(Chen, pgs. 2-4, see also fig. 2 and 4, “Fig. 2 shows the top-level architecture and memory hierarchy of the Eyeriss system… [t]he core clock domain consists of a spatial array of 168 PEs organized as a 12 × 14 rectangle… each PE can either communicate with its neighbor PEs or the GLB through an NoC, or access a memory space that is local to the PE called spads.”).  
	Regarding claim 8, Chen in view of Falcon and DiCecco teaches the IC chip of claim 7, further comprising: a plurality of local controllers, each local controller to control matrix multiplication operations within a corresponding processing cluster(Chen, pgs. 7-8, see also figs. 10,11 and 12, “[W]e implemented the GIN, as shown in Fig. 10, with two levels of hierarchy: Y-bus and X-bus. A vertical Y-bus consists of 12 horizontal X-buses, one at each row of the PE array, and each X-bus connects to 14 PEs in the row. Each X-bus has a row ID, and each PE has a col ID…[e]ach data read from the GLB is augmented with a (row, col) tag by the top-level controller…[t]he tag-ID matching is done using the Multicast Controller (MC). There are 12 MCs on the Y-bus to compare the row tag with the row ID of each X-bus, and 14 MCs on each of the X-buses to compare the col tag with the col ID of each PE… Eyeriss has separate GINs for each of the three data types (filter, ifmap, and psum) to provide sufficient bandwidth from the GLB to the PE array. All GINs have 4-b row IDs to address the 12 rows. The filter and psum GINs use 4-b col IDs to address the 14 columns, while ifmap GIN uses 5 b to support maximum 32 ifmap rows passing in diagonal. The filter and psum GINs have data bus width of 64 b (4b×16 b), while the ifmap GIN has the data bus width of 16 b.”).  
	Regarding claim 9, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, further comprising: a memory interface to couple the plurality of memories to a high bandwidth memory (HBM)(Falcon, para. 0041, “System logic chip 116 may include a memory controller hub (MCH). Processor 102 may communicate with MCH 116 via a processor bus 110. MCH 116 may provide a high bandwidth memory path 118 to memory 120 for instruction and data storage and for storage of graphics commands, data and textures.”).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen with the above teachings of Falcon for the same rationale stated at Claim 1.
	Regarding claim 10, Chen in view of Falcon and DiCecco teaches the IC chip of claim 9, wherein a matrix routine performed by one or more of the plurality of the processing units comprises a distributed matrix multiplication routine, the distributed matrix multiplication routine to be executed by multiple of the plurality of processing units(Falcon, paras. 0085-0087, see also fig. 11, “Execution cluster 1114 may include a number of calculation circuits 1118, distribution logics 1116, 1122, and delay elements 1120. Distribution logic 1116 may receive input signal                 
                    
                            x
                        
                            i
                        
                    ,
                
             i=l, ... , N, where the input signal may be image pixel values…[b]esides input signal                 
                    
                            x
                        
                            i
                        
                    ,
                
             distribution logic 1116 may also assign weight coefficients                 
                    
                            w
                        
                            i
                        
            , 1, ... , N to different calculation circuits… [w]hen calculation circuits 1118 work collaboratively, they may achieve a convolution layer…or a fully-connected layer of a CNN system.”).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen with the above teachings of Falcon for the same rationale stated at Claim 1.
	Regarding claim 11, Chen in view of Falcon and DiCecco teaches the IC chip of claim 10, wherein a plurality of instructions of the matrix routine are to be executed to perform one or more convolution operations(Falcon, paras. 0085-0087, see also fig. 11, “Execution cluster 1114 may include a number of calculation circuits 1118, distribution logics 1116, 1122, and delay elements 1120. Distribution logic 1116 may receive input signal                 
                    
                            x
                        
                            i
                        
                    ,
                
             i=l, ... , N, where the input signal may be image pixel values…[b]esides input signal                 
                    
                            x
                        
                            i
                        
                    ,
                
             distribution logic 1116 may also assign weight coefficients                 
                    
                            w
                        
                            i
                        
            , 1, ... , N to different calculation circuits… [w]hen calculation circuits 1118 work collaboratively, they may achieve a convolution layer…or a fully-connected layer of a CNN system.”).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen with the above teachings of Falcon for the same rationale stated at Claim 1.
	Regarding claim 12, Chen in view of Falcon and DiCecco teaches the IC chip of claim 11, wherein the matrix routine is associated with an operation in a neural network(Falcon, paras. 0085-0087, see also fig. 11, “Execution cluster 1114 may include a number of calculation circuits 1118, distribution logics 1116, 1122, and delay elements 1120. Distribution logic 1116 may receive input signal                 
                    
                            x
                        
                            i
                        
                    ,
                
             i=l, ... , N, where the input signal may be image pixel values…[b]esides input signal                 
                    
                            x
                        
                            i
                        
                    ,
                
             distribution logic 1116 may also assign weight coefficients                 
                    
                            w
                        
                            i
                        
            , 1, ... , N to different calculation circuits… [w]hen calculation circuits 1118 work collaboratively, they may achieve a convolution layer…or a fully-connected layer of a CNN system.”).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chen with the above teachings of Falcon for the same rationale stated at Claim 1.
	Regarding claim 13, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, wherein the matrix multiplication command or instruction is one of a plurality of instructions of a matrix routine, the matrix routine to be performed by one or more of the plurality of the processing units(Chen, pg., 9, As fig. 14 details below: 

    PNG
    media_image1.png
    383
    594
    media_image1.png
    Greyscale

	The customized Caffe runs on the NVIDIA Jetson TK1 development board, and offloads the processing of a CNN layer to Eyeriss[wherein the matrix multiplication command or instruction is one of a plurality of instructions of a matrix routine, the matrix routine to be performed by one or more of the plurality of the processing units] through the PCIe interface.).  
	Regarding claim 14, Chen in view of Falcon and DiCecco teaches the IC chip of claim 1, further comprising: a host interface to couple the plurality of processing units to the first controller(Chen, pg., 9, As fig. 14 details below: 

    PNG
    media_image1.png
    383
    594
    media_image1.png
    Greyscale

	The customized Caffe runs on the NVIDIA Jetson TK1 development board[a host interface], and connects to the Xilinx VC707 which communicates with the Eyeriss accelerator[to couple the plurality of processing units to the first controller].).  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ADAM C STANDKE whose telephone number is (571)270-1806. The examiner can normally be reached Gen. M-F 9-9PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Adam C Standke/
Primary Examiner
Art Unit 2129

        1 Examiner Remarks: Examiner Notes: The claim limitations that are not in bold and contained within square brackets (i.e., [ ]) are claim limitations that are not taught by the prior art of Falcon.
        2 Examiner Remarks: The claim limitations that are not in bold and contained within square brackets are taught by the prior art of Chen. 
        3 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
        4 Examiner Remarks: Para 0065 of Applicant’s Specification details that the arithmetic engine maps to reference numerals 910a-c of drawing 9.
Read full office action
Prosecution Timeline

Mar 15, 2023
Application Filed
Apr 23, 2025
Non-Final Rejection mailed — §103
Jul 22, 2025
Response Filed
Oct 22, 2025
Final Rejection mailed — §103
Feb 23, 2026
Request for Continued Examination
Mar 06, 2026
Response after Non-Final Action
Mar 10, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/474,029
Patent 12632713
DEEP LEARNING HARDWARE
6y 10m to grant Granted May 19, 2026
18/415,376
Patent 12632737
Using Hierarchical Representations for Neural Network Architecture Searching
2y 4m to grant Granted May 19, 2026
17/223,445
Patent 12626182
MICROSERVICE COMPOSITIONS
5y 1m to grant Granted May 12, 2026
17/122,385
Patent 12619714
COPING WITH FEATURE ERROR SUPPRESSION: A MECHANISM TO HANDLE THE CONCEPT DRIFT
5y 4m to grant Granted May 05, 2026
17/189,943
Patent 12614087
OPTIMIZATION APPARATUS, OPTIMIZATION METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
5y 1m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
49%
Grant Probability
75%
With Interview (+25.8%)
4y 4m (~1y 1m remaining)
Median Time to Grant
High
PTA Risk
Based on 130 resolved cases by this examiner. Grant probability derived from career allowance rate.
DEEP LEARNING HARDWARE

This examiner grants 49% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

DEEP LEARNING HARDWARE

This examiner grants 49% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email