Last updated: April 19, 2026
Application No. 16/662,532
3D NEURAL INFERENCE PROCESSING UNIT ARCHITECTURES

Final Rejection §102§103
Filed
Oct 24, 2019
Examiner
GODO, MORIAM MOSUNMOLA
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
7 (Final)
Interview Optional

— +33.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 68 resolved cases, 2023–2026
Examiner Intelligence

GODO, MORIAM MOSUNMOLA View full profile →
Grants 44% of resolved cases
Career Allow Rate
30 granted / 68 resolved
-10.9% vs TC avg
Strong +33% interview lift
Without
With
+33.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
47 currently pending
Career history
115
Total Applications
across all art units
Statute-Specific Performance

§101
16.1%
-23.9% vs TC avg
§103
56.7%
+16.7% vs TC avg
§102
12.7%
-27.3% vs TC avg
§112
12.9%
-27.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 68 resolved cases
Office Action

§102 §103
DETAILED ACTION
     This office action is in response to the Application No. 16662532 filed on 
11/26/2025. Claim 16 has been cancelled, claims 1-15 and 17-20 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered.

Response to Arguments
2.	The claim amendments filed 11/26/2025 has overcome the 112(b) rejection of 09/04/2025. As a result, the 112(b) has been withdrawn.
 Applicant’s arguments regarding the independent claims have been considered but are moot in view of the new grounds of rejection.
It is noted that the Applicant’s argument regarding the dependent claims which depend directly or indirectly from the independent claims are moot as well in view of the new grounds of rejection.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

3.	Claims 1, 14, 18 and 19 are rejected under 35 U.S.C 102(a)(2) as being anticipated by Baum et al. (US20200005127 filed 09/12/2019)

Regarding claim 1, Baum teaches a neural inference chip (The granular nature of the NN processing engine or processor, also referred to as a neurocomputer or neurochip [0019]) comprising: a first tier (4×16 processing element array is shown in FIG. 46. In this embodiment, the circuit, generally referenced 1100 [0308]) comprising a plurality of neural cores arranged in a two dimensional matrix along a plane of the first tier (comprises a PE fabric 1110 having 64 PEs (PE0 . . . PE63) arranged in a 4×16 array with four rows and sixteen columns [0308]), 
the neural cores individually comprising a neural computation unit (the processing elements (PEs) 76 which are composed of a multiply and accumulate (MAC) circuit and local memory [0137]), 
the neural computation unit being adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations (All four data inputs in the first column are multiplied by W[0] (i.e. W0), all four inputs in the second column are multiplied by W[1] (i.e. W4), all four inputs in the third column are multiplied by W[2] (i.e. W8), and so on through the last column which is multiplied by W[15] (i.e. W60). This process continues until all products in each of the PEs are computed and summed to yield output values y [0314]); 
a second tier (Weight memory 1112 [0308], Fig. 46) physically positioned above or below the first tier (4×16 processing element array is shown in FIG. 46. In this embodiment, the circuit, generally referenced 1100 [0308]) in a stacked configuration therewith (see Fig. 46), the second tier (Weight memory 1112 [0308], Fig. 46) comprising a first neural network model memory arranged in a two dimensional matrix (Weight memory 1112 supplies weights W[0 . . . 15] across the 16 columns of the array [0308]. The Examiner notes [0 . . . 15] is a 1x16 row matrix with 1 row and 16 columns. In a matrix, two dimensions are represented by rows and columns) along a plane of the second tier (Weight memory 1112 [0308], Fig. 46), the plane of the second tier being parallel to the plane of the first tier (4×16 processing element array  1100 is parallel to Weight memory 1112, Fig. 46), the first neural network model memory being adapted to store the plurality of synaptic weights (Weight memory 1112 supplies weights W[0 . . . 15] across the 16 columns of the array [0308]); and 
a communication network operatively coupled to the first neural network model memory and to the plurality of neural cores, the communication network being configured to provide dedicated buses for neural cores, the dedicated buses being separate from the plurality of neural cores, the dedicated buses being configured to provide the synaptic weights from the first neural network model memory to neural cores (see Fig. 46 below).

    PNG
    media_image1.png
    422
    702
    media_image1.png
    Greyscale


Regarding claim 14, Baum teaches the neural inference chip of claim 1, Baum teaches wherein the communication network is adapted to provide the same synaptic weights to all of the neural cores (Sixteen weights are output from the weight memory 1112 and applied to the PE array [0309], Fig. 46) 

Regarding claim 18,  Baum teaches a method comprising: providing synaptic weights from a first neural network model memory to a plurality of neural cores via a communication network, wherein the communication network is configured to provide dedicated buses for the plurality of neural cores, the communication network being operatively coupled to the first neural network model memory and to the plurality of neural cores (Sixteen weights are output from the weight memory 1112 and applied to the PE array [0309], see Fig. 46 below), 

    PNG
    media_image1.png
    422
    702
    media_image1.png
    Greyscale

the plurality of neural cores being arrayed on and along a plane of a first tier (comprises a PE fabric 1110 having 64 PEs (PE0 . . . PE63) arranged in a 4×16 array with four rows and sixteen columns [0308]) of a neural inference chip (A high-level block diagram illustrating an example system on chip (SoC) NN processing system comprising one or more NN processing cores is shown in FIG. 4. The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 [0120]; The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE) [0122]. The SoC comprises circuit 1100, Fig. 46), 
each core comprising a neural computation unit (the processing elements (PEs) 76 which are composed of a multiply and accumulate (MAC) circuit and local memory [0137]), 
the neural computation unit adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations (All four data inputs in the first column are multiplied by W[0] (i.e. W0), all four inputs in the second column are multiplied by W[1] (i.e. W4), all four inputs in the third column are multiplied by W[2] (i.e. W8), and so on through the last column which is multiplied by W[15] (i.e. W60). This process continues until all products in each of the PEs are computed and summed to yield output values y [0314]), 
the first neural network model memory being arrayed on and along a plane of a second tier (Weight memory 1112 [0308], Fig. 46) of the neural inference chip (A high-level block diagram illustrating an example system on chip (SoC) NN processing system comprising one or more NN processing cores is shown in FIG. 4. The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 [0120]; The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE) [0122]. The SoC comprises circuit 1100, Fig. 46), 
the second tier (Weight memory 1112 [0308], Fig. 46) being physically positioned above or below the first tier (4×16 processing element array is shown in FIG. 46. In this embodiment, the circuit, generally referenced 1100 [0308]) in the neural inference chip (A high-level block diagram illustrating an example system on chip (SoC) NN processing system comprising one or more NN processing cores is shown in FIG. 4. The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 [0120]; The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE) [0122]. The SoC comprises circuit 1100, Fig. 46)
in a stacked configuration (see Fig. 46) whereby the plane of the first tier is parallel to the plane of the second tier (4×16 processing element array 1100 is parallel to Weight memory 1112, Fig. 46), 
the dedicated buses being separate from the plurality of neural cores (the dedicated buses is separate from the neural cores (cores comprising PEs) in Fig. 46 above), the dedicated buses being configured to provide the synaptic weights from the first neural network model memory to the plurality of neural cores such that the synaptic weights are provided directly from the second tier to the first tier (Weight memory 1112 supplies weights W[0 . . . 15] across the 16 columns of the array [0308]; The Examiner notes Fig. 46 above shows the dedicated buses providing weights from the Weight memory 1112 to the PEs).  

Regarding claim 19, Baum teaches a computer program product for neural inference processing, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a neural inference chip to cause the neural inference chip to perform a method comprising (the present invention may be embodied as a system, method, computer program product or any combination thereof. Accordingly, the present invention may take the form of an entirely hardware embodiment, … or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium [0088]; The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 [0120]): 
providing synaptic weights from a first neural network model memory to each of a plurality of neural cores via a communication network (Sixteen weights are output from the weight memory 1112 and applied to the PE array [0309], see Fig. 46 below), 
wherein the first neural network model memory is arrayed on and along a plane of a second tier (Weight memory 1112 [0308], Fig. 46) of the neural inference chip (A high-level block diagram illustrating an example system on chip (SoC) NN processing system comprising one or more NN processing cores is shown in FIG. 4. The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 [0120]; The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE) [0122]. The SoC comprises circuit 1100, Fig. 46), 
wherein the communication network is configured to provide dedicated buses for each neural core of the plurality of neural cores (see Fig. 46 below), 
the communication network being operatively coupled to the first neural network model memory and to each of the plurality of neural cores (Fig. 46 below shows the communication network coupled to the Weight memory 1112 and PEs 1110), 
the plurality of neural cores being arrayed on and along a plane of a first tier (comprises a PE fabric 1110 having 64 PEs (PE0 . . . PE63) arranged in a 4×16 array with four rows and sixteen columns [0308]) of the neural inference chip (A high-level block diagram illustrating an example system on chip (SoC) NN processing system comprising one or more NN processing cores is shown in FIG. 4. The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 [0120]; The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE) [0122]. The SoC comprises circuit 1100, Fig. 46), 
the second tier (Weight memory 1112 [0308], Fig. 46) being physically positioned above or below the first tier (4×16 processing element array is shown in FIG. 46. In this embodiment, the circuit, generally referenced 1100 [0308]) in the neural inference chip (A high-level block diagram illustrating an example system on chip (SoC) NN processing system comprising one or more NN processing cores is shown in FIG. 4. The SoC NN processing system, generally referenced 100, comprises at least one NN processor integrated circuit (or core) 102 [0120]; The NN processing engine or core 60 comprises several hierarchical computation units. The lowest hierarchical level is the processing element (PE) [0122]. The SoC comprises circuit 1100, Fig. 46) in a stacked configuration (see Fig. 46) whereby the plane of the first tier is parallel to the plane of the second tier (4×16 processing element array 1100 is parallel to Weight memory 1112, Fig. 46),  
the neural cores individually comprising a neural computation unit (the processing elements (PEs) 76 which are composed of a multiply and accumulate (MAC) circuit and local memory [0137]), 
the neural computation unit adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations (All four data inputs in the first column are multiplied by W[0] (i.e. W0), all four inputs in the second column are multiplied by W[1] (i.e. W4), all four inputs in the third column are multiplied by W[2] (i.e. W8), and so on through the last column which is multiplied by W[15] (i.e. W60). This process continues until all products in each of the PEs are computed and summed to yield output values y [0314]), 
the dedicated buses being separate from the plurality of neural cores (the dedicated buses is separate from the neural cores (cores comprising PEs) in Fig. 46 below), the dedicated buses being configured to provide the synaptic weights from the first neural network model memory to each neural core of the plurality of neural cores such that the synaptic weights are provided directly from the second tier to the first tier (Weight memory 1112 supplies weights W[0 . . . 15] across the 16 columns of the array [0308]; The Examiner notes Fig. 46 below shows the dedicated buses providing weights from the Weight memory 1112 to the PEs).

    PNG
    media_image1.png
    422
    702
    media_image1.png
    Greyscale


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



4.	Claims 2-4 are rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Kim et al. ("Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 380-392.)

Regarding claim 2, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach the limitations of claim 2.
Kim teaches wherein the communication network comprises a plurality of through-silicon vias (Multiple processing elements (PE) concurrently communicate with multiple DRAM vaults through high-speed TSVs, pg. 383, left col, last para. The Examiner notes that the communication network is the TSV which is a through-silicon via or through-chip via that passes through the die) extending between the first and second tiers in a direction generally perpendicular to the plane of the first tier (The TSVs extends between the first tier and second tier and the TSVs is in a direction perpendicular to the first tier in Fig. 4).
	It would have been obvious to a person having ordinary skill in the art before the 
effective filing date of the claimed invention to have modified the method of Baum to 
incorporate the teachings of Kim for the benefit of improving computing power-efficiency (GOPs/s/W) over reported GPU based implementation while providing the programmability and scalability advantages over ASIC/FPGA platforms (Kim, pg. 381, left col., third para.).

Regarding claim 3, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach the limitations of claim 3.
Kim teaches further comprising: at least one additional tier comprising at least one additional neural network model memory, (second DRAM (as additional tier) out of four tiers of DRAM, Fig. 4, pg. 380; … 3D with multiple tiers of DRAM, abstract) 
	wherein the communication network is additionally operatively coupled to the at least one additional neural network model memory (TSV is additionally operatively coupled to the at least second DRAM, Fig. 2, pg. 380; Multiple processing elements (PE) concurrently communicate with multiple DRAM vaults through high-speed TSVs, pg. 383, left col, last para.) and 
	adapted to provide synaptic weights from the at least one additional neural network model memory to the plurality of neural cores (The NN shown in Fig. 4 is stored in the DRAM stack, pg. 382, right col, last para. comprising Wij as weights is provided to each of the PEs (processing elements) via TSV (Fig. 4). The Examiner notes that DRAM stack is neural network model memory and PEs are the neural cores)
It would have been obvious to a person having ordinary skill in the art before the 
effective filing date of the claimed invention to have modified the method of Baum to 
incorporate the teachings of Kim for the benefit of improving computing power-efficiency (GOPs/s/W) over reported GPU based implementation while providing the programmability and scalability advantages over ASIC/FPGA platforms (Kim, pg. 381, left col., third para.).

Regarding claim 4, Baum and Kim teaches the neural inference chip of claim 3, Kim teaches wherein a neural network model (In this paper, we will use the term neural network (NN) to represent an artificial neural network, pg. 381, left col, last para.) 
is stored across the first neural network model memory (The NN shown in Fig. 4 is stored in the DRAM stack, pg. 382, right col, last para.; Our approach is based on following key innovations: 1. In-memory neuromorphic processing. The Neurocube integrates a fine grained, highly parallel, compute layer within a 3D high-density memory package, the hybrid memory cube (HMC), pg. 381, left col, second para.) and 
the at least one additional neural network model memory (The example of ConvNN shows how Neurocube can be used to program other NN, … Therefore, different types of network can be programmed in Neurocube without architectural changes, pg. 387, right col, last para.)
 The same motivation to combine dependent claim 3 applies here. 

5.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Kim et al. ("Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 380-392.) and further in view of Gao et al. ("Tetris: Scalable and efficient neural network acceleration with 3d memory." Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. 2017.)

Regarding claim 5, Baum and Kim teaches the neural inference chip of claim 3, but they do not explicitly teach the limitations of claim 5.
Gao teaches wherein a plurality of neural network models are stored across the first neural network model memory and the at least one additional neural network model memory (an eight-die HMC memory stack organized into 16 vaults. Each vault is associated with an array of 14 × 14 NN processing elements, pg. 752, left col, second para., Fig. 2, pg. 754; In addition to processing different NNs or layers in each vault, we can divide large NN layers across the vaults to process them in parallel, pg. 758, right col, section 4.2. The Examiner notes that Fig. 2 has eight DRAM tiers or layers and each tier comprises an array of different NNs or layers in each vault.) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum and Kim to incorporate the teachings of Gao for the benefit of improving computational density by optimally using area for processing elements and on-chip buffers, and that moving partial computations to DRAM dies (Gao, pg. 752, left col., second para.)

6.	Claims 6-8 are rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Burger et al. (US20160379686)

Regarding claim 6, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach wherein each core further comprises: an activation memory adapted to store the input activations and the output activations; and a local controller, the local controller being adapted to load the input activations from the activation memory to the neural computation unit and to store the plurality of output activations from the neural computation unit to the activation memory.
	Burger teaches wherein each core (neural engine 5802, Fig. 58) further comprises: 
	an activation memory adapted to store the input activations and the output activations; (input memory (activations) 5804, Fig. 58; First memory 5804 is used to buffer input activations data [0330]) and
	a local controller, the local controller being adapted to load the input activations from the activation memory to the neural computation unit and to store the plurality of output activations from the neural computation unit to the activation memory (controller component 5704 issues commands to neural engines … to stream a subset of the input activations from DRAM channels into storage elements of the parallel neural engines [0331])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum to incorporate the method of Burger for the benefit of a die stacking technology for providing high bandwidth, low power memory in a 3D integrated circuit technology (Burger [0274])

Regarding claim 7, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach further comprising: a third tier comprising an activation memory, wherein the communication network is additionally operatively coupled to the activation memory and adapted to provide activations from the activation memory to each of the plurality of neural cores.
	Burger teaches further comprising: a third tier (DRAM, such as hybrid memory cube (HMC). HMC combines through-silicon vias and microbumps to connect multiple (e.g., 4 to 8) die of memory cell arrays on top of each other [0265]) comprising 
	an activation memory, (input memory (activations) 5804, Fig. 58; First memory 5804 is used to buffer input activations data [0330]) 
	 wherein the communication network (Although not shown in FIGS. 45A-45C, one or more of the dies in 3D acceleration and memory component 4502 may include through-silicon vias (TSVs) to allow upper die to communicate with lower die [0275]) 
is additionally operatively coupled to the activation memory and adapted to provide activations from the activation memory to the plurality of neural cores (low power memory stack 5706 includes a parallel array of DRAM channels (5726, 5728, 5730, . . . 5732, 5734) to provide access to high-bandwidth memory that can be used to store the activations. The activations are streamed from these parallel channels to the parallel neural engines (5712, 5714, 5716, . . . , 5718, 5720) [0329])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum to incorporate the method of Burger for the benefit of a die stacking technology for providing high bandwidth, low power memory in a 3D integrated circuit technology (Burger [0274])

Regarding claim 8, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach further comprising: a third tier comprising an activation memory, wherein an additional communication network is operatively coupled to the activation memory and adapted to provide activations from the activation memory to each of the plurality of neural cores.
	Burger teaches further comprising: a third tier (DRAM, such as hybrid memory cube (HMC). HMC combines through-silicon vias and microbumps to connect multiple (e.g., 4 to 8) die of memory cell arrays on top of each other [0265]) comprising 
	an activation memory, (input memory (activations) 5804, Fig. 58; First memory 5804 is used to buffer input activations data [0330]) 
	wherein an additional communication network (Although not shown in FIGS. 45A-45C, one or more of the dies in 3D acceleration and memory component 4502 may include through-silicon vias (TSVs) to allow upper die to communicate with lower die [0275]) 
	 is operatively coupled to the activation memory and adapted to provide activations from the activation memory to the plurality of neural cores. (low power memory stack 5706 includes a parallel array of DRAM channels (5726, 5728, 5730, . . . 5732, 5734) to provide access to high-bandwidth memory that can be used to store the activations. The activations are streamed from these parallel channels to the parallel neural engines (5712, 5714, 5716, . . . 5718, 5720) [0329])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum to incorporate the method of Burger for the benefit of a die stacking technology for providing high bandwidth, low power memory in a 3D integrated circuit technology (Burger [0274])

7.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Kim et al. ("Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 380-392.) and further in view of Burger et al. (US20160379686)

	Regarding claim 9, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach wherein the communication network is operatively coupled to the third tier and adapted to provide the synaptic weights from the first neural network model memory to each of the plurality of neural cores of the third tier.
Kim teaches wherein the communication network is operatively coupled to the third tier (Multiple processing elements (PE) concurrently communicate with multiple DRAM vaults through high-speed TSVs, pg. 383, left col, last para. The Examiner notes that the communication network is the TSV which is a through-silicon via or through-chip via that passes through the die) and 
	adapted to provide the synaptic weights from the first neural network model memory to the plurality of neural cores of the third tier. (The NN shown in Fig. 4 is stored in the DRAM stack, pg. 382, right col, last para. comprising Wij as weights is provided to each of the PEs (processing elements) via TSV (Fig. 4). The Examiner notes that DRAM stack is neural network model memory and PEs are the neural cores)
It would have been obvious to a person having ordinary skill in the art before the 

effective filing date of the claimed invention to have modified the method of Baum to 

incorporate the teachings of Kim for the benefit of improving computing power-efficiency (GOPs/s/W) over reported GPU based implementation while providing the programmability and scalability advantages over ASIC/FPGA platforms (Kim, pg. 381, left col., third para.).
	Baum and Kim does not explicitly teach further comprising: a third tier comprising a plurality of neural cores,
	Burger teaches further comprising: a third tier comprising a plurality of neural cores, (DRAM, such as hybrid memory cube (HMC). HMC combines through-silicon vias and microbumps to connect multiple (e.g., 4 to 8) die of memory cell arrays on top of each other [0265]) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum and Kim to incorporate the method of Burger for the benefit of a die stacking technology for providing high bandwidth, low power memory in a 3D integrated circuit technology (Burger [0274])

8.	Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Kim et al. ("Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 380-392.) in view of Burger et al. (US20160379686) and further in view of Gao et al. ("Tetris: Scalable and efficient neural network acceleration with 3d memory." Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. 2017.)

Regarding claim 10, Baum, Kim and Burger teaches the neural inference chip of claim 9, Baum, Kim and Burger does not explicitly teach configured to provide a first neural network model to both the first and third tiers 
Gao teaches configured to provide a first neural network model to both the first and third tiers (The HMC stack (Figure 2 left) is vertically divided into sixteen 32-bit-wide vaults, pg. 754, left col, last para.; An NN engine is placed in each vault, pg. 754, right col, last para. The Examiner notes that the HMC in Fig. 2 has many tiers or layers, each tier has vaults and a neural network is placed in each vault)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum, Kim and Burger to incorporate the method of Gao for the benefit of improving computational density by optimally using area for processing elements and on-chip buffers, and that moving partial computations to DRAM dies (Gao, pg. 752, left col, second para.)

	Regarding claim 11, Baum, Kim and Burger teaches the neural inference chip of claim 9, Baum, Kim and Burger does not explicitly teach configured to provide different neural network models to each of the first and third tiers.
Gao teaches configured to provide different neural network models (In addition to processing different NNs or layers in each vault, pg. 758, right col, section 4.2) 
	to the first and third tiers (The HMC stack (Figure 2 left) is vertically divided into sixteen 32-bit-wide vaults)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum, Kim and Burger to incorporate the method of Gao for the benefit of improving computational density by optimally using area for processing elements and on-chip buffers, and that moving partial computations to DRAM dies (Gao, pg. 752, left col, second para.)

9.	Claims 12, 13, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Zhang (US20190164038)

	Regarding claim 12, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach wherein the communication network has at least two dimensions, a first of the at least two dimensions extending between tiers of the neural inference chip.
	Zhang teaches wherein the communication network (They are communicatively coupled by a plurality of inter-level connections 160, i.e. through-silicon vias (TSV's) 160 a-160 c (FIG. 6C) [0053]) 
	 has at least two dimensions, (Fig. 6C shows that each TSV’s 160a-160c has a length and width) 
	a first of the at least two dimensions extending between tiers of the neural inference chip (length extends between tiers or layers of vertically integrated neuro-processor 100, Fig. 5A and 6C)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum to incorporate the method of Zhang for the benefit of an integrated circuit, and more particularly to a neuro-processor for artificial intelligence (AI) applications [0003] that improves computational density and storage of a neuro-processor. (Zhang [0010-0011])

	Regarding claim 13, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach wherein the communication network has at least three dimensions, a first of the at least three dimensions extending between tiers of the neural inference chip and a second of the at least three dimensions extending within a tier of the neural inference chip.
	Zhang teaches wherein the communication network (They are communicatively coupled by a plurality of inter-level connections 160, i.e. through-silicon vias (TSV's) 160 a-160 c (FIG. 6C) [0053]) 
	 has at least three dimensions, (Fig. 2C shows that TSV has a length, width and depth) 
	a first of the at least three dimensions extending between tiers of the neural inference chip and a second of the at least three dimensions extending within a tier of the neural inference chip. (length extends between tiers or layers of vertically integrated neuro-processor 100 and width extending within a tier of the integrated neuro-processor 100, Fig. 2C)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum to incorporate the method of Zhang for the benefit of an integrated circuit, and more particularly to a neuro-processor for artificial intelligence (AI) applications [0003] that improves computational density and storage of a neuro-processor. (Zhang [0010-0011])

	Regarding claim 15, Baum teaches the neural inference chip of claim 1, Baum does not teach wherein the communication network is adapted to provide the same synaptic weights to a subset of the neural cores.
	Zhang teaches wherein the communication network is adapted to provide the same synaptic weights (In FIG. 9B, the neuro-processing circuit 180 ij serves four memory arrays 170 ijA-170 ijD, i.e. it uses the synaptic weights stored in the memory arrays 170 ijA-170 ijD [0065]) 
	to a subset of the neural cores (The first and second components 180 ijA, 180 ijB collectively form the neuro-processing circuit 180 ij. [0069])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum to incorporate the method of Zhang for the benefit of an integrated circuit, and more particularly to a neuro-processor for artificial intelligence (AI) applications [0003] that improves computational density and storage of a neuro-processor. (Zhang [0010-0011])

10.	Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Kim et al. ("Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 380-392.) in view of Burger et al. (US20160379686) and further in view of Zhang (US20190164038)

	Regarding claim 17, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach wherein the communication network comprises a plurality of rows with one of the tiers, each connected to a subset of the plurality of neural cores across tiers, and wherein the communication network is adapted to provide the same synaptic weights to those cores connected to each of the plurality of rows.
Kim teaches wherein the communication network comprises a plurality of rows with one of the tiers, each connected to a subset of the plurality of neural cores across tiers, (Fig. 4 shows TSV comprises a plurality of rows, each connected to a subset of PEs across tiers, pg. 382) and 
It would have been obvious to a person having ordinary skill in the art before the 

effective filing date of the claimed invention to have modified the method of Baum to 

incorporate the teachings of Kim for the benefit of improving computing power-efficiency (GOPs/s/W) over reported GPU based implementation while providing the programmability and scalability advantages over ASIC/FPGA platforms (Kim, pg. 381, left col., third para.).
	Baum and Kim does not explicitly teach wherein the communication network is adapted to provide the same synaptic weights to those cores connected to each of the plurality of rows.
	Zhang teaches wherein the network is adapted to provide the same synaptic weights to those cores connected to each of the plurality of rows (In FIG. 9B, the neuro-processing circuit 180 ij serves four memory arrays 170 ijA-170 ijD, i.e. it uses the synaptic weights stored in the memory arrays 170 ijA-170 ijD [0065]; The preferred vertically integrated neuro-processor 100 comprises an array with m rows [0031])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum and Kim to incorporate the method of Zhang for the benefit of an integrated circuit, and more particularly to a neuro-processor for artificial intelligence (AI) applications [0003] that improves computational density and storage of a neuro-processor (Zhang [0010-0011]).

11.	Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US20200005127 filed 09/12/2019) in view of Datta et al. (US20130339281)

Regarding claim 20, Baum teaches the neural inference chip of claim 1, Baum does not explicitly teach the limitations of claim 20.
Datta teaches further comprising: at least a third tier (the neuron group NG2, Fig. 4) comprising a plurality of neural cores arranged in a two dimensional matrix along a plane of the third tier (Each processor 10 comprises at least one neuron group 12 [0069]; embodiments of the invention may include various processing elements (including computer simulations) that are modeled on biological neurons [0026]), the second tier (multiple synapses 31, such as synapses S1-2 and S2-3 [0045], Fig. 4) being positioned between the first (neuron group NG1, Fig. 4) and third tiers (neuron group NG2, Fig. 4), 
wherein the communication network is additionally operatively coupled to the neural cores of the third tier for providing synaptic weights from the second tier thereto (As such, the synaptic weight of a synapse 31 interconnecting two neuron groups 12 is shared between the two neuron groups 12 in both the forward direction and the backward direction [0048]).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Baum to incorporate the method of Datta for the benefit of interconnecting neuron groups on different processors via a plurality of reciprocal communication pathways, and facilitating the exchange of reciprocal spiking communication between two different processors (Datta, abstract)

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 8am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle T Bechtold can be reached on (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/M.G./Examiner, Art Unit 2148                                                                                                                                                                                                       
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Oct 24, 2019
Application Filed
Jun 11, 2020
Response after Non-Final Action
Jul 02, 2020
Response after Non-Final Action
Oct 26, 2022
Non-Final Rejection — §102, §103
Feb 02, 2023
Response Filed
May 08, 2023
Final Rejection — §102, §103
Aug 14, 2023
Response after Non-Final Action
Aug 18, 2023
Response after Non-Final Action
Oct 16, 2023
Request for Continued Examination
Oct 26, 2023
Response after Non-Final Action
Nov 16, 2023
Non-Final Rejection — §102, §103
Apr 30, 2024
Response Filed
Aug 19, 2024
Final Rejection — §102, §103
Nov 04, 2024
Response after Non-Final Action
Nov 18, 2024
Response after Non-Final Action
Nov 27, 2024
Request for Continued Examination
Dec 05, 2024
Response after Non-Final Action
Dec 26, 2024
Non-Final Rejection — §102, §103
Apr 03, 2025
Response Filed
Apr 30, 2025
Examiner Interview Summary
Apr 30, 2025
Applicant Interview (Telephonic)
Aug 27, 2025
Non-Final Rejection — §102, §103
Nov 04, 2025
Interview Requested
Nov 26, 2025
Examiner Interview Summary
Nov 26, 2025
Applicant Interview (Telephonic)
Nov 26, 2025
Response Filed
Feb 12, 2026
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/919,417
Patent 12602586
SUPERVISORY NEURON FOR CONTINUOUSLY ADAPTIVE NEURAL NETWORK
2y 5m to grant Granted Apr 14, 2026
17/096,425
Patent 12530583
VOLUME PRESERVING ARTIFICIAL NEURAL NETWORK AND SYSTEM AND METHOD FOR BUILDING A VOLUME PRESERVING TRAINABLE ARTIFICIAL NEURAL NETWORK
2y 5m to grant Granted Jan 20, 2026
16/249,279
Patent 12511528
NEURAL NETWORK METHOD AND APPARATUS
2y 5m to grant Granted Dec 30, 2025
16/942,263
Patent 12367381
CHAINED NEURAL ENGINE WRITE-BACK ARCHITECTURE
2y 5m to grant Granted Jul 22, 2025
16/513,208
Patent 12314847
TRAINING OF MACHINE READING AND COMPREHENSION SYSTEMS
2y 5m to grant Granted May 27, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

8-9
Expected OA Rounds
44%
Grant Probability
78%
With Interview (+33.4%)
4y 8m
Median Time to Grant
High
PTA Risk
Based on 68 resolved cases by this examiner. Grant probability derived from career allow rate.
3D NEURAL INFERENCE PROCESSING UNIT ARCHITECTURES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email