Last updated: May 29, 2026

Application No. 18/625,236

NEURAL PROCESSING UNIT

Non-Final OA §103

Filed

Apr 03, 2024

Priority

Aug 21, 2020 — RE 10-2020-0105509 +3 more

Examiner

METZGER, MICHAEL J

Art Unit

2183

Tech Center

2100 — Computer Architecture & Software

Assignee

Deepx Co. Ltd.

OA Round

3 (Non-Final)

Interview Optional

— +8.0% interview lift. Interview lift (+8.0%) is below the 15.0% threshold. A written response is recommended.

Based on 486 resolved cases, 2023–2026

Examiner Intelligence

METZGER, MICHAEL J View full profile →

Grants 90% — above average

Career Allowance Rate

439 granted / 486 resolved

+35.3% vs TC avg

Moderate +8% lift

Without

With

+8.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

26 currently pending

Career history

511

Total Applications

across all art units

Statute-Specific Performance

§101

4.2%

-35.8% vs TC avg

§103

74.7%

+34.7% vs TC avg

§102

10.5%

-29.5% vs TC avg

§112

3.5%

-36.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 486 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
1.  A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on March 26th, 2026 has been entered.

Response to Arguments
2.  Applicant's arguments filed March 26th, 2026, with respect to the independent claim rejections have been fully considered but they are not persuasive.
While Applicant’s arguments are primarily directed toward limitations of the independent claims that have been added via amendment, and thus will be addressed by the updated rejections below, aspects of the arguments which have been presented both in the After-Final Response dated February 23rd, 2026, and again in the Remarks filed March 26th, 2026, will be partially addressed here to clarify the examination record.
Applicant’s arguments primarily rely upon the idea that the Bokhari reference does not, and cannot, teach the claim limitations as amended, as the amended limitations require a schedule reuses a memory address value for a subsequent scheduling of operations by forgoing a “write operation to the SRAM memory to store the input data” of the operations.  Applicant has argued that Bokhari fails to teach this limitation, as Bokhari is directed toward forgoing “an off-chip DRAM operation rather than an internal SRAM write cycle”, for example, and that Bokhari is concerned with “reducing external memory accesses” rather than avoiding a write to SRAM memory.  This argument is once again not considered persuasive.  One of ordinary skill in the art would understand that, when loading data to be operated upon by a processing element, this data is loaded from whatever layer of memory is “closest” to the processing element, whether that is a local buffer or register file, a cache layer such as L0, L1, L2, or higher, or, if the data is not present any closer, loading from a “main”, “system”, or other “external” memory.  This is because the latency for accessing each higher layer of memory increases compared to the closer, and thus typically much smaller, layers such as internal buffers, registers, or fast-access memory.  If the data must be loaded from a higher cache level or external memory, it is then typically written to the closest available memory layer that can store it, such as a local L0 cache or other fast, closer, memory level.  This is explicitly stated and therefore clearly understood by both the instant application and all of the previously cited references, as it is a routine and conventional aspect of instruction processing.  The instant application states, very explicitly, in paragraphs [0359] and [0360], “[i]f the main memory system 1070 of the edge device 1000 includes DRAM, the neural network processing unit 100 may operate to minimize memory access with the main memory system 1070” ([0359], emphasis added by Examiner), and “the neural network processing unit 100 of the edge device 1000 may be configured to control the reuse of data stored in the NPU memory system 120…and not to make a memory access request to the main memory system 120 [sic] when data is reused” ([0360], emphasis added by Examiner).  In the context of the instant application, the NPU memory system 120 is a fast, local SRAM, and the main memory system is the external DRAM or other slower memory array.  Later paragraphs of the instant application such as [0361-0362] describe the routine and conventional usage of these memory systems in line with the description given above, where data is loaded via access request from the (external) main memory system 1070, then stored in the (local, SRAM) NPU memory system 120 before it is operated upon by the neural processing unit.  This is entirely in line with the disclosure of Bokhari, which also uses a local SRAM memory module (Bokhari [0011], [0071]) that stores data in such a way where that the reuse of said data avoids “additional memory accesses” to the slower, external memory system (Bokhari [0133], [0136], etc).  While Bokhari describes this process as purposed to reduce external memory accesses rather than the internal SRAM write operations, one of ordinary skill in the art would understand that it is the accessing of the external memory which incurs the large latency penalty in execution of the program, and avoiding an external memory access by reusing data which is already present in the local SRAM memory therefore avoids both the long penalty of reading data from the external memory system (or “main memory system” as referred to in the instant application), but also the comparatively smaller latency penalty of writing that data into the local memory before it is operated upon by the processing unit.  This is a routine and conventional aspect of the instruction processing art, and the previously cited references as well as the disclosure of the instant application cited above explicitly state that the reusage of data within the local (SRAM, in both the instant application and Bokhari disclosure) memory is done in order to avoid the penalties associated with accessing the external (or “main”, in the instant application’s nomenclature) memory.  While Applicant’s arguments in the after-final and most recent response have attempted to obscure this basic fact of instruction processing, one of ordinary skill in the art would understand that the “reuse” of data within a local memory which avoids an external memory access does, also, albeit less importantly, avoid the subsequent internal write operation wherein the data from the external memory would be stored to the local or internal memory (if, of course, the external memory access had not been avoided in the first place).  This is why both the instant application, Bokhari, and other previously cited references use the word “reuse”, as it is defined by the reusing of the data already present in the internal (SRAM/local) memory.  Applicant’s arguments both in the most recent response and the after-final response seem to imply that, while Bokhari does avoid an external memory access by reusing data in the internal memory, Bokhari somehow does not avoid a subsequent internal write operation, despite not having read or loaded this fantastical new data which has no apparent source and yet is still somehow available to be written to the internal memory.  There is no possible interpretation or definition of “reuse” in the context of instruction processing which satisfies the implication described above, and therefore Applicant’s arguments are not considered persuasive.  As stated previously, these arguments are addressed toward limitations which have been added via amendment, so they will primarily be addressed in the rejections below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.  Claims 1-9 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Du (US 2020/0387800, cited in the IDS dated May 14th, 2025, herein Du) in view of Chinya et al (US 2020/0410327, herein Chinya) and Bokhari et al (US 20190187963, herein Bokhari, cited in the previous Office Action).

Regarding claim 1, Du teaches a neural network processing unit (NPU) for processing an artificial neural network model compiled by a compiler, the NPU comprising:
a processing element array (Fig 1a, array of processing circuits);
a memory configured to store at least one data of the artificial neural network model processed in the processing element array ([0159], memory, [0064-0065], [0072-0075], neural network models to be operated on by processing circuits); and
an NPU scheduler configured to control the processing element array and the SRAM memory based on predefined operation order information of the artificial neural network model processed by the processing element array ([0097], instruction stream indicating operating order of target neural network model).
Du fails to teach wherein the NPU includes SRAM memory or the NPU scheduler is configured to reuse a memory address value in which an operation value of a first layer of a first scheduling is stored as a memory address value corresponding to an input data of a second layer of a second scheduling, which is a next scheduling of the first scheduling.
Chinya teaches a neural network processing unit (NPU) comprising SRAM memory ([0034-0036], SRAM memory banks) and an NPU scheduler configured to reuse a memory address value in which an operation value of a first layer of a first scheduling is stored as a memory address value corresponding to an input data of a second layer of a second scheduling, which is a next scheduling of the first scheduling, without performing a separate memory write ([0022-0023], [0032], [0036], Table 1, [0046-0047], reuse of input data in neural network operations to improve memory and energy efficiency & claims 5, 17, [0034], [0044], reduce writes to memory via data reshaping).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Du and Chinya to utilize memory address reusage to improve the efficiency of the processor.  While Du does not explicitly contemplate the reusage of memory addresses between operations of neural network layer processing, Chinya explains how accounting for memory elements that may be utilized in multiple processing operations may improve the energy efficiency and memory optimization of a neural network processor.  As one of ordinary skill in the art would understand that the purpose of multi-tiered memory systems such as cache levels and local memory banks is to improve a memory hit rate and thus the efficiency of the processor, the combination would merely entail the simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Du and Chinya fail to teach wherein the scheduler reuses a memory address value without performing a separate memory write operation to the SRAM memory to store the input data.
Bokhari teaches a neural network processing unit configured to reuse a memory address value of a first layer without performing a separate memory write operation to the SRAM memory ([0025], identify data of a CNN layer that can be reused without additional memory access, [0085], [0136], [0146], scheduling scheme analysis indicates reusable data that forgoes further memory accesses; reuse of data forgoes external memory access to write new data to SRAM local memory, instead reuse data already present in the local memory).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Du and Chinya with those of Bokhari to reuse data without additional memory access operations.  While Chinya does not explicitly state that the data reusage occurs “without” a further memory access operation, one of ordinary skill in the art would understand that the data reusage scheme of Chinya which utilizes “a large global buffer as shared storage to reduce DRAM access energy consumption” implies that the DRAM access reduction occurs as a result of not needing to issue additional access operations to the DRAM or another external memory, as taught explicitly by Bokhari.  Therefore, the combination would merely entail the simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.

Regarding claim 2, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the processing element array includes a plurality of processing elements configured to perform MAC operations (Shinya [0028], MAC operations).

Regarding claim 3, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the NPU scheduler is configured to control a read and write order of the processing element array and the SRAM memory (Shinya [0029], schedule to minimize read/write energy consumption).

Regarding claim 4, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the NPU scheduler is configured to control the processing element array and the SRAM memory by analyzing predefined operation order information of the artificial neural network model (Shinya [0023], analysis of NN model used to improve energy efficiency).

Regarding claim 5, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the NPU scheduler is configured to schedule an operation order of the artificial neural network model based on a structural data of the artificial neural network model or an artificial neural network data locality information (Du [0133-0139], scheduling operations according to an algorithm including locality-based scheduling).
Regarding claim 6, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the NPU scheduler is configured to access a memory address value where a node data and a weight data of layers of the artificial neural network model are stored based on a predefined operation order information of the artificial neural network model (Shinya [0029], [0036], reading input data and weights from memory addresses, Du [0097], predefined operation order).

Regarding claim 7, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the NPU scheduler is configured to schedule a processing order based on a structural data from an input layer to an output layer of the artificial neural network or an artificial neural network data locality information (Du [0133-0139], scheduling operations according to an algorithm including locality-based scheduling).

Regarding claim 8, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the SRAM memory includes static memory (Shinya [0024], static SRAM).

Regarding claim 9, the combination of Du, Chinya, and Bokhari teaches the NPU of claim 1, wherein the SRAM memory includes at least one of SRAM, MRAM, STT-MRAM, eMRAM, HBM, and OST-MRAM (Shinya [0024], SRAM).

Claims 14-20 refer to an alternate NPU embodiment of the NPU embodiment of claims 1-7.  Therefore, the above rejections for claims 1-7 are applicable to claims 14-20, respectively.

3.  Claims 10-12 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Roberts et al (US 2020/0174748, herein Roberts) and Bokhari.

Regarding claim 10, Du teaches a neural network processing unit (NPU) for processing an artificial neural network model compiled by a compiler comprising:
a processing element array (Fig 1a, array of processing circuits);
a memory configured to store the artificial neural network model processed in the processing element array ([0159], memory, [0064-0065], [0072-0075], neural network models to be operated on by processing circuits); and
an NPU scheduler configured to control the processing element array and the memory based on predefined operation order information of the artificial neural network model processed by the processing element array ([0097], instruction stream indicating operating order of target neural network model),
wherein the processing element array is configured to perform MAC operations (Fig 1A, [0042], multiplication and accumulation operations).
Du fails to teach the memory being an SRAM or wherein the processing element array is configured to quantize and output the MAC operation result.
Roberts teaches a neural network processing unit comprising an SRAM memory ([0034], SRAM memory) and a processing element configured to quantize and output a MAC operation result (Fig 3, [0019], neural network operations including multiplication and addition of results, [0022], quantization of result values of floating point operations).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Du and Roberts to utilize quantization.  While Du discloses the performance of multiplication and accumulation operations, DU does not explicitly disclose the details of such operations.  However, one of ordinary skill in the art would understand that quantization may improve the performance of the neural network as disclosed by Roberts ([0022]).  As the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, the combination would have been obvious to one of ordinary skill in the art.
Du and Roberts fail to teach wherein the NPU scheduler is configured to reuse a memory address value which an operation value of a first layer of a first scheduling is stored as a memory address value corresponding to an input data of a second layer of a second scheduling, which is a next scheduling of the first scheduling, without a separate memory write operation for the input data of the second layer.
Bokhari teaches a neural network processing unit configured to reuse a memory address value which an operation value of a first layer of a first scheduling is stored as a memory address value corresponding to an input data of a second layer of a second scheduling, which is a next scheduling of the first scheduling, without performing a separate memory write operation to the SRAM memory ([0025], identify data of a CNN layer that can be reused without additional memory access, [0085], [0136], [0146], scheduling scheme analysis indicates reusable data that forgoes further memory accesses; reuse of data forgoes external memory access to write new data to SRAM local memory, instead reuse data already present in the local memory).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Du and Roberts with those of Bokhari to reuse data without additional memory access operations.  While Du and Roberts do not disclose the reusage of data within the neural network processor, one of ordinary skill in the art would understand that large parallel operations such as those required by convolutional neural network layers are often executed on repeated input data, and that improving the energy efficiency of the memory system of the processor is a routine and conventional result of not needing to issue additional access operations to the DRAM or another external memory, as taught explicitly by Bokhari.  Therefore, the combination would merely entail the simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.

Regarding claim 11, the combination of Du, Roberts, and Bokhari teaches the NPU of claim 10, wherein the processing element array includes a multiplier, an adder, an accumulator, and a bit quantization unit (Du [0042], multiply, add, and accumulate & Roberts [0022], quantization).

Regarding claim 12, the combination of Du, Roberts, and Bokhari teaches the NPU of claim 10, wherein the NPU scheduler is configured to recognize reusable variable values and reusable constant values based on predefined operation order information of the artificial neural network model and configured to control to reuse the SRAM memory using the reusable variable value and the reusable constant value (Du [0133-0139], locality based scheduling & Roberts [0024], [0031], [0048], [0062-0063], reuse of result values).

Regarding claim 21, the combination of Du, Roberts, and Bokhari teaches the NPU of claim 10, wherein the NPU scheduler is configured to schedule an operation order of the artificial neural network (ANN) model based on ANN data locality information (Du [0133-0139], scheduling operations according to an algorithm including locality-based scheduling).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Catthoor (US 2020/0159809) discloses a processor that reuses processing results and avoids storing or writing intermediate computational results to a memory array.
Bannon (US 2019/0026078) discloses a processor for processing operations through a local buffer without accessing an SRAM.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105. The examiner can normally be reached Monday-Friday 8:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J METZGER/             Primary Examiner, Art Unit 2183

Read full office action

Prosecution Timeline

Show 1 earlier event

Aug 07, 2025

Non-Final Rejection mailed — §103

Nov 02, 2025

Response Filed

Jan 12, 2026

Final Rejection mailed — §103

Feb 23, 2026

Response after Non-Final Action

Mar 26, 2026

Request for Continued Examination

Apr 01, 2026

Response after Non-Final Action

Apr 09, 2026

Non-Final Rejection mailed — §103

May 11, 2026

Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

18/337,723

Patent 12632303

ACCELERATOR, METHOD OF OPERATING THE SAME, AND ELECTRONIC DEVICE INCLUDING THE SAME

2y 11m to grant Granted May 19, 2026

18/814,641

Patent 12632252

MEMORY DEVICE AND METHOD WITH PROCESSING-IN-MEMORY BLOCK

1y 8m to grant Granted May 19, 2026

18/800,423

Patent 12619463

Thread Creation on Local or Remote Compute Elements by a Multi-Threaded, Self-Scheduling Processor

1y 8m to grant Granted May 05, 2026

18/889,148

Patent 12621126

SM3 HASH ALGORITHM ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

1y 7m to grant Granted May 05, 2026

17/514,549

Patent 12613700

ZERO EXTENDED 52-BIT INTEGER FUSED MULTIPLY ADD AND SUBTRACT INSTRUCTIONS

4y 6m to grant Granted Apr 28, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

90%

Grant Probability

98%

With Interview (+8.0%)

2y 7m (~5m remaining)

Median Time to Grant

High

PTA Risk

Based on 486 resolved cases by this examiner. Grant probability derived from career allowance rate.