Last updated: May 29, 2026
Application No. 17/944,014
PROCESSING UNIT ARCHITECTURES AND TECHNIQUES FOR REUSABLE INSTRUCTIONS AND DATA

Non-Final OA §103§112
Filed
Sep 13, 2022
Priority
Aug 21, 2020 — provisional 63/068,950 +1 more
Examiner
SADLER, NATHAN
Art Unit
2139
Tech Center
2100 — Computer Architecture & Software
Assignee
Memryx Incorporated
OA Round
6 (Non-Final)
Interview Optional

— +26.8% interview lift. Examiner has a relatively high allowance rate (71%); +26.8% interview lift. A written response may suffice.
Based on 672 resolved cases, 2023–2026
Examiner Intelligence

SADLER, NATHAN View full profile →
Grants 71% — above average
Career Allowance Rate
475 granted / 672 resolved
+15.7% vs TC avg
Strong +27% interview lift
Without
With
+26.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
26 currently pending
Career history
698
Total Applications
across all art units
Statute-Specific Performance

§101
1.8%
-38.2% vs TC avg
§103
81.5%
+41.5% vs TC avg
§102
6.1%
-33.9% vs TC avg
§112
4.8%
-35.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 672 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event a determination of the status of the application as subject to AIA  35 U.S.C. 102, 103, and 112 (or as subject to pre-AIA  35 U.S.C. 102, 103, and 112) is incorrect, any correction of the statutory basis for a rejection will not be considered a new ground of rejection if the prior art relied upon and/or the rationale supporting the rejection, would be the same under either status.  

Notice of Claim Interpretation
Claims in this application are not interpreted under 35 U.S.C. 112(f) unless otherwise noted in an office action.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8 December 2025 has been entered.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 1-11 and 14-21 are rejected under 35 U.S.C. 112(a) as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, at the time the application was filed, had possession of the claimed invention.
Claims 1 and 11 have been amended to include the phrase “wherein the first portion of weights are memory mapped into the first on-chip memory”.  Paragraph 0019 of the specification teaches “For example, the weights of a neural network model can be very large and therefore cannot fit into the first on-chip memory.  Therefore data allocation and memory mapping can be utilized to minimize accesses to external memory.”  The specification is silent as to whether the memory mapping is used for the first portion of weights or another portion of weights.  The specification is also silent as to where the weights are memory mapped.  Arpaci-Dusseau et al. (Operating Systems: Three Easy Pieces) teaches that memory mapping is used to map a file into a region of virtual memory (page 14, paragraph 1).  .  Arpaci-Dusseau further teaches that the file is not yet in physical memory but needs to be brough into memory through demand paging (page 14, paragraph 2).  Just because memory mapping is utilized does not mean that the first portion of the weights are memory mapped.  It could be that only files containing other portions of the weights are the only files memory mapped.  Similarly, just because memory mapping is utilized does not mean that the data is memory mapped into the first on-chip memory.  The files containing the weights could instead be memory mapped into virtual memory space and then only the portions of the weights needed could be paged into external memory.  That would still minimize accesses to external memory because not all of the weights would be read into external memory.
Claims 2-10 and 14-21 are rejected based on their dependence on claim 1 or 11.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Jacob et al. (Memory Systems: Cache, DRAM, Disk) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), and Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”).
In regards to claim 1, Jacob teaches a processing unit integrated circuit (IC) chip comprising:
a first on-chip memory configured to store reusable data and instructions (L3 cache, unified, On-chip, Table 6.3; “Locality optimizations exploit knowledge of the cache organization and operation; they either change the layout of data in memory or change the algorithm itself (and thus the order of accesses to the data), thereby improving the reuse of data and/or code in the cache.”, chapter 3, paragraph 3); and
a second on-chip memory configured to cache data and instructions stored in off-chip memory (L2 cache, unified, On-chip, Table 6.3; “In this section, we discuss the implementation of a static random-access memory (SRAM). This is the type of memory used as the building block of most caches because of its superior performance over other memory structures, specifically DRAM.”, section 5.2, paragraph 1); and
an on-chip compute circuitry (Integer and floating point units, figure 6.5).
Jacob fails to teach the reusable data and instructions including a first portion of weights, wherein the first portion of weights are memory mapped into the first on-chip memory; and
wherein the cache data includes a feature map, wherein the reusable data and instructions are stored in the first on-chip memory longer than the cache data and instructions in the second on-chip memory; and
the on-chip compute circuitry configured to compute a convolution of the feature map and the first portion of weights including one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map.
Whatmough teaches the reusable data and instructions including a first portion of weights (“In one embodiment, the weights are stored in writeable memory (volatile or non-volatile) on HA 170”, paragraph 0079; “Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.).”, paragraph 0077); and
wherein the cache data includes a feature map (“For a particular convolution layer implemented by a particular PE 190 of NVM HA 180, NoC interface 192 receives one or more input feature maps from the NoC, and stores the input feature maps in memory 193.”, paragraph 0087), wherein the reusable data and instructions are stored in the first on-chip memory longer than the cache data and instructions in the second on-chip memory (“Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.). In this manner, many different tasks, upgrades, improvements, etc., may be advantageously supported by a single CNN model using ‘delta-weight’ filters.”, paragraph 0077); and
the on-chip compute circuitry configured to compute a convolution of the feature map and the first portion of weights (“AMAC array 194 receives the input feature maps from memory 193 and performs a ‘full’ or normal convolution operation 410 on the input feature maps. This convolution operation uses the fixed weights that are represented within AMAC array 194.”, paragraph 0087)
in order to improve access to the weights (paragraph 0002).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jacob with Whatmough such that the reusable data and instructions including a first portion of weights; and
wherein the cache data includes a feature map, wherein the reusable data and instructions are stored in the first on-chip memory longer than the cache data and instructions in the second on-chip memory; and
the on-chip compute circuitry configured to compute a convolution of the feature map and the weights
in order to improve access to the weights (id.).
Jacob in view of Whatmough fails to teach that the first portion of weights are memory mapped into the first on-chip memory; and
the on-chip compute circuitry configured to compute a convolution of the feature map and the weights including one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map.
Banville teaches that data is memory mapped into the first on-chip memory (“Advantageously, at least a portion of the shared memory, such as, for example, at least a portion of the shared L2 cache, at least a portion of the shared L3 cache, etc., is configurable to be memory-mapped static random-access memory (SRAM). Shared memory, such as memory-mapped SRAM, may be used to store data, intra and inter core communication messages, etc.”, paragraph 0013) “in order to more efficiently process data” (paragraph 0014).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jacob with Whatmough and Banville such that the first portion of weights are memory mapped into the first on-chip memory “in order to more efficiently process data” (id.).
Jacob in view of Whatmough and Banville fails to teach the on-chip compute circuitry configured to compute a convolution of the feature map and the weights including one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map.  Siu teaches the on-chip compute circuitry configured to compute a convolution of the feature map and the weights including one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map (“In the inner loops (K, S, R, C) of Order 1 (Input-Major Order), each input window of size R × S × C is multiplied element-wise with each of the K filters, producing a 1 × 1 strip of output activations of length K. This computation is repeated for each output position, where the input window is shifted by a stride of m.”, section III(A), paragraph 2; “In Order 2 (Filter-Major Order), one filter is convolved across the entire input activation block, producing an output plane of size P × Q. This computation is repeated for each of the K filters. In this order, the input activation values are re-accessed multiple times for each filter.”, section III(A), paragraph 3) “to employ a computation order that exhibits temporal locality” (section III(A), paragraph 1).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jacob with Whatmough, Banville, and Siu such that the on-chip compute circuitry configured to compute a convolution of the feature map and the weights including one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map “to employ a computation order that exhibits temporal locality” (id.).
In regards to claim 2, Jacob further teaches that the second on-chip memory comprises on-chip volatile memory (L2 cache, On-chip, Table 6.3; “In this section, we discuss the implementation of a static random-access memory (SRAM). This is the type of memory used as the building block of most caches because of its superior performance over other memory structures, specifically DRAM.”, section 5.2, paragraph 1).
In regards to claim 3, Jacob further teaches that the on-chip volatile memory comprises on-chip static random access memory (SRAM) (L2 cache, On-chip, Table 6.3; “In this section, we discuss the implementation of a static random-access memory (SRAM). This is the type of memory used as the building block of most caches because of its superior performance over other memory structures, specifically DRAM.”, section 5.2, paragraph 1).
In regards to claim 4, Jacob further teaches that the first on-chip memory is further configured to update the stored reusable data and instructions from the off-chip memory (“A write-update policy will transparently update the various cached copies of a block: if processor A writes to a block, the write data will find its way into all cached copies of that block.”, page 241, paragraph 6).
In regards to claim 5, Jacob further teaches that the first on-chip memory is further configured to update the stored reusable data and instructions with run-time instructions from the compute circuitry (“A write-update policy will transparently update the various cached copies of a block: if processor A writes to a block, the write data will find its way into all cached copies of that block.”, page 241, paragraph 6).
In regards to claim 7, Whatmough further teaches that the second on-chip memory configured to cache a portion of the feature map (“For a particular convolution layer implemented by a particular PE 190 of NVM HA 180, NoC interface 192 receives one or more input feature maps from the NoC, and stores the input feature maps in memory 193.”, paragraph 0087) and a second portion of the weights (“any delta weights for a particular convolutional layer are stored in local memory, i.e., memory 193, of the corresponding PE 190”, paragraph 0085; “Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.).”, paragraph 0077)
In regards to claim 9, Whatmough further teach that the first on-chip memory comprises on-chip non-volatile memory (“In one embodiment, the weights are stored in writeable memory (volatile or non-volatile) on HA 170”, paragraph 0079; “To improve access, the weights may be stored in a memory that is located closer to the ANN processor, such as on-chip non-volatile memory (NVM) including, for example, flash memory, read-only memory (ROM), etc.”, paragraph 0002).
In regards to claim 10, Whatmough further teaches that the on-chip non-volatile memory comprises a non-volatile memory selected from a group consisting of resistive random-access memory (RRAM), flash memory, and magnetoresistive random-access memory (MRAM) (“To improve access, the weights may be stored in a memory that is located closer to the ANN processor, such as on-chip non-volatile memory (NVM) including, for example, flash memory, read-only memory (ROM), etc.”, paragraph 0002).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Jacob et al. (Memory Systems: Cache, DRAM, Disk) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”), and Gokmen (US 2017/0124025).
In regards to claim 6, Jacob in view of Whatmough, Banville, and Siu teaches claim 1.  Jacob in view of Whatmough, Banville, and Siu fails to teach that the processing unit IC chip comprises a resistive processing unit (RPU).  Gokmen teaches that the processing unit IC chip comprises a resistive processing unit (RPU) (“In accordance with the present principles, computer architectures are provided where single resistive cross point devices are employed as processing units to accelerate computational operations for applications, such as, e.g., neural network training algorithms and matrix operations. The single resistive cross point devices called resistive processing units (RPUs) can be organized so that the RPUs become programmable.”, paragraph 0017) “to accelerate computational operations for applications” (id.).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jacob with Whatmough, Banville, Siu, and Gokmen such that the processing unit IC chip comprises a resistive processing unit (RPU) “to accelerate computational operations for applications” (id.).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Jacob et al. (Memory Systems: Cache, DRAM, Disk) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”), and Anonymous (“A Simple Method to Reduce Off-chip Memory Accesses on Convolutional Neural Networks”).
In regards to claim 8, Jacob in view of Whatmough, Banville, and Siu teaches claim 7.  Jacob in view of Whatmough, Banville, and Siu fails to teach that an allocation of the first portion of the weights and the second portion of the weights is based on one or more of a computation order, partition scheme, a layer fusion and a skip-connection of an artificial intelligence model and storage of the weights in the off-chip memory.  Anonymous teaches that an allocation of the first portion of the weights and the second portion of the weights is based on one or more of a computation order, partition scheme, a layer fusion and a skip-connection of an artificial intelligence model and storage of the weights in the off-chip memory (“First, we skip modules configured by a long skip-connection since it is extremely difficult to manage them within on-chip memory efficiently.”, section 3, paragraph 3) in order “to reduce off-chip memory accesses” (abstract).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jacob with Whatmough, Banville, Siu, and Anonymous such that an allocation of the first portion of the weights and the second portion of the weights is based on one or more of a computation order, partition scheme, a layer fusion and a skip-connection of an artificial intelligence model and storage of the weights in the off-chip memory in order “to reduce off-chip memory accesses” (id.).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US 2020/0285942) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), and Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”).
In regards to claim 11, Yan teaches a system comprising:
an off-chip memory configured to store weights and a feature map (“Specifically, the accelerator 210 may read input data (e.g., input feature maps and weights) from the memory 240, e.g., read into an on-chip memory (on-chip cache) in the accelerator 210, and process the input data, e.g., performing convolution on the input data, Bias Activation Pooling (BAP) operations to obtain output data, and storing the output data in the memory 240.”, paragraph 0029); and
a processing unit integrated circuit (IC) chip (“The accelerator 210 and the processor 220 are disposed on-chip, and can access the off-chip memory 240 through the interconnection 230.”, paragraph 0027) including;
an on-chip memory configured to store portions of the weights and cache a portion of the feature map (“Specifically, the accelerator 210 may read input data (e.g., input feature maps and weights) from the memory 240, e.g., read into an on-chip memory (on-chip cache) in the accelerator 210, and process the input data, e.g., performing convolution on the input data, Bias Activation Pooling (BAP) operations to obtain output data, and storing the output data in the memory 240.”, paragraph 0029); and
an on-chip compute circuitry (“The accelerator 210 is configured to implement data processing.”, paragraph 0029) configured to execute an artificial intelligence model (“The technical solution of the embodiments of the present disclosure can be applied to various neural networks, such as CNN”, paragraph 0024) including computation of a convolution of the feature set and the weights (“Specifically, the accelerator 210 may read input data (e.g., input feature maps and weights) from the memory 240, e.g., read into an on-chip memory (on-chip cache) in the accelerator 210, and process the input data, e.g., performing convolution on the input data, Bias Activation Pooling (BAP) operations to obtain output data, and storing the output data in the memory 240.”, paragraph 0029)
Yan fails to teach the on-chip memory comprising: 
a first on-chip memory configured to store a first portion of the weights; and
a second on-chip memory configured to cache a portion of the feature map and a second portion of the weights; and
the computation of the convolution of the feature map and the weights includes one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map, wherein the selected one of the output retaining order, input retaining order or weight retaining order and a partition scheme of the feature map is selected to reduce external memory access to the off-chip memory as a function of one or more constraints including a feature map size, a weights size, size of the second on-chip memory and a range of axes of the partitions.
Whatmough teaches the on-chip memory comprising:
a first on-chip memory configured to store a first portion of the weights (“In one embodiment, the weights are stored in writeable memory (volatile or non-volatile) on HA 170”, paragraph 0079; “Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.).”, paragraph 0077); and
a second on-chip memory configured to cache a portion of the feature map (“For a particular convolution layer implemented by a particular PE 190 of NVM HA 180, NoC interface 192 receives one or more input feature maps from the NoC, and stores the input feature maps in memory 193.”, paragraph 0087) and a second portion of the weights (“any delta weights for a particular convolutional layer are stored in local memory, i.e., memory 193, of the corresponding PE 190”, paragraph 0085; “Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.).”, paragraph 0077)
in order to improve access to the weights (paragraph 0002).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough such that the on-chip memory comprising:
a first on-chip memory configured to store a first portion of the weights; and
a second on-chip memory configured to cache a portion of the feature map and a second portion of the weights
in order to improve access to the weights (id.).
Yan in view of Whatmough fails to teach that the first portion of weights are memory mapped into the first on-chip memory; and
the computation of the convolution of the feature map and the weights includes one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map, wherein the selected one of the output retaining order, input retaining order or weight retaining order and a partition scheme of the feature map is selected to reduce external memory access to the off-chip memory as a function of one or more constraints including a feature map size, a weights size, size of the second on-chip memory and a range of axes of the partitions.
Banville teaches that data is memory mapped into the first on-chip memory (“Advantageously, at least a portion of the shared memory, such as, for example, at least a portion of the shared L2 cache, at least a portion of the shared L3 cache, etc., is configurable to be memory-mapped static random-access memory (SRAM). Shared memory, such as memory-mapped SRAM, may be used to store data, intra and inter core communication messages, etc.”, paragraph 0013) “in order to more efficiently process data” (paragraph 0014).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough and Banville such that the first portion of weights are memory mapped into the first on-chip memory “in order to more efficiently process data” (id.).
Yan in view of Whatmough and Banville fails to teach the computation of the convolution of the feature map and the weights includes one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map, wherein the selected one of the output retaining order, input retaining order or weight retaining order and a partition scheme of the feature map is selected to reduce external memory access to the off-chip memory as a function of one or more constraints including a feature map size, a weights size, size of the second on-chip memory and a range of axes of the partitions.  Siu teaches the computation of the convolution of the feature map and the weights includes one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map (“In the inner loops (K, S, R, C) of Order 1 (Input-Major Order), each input window of size R × S × C is multiplied element-wise with each of the K filters, producing a 1 × 1 strip of output activations of length K. This computation is repeated for each output position, where the input window is shifted by a stride of m.”, section III(A), paragraph 2; “In Order 2 (Filter-Major Order), one filter is convolved across the entire input activation block, producing an output plane of size P × Q. This computation is repeated for each of the K filters. In this order, the input activation values are re-accessed multiple times for each filter.”, section III(A), paragraph 3), wherein the selected one of the output retaining order, input retaining order or weight retaining order and a partition scheme of the feature map is selected to reduce external memory access to the off-chip memory as a function of one or more constraints including a feature map size, a weights size, size of the second on-chip memory and a range of axes of the partitions (“However, the order of computation affects when the operands (activations and weights) are accessed from memory. Therefore, it is necessary to employ a computation order that exhibits temporal locality of the operand values. Figure 3 shows two orders for the computation that enable such reuse of the weights and input activations respectively.”, section III(A), paragraph 1) “to employ a computation order that exhibits temporal locality” (section III(A), paragraph 1).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough, Banville, and Siu such that the computation of the convolution of the feature map and the weights includes one of an output retaining order, an input retaining order or a weight retaining order partial product computation of partitions of the feature map, wherein the selected one of the output retaining order, input retaining order or weight retaining order and a partition scheme of the feature map is selected to reduce external memory access to the off-chip memory as a function of one or more constraints including a feature map size, a weights size, size of the second on-chip memory and a range of axes of the partitions “to employ a computation order that exhibits temporal locality” (id.).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US 2020/0285942) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”), and Ferdman et al. (US 2019/0220734).
In regards to claim 14, Yan in view of Whatmough, Banville, and Siu teaches claim 12.  Yan in view of Whatmough, Banville, and Siu fails to teach that an allocation of the first portion of the weights and the second portion of the weights is further based on a layer fusion of the artificial intelligence model.  Ferdman teaches that an allocation of the first portion of the weights and the second portion of the weights is further based on a layer fusion of the artificial intelligence model (“Only the output feature maps 104 of the last fused layer 114 are written off chip to external memory 106. More specifically, an output data region of intermediate feature maps computed and outputted by a convolutional layer depends only on an input data region of input feature maps that are inputted to that convolutional layer. The exploitation of this data locality in the fused layer dataflow of the fused layer convolutional neural network 100 allows the data to be passed directly from one convolutional layer to the next, without writing and reading the intermediate data to and from the external memory 106.”, paragraph 0059) thereby “saving energy and/or saving computing memory during network training for CNN implementations” (paragraph 0010).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough, Siu, and Ferdman such that an allocation of the first portion of the weights and the second portion of the weights is further based on a layer fusion of the artificial intelligence model thereby “saving energy and/or saving computing memory during network training for CNN implementations” (id.).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US 2020/0285942) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”), and Anonymous (“A Simple Method to Reduce Off-chip Memory Accesses on Convolutional Neural Networks”).
In regards to claim 15, Yan in view of Whatmough, Banville, and Siu teaches claim 12.  Yan in view of Whatmough, Banville, and Siu fails to teach that the allocation of the first portion of the weights and the second portion of the weights is further based on a skip-connection.  Anonymous teaches that the allocation of the first portion of the weights and the second portion of the weights is further based on a skip-connection (“First, we skip modules configured by a long skip-connection since it is extremely difficult to manage them within on-chip memory efficiently.”, section 3, paragraph 3) in order “to reduce off-chip memory accesses” (abstract).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough, Banville, Siu, and Anonymous such that the allocation of the first portion of the weights and the second portion of the weights is further based on a skip-connection in order “to reduce off-chip memory accesses” (id.).

Claims 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US 2020/0285942) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”), and Jacob et al. (Memory Systems: Cache, DRAM, Disk).
In regards to claim 16, Whatmough further teaches that the first on-chip memory comprises on-chip non-volatile memory (“Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.).”, paragraph 0077); and
the second on-chip memory comprises on-chip volatile memory (“Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.).”, paragraph 0077).
Yan in view of Whatmough, Banville, and Siu fails to teach that the off-chip memory comprises off-chip volatile memory.  Jacob teaches that the off-chip memory comprises off-chip volatile memory (“DRAM: DRAM provides a random-access storage that is relatively large, relatively fast, and relatively cheap. It is large and cheap compared to cache, and it is fast compared to disk.”, page 3, paragraph 5) because “it is just fast enough and just cheap enough to act as an operating store” (page 3, paragraph 5).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough, Banville, Siu, and Jacob such that the off-chip memory comprises off-chip volatile memory because “it is just fast enough and just cheap enough to act as an operating store” (id.).
In regards to claim 17, Whatmough further teaches that the on-chip non-volatile memory comprises a non-volatile memory selected from a group consisting of resistive random-access memory (RRAM), flash memory, and magnetoresistive random-access memory (MRAM) (“To improve access, the weights may be stored in a memory that is located closer to the ANN processor, such as on-chip non-volatile memory (NVM) including, for example, flash memory, read-only memory (ROM), etc.”, paragraph 0002); and
the on-chip volatile memory comprises on-chip static random-access memory (SRAM) (“Advantageously, the weights of the filters with respect to task A may be stored in read-only memory (e.g., ROM, etc.), while the ‘delta weights’ of each task-specific filter may be stored in writeable memory (e.g., flash, SRAM, etc.).”, paragraph 0077).
Jacob further teaches that the off-chip volatile memory comprises off-chip dynamic random-access memory (DRAM) (“DRAM: DRAM provides a random-access storage that is relatively large, relatively fast, and relatively cheap. It is large and cheap compared to cache, and it is fast compared to disk.”, page 3, paragraph 5).
In regards to claim 18, Yan in view of Whatmough, Banville, and Siu teaches claim 11.  Yan in view of Whatmough, Banville, and Siu fails to teach that the first on-chip memory is further configured to store a first portion of instructions of the artificial intelligence model; and
the second on-chip memory is further configured to cache a second portion of instructions of the artificial intelligence model.
Jacob teaches that the first on-chip memory is further configured to store a first portion of the instructions of the artificial intelligence model (L3 cache, unified, On-chip, Table 6.3); and
the second on-chip memory is further configured to cache a second portion of instructions of the artificial intelligence model (L2 cache, unified, On-chip, Table 6.3) 
“to hide the latency to the backing store” (page 130, paragraph 6).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough, Siu, and Jacob such that the first on-chip memory is further configured to store a first portion of the instructions of the artificial intelligence model; and
the second on-chip memory is further configured to cache a second portion of instructions of the artificial intelligence model
“to hide the latency to the backing store” (id.).
In regards to claim 19, Jacob further teaches that the first on-chip memory is further configured for updating one or both of the first portion of the weights and the first portion of the instructions of the artificial intelligence model from the off-chip memory (“A write-update policy will transparently update the various cached copies of a block: if processor A writes to a block, the write data will find its way into all cached copies of that block.”, page 241, paragraph 6).
In regards to claim 20, Jacob further teaches that the first on-chip memory is further configured for updating one or both of the first portion of the weights and the first portion of the instructions of the artificial intelligence model with run-time instructions from the compute circuitry (“A write-update policy will transparently update the various cached copies of a block: if processor A writes to a block, the write data will find its way into all cached copies of that block.”, page 241, paragraph 6).

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US 2020/0285942) in view of Whatmough et al. (US 2021/0192323), Banville (US 2016/0173451), Siu et al. (“Memory Requirements for Convolutional Neural Network Hardware Accelerators”), and Choi et al. (“Topology Preserving Neural Networks that Achieve a Prescribed Feature Map Probability Density Distribution”).
In regards to claim 21, Yan in view of Whatmough, Banville, and Siu teaches claim 11.  Yan in view of Whatmough, Banville, and Siu fails to teach that the off-chip memory stores the feature map based on a mapping of the feature map to a topology of the artificial intelligence model.  Choi teaches that the off-chip memory stores the feature map based on a mapping of the feature map to a topology of the artificial intelligence model (“The network is trained from a sequence of samples of the input variable u(t). The ordered image formed after convergence is commonly denoted as a topology preserving feature map, as it preserves some notion of the proximity of the input signal features.”, section 1, paragraph 2) in order that the output weights converge (section 1, paragraph 4).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yan with Whatmough, Banville, Siu, and Choi such that the off-chip memory stores the feature map based on a mapping of the feature map to a topology of the artificial intelligence model in order that the output weights converge (id.).

Response to Arguments
Applicant’s arguments with respect to the claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Zheng (US 12,443,832) teaches storing weights and instructions in on-chip memory.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NATHAN SADLER whose telephone number is (571)270-7699. The examiner can normally be reached Monday - Friday 8am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached at (571)272-4204. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Nathan Sadler/Primary Examiner, Art Unit 2139                                                                                                                                                                                                        9 February 2026
Read full office action
Prosecution Timeline

Show 7 earlier events
Feb 13, 2025
Request for Continued Examination
Feb 14, 2025
Response after Non-Final Action
May 21, 2025
Non-Final Rejection mailed — §103, §112
Aug 20, 2025
Response Filed
Sep 08, 2025
Final Rejection mailed — §103, §112
Dec 08, 2025
Request for Continued Examination
Dec 19, 2025
Response after Non-Final Action
Feb 11, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/209,089
Patent 12638997
METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR DATA MIGRATION
2y 11m to grant Granted May 26, 2026
18/766,226
Patent 12632376
MEMORY SYSTEMS AND OPERATION METHODS THEREOF AND STORAGE DEVICES AND OPERATION METHODS THEREOF
1y 10m to grant Granted May 19, 2026
18/941,954
Patent 12632188
INITIALIZING MEMORY SYSTEMS
1y 6m to grant Granted May 19, 2026
18/797,618
Patent 12625699
APPARATUS AND METHODS FOR AUTOMATICALLY REMOVING DATA OF MOBILE DEVICES
1y 9m to grant Granted May 12, 2026
18/794,937
Patent 12619534
INTERCONNECT BASED ADDRESS MAPPING FOR IMPROVED RELIABILITY
1y 9m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

6-7
Expected OA Rounds
71%
Grant Probability
98%
With Interview (+26.8%)
2y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 672 resolved cases by this examiner. Grant probability derived from career allowance rate.