Last updated: April 19, 2026
Application No. 18/331,734
LARGE TENSOR TILING

Non-Final OA §101§103§112
Filed
Jun 08, 2023
Examiner
MAIDO, MAGGIE T
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
This examiner grants 64% of cases after interview

— +20.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 36 resolved cases, 2023–2026
Examiner Intelligence

MAIDO, MAGGIE T View full profile →
Grants 64% of resolved cases
Career Allow Rate
23 granted / 36 resolved
+8.9% vs TC avg
Strong +21% interview lift
Without
With
+20.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
51 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
25.6%
-14.4% vs TC avg
§103
56.1%
+16.1% vs TC avg
§102
2.6%
-37.4% vs TC avg
§112
15.3%
-24.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION

This action is responsive to claims filed on 8 June 2023.
Claims 1-20 are pending for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 and analogous claims 11 and 17 recites the limitation "the input sensor data" in line 10. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term "the input sensor data" has been construed to be “input sensor data”. Claims 2-10, 12-16, 18-20, which are dependent on claims 1, 11, 17, are similarly rejected.
Claim 18 recites the limitation "the PE weight memories" in line 2. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term "the PE weight memories" has been construed to be “PE weight memories”. Claim 19, which is dependent on claim 18, is similarly rejected.
Claim 18 recites the limitation "the input tiles" in line 7. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term "the input tiles" has been construed to be “input tiles”. Claim 19, which is dependent on claim 18, is similarly rejected.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception, abstract idea, without significantly more.
Step 1: This part of the eligibility analysis evaluates whether the claim(s) falls within any statutory
category. MPEP 2106.03:
According to the first part of the Alice analysis, in the instant case, the claims were determined
to be directed to one of the four statutory categories: an article of manufacture, a method/process (Claims 11-16), a machine/system/product (Claims 1-10, 17-20), and a composition of matter. Based on the claims being determined to be within of the four categories (i.e., process, machine, manufacture, or composition of matter), (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea).
Step 2A Prong One: This part of the eligibility analysis evaluates whether the claim(s) recites a
judicial exception. 
Regarding independent claims 1, 11, 17, the claims recite a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG) without significantly more (Step-2A: Prong One). The applicant's claim limitations under broadest reasonable interpretation covers activities classified under mental processes - concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (see MPEP § 2106.04(a)(2), subsection Ill) and the 2019 PEG. As evaluated below:

Claims 1, 11, 17:
“the data router configured to: determine a split of the input tensor into a plurality of tiles based on the array of interconnected PEs and dimensions of the input tensor” (mental process of judgement) 
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“a systolic array comprising an array of interconnected processing elements (PEs), each PE associated with a PE data memory configured to store at least a portion of a tensor”
“split the input tensor into the plurality of tiles, including a first tile and a second tile overlapping a shared edge, by routing the input tensor data to the PE data memories that store the plurality of tiles”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“a data router configured to perform tensor tiling of an input tensor”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements directed to mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Third, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Lastly, the claims directed to data gathering activity as noted above, are deemed directed to an insignificant extra-solution activity. The courts have found these types of limitations insufficient to
qualify as "significantly more", see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claims 1, 11, 17 do not recite what the courts have identified as "significantly more".

Furthermore, regarding dependent claims 2-10, which depend from claim 1, claims 12-16, which depend from claim 11, claims 18-20, which depend from claim 17, the claims are directed to a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG, a law of nature, or a natural phenomenon) without significantly more as highlighted below in the claim limitations by evaluating the claim limitations under the Step2A and 2B:

Claim 2:
Incorporates the rejection of claim 1.
“an input handler configured to provide an indication of the determined split to the data router”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 3:
Incorporates the rejection of claim 1.
“wherein each PE is associated with a PE convolution engine configured to perform a convolution on a respective portion of a tile stored in the associated PE data memory”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 4:
Incorporates the rejection of claim 3.
“a systolic controller configured to control each of the PE convolution engines to perform the convolution on the respective portion of the tile stored in the associated PE data memory based on the split”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claims 5, 13, 19:
Incorporates the rejections of claims 3, 12, 18, respectively.
“wherein the PE convolution engine is configured to perform the convolution on respective portions of multiple tiles stored in the associated PE data memory by reusing data in the associated PE data memory that overlaps the shared edge”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claims 6, 14, 20:
Incorporates the rejections of claims 1, 11, 17, respectively.
“wherein the data router is configured to route the input tensor to the PE data memories that store the plurality of tiles with data along the shared edge duplicated in a first PE data memory associated with a first PE and in a second PE data memory associated with a second PE”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claims 7, 15:
Incorporates the rejections of claims 1, 11, respectively.
“wherein the data router is further configured to transpose the plurality of tiles in the PE data memories by storing tile rows as columns in the PE data memories”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claims 8, 16:
Incorporates the rejections of claims 1, 11, respectively.
“wherein each PE is further associated with a PE weight memory and wherein the data router is further configured to route weights to the PE weight memories based on the routing of the input tensor to store the plurality of tiles in the PE data memories”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 9:
Incorporates the rejection of claim 1.
“wherein the data router comprises a hardware-implemented algorithm”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 10:
Incorporates the rejection of claim 1.
“wherein the systolic array comprises a scalable array of interconnected PEs”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 12:
Incorporates the rejection of claim 11.
“further comprising: performing a convolution on the input tensor by performing, by a PE convolution engine associated with each PE, a convolution on respective portions of the input tiles stored in the associated PE data memory”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 18:
Incorporates the rejection of claim 17.
“wherein the data router is further configured to route weights to the PE weight memories based on the routing of the input tensor to store the plurality of tiles in the PE data memories”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“wherein each PE is associated with a PE convolution engine configured to perform a convolution on the input tensor by performing a convolution on respective portions of the input tiles stored in the associated PE data memory with the weights stored in the associated PE weight memories”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to instructions for mere data gathering or data output or directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

The dependent claims as analyzed above, do not recite limitations that integrated the judicial exception into a practical application. In addition, the claim limitations do not include additional elements that are sufficient to amount to significantly more than the judicial exception (Step-2B). Therefore, the claims do not recite any limitations, when considered individually or as a whole, that recite what have the courts have identified as "significantly more", see MPEP 2106.05; and therefore, as a whole the claims are not patent eligible. As shown above, the dependent claims do not provide any additional elements that when considered individually or as an ordered combination, amount to significantly more than the abstract idea identified. Therefore, as a whole, the dependent claims do not recite what have the courts have identified as "significantly more" than the recited judicial exception. Therefore, claims 2-10, 12-16, 18-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception and does not recite, when claim elements are examined individually and as a whole, elements that the courts have identified as "significantly more" than the recited judicial exception.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al. (U.S. Patent No. 11494321, hereinafter ‘Yu'), in view of Taba et al. (U.S. Pre-Grant Publication No. 20200104718, hereinafter 'Taba'). 

	Regarding claim 1 and analogous claims 11, 17, Yu teaches A computing system, comprising:
a systolic array comprising an array of interconnected processing elements (PEs), each PE associated with a PE data memory configured to store at least a portion of a tensor ([Col. 18, Lines 33-50] FIG. 5 illustrates an example of a computing system 500 according to certain embodiments. In the illustrated example, computing system 500 includes a DMA engine 550, system memory 520, and one or more accelerators 502-1 to 502-m. Computing system 500 may include other components not specifically shown, such as a host processor. Accelerators 502-1 may be a neural network accelerator (e.g., a neural network processor or tensor processing unit), and may include a a systolic array comprising an array of interconnected processing elements (PEs) processing element array 510-1 (e.g., a systolic array), a state buffer 504-1, and a result buffer 512-1 as described above with respect to FIG. 3. Processing element array 510-1 may include an array of processing elements arranged in rows and columns. each PE associated with a PE data memory configured to store at least a portion of a tensor Each processing element is capable of performing a multiply-and-add operation. State buffer 504-1 may be used to store input data such as feature map values and weight values for processing element array 510-1, and/or may be used to store intermediate outputs that may be used in subsequent layers.); and
	a data router configured to perform tensor tiling of an input tensor, the data router configured to: determine a split of the input tensor into a plurality of tiles based on the array of interconnected PEs and dimensions of the input tensor ([Col. 20, Lines 45-57] The a data router configured to perform tensor tiling of an input tensor tensor copy instructions described above may be performed by a processing engine of the computing system, such as pooling engine 318 or activation engine 316 of accelerator 302. Pooling engine 318 may read from a first region of the state buffer (e.g., memory subsystem 304) and write the read data to a second region in the state buffer. The first region and the second region may have the same two-dimensional shape (e.g., the same number of partitions and the same number of bytes in each partition), or may have different two-dimensional shapes, such as different numbers of partitions and different numbers of elements (e.g., bytes) in each partition, but may have the same total size (number of partitions×number of elements in each partition).; [Col. 23, Line 65-Col. 24, Line 11] According to techniques disclosed herein, before tensor 714 is written or loaded into state buffer 710 during period P1, a processing engine 730 (e.g., pooling engine 318) may execute a tensor copy instruction to read tensor 712 from state buffer 710, reshape tensor 712 into a tensor 720, and save tensor 720 to state buffer 710. determine a split of the input tensor into a plurality of tiles based on the array of interconnected PEs and dimensions of the input tensor Tensor 720 may have the same total size (and the same content) as tensor 712, but may have a shape different from a shape of tensor 712. For example, tensor 720 may include more partitions than tensor 712 but may have few elements in each partition compared with tensor 712. In one example, tensor 712 may have a shape of 32 partitions×256 elements/partition, whereas tensor 720 may have a shape of 128 partitions×64 elements/partition.); and
	Yu fails to teach split the input tensor into the plurality of tiles, including a first tile and a second tile overlapping a shared edge, by routing the input tensor data to the PE data memories that store the plurality of tiles.
	Taba teaches split the input tensor into the plurality of tiles, including a first tile and a second tile overlapping a shared edge, by routing the input tensor data to the PE data memories that store the plurality of tiles ([0019] FIG. 14 illustrates exemplary spiral paths for a 5×5 input tensor according to embodiments of the present disclosure.; [0072] Referring to FIG. 14, split the input tensor into the plurality of tiles exemplary spiral paths for a 5×5 input tensor is illustrated. In these examples, various shortcut paths are introduced for convolutions including a first tile and a second tile overlapping a shared edge overlapping a boundary.; [0037] With reference now to FIG. 1, a neural core according to embodiments of the present disclosure is depicted. A neural core 100 is a tileable computational unit that computes one block of an output tensor. A neural core 100 has M inputs and N outputs. In various embodiments, M=N. To compute an output tensor block, a by routing the input tensor data to the PE data memories that store the plurality of tiles neural core multiplies an M×1 input tensor block 101 with an M×N weight tensor block 102 and accumulates the products into weighted sums that are stored in a 1×N intermediate tensor block 103. A 0×N parameter tensor block contains the 0 parameters that specify each of the N neuron activation functions that are applied to the intermediate tensor block 103 to produce a 1×N output tensor block 105.).
Yu and Taba are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Yu, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Taba to Yu before the effective filing date of the claimed invention in order to decompose the computation of the output data tensor into smaller problems, solved on one or more neural core (cf. Taba, [0036] In various embodiments, computation of the output data tensor as described above is decomposed into smaller problems. Each problem may then be solved on one or more neural core, or on one or more core of a conventional multicore system in parallel.).

Regarding claim 2, Yu, as modified by Taba, teaches The computing system of claim 1.
	Yu teaches further comprising: an input handler configured to provide an indication of the determined split to the data router ([Col. 3, Lines 23-28] The an input handler configured to provide an indication of the determined split to the data router memory allocation, tensorization (converting a large tensor operation into multiple smaller tensor operations), and data transfer may be determined by a compiler that generates and schedules instructions to be executed by the neural network processor and other execution engines of a computing system to implement a neural network model.).
Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 3, Yu, as modified by Taba, teaches The computing system of claim 1.
	Yu teaches wherein each PE is associated with a PE convolution engine configured to perform a convolution on a respective portion of a tile stored in the associated PE data memory ([Col. 10, Line 64-Col. 11, Line 8] Processing element array 310 is the computation matrix of accelerator 302. each PE is associated with a PE convolution engine configured to perform a convolution on a respective portion of a tile stored in the associated PE data memory  Processing element array 310 can, for example, execute parallel integration, convolution, correlation, and/or matrix multiplication, among other things. Processing element array 310 may include multiple processing elements 311, arranged in rows and columns, such that results output by one processing element 311 can be input directly into another processing element 311. Processing elements 311 that are not on the outside edges of processing element array 310 thus can receive data to operate on from other processing elements 311, rather than from memory subsystem 304.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 4, Yu, as modified by Taba, teaches The computing system of claim 3.
	Yu teaches further comprising a systolic controller configured to control each of the PE convolution engines to perform the convolution on the respective portion of the tile stored in the associated PE data memory based on the split ([Col. 15, Lines 1-20] a systolic controller configured to control each of the PE convolution engines to perform the convolution on the respective portion of the tile stored in the associated PE data memory based on the split The compiler may traverse the data flow graph and perform shape inference on the neural network model, for example, to determine the sizes of the data used for each operation. The compiler may add, to the neural network model, operations for padding the input feature map for each input channel, based on parameters of a convolution operation, such as the size of an original input feature map, the size of a filter (e.g., kernel), the stride used for the convolution, the memory alignment, and the size of the processing element array. Optionally, the compiler may add to the neural network model operations for dividing input data into multiple partitions and dividing the convolution operation into multiple sub-operations. The compiler may map the operations of the neural network model to the computing system, such as memory subsystem 304 and processing element array 310 in accelerator 302, pooling engines 318, activation engines 316, DMA engines (not shown in FIG. 3), and the like, and generate and schedule instructions to be performed by these different components of the computing system.; [Col. 3, Lines 23-37] The memory allocation, tensorization (transforming a large tensor operation to multiple smaller tensor operations), and data transfer may be determined by a compiler that generates and schedules instructions to be executed by the neural network processor and other execution engines of a computing system to implement a neural network model. The instructions may generally include instructions for memory load operations that read input data (e.g., input feature maps) and static variables (e.g., weights, such as filter tensors for a convolutional neural network), instructions for computation operations that use the input data and the static variables to perform arithmetic operations, and memory save operations that save outputs (e.g., intermediate results) of the computation operations to the system memory to make room for other input tensors for other operations.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 5 and analogous claims 13, 19, Yu, as modified by Taba, teaches The computing system of claim 3, The method of claim 12, and The NPU of claim 18, respectively.
	Taba teaches wherein the PE convolution engine is configured to perform the convolution on respective portions of multiple tiles stored in the associated PE data memory by reusing data in the associated PE data memory that overlaps the shared edge ([0072] Referring to FIG. 14, exemplary spiral paths for a 5×5 input tensor is illustrated. In these examples, various shortcut paths are introduced for convolutions overlapping a boundary. Locations closer to the boundary incur more boundary crossings, each of which requires another partial sum to be cached. Accordingly, handling different boundary conditions requires cores to run different microcode.; [0088] Referring now to FIG. 21, a method of data distribution in an array of neural network cores is illustrated according to embodiments of the present disclosure. At 2101, by each neural core of an array of neural cores, wherein the PE convolution engine is configured to perform the convolution on respective portions of multiple tiles stored in the associated PE data memory by reusing data in the associated PE data memory that overlaps the shared edge a weight tensor is applied to a plurality of input activations by traversing a weight tensor according to a programmable path to compute partial sums. At 2102, partial sums are communicated to at least one adjacent neural core within the array via a network. At 2103, at least one output activation of a neural network layer is computed from the partial sums.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 6 and analogous claims 14, 20, Yu, as modified by Taba, teaches The computing system of claim 1, The method of claim 11, and The NPU of claim 17, respectively.
	Taba teaches wherein the data router is configured to route the input tensor to the PE data memories that store the plurality of tiles with data along the shared edge duplicated in a first PE data memory associated with a first PE and in a second PE data memory associated with a second PE ([0072] Referring to FIG. 14, exemplary wherein the data router is configured to route the input tensor to the PE data memories that store the plurality of tiles with data along the shared edge spiral paths for a 5×5 input tensor is illustrated. In these examples, various shortcut paths are introduced for convolutions overlapping a boundary. Locations closer to the boundary incur more boundary crossings, each of which requires another partial sum to be cached. Accordingly, handling different boundary conditions requires cores to run different microcode.; [0046] In various embodiments a global scheduler 304 is included in IPU 300. In various embodiments, a local core controller 334 is included on each core 303. In such embodiments, the direction of operations is shared between the global scheduler (chip microengine) and the local core controller (core microengine). In particular, at 311, compute instructions are loaded from model memory 301 to the neural computation unit 333 on each core 303 by global scheduler 304. duplicated in a first PE data memory associated with a first PE and in a second PE data memory associated with a second PE At 312, parameters (e.g., neural network/synaptic weights) are loaded from model memory 301 to the neural computation unit 333 on each core 303 by global scheduler 304. At 313, neural network activation data are loaded from activation local activation memory 332 to neural computation unit 333 on each core 303 by local core controller 334. As noted above, the activations are provided to the axons of the particular neural network defined by the model, and may originate from the same or another neural computation unit, or from outside the system. At 314, neural computation unit 333 performs the computation to generate output neuron activations as directed by local core controller 334. In particular, the computation comprises applying the input synaptic weights to the input activations. It will be appreciated that various methods are available for performing such computations, including in silico dendrites, as well as vector multiplication units. At 315, the results from computation are stored in local activation memory 332 as directed by local core controller 334. As described above, these stages may be pipelined, in order to provide efficient usage of the neural computation unit on each core. It will also be appreciated that inputs and outputs may be transferred from local activation memory 332 to global activation memory 302 according to the requirements of a given neural network.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 7 and analogous claim 15, Yu, as modified by Taba, teaches The computing system of claim 1, The method of claim 11, respectively.
	Taba teaches wherein the data router is further configured to transpose the plurality of tiles in the PE data memories by storing tile rows as columns in the PE data memories ([0078] Referring to FIGS. 17A-F, the weight order and input order are illustrated for exemplary horizontal-vertical paths. In these exemplary embodiments, the computation of an output traverses the input tensor by scanning each row of pixels horizontally and collecting their output in a single column, which is then combined vertically.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 8 and analogous claim 16, Yu, as modified by Taba, teaches The computing system of claim 1, The method of claim 11, respectively.
	Yu teaches wherein each PE is further associated with a PE weight memory and wherein the data router is further configured to route weights to the PE weight memories based on the routing of the input tensor to store the plurality of tiles in the PE data memories ([Col. 11, Lines 35-50] An example of a each PE processing element 311 is illustrated in an inset diagram in FIG. 3. As illustrated by this example, processing element 311 can include a multiplier-accumulator circuit. Inputs from the left can include, for example, is further associated with a PE weight memory input data i and a weight value w, where the wherein the data router is further configured to route weights to the PE weight memories based on the routing of the input tensor to store the plurality of tiles in the PE data memories input data is a value taken from either a set of input data or a set of intermediate results, and the weight value is from a set of weight values that connect one layer of the neural network to the next. A set of input data can be, for example, an image being submitted for identification or object recognition, an audio clip being provided for speech recognition, a string of text for natural language processing or machine translation, or the current state of a game requiring analysis to determine a next move, among other things. In some examples, the input data and the weight value are output to the right, for input to the next processing element 311.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 9, Yu, as modified by Taba, teaches The computing system of claim 1.
	Yu teaches wherein the data router comprises a hardware-implemented algorithm ([Col. 3, Line 63-Col. 4, Line 24] According to certain embodiments, to reduce the overhead of performing a large number of DMA transfers, a compiler may, after the resource allocation and instruction generation of a typical compiling process, data router comprises a hardware-implemented algorithm analyze the local memory allocation and usage by the instructions and modify the instructions by, for example, replacing some DMA operations with tensor copy instructions that copy tensors within the local memory without involving the DMA controller. For example, at least some DMA operations for state buffer spilling and reloading may be replaced by tensor copy instructions that can reshape the dimensions (e.g., changing the number of partitions and the number of elements per partition but not the total number of elements) of the tensors to be spilled, and save the reshaped tensors in unused regions of the state buffer that may not have dimensions suitable for storing the original tensors. When the original tensor of a reshaped tensor needs to be used in subsequent instructions, a second tensor copy instruction may be used to change the reshaped tensor in the state buffer back to its original shape and save the tensor to the state buffer for use by the subsequent instructions. In some embodiments, the subsequent instructions may use the tensor with dimensions different from the dimensions of the original tensor and the reshaped tensor, and thus the second tensor copy instruction may read the reshaped tensor, reshape it into the desired dimensions, and save the tensor with the desired dimensions to the state buffer for use by the subsequent instruction. In this way, some DMA save instructions and DMA load instructions may be replaced by tensor copy instructions.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 10, Yu, as modified by Taba, teaches The computing system of claim 1.
	Yu teaches wherein the systolic array comprises a scalable array of interconnected Pes ([Col. 18, Lines 33-45] FIG. 5 illustrates an example of a computing system 500 according to certain embodiments. In the illustrated example, computing system 500 includes a DMA engine 550, system memory 520, and one or more accelerators 502-1 to 502-m. Computing system 500 may include other components not specifically shown, such as a host processor. Accelerators 502-1 may be a neural network accelerator (e.g., a neural network processor or tensor processing unit), and may include a systolic array processing element array 510-1 (e.g., a systolic array), a state buffer 504-1, and a result buffer 512-1 as described above with respect to FIG. 3. Processing element array 510-1 may comprises a scalable array of interconnected Pes include an array of processing elements arranged in rows and columns.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 12, Yu, as modified by Taba, teaches The method of claim 11.
	Yu teaches further comprising: performing a convolution on the input tensor by performing, by a PE convolution engine associated with each PE, a convolution on respective portions of the input tiles stored in the associated PE data memory ([Col. 10, Line 64-Col.11, Line 8] Processing element array 310 is the computation matrix of accelerator 302. performing a convolution on the input tensor by performing, by a PE convolution engine associated with each PE, a convolution on respective portions of the input tiles stored in the associated PE data memory Processing element array 310 can, for example, execute parallel integration, convolution, correlation, and/or matrix multiplication, among other things. Processing element array 310 may include multiple processing elements 311, arranged in rows and columns, such that results output by one processing element 311 can be input directly into another processing element 311. Processing elements 311 that are not on the outside edges of processing element array 310 thus can receive data to operate on from other processing elements 311, rather than from memory subsystem 304.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 18, Yu, as modified by Taba, teaches The NPU of claim 17.
	Yu teaches wherein the data router is further configured to route weights to the PE weight memories based on the routing of the input tensor to store the plurality of tiles in the PE data memories ([Col. 11, Lines 35-50] An example of a processing element 311 is illustrated in an inset diagram in FIG. 3. As illustrated by this example, processing element 311 can include a multiplier-accumulator circuit. Inputs from the left can include, for example, input data i and a weight value w, where the wherein the data router is further configured to route weights to the PE weight memories based on the routing of the input tensor to store the plurality of tiles in the PE data memories input data is a value taken from either a set of input data or a set of intermediate results, and the weight value is from a set of weight values that connect one layer of the neural network to the next. A set of input data can be, for example, an image being submitted for identification or object recognition, an audio clip being provided for speech recognition, a string of text for natural language processing or machine translation, or the current state of a game requiring analysis to determine a next move, among other things. In some examples, the input data and the weight value are output to the right, for input to the next processing element 311.); and
	wherein each PE is associated with a PE convolution engine configured to perform a convolution on the input tensor by performing a convolution on respective portions of the input tiles stored in the associated PE data memory with the weights stored in the associated PE weight memories ([Col. 10, Line 64-Col.11, Line 8] Processing element array 310 is the computation matrix of accelerator 302. Processing element array 310 can, for example, execute parallel integration, convolution, correlation, and/or matrix multiplication, among other things. Processing element array 310 may include multiple processing elements 311, arranged in rows and columns, such that results output by one processing element 311 can be input directly into another processing element 311.).
	Yu and Taba are combinable for the same rationale as set forth above with respect to claim 1.





Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Meyer et al. (U.S. Patent No. 12260214) teaches multiple input data streams provided to respective subsets of one or more computational circuit blocks in the pipeline using bypass circuitry of the computational circuit blocks, the computation performed on multiple input data streams in the respective subsets of one or more computational circuit blocks to generate multiple output data streams corresponding to the output tensor.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MM/Examiner, Art Unit 2129                                                                                                                                                                                              
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jun 08, 2023
Application Filed
Jan 21, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/330,099
Patent 12602603
MULTI-AGENT INFERENCE
2y 5m to grant Granted Apr 14, 2026
17/392,319
Patent 12596933
CONTEXT-AWARE ENTITY LINKING FOR KNOWLEDGE GRAPHS TO SUPPORT DECISION MAKING
2y 5m to grant Granted Apr 07, 2026
17/062,058
Patent 12579463
GENERATIVE REASONING FOR SYMBOLIC DISCOVERY
2y 5m to grant Granted Mar 17, 2026
17/659,028
Patent 12579452
EVALUATION SCORE DETERMINATION MACHINE LEARNING MODELS WITH DIFFERENTIAL PERIODIC TIERS
2y 5m to grant Granted Mar 17, 2026
17/212,022
Patent 12566941
EXTENSION OF EXISTING NEURAL NETWORKS WITHOUT AFFECTING EXISTING OUTPUTS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
64%
Grant Probability
85%
With Interview (+20.7%)
4y 3m
Median Time to Grant
Low
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allow rate.