Last updated: July 17, 2026
Application No. 18/967,123
ARCHITECTURE FOR BLOCK SPARSE OPERATIONS ON A SYSTOLIC ARRAY

Non-Final OA §103
Filed
Dec 03, 2024
Priority
Mar 15, 2019 — provisional 62/819,337 +7 more
Examiner
DOMAN, SHAWN
Art Unit
2183
Tech Center
2100 — Computer Architecture & Software
Assignee
Intel Corporation
OA Round
2 (Non-Final)
Interview Optional

— +26.1% interview lift. Examiner has a relatively high allowance rate (65%); +26.1% interview lift. A written response may suffice.
Based on 281 resolved cases, 2023–2026
Examiner Intelligence

DOMAN, SHAWN View full profile →
Grants 65% — above average
Career Allowance Rate
183 granted / 281 resolved
+10.1% vs TC avg
Strong +26% interview lift
Without
With
+26.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
34 currently pending
Career history
329
Total Applications
across all art units
Statute-Specific Performance

§101
1.9%
-38.1% vs TC avg
§103
76.4%
+36.4% vs TC avg
§102
8.9%
-31.1% vs TC avg
§112
11.5%
-28.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 281 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 have been examined.
The specification objections in the previous Office Action have been addressed and are withdrawn.

Information Disclosure Statement
The Applicant's submission of the Information Disclosure Statements dated February 26, 2026 and April 14, 2026, is acknowledged by the Examiner and the cited references have been considered in the examination of the claims now pending. Copies of the PTOL-1449s initialed and dated by the Examiner are attached to the instant office action. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 11-13, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2017/0090924 by Mishra et al. (previously cited and hereinafter referred to as “Mishra”). 
Regarding claim 1, Mishra discloses:
an accelerator device comprising: a memory (Mishra discloses, at Figure 18 and related description, a processor comprising a memory, which discloses an accelerator device. See also ¶ [0060], which discloses accelerating dot product operations.); and 
a compute cluster including multiple processing resources coupled with the memory, the multiple processing resources coupled …to facilitate data exchange between the multiple processing resources, respective processing resources of the multiple processing resources including a matrix accelerator configured to (Mishra discloses, at Figure 18 and related description, the processor can include multiple cores that are coupled to one another. See also ¶ [0060], which discloses accelerating dot product operations, which discloses a matrix accelerator.): 
perform a dot product operation on elements of a sparse first matrix and a second matrix in response to …[an] instruction, elements of the sparse first matrix are compacted into a compressed representation including a non-zero value element and an indication of the non-zero value element, and the …instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element (Mishra discloses, at Figure 1 and related description, performing dot product operations on elements of sparse vectors, i.e., matrices, in which computations associated with the zero values are ignored, i.e., skipped. As disclosed at Figure 2 and related description, the elements are stored in a compressed representation that includes non-zero element values and indicia of the positions of the non-zero elements.); and 
write output of the dot product operation… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations.).
Mishra does not explicitly disclose the aforementioned coupling is with a data crossbar, the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned writing of output is to memory. 
However, Mishra discloses a crossbar. See e.g., ¶ [0054]. Mishra also discloses at ¶ [0203] that any number of well-known techniques can be used to connect the processing resources, which covers using a crossbar to do so. It would have been obvious to use a crossbar because crossbars are a well-known way to improve performance by allowing any to any connectivity, which increases data transfer flexibility.
Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses storing data to memory. See, e.g., Figure 18 and related description. It would have been obvious to store the dot product operation output to memory in order to enable the output to be used at a later time.

Regarding claim 2, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the compressed representation is to be stored …in a compressed format (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation.).
Mishra does not explicitly disclose the aforementioned storing is to the memory.
However, Mishra discloses storing data to memory. See, e.g., Figure 18 and related description. It would have been obvious to store the compressed representation to memory in order to enable the data to be used at a later time.

Regarding claim 3, Mishra, as modified, discloses the elements of claim 2, as discussed above. Mishra also discloses:
the memory is a level two cache memory or a shared local memory (Mishra discloses, at Figure 18 and related description, L2 cache.).

Regarding claim 4, Mishra, as modified, discloses the elements of claim 3, as discussed above. Mishra also discloses:
…a processing resource of the multiple processing resources is configured to: load the compressed representation… into an internal memory within the processing resource (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation, which discloses loading the elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.); 
load the second matrix… into the internal memory (Mishra discloses, at Figure 2 and related description, storing elements of the second vector, i.e., matrix, which discloses loading elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.); 
perform the dot product operation via the matrix accelerator on elements from the compressed representation and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with non-zero values of the sparse first matrix stored within the compressed representation and are selected by the matrix accelerator from the elements of the second matrix stored in the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements.); 
and write at least a portion of the output of the dot product operation to the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations to internal memory.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned loading is from the memory. 
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses, e.g., at Figure 18 and related description, moving data between external memory and various levels of cache, i.e., internal memory. It would have been obvious to load from the external memory because doing so represents utilizing well-known tradeoffs between memory capabilities and costs.

Regarding claim 5, Mishra, as modified, discloses the elements of claim 4, as discussed above. Mishra also discloses:
the matrix accelerator is configured to select the selected elements of the second matrix from a vector in the internal memory based on indications of the non-zero values of the sparse first matrix, the vector including a plurality of elements of the second matrix (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements. As shown at Figure 2 and related description, the values to be used are selected based on indicia of locations of non-zero values.).

Regarding claim 6, Mishra, as modified, discloses the elements of claim 5, as discussed above. Mishra also discloses:
the internal memory includes a register file or a level one cache memory (Mishra discloses, at Figure 17 and related description, L1 cache.).

Regarding claim 7, Mishra, as modified, discloses the elements of claim 6, as discussed above. Mishra also discloses:
the matrix accelerator is configured to associate elements from the compressed representation and corresponding selected elements of the second matrix with corresponding channels of input data for processing within the matrix accelerator (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses associating corresponding elements. As disclosed at ¶ [0092], the operations are put into vertical SIMD alignment, which discloses corresponding channels.).

Regarding claim 11, Mishra discloses:
a method comprising: on a general-purpose graphics processor (Mishra discloses, at Figure 18 and related description, using a general purpose graphics processing unit.): 
performing a dot product operation on multiple elements of a sparse first matrix and a second matrix …[an] instruction, the dot product operation performed via a compute cluster including multiple processing resources coupled with a memory, the multiple processing resources coupled …to facilitate data exchange between the multiple processing resources, respective processing resources of the multiple processing resources including a matrix accelerator, wherein elements of the sparse first matrix are compacted into a compressed representation that includes a non-zero value element and an indication of the non-zero value element, and the …instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element (Mishra discloses, at Figure 1 and related description, performing dot product operations on elements of sparse vectors, i.e., matrices, in which computations associated with the zero values are ignored, i.e., skipped. As disclosed at Figure 2 and related description, the elements are stored in a compressed representation that includes non-zero element values and indicia of the positions of the non-zero elements Mishra also discloses, at Figure 18 and related description, the processor can include multiple cores that are coupled to one another. See also ¶ [0060], which discloses accelerating dot product operations, which discloses a matrix accelerator.); and 
writing output of the dot product operation… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, the aforementioned coupling is with a data crossbar, and the aforementioned writing of output is to memory. 
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses a crossbar. See e.g., ¶ [0054]. Mishra also discloses at ¶ [0203] that any number of well-known techniques can be used to connect the processing resources, which covers using a crossbar to do so. It would have been obvious to use a crossbar because crossbars are a well-known way to improve performance by allowing any to any connectivity, which increases data transfer flexibility.
Mishra also discloses storing data to memory. See, e.g., Figure 18 and related description. It would have been obvious to store the dot product operation output to memory in order to enable the output to be used at a later time.

Regarding claim 12, Mishra, as modified, discloses the elements of claim 11, as discussed above. Mishra also discloses:
storing the compressed representation …in a compressed format (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation.); and 
via a processing resource of the multiple processing resources and in response to the …instruction (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses via a processing resource of the multiple processing resources and in response to the …instruction.): 
loading the compressed representation …into an internal memory within the processing resource (Mishra discloses, at Figure 1 and related description, performing dot product operations with the compressed representations, which discloses loading the compressed representation …into an internal memory within the processing resource.); 
loading the second matrix …into the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations with the second matrix, which discloses loading the second matrix into internal memory.);
performing the dot product operation via the matrix accelerator on elements from the compressed representation and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with non-zero values of the sparse first matrix stored within the compressed representation and are selected by the matrix accelerator from the elements of the second matrix stored in the internal memory, wherein the matrix accelerator is configured to select the selected elements of the second matrix from a vector in the internal memory based on indications of the non-zero values of the sparse first matrix, the vector including a plurality of elements of the second matrix (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements.); and 
writing at least a portion of the output of the dot product operation to the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations to internal memory.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned storing and loading use the memory. 
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses, e.g., at Figure 18 and related description, moving data between external memory and various levels of cache, i.e., internal memory. It would have been obvious to load from the external memory because doing so represents utilizing well-known tradeoffs between memory capabilities and costs.

Regarding claim 13, Mishra, as modified, discloses the elements of claim 12, as discussed above. Mishra also discloses:
compacting the elements of the sparse first matrix into the compressed representation within a memory of the processing resource (Mishra discloses, at Figure 2 and related description, generating a compressed representation, which discloses compacting the elements of the sparse first matrix into the compressed representation within a memory of the processing resource.).

Regarding claim 16, Mishra discloses:
a graphics processing system comprising: a memory device; a graphics processor coupled with the memory device, the graphics processor including (Mishra discloses, at Figure 18 and related description, a processor comprising a memory, which discloses a graphics processing system comprising: a memory device; a graphics processor coupled with the memory device.): 
a cache memory (Mishra discloses, at Figure 18 and related description, the processor includes cache.); and 
a compute cluster including multiple processing resources coupled with the cache memory, the multiple processing resources coupled …to facilitate data exchange between the multiple processing resources, respective processing resources of the multiple processing resources including a matrix accelerator configured to (Mishra discloses, at Figure 18 and related description, the processor can include multiple cores that are coupled to one another. See also ¶ [0060], which discloses accelerating dot product operations, which discloses a matrix accelerator.): 
perform a dot product operation on elements of a sparse first matrix and a second matrix in response to … [an] instruction, elements of the sparse first matrix are compacted into a compressed representation including a non-zero value element and an indication of the non-zero value element, and the …instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element (Mishra discloses, at Figure 1 and related description, performing dot product operations on elements of sparse vectors, i.e., matrices, in which computations associated with the zero values are ignored, i.e., skipped. As disclosed at Figure 2 and related description, the elements are stored in a compressed representation that includes non-zero element values and indicia of the positions of the non-zero elements.); and 
write output of the dot product operation… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations.).
Mishra does not explicitly disclose the aforementioned coupling is with a data crossbar, the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned writing of output is to cache.
However, Mishra discloses a crossbar. See e.g., ¶ [0054]. Mishra also discloses at ¶ [0203] that any number of well-known techniques can be used to connect the processing resources, which covers using a crossbar to do so. It would have been obvious to use a crossbar because crossbars are a well-known way to improve performance by allowing any to any connectivity, which increases data transfer flexibility.
Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses storing data to cache. See, e.g., Figure 18 and related description. It would have been obvious to store the dot product operation output to cache in order to enable the output to be used at a later time.

Regarding claim 17, Mishra, as modified, discloses the elements of claim 16, as discussed above. Mishra also discloses:
the compressed representation is to be stored … in a compressed format (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation.).
Mishra does not explicitly disclose the aforementioned storing is to the cache.
However, Mishra discloses storing data to cache. See, e.g., Figure 18 and related description. It would have been obvious to store the compressed representation to cache in order to enable the data to be used at a later time.

Regarding claim 18, Mishra, as modified, discloses the elements of claim 17, as discussed above. Mishra also discloses:
…a processing resource of the multiple processing resources is configured to: load the compressed representation … into an internal memory within the processing resource (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation, which discloses loading the elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.); 
load the second matrix …into the internal memory (Mishra discloses, at Figure 2 and related description, storing elements of the second vector, i.e., matrix, which discloses loading elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.);  
perform the dot product operation via the matrix accelerator on elements from the compressed representation and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with non-zero values of the sparse first matrix stored within the compressed representation and are selected by the matrix accelerator from the elements of the second matrix stored in the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements.); and 
write at least a portion of the output of the dot product operation to the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations to internal memory.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned loading is from the cache memory. 
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses, e.g., at Figure 18 and related description, moving data between various levels of cache, i.e., internal memory. It would have been obvious to load from the cache memory because doing so represents utilizing well-known tradeoffs between memory capabilities and costs.

Regarding claim 19, Mishra, as modified, discloses the elements of claim 18, as discussed above. Mishra also discloses:
the matrix accelerator is configured to select the selected elements of the second matrix from a vector in the internal memory based on indications of the non-zero values of the sparse first matrix, the vector including a plurality of elements of the second matrix (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements. As shown at Figure 2 and related description, the values to be used are selected based on indicia of locations of non-zero values.).

Regarding claim 20, Mishra, as modified, discloses the elements of claim 19, as discussed above. Mishra also discloses:
the matrix accelerator is configured to associate elements from the compressed representation and corresponding selected elements of the second matrix with corresponding channels of input data for processing within the matrix accelerator (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses associating corresponding elements. As disclosed at ¶ [0092], the operations are put into vertical SIMD alignment, which discloses corresponding channels.).

Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Mishra in view of US Publication No. 2019/0114534 by Teng et al. (previously cited and hereinafter referred to as “Teng”). 
Regarding claim 8, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the sparse first matrix includes …data …and the second matrix includes …data…, and wherein the output of the dot product operation includes output… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses the sparse first matrix data and second matrix data and output.).
Mishra does not explicitly disclose the aforementioned sparse first matrix data includes weight associated with a neural network, the aforementioned second matrix data includes input activation associated with the neural network, and the aforementioned output includes activation data associated with the neural network.
However, in the same field of endeavor (e.g., processing) Teng discloses:
weight data, input activation data, and output activation data, all associated with a neural network (Teng discloses, at Figure 3 and related description, weight data, input activation data, and output data, all associated with a neural network.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the neural network data disclosed by Teng in order to improve performance by using neural networks to process data.

Regarding claim 9, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the matrix accelerator includes …[an] array of processing elements (Mishra discloses, at Figure 18 and related description, the processor can include multiple cores, which discloses an array of processing elements.).
Mishra does not explicitly disclose the aforementioned array is systolic.
However, in the same field of endeavor (e.g., processing) Teng discloses:
a systolic array (Teng discloses, at Figure 5 and related description, a systolic array.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the systolic array disclosed by Teng in order to improve performance by simplifying control and data transfer operations.

Claims 10, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Mishra in view of US Publication No. 2021/0012197 by Simonyan et al. (previously cited and hereinafter referred to as “Simonyan”). 
Regarding claim 10, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the sparse first matrix has …sparsity and elements of the sparse first matrix are compacted into a compressed representation based on the … sparsity (Mishra discloses, at Figure 2 and related description, compressing elements of a sparse matrix based on the sparsity.).
Mishra does not explicitly disclose the aforementioned sparsity is structured.
However, in the same field of endeavor (e.g., processing) Simonyan discloses:
structured sparsity (Simonyan discloses, at Figure 1 and related description, contiguous sparsity.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the structured sparsity disclosed by Simonyan in order to improve performance by enabling implementation with more efficient coding. See Simonyan, ¶ [0010].

Regarding claim 14, Mishra, as modified, discloses the elements of claim 13, as discussed above. Mishra also discloses:
the sparse first matrix has … sparsity and elements of the sparse first matrix are compacted into a compressed representation based on the …sparsity (Mishra discloses, at Figure 2 and related description, compressing elements of a sparse matrix based on the sparsity.).
Mishra does not explicitly disclose the aforementioned sparsity is structured.
However, in the same field of endeavor (e.g., processing) Simonyan discloses:
structured sparsity (Simonyan discloses, at Figure 1 and related description, contiguous sparsity.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the structured sparsity disclosed by Simonyan in order to improve performance by enabling implementation with more efficient coding. See Simonyan, ¶ [0010].

Regarding claim 15, Mishra, as modified, discloses the elements of claim 14, as discussed above. Mishra also discloses:
performing a dot product operation includes performing a dot product operation on 8-bit integer elements (Mishra discloses, at Figure 1 and related description, performing dot product operations. As disclosed at ¶ [0095], the elements can by 8 bit integer elements.).

Response to Arguments
On pages 9-10 of the response filed April 14, 2026 (“response”), the Applicant argues, “Claim 1 recites, in part, "the multiple processing resources coupled with a data crossbar  to facilitate data exchange between the multiple processing resources." The Office Action asserts  that Mishra meets this limitation at paragraphs [0054] and [0203]. Applicant respectfully  disagrees.    Mishra paragraph [0054] describes a "fully connected crossbar where each data element  in the first source packed data operand may be compared with each data element in the source packed data operand so that an all element to all element comparison may be performed."  Thus, the crossbar disclosed in Mishra [0054] is an intra-unit switching mechanism for  comparing two packed data operands element-by-element and is not "a data crossbar to facilitate  data exchange between multiple processing resources of a compute cluster including multiple  processing resources." This distinction is confirmed by Mishra's figures. Mishra FIG. 3, which  the Office Action cites for the crossbar disclosure at paragraph [0054], shows only a single  execution unit (318) with a single decode unit (314). The crossbar described at paragraph [0054]  is internal to that one execution unit; it is element-comparison logic within a single functional  block, not an interconnect between multiple processing resources of a compute cluster.    Mishra paragraph [0203] states that "logic blocks communicate through a high-  bandwidth interconnect network (e.g., a ring network) with some fixed function logic, memory  I/O interfaces, and other necessary I/O logic, depending on the application" (emphasis  added). Mishra [0203] is specific that the logic blocks communicate with "some fixed function  logic, memory I/O interfaces, and other necessary I/O logic." There is no indication that data  exchange between multiple processing resources of a compute cluster can be adapted to be  performed via the ring network in the manner of the claimed crossbar. Mishra paragraph [0210]  states that "a ring based interconnect unit 1812 interconnects the integrated graphics logic 1808,  the set of shared cache units 1806, and the system agent unit 1810/integrated memory controller  unit(s) 1814," within a general purpose processor of FIG. 18. While Mishra [0210] goes on to  state, "alternative embodiments may use any number of well-known techniques for  interconnecting such units," (emphasis added), Mishra is explicit that "any number of well-  known techniques" may be used to interconnect "such units," (e.g., the integrated graphics  logic 1808, the set of shared cache units 1806, and the system agent unit 1810/integrated  memory controller unit(s) 1814). Thus, there is no teaching of a data crossbar, ring network, or  ring-based interconnect in Mishra that is equivalent to or renders obvious "a data crossbar to  facilitate data exchange between the multiple processing resources," as in claim 1.”
Though fully considered, the Examiner respectfully disagrees. As noted by the Applicant, Mitra discloses, e.g., at ¶ [0203], multiple logic blocks, e.g., cores, that are coupled by a high-speed interconnect. Mitra discloses that one example of such an interconnect is a ring network. Mitra does not limit the interconnection to being a ring network, but merely lists a ring network as an example that can be used to interconnect the logic blocks. This discloses that the possible use of other types of interconnects was contemplated by the inventors. 
One well-known type of interconnect, as disclosed by Mitra, is a crossbar. Crossbars have well-known advantages and drawbacks, as do all interconnects. The Applicant argues that Mitra’s crossbar is internal to an execution unit, and does not interconnect multiple processing resources. But the question is whether it would have been obvious to use a crossbar, as disclosed by Mitra, to connect the multiple logic blocks. The Examiner maintains that it would have been obvious. The use of a crossbar would have provided the advantages associated with crossbars generally, e.g., high bandwidth, low latency, parallel transmission between multiple endpoints, etc. Furthermore, crossbars and ring networks are both fundamentally well-known interconnection types. It would have been obvious to a person having ordinary skill in the art to select the appropriate interconnect as circumstances dictated. Accordingly, the Applicant’s arguments are deemed unpersuasive.  

On page 11 of the response the Applicant argues, “Claim 1 further recites that the matrix accelerator is configured to "perform a dot product operation on elements of a sparse first matrix and a second matrix in response to a sparse dot product instruction." The Office Action acknowledges that Mishra does "not explicitly disclose" this limitation, but asserts that Mishra paragraph [0081] discloses "performing complex operations in response to a single instruction" and "[i]t would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity." Applicant respectfully disagrees.  Mishra [0081] discloses the instruction VXBARCMPU, which is a data element comparison instruction that performs all-to-all element comparisons between two packed source operands and generates match masks indicating which elements in one operand match elements in the other. Mishra does not actually teach a specific instruction at all to perform sparse dot product operations. The Office Action identifies no prior art that teaches the specific sparse dot product instruction claimed, nor any motivation to combine Mishra's element-comparison instruction with sparse matrix multiplication. The statement that "[i]t would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity," cannot be logically or legally supported.”
Though fully considered, the Examiner respectfully disagrees. Mitra explicitly discloses sparse vector dot product calculations, which discloses performing a dot product on sparse matrices. See, e.g., ¶ [0035]. The only difference between this and the claims is that the claims recite doing so “in response to a sparse dot product instruction.” 
The Examiner maintains that it is evident that Mitra’s sparse vector dot product calculations are performed in response to one or more instructions. This one or more instructions discloses the claimed sparse dot product instruction. However, Mitra does not explicitly disclose that it is a single instruction that, when executed, caused the sparse vector dot product calculations to be performed. However, there are only two possibilities. One, it is a single instruction, i.e., a sparse dot product instruction, or two, it is multiple instructions. It is fundamentally well known to combine multiple operations into a single instruction. Doing so represents basic design tradeoffs and provides certain advantages, such as shorter code, at the cost of complexity. An example of combining complex functionality into a single instruction is disclosed by Mitra at, e.g., ¶ [0081]. 
While Mitra is silent regarding the instruction or instructions that cause the sparse vector computations to be performed, the Examiner maintains that it would have been obvious to use a single instruction to cause the sparse vector computations, for the reasons above. Accordingly, the Applicant’s arguments are deemed unpersuasive.  

On pages 11-13 of the response the Applicant argues the remaining claims are not obvious in view of the cited references for similar reasons.
Though fully considered, the Examiner respectfully disagrees. The reasons set forth in the remarks and rejections presented above, including those regarding the independent claims, are applicable to these claims.

Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/SHAWN DOMAN/
Primary Examiner, Art Unit 2183
Read full office action
Prosecution Timeline

Dec 03, 2024
Application Filed
Jan 20, 2026
Non-Final Rejection mailed — §103
Apr 14, 2026
Response Filed
May 04, 2026
Final Rejection mailed — §103
Jul 06, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/357,984
Patent 12681730
Measuring Performance Associated with Processing Instructions
2y 11m to grant Granted Jul 14, 2026
18/424,989
Patent 12675295
SYSTEM AND METHOD FOR DISTRIBUTED FORWARDING LOGIC FOR CYCLIC DATA-PIPELINE COHERENCY
2y 5m to grant Granted Jul 07, 2026
18/474,207
Patent 12675292
INSTRUCTION CONVERSION METHOD, INSTRUCTION CONVERSION SYSTEM, AND PROCESSOR
2y 9m to grant Granted Jul 07, 2026
18/057,140
Patent 12650843
Registers Vector Processors Store Addresses for Accessing Vectors
3y 6m to grant Granted Jun 09, 2026
18/494,152
Patent 12650949
SYSTEM ON CHIP INCLUDING PERFORMANCE CONTROLLER GENERATING CONTROL INFORMATION BASED ON FUNCTIONS AND METHOD OF OPERATING THE SAME
2y 7m to grant Granted Jun 09, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
65%
Grant Probability
91%
With Interview (+26.1%)
3y 0m (~1y 5m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 281 resolved cases by this examiner. Grant probability derived from career allowance rate.