DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 have been examined.
Information Disclosure Statement
The Applicant's submission of the Information Disclosure Statements dated December 3, 2024 (x5), February 6, 2025, May 6, 2025, July 22, 2025, October 6, 2025, and December 19, 2025, is acknowledged by the Examiner and the cited references have been considered in the examination of the claims now pending. Copies of the PTOL-1449s initialed and dated by the Examiner are attached to the instant office action.
Specification
The disclosure is objected to because of the following informalities.
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Appropriate correction is required. The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which Applicant may become aware in the specification.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-7, 11-13, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2017/0090924 by Mishra et al. (previously cited and hereinafter referred to as “Mishra”).
Regarding claim 1, Mishra discloses:
an accelerator device comprising: a memory (Mishra discloses, at Figure 18 and related description, a processor comprising a memory, which discloses an accelerator device. See also ¶ [0060], which discloses accelerating dot product operations.); and
a compute cluster including multiple processing resources coupled with the memory, the multiple processing resources coupled …to facilitate data exchange between the multiple processing resources, respective processing resources of the multiple processing resources including a matrix accelerator configured to (Mishra discloses, at Figure 18 and related description, the processor can include multiple cores that are coupled to one another. See also ¶ [0060], which discloses accelerating dot product operations, which discloses a matrix accelerator.):
perform a dot product operation on elements of a sparse first matrix and a second matrix in response to …[an] instruction, elements of the sparse first matrix are compacted into a compressed representation including a non-zero value element and an indication of the non-zero value element, and the …instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element (Mishra discloses, at Figure 1 and related description, performing dot product operations on elements of sparse vectors, i.e., matrices, in which computations associated with the zero values are ignored, i.e., skipped. As disclosed at Figure 2 and related description, the elements are stored in a compressed representation that includes non-zero element values and indicia of the positions of the non-zero elements.); and
write output of the dot product operation… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations.).
Mishra does not explicitly disclose the aforementioned coupling is with a data crossbar, the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned writing of output is to memory.
However, Mishra discloses a crossbar. See e.g., ¶ [0054]. Mishra also discloses at ¶ [0203] that any number of well-known techniques can be used to connect the processing resources, which covers using a crossbar to do so. It would have been obvious to use a crossbar because crossbars are a well-known way to improve performance by allowing any to any connectivity, which increases data transfer flexibility.
Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses storing data to memory. See, e.g., Figure 18 and related description. It would have been obvious to store the dot product operation output to memory in order to enable the output to be used at a later time.
Regarding claim 2, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the compressed representation is to be stored …in a compressed format (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation.).
Mishra does not explicitly disclose the aforementioned storing is to the memory.
However, Mishra discloses storing data to memory. See, e.g., Figure 18 and related description. It would have been obvious to store the compressed representation to memory in order to enable the data to be used at a later time.
Regarding claim 3, Mishra, as modified, discloses the elements of claim 2, as discussed above. Mishra also discloses:
the memory is a level two cache memory or a shared local memory (Mishra discloses, at Figure 18 and related description, L2 cache.).
Regarding claim 4, Mishra, as modified, discloses the elements of claim 3, as discussed above. Mishra also discloses:
…a processing resource of the multiple processing resources is configured to: load the compressed representation… into an internal memory within the processing resource (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation, which discloses loading the elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.);
load the second matrix… into the internal memory (Mishra discloses, at Figure 2 and related description, storing elements of the second vector, i.e., matrix, which discloses loading elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.);
perform the dot product operation via the matrix accelerator on elements from the compressed representation and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with non-zero values of the sparse first matrix stored within the compressed representation and are selected by the matrix accelerator from the elements of the second matrix stored in the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements.);
and write at least a portion of the output of the dot product operation to the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations to internal memory.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned loading is from the memory.
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses, e.g., at Figure 18 and related description, moving data between external memory and various levels of cache, i.e., internal memory. It would have been obvious to load from the external memory because doing so represents utilizing well-known tradeoffs between memory capabilities and costs.
Regarding claim 5, Mishra, as modified, discloses the elements of claim 4, as discussed above. Mishra also discloses:
the matrix accelerator is configured to select the selected elements of the second matrix from a vector in the internal memory based on indications of the non-zero values of the sparse first matrix, the vector including a plurality of elements of the second matrix (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements. As shown at Figure 2 and related description, the values to be used are selected based on indicia of locations of non-zero values.).
Regarding claim 6, Mishra, as modified, discloses the elements of claim 5, as discussed above. Mishra also discloses:
the internal memory includes a register file or a level one cache memory (Mishra discloses, at Figure 17 and related description, L1 cache.).
Regarding claim 7, Mishra, as modified, discloses the elements of claim 6, as discussed above. Mishra also discloses:
the matrix accelerator is configured to associate elements from the compressed representation and corresponding selected elements of the second matrix with corresponding channels of input data for processing within the matrix accelerator (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses associating corresponding elements. As disclosed at ¶ [0092], the operations are put into vertical SIMD alignment, which discloses corresponding channels.).
Regarding claim 11, Mishra discloses:
a method comprising: on a general-purpose graphics processor (Mishra discloses, at Figure 18 and related description, using a general purpose graphics processing unit.):
performing a dot product operation on multiple elements of a sparse first matrix and a second matrix …[an] instruction, the dot product operation performed via a compute cluster including multiple processing resources coupled with a memory, the multiple processing resources coupled …to facilitate data exchange between the multiple processing resources, respective processing resources of the multiple processing resources including a matrix accelerator, wherein elements of the sparse first matrix are compacted into a compressed representation that includes a non-zero value element and an indication of the non-zero value element, and the …instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element (Mishra discloses, at Figure 1 and related description, performing dot product operations on elements of sparse vectors, i.e., matrices, in which computations associated with the zero values are ignored, i.e., skipped. As disclosed at Figure 2 and related description, the elements are stored in a compressed representation that includes non-zero element values and indicia of the positions of the non-zero elements Mishra also discloses, at Figure 18 and related description, the processor can include multiple cores that are coupled to one another. See also ¶ [0060], which discloses accelerating dot product operations, which discloses a matrix accelerator.); and
writing output of the dot product operation… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, the aforementioned coupling is with a data crossbar, and the aforementioned writing of output is to memory.
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses a crossbar. See e.g., ¶ [0054]. Mishra also discloses at ¶ [0203] that any number of well-known techniques can be used to connect the processing resources, which covers using a crossbar to do so. It would have been obvious to use a crossbar because crossbars are a well-known way to improve performance by allowing any to any connectivity, which increases data transfer flexibility.
Mishra also discloses storing data to memory. See, e.g., Figure 18 and related description. It would have been obvious to store the dot product operation output to memory in order to enable the output to be used at a later time.
Regarding claim 12, Mishra, as modified, discloses the elements of claim 11, as discussed above. Mishra also discloses:
storing the compressed representation …in a compressed format (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation.); and
via a processing resource of the multiple processing resources and in response to the …instruction (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses via a processing resource of the multiple processing resources and in response to the …instruction.):
loading the compressed representation …into an internal memory within the processing resource (Mishra discloses, at Figure 1 and related description, performing dot product operations with the compressed representations, which discloses loading the compressed representation …into an internal memory within the processing resource.);
loading the second matrix …into the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations with the second matrix, which discloses loading the second matrix into internal memory.);
performing the dot product operation via the matrix accelerator on elements from the compressed representation and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with non-zero values of the sparse first matrix stored within the compressed representation and are selected by the matrix accelerator from the elements of the second matrix stored in the internal memory, wherein the matrix accelerator is configured to select the selected elements of the second matrix from a vector in the internal memory based on indications of the non-zero values of the sparse first matrix, the vector including a plurality of elements of the second matrix (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements.); and
writing at least a portion of the output of the dot product operation to the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations to internal memory.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned storing and loading use the memory.
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses, e.g., at Figure 18 and related description, moving data between external memory and various levels of cache, i.e., internal memory. It would have been obvious to load from the external memory because doing so represents utilizing well-known tradeoffs between memory capabilities and costs.
Regarding claim 13, Mishra, as modified, discloses the elements of claim 12, as discussed above. Mishra also discloses:
compacting the elements of the sparse first matrix into the compressed representation within a memory of the processing resource (Mishra discloses, at Figure 2 and related description, generating a compressed representation, which discloses compacting the elements of the sparse first matrix into the compressed representation within a memory of the processing resource.).
Regarding claim 16, Mishra discloses:
a graphics processing system comprising: a memory device; a graphics processor coupled with the memory device, the graphics processor including (Mishra discloses, at Figure 18 and related description, a processor comprising a memory, which discloses a graphics processing system comprising: a memory device; a graphics processor coupled with the memory device.):
a cache memory (Mishra discloses, at Figure 18 and related description, the processor includes cache.); and
a compute cluster including multiple processing resources coupled with the cache memory, the multiple processing resources coupled …to facilitate data exchange between the multiple processing resources, respective processing resources of the multiple processing resources including a matrix accelerator configured to (Mishra discloses, at Figure 18 and related description, the processor can include multiple cores that are coupled to one another. See also ¶ [0060], which discloses accelerating dot product operations, which discloses a matrix accelerator.):
perform a dot product operation on elements of a sparse first matrix and a second matrix in response to … [an] instruction, elements of the sparse first matrix are compacted into a compressed representation including a non-zero value element and an indication of the non-zero value element, and the …instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element (Mishra discloses, at Figure 1 and related description, performing dot product operations on elements of sparse vectors, i.e., matrices, in which computations associated with the zero values are ignored, i.e., skipped. As disclosed at Figure 2 and related description, the elements are stored in a compressed representation that includes non-zero element values and indicia of the positions of the non-zero elements.); and
write output of the dot product operation… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations.).
Mishra does not explicitly disclose the aforementioned coupling is with a data crossbar, the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned writing of output is to cache.
However, Mishra discloses a crossbar. See e.g., ¶ [0054]. Mishra also discloses at ¶ [0203] that any number of well-known techniques can be used to connect the processing resources, which covers using a crossbar to do so. It would have been obvious to use a crossbar because crossbars are a well-known way to improve performance by allowing any to any connectivity, which increases data transfer flexibility.
Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses storing data to cache. See, e.g., Figure 18 and related description. It would have been obvious to store the dot product operation output to cache in order to enable the output to be used at a later time.
Regarding claim 17, Mishra, as modified, discloses the elements of claim 16, as discussed above. Mishra also discloses:
the compressed representation is to be stored … in a compressed format (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation.).
Mishra does not explicitly disclose the aforementioned storing is to the cache.
However, Mishra discloses storing data to cache. See, e.g., Figure 18 and related description. It would have been obvious to store the compressed representation to cache in order to enable the data to be used at a later time.
Regarding claim 18, Mishra, as modified, discloses the elements of claim 17, as discussed above. Mishra also discloses:
…a processing resource of the multiple processing resources is configured to: load the compressed representation … into an internal memory within the processing resource (Mishra discloses, at Figure 2 and related description, the elements are stored in a compressed representation, which discloses loading the elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.);
load the second matrix …into the internal memory (Mishra discloses, at Figure 2 and related description, storing elements of the second vector, i.e., matrix, which discloses loading elements into an internal memory. See also, at Figure 18 and related description, a cache hierarchy, which discloses moving data between external memory and internal memory.);
perform the dot product operation via the matrix accelerator on elements from the compressed representation and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with non-zero values of the sparse first matrix stored within the compressed representation and are selected by the matrix accelerator from the elements of the second matrix stored in the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements.); and
write at least a portion of the output of the dot product operation to the internal memory (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses writing the output of the dot product operations to internal memory.).
Mishra does not explicitly disclose the aforementioned dot product operations are in response to a sparse dot product instruction, and the aforementioned loading is from the cache memory.
However, Mishra discloses performing complex operations in response to a single instruction. See, e.g., the pseudocode shown at ¶ [0081] for performing operations in response to an instruction. It would have been obvious to modify Mishra to include a single instruction for performing the disclosed sparse dot product operations because doing so represents applying well-known design tradeoffs, such as the tradeoff between code size and instruction complexity.
Mishra also discloses, e.g., at Figure 18 and related description, moving data between various levels of cache, i.e., internal memory. It would have been obvious to load from the cache memory because doing so represents utilizing well-known tradeoffs between memory capabilities and costs.
Regarding claim 19, Mishra, as modified, discloses the elements of claim 18, as discussed above. Mishra also discloses:
the matrix accelerator is configured to select the selected elements of the second matrix from a vector in the internal memory based on indications of the non-zero values of the sparse first matrix, the vector including a plurality of elements of the second matrix (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses selecting the elements. As shown at Figure 2 and related description, the values to be used are selected based on indicia of locations of non-zero values.).
Regarding claim 20, Mishra, as modified, discloses the elements of claim 19, as discussed above. Mishra also discloses:
the matrix accelerator is configured to associate elements from the compressed representation and corresponding selected elements of the second matrix with corresponding channels of input data for processing within the matrix accelerator (Mishra discloses, at Figure 1 and related description, performing dot product operations using non-zero elements stored in a compressed representation, which discloses associating corresponding elements. As disclosed at ¶ [0092], the operations are put into vertical SIMD alignment, which discloses corresponding channels.).
Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Mishra in view of US Publication No. 2019/0114534 by Teng et al. (previously cited and hereinafter referred to as “Teng”).
Regarding claim 8, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the sparse first matrix includes …data …and the second matrix includes …data…, and wherein the output of the dot product operation includes output… (Mishra discloses, at Figure 1 and related description, performing dot product operations, which discloses the sparse first matrix data and second matrix data and output.).
Mishra does not explicitly disclose the aforementioned sparse first matrix data includes weight associated with a neural network, the aforementioned second matrix data includes input activation associated with the neural network, and the aforementioned output includes activation data associated with the neural network.
However, in the same field of endeavor (e.g., processing) Teng discloses:
weight data, input activation data, and output activation data, all associated with a neural network (Teng discloses, at Figure 3 and related description, weight data, input activation data, and output data, all associated with a neural network.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the neural network data disclosed by Teng in order to improve performance by using neural networks to process data.
Regarding claim 9, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the matrix accelerator includes …[an] array of processing elements (Mishra discloses, at Figure 18 and related description, the processor can include multiple cores, which discloses an array of processing elements.).
Mishra does not explicitly disclose the aforementioned array is systolic.
However, in the same field of endeavor (e.g., processing) Teng discloses:
a systolic array (Teng discloses, at Figure 5 and related description, a systolic array.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the systolic array disclosed by Teng in order to improve performance by simplifying control and data transfer operations.
Claims 10, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Mishra in view of US Publication No. 2021/0012197 by Simonyan et al. (previously cited and hereinafter referred to as “Simonyan”).
Regarding claim 10, Mishra, as modified, discloses the elements of claim 1, as discussed above. Mishra also discloses:
the sparse first matrix has …sparsity and elements of the sparse first matrix are compacted into a compressed representation based on the … sparsity (Mishra discloses, at Figure 2 and related description, compressing elements of a sparse matrix based on the sparsity.).
Mishra does not explicitly disclose the aforementioned sparsity is structured.
However, in the same field of endeavor (e.g., processing) Simonyan discloses:
structured sparsity (Simonyan discloses, at Figure 1 and related description, contiguous sparsity.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the structured sparsity disclosed by Simonyan in order to improve performance by enabling implementation with more efficient coding. See Simonyan, ¶ [0010].
Regarding claim 14, Mishra, as modified, discloses the elements of claim 13, as discussed above. Mishra also discloses:
the sparse first matrix has … sparsity and elements of the sparse first matrix are compacted into a compressed representation based on the …sparsity (Mishra discloses, at Figure 2 and related description, compressing elements of a sparse matrix based on the sparsity.).
Mishra does not explicitly disclose the aforementioned sparsity is structured.
However, in the same field of endeavor (e.g., processing) Simonyan discloses:
structured sparsity (Simonyan discloses, at Figure 1 and related description, contiguous sparsity.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Mishra to include the structured sparsity disclosed by Simonyan in order to improve performance by enabling implementation with more efficient coding. See Simonyan, ¶ [0010].
Regarding claim 15, Mishra, as modified, discloses the elements of claim 14, as discussed above. Mishra also discloses:
performing a dot product operation includes performing a dot product operation on 8-bit integer elements (Mishra discloses, at Figure 1 and related description, performing dot product operations. As disclosed at ¶ [0095], the elements can by 8 bit integer elements.).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677. The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAWN DOMAN/Primary Examiner, Art Unit 2183