DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 25 November 2025 has been entered. Claims 1-20 remain
pending in the application.
Claim Construction
Regarding claim 1, the preamble is given patentable weight. Claim 1 contains the limitation “the plurality of hardware compute elements” and “the memory interface” in the body, which is referring to the limitations as recited in the preamble. A skilled person in the art reading the claims would consider the claim in view of the body and preamble, and identify them limited to the technological environment of a hybrid threading fabric performing the particular functions using a plurality of hardware compute elements and a memory interface. The body of the claim depends on the preamble for completeness, and gives life, meaning, and vitality to this claim. Therefore, the preamble of claim 1 should be afforded patentable weight.
Regarding claim 19, the preamble is given patentable weight. Claim 19 contains the limitation “the plurality of hardware compute elements” and “the memory interface” in the body, which is referring to the limitations as recited in the preamble. A skilled person in the art reading the claims would consider the claim in view of the body and preamble, and identify them limited to the technological environment of a hybrid threading fabric performing the particular functions using a plurality of hardware compute elements and a memory interface. The body of the claim depends on the preamble for completeness, and gives life, meaning, and vitality to this claim. Therefore, the preamble of claim 19 should be afforded patentable weight.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without significantly more. A discussion of apparatus claims 10-18 will be presented first, followed by the computer program product claims 19-20, and followed by the method claims 1-9.
Regarding claim 10, under the Alice Framework Step 1 analysis, the claim falls within the four statutory categories of patentable subject matter: a machine.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 10 recites Mathematical Concepts. The claim recites Mathematical Relationships, which is specifically identified as an exemplar in the Mathematical Concepts grouping of abstract ideas:
“the first set of multiple coordinate data structure elements describing non-zero values of an input matrix, a first coordinate data structure element of the first set of multiple coordinate data structure elements comprising a first input matrix row number, a first input matrix column number, and a first input matrix value corresponding to the first input matrix row number and the first input matrix column number;
having input vector row numbers corresponding to input matrix column numbers of the first set of multiple coordinate data structure elements;
generating a first product based on a first portion of the first set of multiple coordinate data structure elements and the first set of input vector values;
updating a first partial accumulation value corresponding to a first input matrix row based at least in part on the first product;
generating, a second product based on a second portion of the first set of multiple coordinate data structure elements and the first set of input vector values, the second portion of the first set of multiple coordinate data structure elements being different than the first portion of the first set of multiple coordinate data structure elements, and;
updating a second partial accumulation value corresponding to a second input matrix row based at least in part on the first product;
summing a first set of partial accumulation values corresponding to the first input matrix row across at least a portion to generate a first output vector row value.”
See specification [0042-0043] which describes the first set of multiple coordinate data structure elements. See specification [0046] which describes input vector row numbers. See specification [0048-0049] which describes generating a first and second product. See specification [0049-0052] which describes updating a first partial and second partial accumulation values. See specification [0053] which describes summing. For these reasons, the claim recites Mathematical Concepts.
Under the Alice Framework Step 2A Prong 2 Analysis, the claim recites the combination of the following additional elements: a hardware compute element memory comprising multiple memory locations and a hardware compute element in communication with the hardware compute element memory, the hardware compute element comprising multiple parallel processing lanes, the hardware compute element programmed to execute operations, a first parallel processing lane of the hardware compute element, a second parallel processing lane of the hardware compute element, performed in parallel, a memory interface, an external memory, sending a first asynchronous read request to an external memory via the memory interface, the first asynchronous read request requesting, and sending a second asynchronous read request to the external via the memory interface, the second asynchronous read request requesting. A hardware compute element memory comprising multiple memory locations and a hardware compute element in communication with the hardware compute element memory, the hardware compute element comprising multiple parallel processing lanes, the hardware compute element programmed to execute operations, a first parallel processing lane of the hardware compute element, a second parallel processing lane of the hardware compute element, a memory interface, and an external memory are recited at a high level of generality and are examples of generic computing elements, and/or merely generally linked to a particular technological environment (see MPEP 2106.05(h)(vi): Limiting the abstract idea of collecting information, analyzing it, and displaying certain results of the collection analysis to data related to the electric power grid, because limiting application of the abstract idea to power-grid monitoring is simply an attempt to limit the use of the abstract idea to a particular technological environment). The claim recites limitations which are examples of generic computing elements that result in “apply it” on a computer. The performed and sending limitations are examples of insignificant extra-solution activity, mere data gathering (see MPEP 2106.05(g): Insignificant Extra-Solution Activity). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites a hardware compute element memory comprising multiple memory locations and a hardware compute element in communication with the hardware compute element memory, the hardware compute element comprising multiple parallel processing lanes, the hardware compute element programmed to execute operations, a memory interface, and an external memory are recited at a high level of generality which merely recite these generic components that result in “apply it” on a computer. Further, the additional elements merely generally link the use of the abstract idea to a particular technological environment. The performed limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Patterson, David A., and John L. Hennessy. Computer Organization and Design: The Hardware/Software Interface (5th Edition). Morgan Kaufmann, 2013. (hereinafter “Hennessy”), Pg. 225, Para. 1, subword parallelism; Pg. 227, Para. 2-3, parallel). The sending limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Horowitz, P., & Hill, W. (2015). The art of electronics (3rd ed.). Cambridge University Press. (hereinafter “Horowitz”), Pg. 1016, 14.4.3-A cont. on bridging paragraph on Pg. 1017; Pg. 1017, Fig. 14.22; Pg. 1018, 14.4.4-A cont. on bridging paragraph Pg. 1019; Pg. 1019, Fig. 14.25; Pg. 1021, Fig. 14.28). Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 10 is ineligible.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 11 recites Mathematical Concepts. The claim recites Mathematical Relationships, which is specifically identified as an exemplar in the Mathematical Concepts grouping of abstract ideas:
“summing a second set of multiple partial accumulation values corresponding to the second input matrix row to generate a second output vector row value.”
See specification [0053] which describes summing. For these reasons, the claim recites Mathematical Concepts.
Claim 11 contains no new, further additional elements that would require consideration under Step 2A Prong 2 and Step 2B. Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 11 is ineligible.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 12 recites Mathematical Concepts. The claim recites Mathematical Relationships, which is specifically identified as an exemplar in the Mathematical Concepts grouping of abstract ideas:
“a second set of multiple coordinate data structure elements, the second set of multiple coordinate data structure elements also describing non-zero values of the input matrix;
having input vector row numbers corresponding to input matrix column numbers of the second set of multiple coordinate data structure elements;
to update at least one of the first partial accumulation value and the second partial accumulation value using the second set of multiple coordinate data structure elements and the second set of input vector values.”
See specification [0042-0043] which describes the second set of multiple coordinate data structure elements. See specification [0046] which describes input vector row numbers. See specification [0049-0052] which describes updating multiple partial accumulation values. For these reasons, the claim recites Mathematical Concepts.
Under the Alice Framework Step 2A Prong 2 Analysis, the claim recites the combination of the following additional elements: loading to the hardware compute element a second set of multiple coordinate data structure elements, loading to the hardware compute element a second set of input vector values, using the multiple parallel processing lanes of the hardware compute element. The loading and using limitations are examples of insignificant extra-solution activity, mere data gathering (see MPEP 2106.05(g): Insignificant Extra-Solution Activity). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites insignificant extra-solution activities. The load limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Patterson, Pg. 13, Fig. 1.5, Load instructions; Pg. 265, vector load/store unit; Pg. 266, Fig. 4.3, Load instructions; Pg. 267, Para. 1; Pg. 280, Para. 1-4). The using limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Patterson, Pg. 10, Single instruction stream, multiple data streams; Pg. 269, Para. 1; Pg. 273, Fig. 4.5, Pg. 271-273, Sec. Multiple Lanes). Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 12 is ineligible.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 13 recites Mathematical Concepts. The claim recites Mathematical Relationships, which is specifically identified as an exemplar in the Mathematical Concepts grouping of abstract ideas:
“determining a portion of coordinate data structure elements corresponding to a first number of rows of the input matrix, the first number of rows of the input matrix comprising the first input matrix row, the portion of coordinate data structure elements comprising the first set of multiple coordinate data structure elements; and
updating at least one partial accumulation value using the portion of coordinate data structure elements corresponding to the first number of rows of the input matrix before summing the first partial accumulation value corresponding to the first input matrix row.”
See specification [0044] which describes determining. See specification [0049], [0051-0052] which describes updating. For these reasons, the claim recites Mathematical Concepts.
Claim 13 contains no new, further additional elements that would require consideration under Step 2A Prong 2 and Step 2B. Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 13 is ineligible.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 14 recites Mathematical Concepts. The claim recites Mathematical Relationships, which is specifically identified as an exemplar in the Mathematical Concepts grouping of abstract ideas:
“the first set of input vector values comprising multiple input vector values”
See specification [0048], [0050-0052] which describes the first set of input vector value. For these reasons, the claim recites Mathematical Concepts.
Under the Alice Framework Step 2A Prong 2 Analysis, the claim recites the combination of the following additional elements: having non-contiguous input vector row numbers. The having limitation is an example of generally linking the use of the abstract idea to a particular technological environment (see MPEP 2106.05(h)(vi): Limiting the abstract idea of collecting information, analyzing it, and displaying certain results of the collection analysis to data related to the electric power grid, because limiting application of the abstract idea to power-grid monitoring is simply an attempt to limit the use of the abstract idea to a particular technological environment). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites generally linking the use of the abstract idea to a particular technological environment, non-contiguous memory. Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 14 is ineligible.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 15 recites the combination of the following additional elements: executing a write operation that writes the first partial accumulation value to a first memory location and second partial accumulation value to a second memory location that is not contiguous with the first memory location. The executing limitation is an example of insignificant extra-solution activity, mere data gathering (see MPEP 2106.05(g): Insignificant Extra-Solution Activity). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites insignificant extra-solution activities. The executing limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Patterson, Pg. 13, Fig. 1.5, store instructions; Pg. 265, vector load/store unit; Pg. 266, Fig. 4.3, store instructions; Pg. 267, Para. 1; Pg. 280, Para. 1-4). Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 15 is ineligible.
Under the Alice Framework Step 2A Prong 2 Analysis, claim 16 recites the combination of the following additional elements: gather loading the first set of input vector values from non-contiguous memory locations at the hardware compute element memory. The loading limitation is an example of insignificant extra-solution activity, mere data gathering (see MPEP 2106.05(g): Insignificant Extra-Solution Activity). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites insignificant extra-solution activities. The loading limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Patterson, Pg. 279-280, Gather-Scatter Sec.). Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 16 is ineligible.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 17 recites Mathematical Concepts. The claim recites Mathematical Relationships, which is specifically identified as an exemplar in the Mathematical Concepts grouping of abstract ideas: “applying a hash function to the first input matrix row number to generate a first row number hash”
See specification [0180] which describes hashing. For these reasons, the claim recites Mathematical Concepts.
Under the Alice Framework Step 2A Prong 2 Analysis, the claim recites the combination of the following additional elements: writing the first partial accumulation value to a first location at the hardware compute element memory using the first row number hash. The writing limitations are an example of insignificant extra-solution activity, mere data gathering (see MPEP 2106.05(g): Insignificant Extra-Solution Activity). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites insignificant extra-solution activities. The writing limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Patterson, Pg. 13, Fig. 1.5, store instructions; Pg. 265, vector load/store unit; Pg. 266, Fig. 4.3, store instructions; Pg. 267, Para. 1; Pg. 280, Para. 1-4, for writing) (Patterson, Pg. 10, Single instruction stream, multiple data streams; Pg. 269, Para. 1; Pg. 273, Fig. 4.5, Pg. 271-273, Sec. Multiple Lanes, for parallel processing lanes). Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 17 is ineligible.
Under the Alice Framework Step 2A Prong 1 Analysis, claim 18 recites Mathematical Concepts. The claim recites Mathematical Relationships, which is specifically identified as an exemplar in the Mathematical Concepts grouping of abstract ideas: “applying the hash function to a second input matrix row number of a second coordinate data structure of the first set of multiple coordinate data structure elements to generate a second row number hash”
See specification [0180] which describes hashing. For these reasons, the claim recites Mathematical Concepts.
Under the Alice Framework Step 2A Prong 2 Analysis, the claim recites the combination of the following additional elements: writing second partial accumulation value to a second location at a hardware compute element memory using the second row number hash, the second location being noncontiguous with the first location, and writing of the first partial accumulation value and the second partial accumulation value being performed with a scatter write operation. The writing limitations are examples of insignificant extra-solution activity, mere data gathering (see MPEP 2106.05(g): Insignificant Extra-Solution Activity). The noncontiguous limitation is an example of generally linking the use of the abstract idea to a particular technological environment (see MPEP 2106.05(h)(vi): Limiting the abstract idea of collecting information, analyzing it, and displaying certain results of the collection analysis to data related to the electric power grid, because limiting application of the abstract idea to power-grid monitoring is simply an attempt to limit the use of the abstract idea to a particular technological environment). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites insignificant extra-solution activities. The writing limitations described above as an insignificant extra-solution activity are also well-understood, routine, or conventional (Patterson, Pg. 13, Fig. 1.5, store instructions; Pg. 265, vector load/store unit; Pg. 266, Fig. 4.3, store instructions; Pg. 267, Para. 1; Pg. 280, Para. 1-4, for writing) (Patterson, Pg. 10, Single instruction stream, multiple data streams; Pg. 269, Para. 1; Pg. 273, Fig. 4.5, Pg. 271-273, Sec. Multiple Lanes, for parallel processing lanes). Further, the claim recites generally linking the use of the abstract idea to a particular technological environment, non-contiguous memory. Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 18 is ineligible.
Claims 19-20 are directed toward a computer program product that would be executed by the device of claim 10-11. All limitations recited in claims 10-11 are practiced by the compute program product of claims 19-20, respectively. The claim 10-11 analysis equally applies to claims 19-20.
Additionally, under the Alice Framework Step 2A Prong 1 Analysis, claim 19 recites the combination of the following new additional elements: a non-transitory machine-readable medium comprising instructions thereon that, when executed by a computer architecture, causes the computer architecture to execute operations. A non-transitory machine-readable medium comprising instructions and a computer architecture is recited at a high level of generality and is an example of generic computing elements, and/or merely generally linked to a particular technological environment (see MPEP 2106.05(h)(vi): Limiting the abstract idea of collecting information, analyzing it, and displaying certain results of the collection analysis to data related to the electric power grid, because limiting application of the abstract idea to power-grid monitoring is simply an attempt to limit the use of the abstract idea to a particular technological environment). The claim recites limitations which are examples of generic computing elements that result in “apply it” on a computer. Additionally, the when executed by a computer architecture, and causing the architecture to execute are examples of mere instructions to apply the exception, merely resulting in “apply it” on a computer (see MPEP 2106.05(f): Mere Instructions To Apply An Exception). Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites a non-transitory machine-readable medium comprising instructions and a computer architecture and is recited at a high level of generality which merely recite these generic components that result in “apply it” on a computer. Further, these additional element merely generally links the use of the abstract idea to a particular technological environment. The execution limitations are instructions to apply the abstract idea, merely resulting in “apply it”. Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 19 is ineligible.
Claims 1-9 are directed toward a method that would be performed by the device of claim 10-18. All limitations recited in claims 10-18 are practiced by the method of claims 1-9, respectively. The claim 10-18 analysis equally applies to claims 1-9.
Additionally, under the Alice Framework Step 2A Prong 1 Analysis, claim 1 recites the combination of the following new additional element: a hybrid threading fabric. A hybrid threading fabric is recited at a high level of generality and is an example of generic computing elements, and/or merely generally linked to a particular technological environment (see MPEP 2106.05(h)(vi): Limiting the abstract idea of collecting information, analyzing it, and displaying certain results of the collection analysis to data related to the electric power grid, because limiting application of the abstract idea to power-grid monitoring is simply an attempt to limit the use of the abstract idea to a particular technological environment). The claim recites limitations which are examples of generic computing elements that result in “apply it” on a computer. Taken alone or in combination, they fail to integrate the judicial exception into a practical application.
Under the Alice Framework Step 2B Analysis, the additional elements recited above, taken alone or in combination, do not amount to significantly more than the judicial exception. As discussed in the Step 2A Prong 2 Analysis, the claim recites a hybrid threading fabric and is recited at a high level of generality which merely recite these generic components that result in “apply it” on a computer. Further, the additional element merely generally links the use of the abstract idea to a particular technological environment. Since the claim does not include additional elements that, alone or in combination, amount to significantly more than the judicial exception, claim 1 is ineligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
A discussion of apparatus claims 10-18 will be presented first, followed by the computer program product claims 19-20, and followed by the method claims 1-9.
Claims 10-16, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over G. -Y. Lueh et al., "C-for-Metal: High Performance Simd Programming on Intel GPUs," 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Seoul, Korea (South), 2021, pp. 289-300, doi: 10.1109/CGO51591.2021.9370324. (hereinafter “Lueh”) in view of S. Yesil, A. Heidarshenas, A. Morrison and J. Torrellas, "Speeding Up SpMV for Power-Law Graph Analytics by Enhancing Locality & Vectorization," SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2020, pp. 1-15. (hereinafter “Yesil”) in view of US 20140052929 A1 Gulati et al. (hereinafter “Gulati”).
Regarding claim 10, Lueh teaches an apparatus comprising:
a hardware compute element memory (Pg. 289, Col. 1, Sec. 1, Para. 1, SLM) comprising multiple memory locations (Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3); and
a hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) in communication (Pg. 289, Col. 1, Sec. I, Para. 1) with the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, SLM), the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) comprising multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism), the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) programmed to execute operations comprising:
requesting “selecting” a first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select), the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) describing non-zero values (Pg. 298, Col. 1, Sub. 6, dense matrices A, B, C) of an input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A), a first coordinate data structure element elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i,j)) of the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) comprising a first input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;), a first input matrix column number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m;), and a first input matrix value (Pg. 293, Fig. 1, m.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) corresponding to the first input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) and the first input matrix column number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m;);
requesting “selecting” (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) a first set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) having input vector row numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, vector type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in vector<short, 8> v;) corresponding to input matrix column numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m; Pg. 297, Col. 2, Sub. 4) of the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select);
using the multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism) of the compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate).
Lueh also teaches a second input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they do not explicitly disclose requesting as it is known in the art and are silent with disclosing summing a first set of partial accumulation values corresponding to the first input matrix row across at least a portion of the multiple parallel processing lanes to generate a first output vector row value.
Additionally, while Lueh generally teaches multiple parallel processing lanes (Pg. 294, Col. 1, Sub. D, Para. 1-2), they are silent with explicitly disclosing individual processing lanes in combination with their computations, and are silent with disclosing:
generating, by a first parallel processing lane of the hardware compute element, a first product based on a first portion of the first set of multiple coordinate data structure elements and the first set of input vector values;
updating a first partial accumulation value corresponding to a first input matrix row and the first parallel processing lane based at least in part on the first product;
generating, by a second parallel processing lane of the hardware compute element, a second product based on a second portion of the first set of multiple coordinate data structure elements and the first set of input vector values, the second portion of the first set of multiple coordinate data structure elements being different than the first portion of the first set of multiple coordinate data structure elements, and the generating of the first product by the first parallel processing lane and the generating of the second product by the second parallel processing lane being performed in parallel;
updating a second partial accumulation value corresponding to a second input matrix row the second parallel processing lane based at least in part on the first product.
Additionally, while Lueh teaches generally read requests (Pg. 293, Sec. IV-B), however, it appears these are different than the “asynchronous read requests” as claimed. Therefore, Lueh is also silent with disclosing:
requesting;
a memory interface;
sending a first asynchronous read request to an external memory via the memory interface, the first asynchronous read request;
sending a second asynchronous read request to the external memory via the memory interface, the second asynchronous read request.
Yesil teaches:
generating, by a first parallel processing lane of the hardware compute element, a first product based on a first portion (Pg. 7, Col. 1, Algorithm 2, line 21-22 iterates starting at index i and loads mask for respective SIMD lane as ms value, lines 23-24 uses lane number and ms as indices to access values, line 26 computes multiplication product mul between values previously accessed by lane number and ms at starting indices of loop; “first portion” – line 10 starting chunk) of the first set of multiple coordinate data structure elements and the first set of input vector values;
updating a first partial accumulation value corresponding to a first input matrix row and the first parallel processing lane based at least in part on the first product (Pg. 7, Col. 1, Algorithm 2, after computing starting product mul in line 26 as indexed by lane, and updates row_sums in line 27 with mul and previous row_sums value);
generating, by a second parallel processing lane of the hardware compute element, a second product based on a second portion of the first set of multiple coordinate data structure elements and the first set of input vector values, the second portion of the first set of multiple coordinate data structure elements being different than the first portion of the first set of multiple coordinate data structure elements (Pg. 7, Col. 1, Algorithm 2, line 21-22 iterates starting at index i and loads mask for respective SIMD lane as ms value, lines 23-24 uses lane number and ms as indices to access values, line 26 computes multiplication product mul between values previously accessed by lane number and ms at next indices of loop; “second portion” – line 10 next chunk), and the generating of the first product by the first parallel processing lane and the generating of the second product by the second parallel processing lane being performed in parallel (Pg. 7, Col. 1, Algorithm 2, lines 9-10 chunks are processed in parallel; Col. 1, Para. 2);
updating a second partial accumulation value corresponding to a second input matrix row the second parallel processing lane based at least in part on the first product (Pg. 7, Col. 1, Algorithm 2, after computing next product mul in line 26 as indexed by lane, and updates row_sums in line 27 with mul and previous row_sums value); and
summing a first set of (Pg. 7, Col. 1, Algorithm 2, lines 27, 30; Pg. 7, Col. 1, Para. 2) multiple partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30) corresponding to the first input matrix row across at least a portion of the multiple parallel processing lanes to generate a first output vector row value (Pg. 6, Col. 2, Para. 2, output vector y; Pg. 7, Col. 1, Algorithm 2, vector y; Pg. 7, Col. 1, Para. 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lueh’s compute element with Yesil’s machine instructions because they are in the claimed invention’s same field of endeavor of matrix-vector operations (Abstract). Lueh teaches the basic structures of computing architecture to perform various processes including well-known computer architecture instructions and arithmetic, however, they never go into specific detail describing the instructions of the arithmetic and other command processes. Yesil teaches machine instructions including loading, updating, generating, and summing in a processor (Col. 2, Sec. I, Para. 2-3). Using the known techniques of Yesil’s machine instructions to improve a similar processor to provide the desired matrix-vector operations in Lueh’s compute element, would have been obvious to one of ordinary skill in the art.
Yesil is further silent with disclosing:
requesting;
a memory interface;
sending a first asynchronous read request to an external memory via the memory interface, the first asynchronous read request;
sending a second asynchronous read request to the external memory via the memory interface, the second asynchronous read request.
The combination of Lueh in view of Yesil are silent with disclosing:
requesting;
a memory interface;
sending a first asynchronous read request to an external memory via the memory interface, the first asynchronous read request;
sending a second asynchronous read request to the external memory via the memory interface, the second asynchronous read request.
Gulati teaches:
requesting ([0031], [0042] read requests; [0028] requestors; [0048] asynchronous memory access requests);
a memory interface (Fig. 1, 110, [0033]);
sending a first asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests) to an external memory (Fig. 1, 120a-d, [0034]) via the memory interface (Fig. 1, 110, [0034]), the first asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests);
sending a second asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests; Note: Gulati teaches a plurality of asynchronous requests, where there exists at least more than one request) to the external memory (Fig. 1, 120a-d, [0034]) via the memory interface (Fig. 1, 110, [0034]), the second asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests; Note: Gulati teaches a plurality of asynchronous requests, where there exists at least more than one request).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lueh in view of Yesil’s modified compute device with Gulati’s memory interface and asynchronous reads because they are in the claimed invention’s same field of endeavor of graphical processing unit (GPU) architecture ([0029], [0034]). It would have been obvious to one of ordinary skill in the art to incorporate the memory interface and thereby it’s operations of asynchronous read requests, as doing so allows the modified compute device to reduce the demands on memory bandwidth and average power consumption via the caches included in the memory interface ([0033]). Thus, a person skilled in the art would recognize that modifying with Gulati would provide support to the modified compute device’s computing complexity and power consumption, and thus result in improvements of the device’s performance.
Regarding claim 11, in addition to the teachings addressed in the claim 10 analysis, the rejection of claim 10 is incorporated and Lueh teaches the operations further comprising:
the second input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) across at least
a portion of the multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with summing a second set of multiple partial accumulation values corresponding to the second input matrix row across at least a portion of the multiple parallel processing lanes to generate a second output vector row value.
Yesil teaches:
summing a second set of (Pg. 7, Col. 1, Algorithm 2, lines 27, 30; Pg. 7, Col. 1, Para. 2) the multiple partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30) corresponding to a second input matrix row across at least a portion of the parallel processing lanes to generate a second output vector row value (Pg. 6, Col. 2, Para. 2, output vector y; Pg. 7, Col. 1, Algorithm 2, vector y; Pg. 7, Col. 1, Para. 2).
The motivation provided with respect to claim 10 equally applies to claim 11.
Regarding claim 12, in addition to the teachings addressed in the claim 10 analysis, the rejection of claim 10 is incorporated and Lueh teaches the operations further comprising:
loading “selecting” to the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) a second set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select), the second set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) also describing non-zero values (Pg. 298, Col. 1, Sub. 6, dense matrices A, B, C) of the input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A);
loading “selecting” to the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) a second set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) having input vector row numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, vector type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in vector<short, 8> v;) corresponding to input matrix column numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m; Pg. 297, Col. 2, Sub. 4) of the second set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select); and
using the multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism) of the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate)
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they do not explicitly disclose loading as it is known in the art and are silent with update at least one of the first partial accumulation value and the second partial accumulation value using the second set of multiple coordinate data structure elements and the second set of input vector values.
Yesil teaches:
loading (Pg. 2, Table I; Pg. 2, Col. 2, Sec. II, Sub. B; Pg. 7, Algorithm 2, lines 18, 22-24)
update at least one of the first partial accumulation value and the second partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value and the next row_sums value as indicated by starting index and next index) using the second set of multiple coordinate data structure elements and the second set of input vector values.
The motivation provided with respect to claim 10 equally applies to claim 12.
Regarding claim 13, in addition to the teachings addressed in the claim 10 analysis, the rejection of claim 10 is incorporated and Lueh teaches the operations further comprising:
determining a portion of coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, vector or matrix) corresponding to a first number of rows (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) of the input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A), the first number of rows (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) of the input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A) comprising the first input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;), the portion of coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, vector or matrix) comprising the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select);
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with updating at least one partial accumulation value[[s]] using the portion of [[the]] coordinate data structure elements corresponding to the first number of rows of the input matrix before summing the first partial accumulation values corresponding to the first input matrix row.
Yesil teaches:
updating at least one partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value and the next row_sums value as indicated by starting index and next index) using the portion of the coordinate data structure elements corresponding to the first number of rows of the input matrix before summing (Pg. 7, Col. 1, Algorithm 2, lines 27, 30; Pg. 7, Col. 1, Para. 2) the first partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index) corresponding to the first input matrix row.
The motivation provided with respect to claim 10 equally applies to claim 13.
Regarding claim 14, in addition to the teachings addressed in the claim 10 analysis, the rejection of claim 10 is incorporated and Lueh teaches the operations further comprising:
the first set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) comprising multiple input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) having non-contiguous (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3) input vector row numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, vector type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in vector<short, 8> v;).
Regarding claim 15, in addition to the teachings addressed in the claim 10 analysis, the rejection of claim 10 is incorporated and Lueh teaches the operations further comprising:
is not contiguous (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3)
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with executing a write operation that writes a first partial accumulation value to a first memory location and the second partial accumulation value to a second memory location that is not contiguous with the first memory location.
Yesil teaches wherein:
executing a write operation (Pg. 3, Col. 2, Para. 2) that writes the first partial accumulation value (Pg. 7, Col. 1, Algorithm 2, lines 17-18, 31-32) to a first memory location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids) and the second partial accumulation value (Pg. 7, Col. 1, Algorithm 2, lines 17-18, 31-32) to a second memory location (Pg. 7, Col. 1, Algorithm 2, lines 31-32, y) that is not contiguous with the first memory location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids).
The motivation provided with respect to claim 10 equally applies to claim 15.
Regarding claim 16, in addition to the teachings addressed in the claim 10 analysis, the rejection of claim 10 is incorporated and Lueh teaches the operations further comprising:
gather (Pg. 293, Col. 1, Para. 4, replicate) loading “selecting” (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) the first set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) from non-contiguous memory locations (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3) at the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, SLM).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they do not explicitly disclose loading as it is known in the art and are silent with gather loading.
Yesil teaches:
gather loading (Pg. 2, Table I; Pg. 2, Col. 2, Sec. II, Sub. B; Pg. 7, Algorithm 2, lines 18, 22-24)
The motivation provided with respect to claim 10 equally applies to claim 16.
Claims 19-20 are directed toward a computer program product that would be executed by the device of claim 10-11. All limitations recited in claims 10-11 are practiced by the compute program product of claims 19-20, respectively. The claim 10-11 analysis equally applies to claims 19-20.
Additionally, Lueh discloses a non-transitory machine-readable medium comprising instructions (Pg. 289, Col. 2, Sec. I, Para. 4, C-for-Metal (CM)) thereon that, when executed by a computer architecture (Pg. 289, Col. 2, Sec. I, Para. 4, Intel GPUs; Pg. 289, Col. 1, Sec. I, Para. 1-2, Intel GPU), causes the computer architecture to execute operations (Pg. 289, Col. 2, Sec. I, Para. 4, operations on vectors and matrices), the computer architecture comprising a plurality of hardware compute elements (Pg. 289, Col. 1, Sec. I, Para. 1-2, work-groups which are groups of work-items).
Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Lueh in view of Yesil in view of Gulati as applied to claim 10 above, and further in view of US 10896141 B2 Cook et al. (hereinafter “Cook”).
Regarding claim 17, in addition to the teachings addressed in the claim 10 analysis, the rejection of claim 10 is incorporated and Lueh teaches the operations further comprising:
the first input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;)
the first input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) and
the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with the update of the first partial accumulation values, writing the first partial accumulation value, a first location, applying a hash function, to generate a first row number hash.
Yesil teaches:
the update of the first partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index), writing (Pg. 3, Col. 2, Para. 2) the first partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index), and a first location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids).
The motivation provided with respect to claim 10 equally applies to claim 17.
Lueh in view of Yesil are silent with disclosing: applying a hash function and to generate a first row number hash.
Lueh in view of Yesil in view of Gulati are silent with disclosing: applying a hash function and to generate a first row number hash.
Cook teaches:
applying a hash function (Col. 3, lines 51-65) and to generate a first row number hash (Col. 3, lines 51-65, hash output).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lueh in view of Yesil in view of Gulati’s modified compute device with Cook’s hashing techniques because they are in the claimed invention’s same field of endeavor of matrix-vector operations (Col. 3, lines 5-10). Lueh teaches the basic structures of computing architecture to perform various processes including well-known computer architecture instructions and arithmetic, however, they never go into specific detail describing the algorithms of the arithmetic and other command processes. Yesil teaches machine instructions including loading, updating, and summing in a processor (Col. 2, Sec. I, Para. 2-3). Cook teaches the remaining limitations of applying a hashing function for memory addressing (Col. 3, lines 51-65) with configurable tag and data addresses to ensure high throughput (Col. 5, lines 30-53). Using the known techniques of hashing as taught by Cook to improve Yesil’s matrix-vector operations in Lueh’s compute element, would have been obvious to one of ordinary skill in the art, since one of ordinary skill in the art would recognize that the system of Lueh in view of Yesil in view of Gulati was ready for improvement to incorporate the addressing features, as taught by Cook.
Regarding claim 18, in addition to the teachings addressed in the claim 17 analysis, the rejection of claim 17 is incorporated and Lueh teaches the operations further comprising:
a second input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) of a second coordinate data structure (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i,j)) of the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) the second input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, SLM) noncontiguous (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3) a scatter write operation (Pg. 293, Col. 2, Sub. B, Scattered read/write).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with the update of the second partial accumulation values, writing the second partial accumulation value, a second location, the first location, the writing of the first partial accumulation value and the second partial accumulation value, applying the hash function, using the second row number hash, and to generate a second row number hash.
Yesil teaches:
the update of the second partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the next row_sums value as indicated by the next index), writing (Pg. 3, Col. 2, Para. 2) the second partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the next row_sums value as indicated by the next index), a second location (Pg. 7, Col. 1, Algorithm 2, lines 31-32, y), the first location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids), and the writing (Pg. 3, Col. 2, Para. 2) of the first partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index) and the second partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the next row_sums value as indicated by the next index).
The motivation provided with respect to claim 10 equally applies to claim 18.
Lueh in view of Yesil are silent with disclosing: applying a hash function and to generate a first row number hash.
Lueh in view of Yesil in view of Gulati are silent with disclosing: applying a hash function and to generate a first row number hash.
Cook teaches:
applying a hash function (Col. 3, lines 51-65), to generate a first row number hash (Col. 3, lines 51-65, hash output), and to generate a second row number hash (Col. 3, lines 51-65, hash output).
The motivation provided with respect to claim 17 equally applies to claim 18.
Claims 1-7 are rejected under 35 U.S.C. 103 as being unpatentable over Lueh in view of Yesil in view of Gulati in view of Brewer, Tony. "Hybrid Threaded Processing for Sparse Data Kernels". May 9, 2018. Micron Technology, INC. (hereinafter “Brewer”).
Regarding claim 1, Lueh teaches an apparatus comprising:
a hardware compute element memory (Pg. 289, Col. 1, Sec. 1, Para. 1, SLM) comprising multiple memory locations (Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3); and
a hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) in communication (Pg. 289, Col. 1, Sec. I, Para. 1) with the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, SLM), the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) comprising multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism), the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) programmed to execute operations comprising:
requesting “selecting” a first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select), the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) describing non-zero values (Pg. 298, Col. 1, Sub. 6, dense matrices A, B, C) of an input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A), a first coordinate data structure element elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i,j)) of the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) comprising a first input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;), a first input matrix column number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m;), and a first input matrix value (Pg. 293, Fig. 1, m.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) corresponding to the first input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) and the first input matrix column number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m;);
requesting “selecting” (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) a first set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) having input vector row numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, vector type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in vector<short, 8> v;) corresponding to input matrix column numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m; Pg. 297, Col. 2, Sub. 4) of the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select);
using the multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism) of the compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate).
Lueh also teaches a second input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they do not explicitly disclose requesting as it is known in the art and are silent with disclosing summing a first set of partial accumulation values corresponding to the first input matrix row across at least a portion of the multiple parallel processing lanes to generate a first output vector row value.
Additionally, while Lueh generally teaches multiple parallel processing lanes (Pg. 294, Col. 1, Sub. D, Para. 1-2), they are silent with explicitly disclosing individual processing lanes in combination with their computations, and are silent with disclosing:
generating, by a first parallel processing lane of the hardware compute element, a first product based on a first portion of the first set of multiple coordinate data structure elements and the first set of input vector values;
updating a first partial accumulation value corresponding to a first input matrix row and the first parallel processing lane based at least in part on the first product;
generating, by a second parallel processing lane of the hardware compute element, a second product based on a second portion of the first set of multiple coordinate data structure elements and the first set of input vector values, the second portion of the first set of multiple coordinate data structure elements being different than the first portion of the first set of multiple coordinate data structure elements, and the generating of the first product by the first parallel processing lane and the generating of the second product by the second parallel processing lane being performed in parallel;
updating a second partial accumulation value corresponding to a second input matrix row the second parallel processing lane based at least in part on the first product.
Additionally, while Lueh teaches generally read requests (Pg. 293, Sec. IV-B), however, it appears these are different than the “asynchronous read requests” as claimed. Therefore, Lueh is also silent with disclosing:
requesting;
a hybrid threading fabric comprising a plurality of hardware compute elements and a memory interface;
sending a first asynchronous read request to an external memory via the memory interface, the first asynchronous read request;
sending a second asynchronous read request to the external memory via the memory interface, the second asynchronous read request.
Yesil teaches:
generating, by a first parallel processing lane of the hardware compute element, a first product based on a first portion (Pg. 7, Col. 1, Algorithm 2, line 21-22 iterates starting at index i and loads mask for respective SIMD lane as ms value, lines 23-24 uses lane number and ms as indices to access values, line 26 computes multiplication product mul between values previously accessed by lane number and ms at starting indices of loop; “first portion” – line 10 starting chunk) of the first set of multiple coordinate data structure elements and the first set of input vector values;
updating a first partial accumulation value corresponding to a first input matrix row and the first parallel processing lane based at least in part on the first product (Pg. 7, Col. 1, Algorithm 2, after computing starting product mul in line 26 as indexed by lane, and updates row_sums in line 27 with mul and previous row_sums value);
generating, by a second parallel processing lane of the hardware compute element, a second product based on a second portion of the first set of multiple coordinate data structure elements and the first set of input vector values, the second portion of the first set of multiple coordinate data structure elements being different than the first portion of the first set of multiple coordinate data structure elements (Pg. 7, Col. 1, Algorithm 2, line 21-22 iterates starting at index i and loads mask for respective SIMD lane as ms value, lines 23-24 uses lane number and ms as indices to access values, line 26 computes multiplication product mul between values previously accessed by lane number and ms at next indices of loop; “second portion” – line 10 next chunk), and the generating of the first product by the first parallel processing lane and the generating of the second product by the second parallel processing lane being performed in parallel (Pg. 7, Col. 1, Algorithm 2, lines 9-10 chunks are processed in parallel; Col. 1, Para. 2);
updating a second partial accumulation value corresponding to a second input matrix row the second parallel processing lane based at least in part on the first product (Pg. 7, Col. 1, Algorithm 2, after computing next product mul in line 26 as indexed by lane, and updates row_sums in line 27 with mul and previous row_sums value); and
summing a first set of (Pg. 7, Col. 1, Algorithm 2, lines 27, 30; Pg. 7, Col. 1, Para. 2) multiple partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30) corresponding to the first input matrix row across at least a portion of the multiple parallel processing lanes to generate a first output vector row value (Pg. 6, Col. 2, Para. 2, output vector y; Pg. 7, Col. 1, Algorithm 2, vector y; Pg. 7, Col. 1, Para. 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lueh’s compute element with Yesil’s machine instructions because they are in the claimed invention’s same field of endeavor of matrix-vector operations (Abstract). Lueh teaches the basic structures of computing architecture to perform various processes including well-known computer architecture instructions and arithmetic, however, they never go into specific detail describing the instructions of the arithmetic and other command processes. Yesil teaches machine instructions including loading, updating, generating, and summing in a processor (Col. 2, Sec. I, Para. 2-3). Using the known techniques of Yesil’s machine instructions to improve a similar processor to provide the desired matrix-vector operations in Lueh’s compute element, would have been obvious to one of ordinary skill in the art.
Yesil is further silent with disclosing:
requesting;
a hybrid threading fabric comprising a plurality of hardware compute elements and a memory interface;
sending a first asynchronous read request to an external memory via the memory interface, the first asynchronous read request;
sending a second asynchronous read request to the external memory via the memory interface, the second asynchronous read request.
The combination of Lueh in view of Yesil are silent with disclosing:
requesting;
a memory interface;
sending a first asynchronous read request to an external memory via the memory interface, the first asynchronous read request;
sending a second asynchronous read request to the external memory via the memory interface, the second asynchronous read request.
Gulati teaches:
requesting ([0031], [0042] read requests; [0028] requestors; [0048] asynchronous memory access requests);
a memory interface (Fig. 1, 110, [0033]);
sending a first asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests) to an external memory (Fig. 1, 120a-d, [0034]) via the memory interface (Fig. 1, 110, [0034]), the first asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests);
sending a second asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests; Note: Gulati teaches a plurality of asynchronous requests, where there exists at least more than one request) to the external memory (Fig. 1, 120a-d, [0034]) via the memory interface (Fig. 1, 110, [0034]), the second asynchronous read request ([0008], [0035], [0038], [0048] asynchronous memory access requests; Note: Gulati teaches a plurality of asynchronous requests, where there exists at least more than one request).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lueh in view of Yesil’s modified compute device with Gulati’s memory interface and asynchronous reads because they are in the claimed invention’s same field of endeavor of graphical processing unit (GPU) architecture ([0029], [0034]). It would have been obvious to one of ordinary skill in the art to incorporate the memory interface and thereby it’s operations of asynchronous read requests, as doing so allows the modified compute device to reduce the demands on memory bandwidth and average power consumption via the caches included in the memory interface ([0033]). Thus, a person skilled in the art would recognize that modifying with Gulati would provide support to the modified compute device’s computing complexity and power consumption, and thus result in improvements of the device’s performance.
Although it appears Gulati teaches a fabric (Fig. 1, 130, [0025], [0030]), it seems that this is different than the “hybrid threading fabric” as presented in claim 1. Therefore, Gulati is silent with disclosing a hybrid threading fabric.
The combination of Lueh in view of Yesil in view of Gulati are silent with disclosing a hybrid threading fabric.
Brewer teaches a hybrid threading fabric (Pg. 12, HTF cluster in NOC 1x1 Config 1.2Ghz Simulated and in NOC 2x2 Config 1.2Ghz Simulated Figures).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lueh in view of Yesil in view of Gulati’s modified compute device with Brewer’s hybrid threading fabric because they are in the claimed invention’s same field of endeavor of graphical processing unit (GPU) architecture (Pg. 12, footnote 1). It would have been obvious to one of ordinary skill in the art to implemented using hybrid threading fabric as doing so allows significant reduction in energy consumption than conventional GPUs (Pg. 12, Comparison to Reference Platforms). Thus, a person skilled in the art would recognize that modifying with Brewer would provide support to the modified compute device’s power consumption, and thus result in improvements of the device’s performance.
Regarding claim 2, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Lueh teaches the operations further comprising:
the second input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) across at least
a portion of the multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with summing a second set of multiple partial accumulation values corresponding to the second input matrix row across at least a portion of the multiple parallel processing lanes to generate a second output vector row value.
Yesil teaches:
summing a second set of (Pg. 7, Col. 1, Algorithm 2, lines 27, 30; Pg. 7, Col. 1, Para. 2) the multiple partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30) corresponding to a second input matrix row across at least a portion of the parallel processing lanes to generate a second output vector row value (Pg. 6, Col. 2, Para. 2, output vector y; Pg. 7, Col. 1, Algorithm 2, vector y; Pg. 7, Col. 1, Para. 2).
The motivation provided with respect to claim 10 equally applies to claim 11.
Regarding claim 3, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Lueh teaches the operations further comprising:
loading “selecting” to the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) a second set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select), the second set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) also describing non-zero values (Pg. 298, Col. 1, Sub. 6, dense matrices A, B, C) of the input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A);
loading “selecting” to the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate) a second set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) having input vector row numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, vector type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in vector<short, 8> v;) corresponding to input matrix column numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (j); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 8 in matrix<int, 4, 8> m; Pg. 297, Col. 2, Sub. 4) of the second set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select); and
using the multiple parallel (Pg. 289, Col. 1, Sec. I, Para. 1) processing lanes (Pg. 289, Col. 1, Sec. I, Para. 2; Pg. 291, Col. 1, Sec. III, Sub. 2, the GPU contains multiple lanes, and the GPU is programmed to perform cross-lane data sharing to express data parallelism) of the hardware compute element (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate)
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they do not explicitly disclose loading as it is known in the art and are silent with update at least one of the first partial accumulation value and the second partial accumulation value using the second set of multiple coordinate data structure elements and the second set of input vector values.
Yesil teaches:
loading (Pg. 2, Table I; Pg. 2, Col. 2, Sec. II, Sub. B; Pg. 7, Algorithm 2, lines 18, 22-24)
update at least one of the first partial accumulation value and the second partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value and the next row_sums value as indicated by starting index and next index) using the second set of multiple coordinate data structure elements and the second set of input vector values.
The motivation provided with respect to claim 1 equally applies to claim 3.
Regarding claim 4, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Lueh teaches the operations further comprising:
determining a portion of coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, vector or matrix) corresponding to a first number of rows (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) of the input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A), the first number of rows (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) of the input matrix (Pg. 298, Col. 1, Sub. 6, A, B, C; Pg. 297, Col. 2, Sub. 4, A) comprising the first input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;), the portion of coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, vector or matrix) comprising the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select);
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with updating at least one partial accumulation value[[s]] using the portion of [[the]] coordinate data structure elements corresponding to the first number of rows of the input matrix before summing the first partial accumulation values corresponding to the first input matrix row.
Yesil teaches:
updating at least one partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value and the next row_sums value as indicated by starting index and next index) using the portion of the coordinate data structure elements corresponding to the first number of rows of the input matrix before summing (Pg. 7, Col. 1, Algorithm 2, lines 27, 30; Pg. 7, Col. 1, Para. 2) the first partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index) corresponding to the first input matrix row.
The motivation provided with respect to claim 1 equally applies to claim 4.
Regarding claim 5, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Lueh teaches the operations further comprising:
the first set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) comprising multiple input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) having non-contiguous (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3) input vector row numbers (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, vector type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in vector<short, 8> v;).
Regarding claim 6, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Lueh teaches the operations further comprising:
is not contiguous (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3)
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with executing a write operation that writes a first partial accumulation value to a first memory location and the second partial accumulation value to a second memory location that is not contiguous with the first memory location.
Yesil teaches wherein:
executing a write operation (Pg. 3, Col. 2, Para. 2) that writes the first partial accumulation value (Pg. 7, Col. 1, Algorithm 2, lines 17-18, 31-32) to a first memory location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids) and the second partial accumulation value (Pg. 7, Col. 1, Algorithm 2, lines 17-18, 31-32) to a second memory location (Pg. 7, Col. 1, Algorithm 2, lines 31-32, y) that is not contiguous with the first memory location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids).
The motivation provided with respect to claim 1 equally applies to claim 6.
Regarding claim 7, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Lueh teaches the operations further comprising:
gather (Pg. 293, Col. 1, Para. 4, replicate) loading “selecting” (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) the first set of input vector values (Pg. 293, Fig. 1, v.select; Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 3-4, element) from non-contiguous memory locations (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3) at the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, SLM).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they do not explicitly disclose loading as it is known in the art and are silent with gather loading.
Yesil teaches:
gather loading (Pg. 2, Table I; Pg. 2, Col. 2, Sec. II, Sub. B; Pg. 7, Algorithm 2, lines 18, 22-24)
The motivation provided with respect to claim 1 equally applies to claim 6.
Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Lueh in view of Yesil in view of Gulati in view of Brewer as applied to claim 1 above, and further in view of Cook.
Regarding claim 8, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Lueh teaches the operations further comprising:
the first input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;)
the first input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) and
the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, the Intel GPU contains work-groups that are used to communicate).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with the update of the first partial accumulation values, writing the first partial accumulation value, a first location, applying a hash function, to generate a first row number hash.
Yesil teaches:
the update of the first partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index), writing (Pg. 3, Col. 2, Para. 2) the first partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index), and a first location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids).
The motivation provided with respect to claim 1 equally applies to claim 8.
Lueh in view of Yesil are silent with disclosing: applying a hash function and to generate a first row number hash.
Lueh in view of Yesil in view of Gulati are silent with disclosing: applying a hash function and to generate a first row number hash.
Lueh in view of Yesil in view of Gulati in view of Brewer are silent with disclosing: applying a hash function and to generate a first row number hash.
Cook teaches:
applying a hash function (Col. 3, lines 51-65) and to generate a first row number hash (Col. 3, lines 51-65, hash output).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lueh in view of Yesil in view of Gulati in view of Brewer’s modified compute device with Cook’s hashing techniques because they are in the claimed invention’s same field of endeavor of matrix-vector operations (Col. 3, lines 5-10). Lueh teaches the basic structures of computing architecture to perform various processes including well-known computer architecture instructions and arithmetic, however, they never go into specific detail describing the algorithms of the arithmetic and other command processes. Yesil teaches machine instructions including loading, updating, and summing in a processor (Col. 2, Sec. I, Para. 2-3). Cook teaches the remaining limitations of applying a hashing function for memory addressing (Col. 3, lines 51-65) with configurable tag and data addresses to ensure high throughput (Col. 5, lines 30-53). Using the known techniques of hashing as taught by Cook to improve Yesil’s matrix-vector operations in Lueh’s compute element, would have been obvious to one of ordinary skill in the art, since one of ordinary skill in the art would recognize that the system of Lueh in view of Yesil in view of Gulati in view of Brewer was ready for improvement to incorporate the addressing features, as taught by Cook.
Regarding claim 9, in addition to the teachings addressed in the claim 8 analysis, the rejection of claim 8 is incorporated and Lueh teaches the operations further comprising:
a second input matrix row number (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) of a second coordinate data structure (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i,j)) of the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) the second input matrix row (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i); Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 1, 4 in matrix<int, 4, 8> m;) the hardware compute element memory (Pg. 289, Col. 1, Sec. I, Para. 1, SLM) noncontiguous (Pg. 298, Col. 1, Sub. 7; Pg. 297, Col. 1, Para. 1; Pg. 292, Col. 2, Sub. A, Para. 3) a scatter write operation (Pg. 293, Col. 2, Sub. B, Scattered read/write).
Although Lueh generally discloses selecting the first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and partial sums (Pg. 298, Col. 1, Sub. 7), they are silent with the update of the second partial accumulation values, writing the second partial accumulation value, a second location, the first location, the writing of the first partial accumulation value and the second partial accumulation value, applying the hash function, using the second row number hash, and to generate a second row number hash.
Yesil teaches:
the update of the second partial accumulation values (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the next row_sums value as indicated by the next index), writing (Pg. 3, Col. 2, Para. 2) the second partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the next row_sums value as indicated by the next index), a second location (Pg. 7, Col. 1, Algorithm 2, lines 31-32, y), the first location (Pg. 7, Col. 1, Algorithm 2, lines 17-18, row_ids), and the writing (Pg. 3, Col. 2, Para. 2) of the first partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the starting row_sums value as indicated by starting index) and the second partial accumulation value (Pg. 6, Col.1, Para. 2; Pg. 7, Col. 1, Para. 2; Pg. 7, Col. 1, Algorithm 2, line 29-30, the next row_sums value as indicated by the next index).
The motivation provided with respect to claim 1 equally applies to claim 9.
Lueh in view of Yesil are silent with disclosing: applying a hash function and to generate a first row number hash.
Lueh in view of Yesil in view of Gulati are silent with disclosing: applying a hash function and to generate a first row number hash.
Lueh in view of Yesil in view of Gulati in view of Brewer are silent with disclosing: applying a hash function and to generate a first row number hash.
Cook teaches:
applying a hash function (Col. 3, lines 51-65), to generate a first row number hash (Col. 3, lines 51-65, hash output), and to generate a second row number hash (Col. 3, lines 51-65, hash output).
The motivation provided with respect to claim 8 equally applies to claim 9.
Response to Arguments
35 USC 101. Applicant argues in substance the following:
Applicant asserts that, the Specification provides "sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement" See MPEP § 2106.05(a). (Remarks Pg. 10). Applicant explains:
See Specification at ⁋⁋39, 40. The specification further discloses that the problems of sparse matrix processing in a reconfigurable compute fabric can be addressed by representing sparse matrices as coordinate data structures.
See Specification at ⁋ 42. Representing sparce matrices as coordinate data structures, however, reduces the efficiency of the reconfigurable compute fabric by making it more difficult to perform operations in parallel. See Specification at ⁋44. The specification further discloses a technique that enhances parallelization of vector operations based on coordinate data structures in a configurable compute fabric. (Remarks Pg. 11)
See Specification at ⁋⁋ 45-47. The described improvements allow the reconfigurable compute fabric to achieve the efficiency gains associated with representing sparce matrices as coordinate data structures while mitigating the associated loss in parallelization. See id.
Applicant submits that the specific improvements to computing technology described by the Specification are also incorporated into the claim (Remarks Pg. 12).
Applicant submits that claim 1 covers a particular solution to a problem in a
particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. See MPEP § 2106.0S(a). For example, Applicant does not merely claim "improve the efficiency of a reconfigurable compute fabric processing sparce matrices." Instead, claim 1 recites a particular technique executed in a specifically recited reconfigurable compute fabric that brings about more efficient processing of sparce matrices as described (Remarks Pg. 13 bottom – Pg. 14 top).
Examiner respectfully disagrees. Claim 1, with the analysis provided by corresponding claim 15 in the rejection, recites “the particular solution” as an abstract idea and with additional elements that do not integrate the abstract idea into a practical application. Claim 1 is ineligible because the purported specific improvements to computing technology are a result of applying the abstract idea, rather than to the improvement of the technology or functioning of a computer. See MPEP 2106.04(d)(1).
With respect to “achieve the efficiency gains associated with representing sparce matrices as coordinate data structures” it is the abstract idea, the manner of applying the mathematical relationships to express the sparse matrix in an encoded form (coordinate data structure) to perform more efficient matrix-vector multiplication that achieves the purported improvement. See specification para. ([0044]) describing how the encoded form (coordinate data structure) can reduce memory usage and processor operations.
See MPEP 2106.05(a)(I). “Examples that the courts have indicated may not be
sufficient to show an improvement in computer-functionality”.
See MPEP 2106.05(a)(II). “It is important to keep in mind that an improvement in
the abstract idea itself (e.g. a recited fundamental economic concept) is not an improvement in technology.”
See also MPEP 2106.05. An inventive concept "cannot be furnished by the
unpatentable law of nature (or natural phenomenon or abstract idea) itself."
The encoded form of the matrix has issues with parallelization in the reconfigurable compute fabric, or other suitable processing architecture ([0044]). Applicant submits that the improvement thus lies in this parallelization, “while mitigating the associated loss in parallelization” as a consequence of the gather and scatter operations and parallel processing lanes ([0045]). However, these limitations have been analyzed as part of the additional elements, and the “specifically recited reconfigurable compute fabric” is analyzed also as part of the additional elements, of which were recited at a high level of generality and are examples of generic computing elements, and/or merely generally linked to a particular technological environment, and/or merely result in “apply it” on a computer, and/or are examples of insignificant extra-solution activity, mere data gathering, that are also well-understood, routine, or conventional.
The claimed invention does not integrate the judicial exception into a practical application because it is directed to an improvement over well-understood, routine, conventional activity. See MPEP 2106.04(d)(1). Further, the courts have identified other limitations that did not integrate the judicial exception in to a practical application, (i) merely reciting the words “apply it” (or an equivalent) (ii) adding insignificant extra-solution activity (iii) generally linking the use to a particular technological environment, see MPEP 2106.04(d)(I).
Applicant asserts that, the improvement disclosed and recited by claim 1 is not provided by the judicial exception alone. The Office asserts that the claims are directed to Mathematical Concepts. See Office Action at p. 4. Initially, Applicant does not concede that the claims are directed to Mathematical Concepts. Instead, the claims are directed to a particularly recited way of arranging and using a particular arrangement of computing hardware (i.e., a hybrid threading fabric) to improve the efficiency of the arrangement of computing hardware in a recited circumstance. Regardless, even if claim 1 recites Mathematical Concepts, Applicant submits that the improvement of claim 1 are not provided by any mathematical concepts alone. Instead, the improvement of claim 1 depends on the recited arrangement and use of a particularly-recited hardware arrangement - the hybrid threading fabric (Remarks Pg. 14 top).
Examiner respectfully disagrees. To the extent Applicant is arguing a particular machine from the “particular arrangement”, the following factors have been considered when determining whether the hybrid threading fabric recited in claim 1 provides significantly more:
The additional elements are recited at a high-level of generality, and do not provide significantly more than the judicial exception. See MPEP 2106.05(b)(I).
The additional elements invoke computers (or equivalent) or other machinery merely as a tool to perform the judicial exception. See MPEP 2106.05(b)(II).
Use of the additional elements contributes only nominally or insignificantly to the execution of the claimed method (e.g., in a data gathering step or in a field-of-use limitation). See MPEP 2106.05(b)(III).
As discussed in the response to the argument above, the additional elements are recited at a high level of generality and are examples of generic computing elements, and/or merely generally linked to a particular technological environment, and/or merely result in “apply it” on a computer, and/or are examples of insignificant extra-solution activity, mere data gathering, that are also well-understood, routine, or conventional. The additional elements, alone and in combination, do not amount to significantly more than the judicial exception, and therefore the judicial exception is not applied with, or by use of, a particular machine.
Applicant asserts that, that an improvement to "the way the computer stores and retrieves data in combination with the specific data structure recited in the claims" is an eligible improvement to computer technology, and not simply the advantage of an abstract idea. See MPEP 2106.0S(a), Enfish, LLC v. Microsoft Corporation et al., 822 F.3d 1327 (Fed. Cir. 2016). Claim 1 even goes beyond this by reciting details of the arrangement and use of particular hardware units of the hybrid threading fabric (e.g., the hardware compute element, the first and second parallel processing lanes of the hardware compute element) (Remarks Pg. 14 middle).
Examiner respectfully disagrees. The storing/retrieving appear to be recited in claim 1 as the “read request” of which was analyzed as part of the additional elements as examples of insignificant extra-solution activity, mere data gathering, are also well-understood, routine, or conventional. Therefore, the additional elements do not integrate the abstract idea into a practical application as the improvement may not be over well-understood, routine, conventional activity.
35 USC 103. Applicant argues in substance the following:
Applicant asserts that, claim 1 recites an arrangement that includes using a coordinate data structure representations of a sparse matrix in a hybrid threading fabric. The cited references fail to teach or suggest at least this arrangement. Leuh is silent regarding any kind of coordinate data structure. Leuh mentions coordinates at page 297 in its paragraph describing K-means clustering as a potential application of its disclosed techniques. See Leuh at p. 297. The referred-to coordinates, however, have nothing to do with a coordinate data structure representation of a sparce matrix. Instead, the coordinates referenced by Leuh refer to coordinates of the multi-dimensional space in which the K-means clustering occurs (Remarks Pg. 15 bottom – Pg. 16 top).
Examiner respectfully disagrees. In claim 1, with the analysis provided by corresponding claim 15 in the rejection, Lueh teaches a first set of multiple coordinate data structure elements (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, select) and a first coordinate data structure element (Pg. 292, Col. 2, Sec. IV, Sub. A, Para. 4, matrix type (i,j)). The claim language is given its ‘plain meaning’ as interpreted by the broadest reasonable interpretation. See MPEP 2111.01(I).
Applicant asserts that, the Office's rejection is based on (i) a reference (Leuh) that is silent about any sort of coordinate data structure; and (ii) a reference (Yesil) that provides an alternative technique to overcome the disclosed shortcomings of the BCOO coordinate-based technique. Accordingly, Applicant submits that the combination of Leuh and Yesil fails to disclose or suggest the recited arrangement and utilization of "multiple coordinate data structure elements." Further, one having ordinary skill in the art would not modify a combination of Leuh, which fails to disclose coordinate data structures and Yesil, which disparages coordinate-based processing, to arrive at the claimed combination (Remarks Pg. 17).
Examiner respectfully disagrees. Lueh is relied on to teach the coordinate data structure limitation, and Yesil is not.
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). In this case, Lueh teaches the basic structures of computing architecture to perform various processes including well-known computer architecture instructions and arithmetic, however, they never go into specific detail describing the instructions of the arithmetic and other command processes. Yesil teaches machine instructions including loading, updating, generating, and summing in a processor (Col. 2, Sec. I, Para. 2-3). Using the known techniques of Yesil’s machine instructions to improve a similar processor to provide the desired matrix-vector operations in Lueh’s compute element, would have been obvious to one of ordinary skill in the art.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARKUS A VILLANUEVA whose telephone number is (703)756-1603. The examiner can normally be reached M - F 8:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARKUS ANTHONY VILLANUEVA/Examiner, Art Unit 2151
/James Trujillo/Supervisory Patent Examiner, Art Unit 2151