Last updated: April 19, 2026
Application No. 18/222,916
GENERATING SPARSE NEURAL NETWORKS

Non-Final OA §101§103
Filed
Jul 17, 2023
Examiner
MAIDO, MAGGIE T
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
This examiner grants 64% of cases after interview

— +20.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 36 resolved cases, 2023–2026
Examiner Intelligence

MAIDO, MAGGIE T View full profile →
Grants 64% of resolved cases
Career Allow Rate
23 granted / 36 resolved
+8.9% vs TC avg
Strong +21% interview lift
Without
With
+20.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
51 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
25.6%
-14.4% vs TC avg
§103
56.1%
+16.1% vs TC avg
§102
2.6%
-37.4% vs TC avg
§112
15.3%
-24.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§101 §103
DETAILED ACTION

This action is responsive to claims filed on 17 July 2023.
Claims 1-28 are pending for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-28 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception, abstract idea, without significantly more.
Step 1: This part of the eligibility analysis evaluates whether the claim(s) falls within any statutory
category. MPEP 2106.03:
According to the first part of the Alice analysis, in the instant case, the claims were determined
to be directed to one of the four statutory categories: an article of manufacture, a method/process (Claims 1-13), a machine/system/product (Claims 14-28), and a composition of matter. Based on the claims being determined to be within of the four categories (i.e., process, machine, manufacture, or composition of matter), (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea).
Step 2A Prong One: This part of the eligibility analysis evaluates whether the claim(s) recites a
judicial exception. 
Regarding independent claims 1, 14, 21, the claims recite a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG) without significantly more (Step-2A: Prong One). The applicant's claim limitations under broadest reasonable interpretation covers activities classified under mental processes - concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (see MPEP § 2106.04(a)(2), subsection Ill) and the 2019 PEG. As evaluated below:

Claims 1, 14, 21:
“determining at least one sparse pattern based on data stored by at least one data structure within a workload, the data comprising at least one value determined at runtime” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
	“modifying the workload based at least in part on the at least one sparse pattern”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements directed to mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Third, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Lastly, the claims directed to data gathering activity as noted above, are deemed directed to an insignificant extra-solution activity. The courts have found these types of limitations insufficient to
qualify as "significantly more", see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claims 1, 14, 21 do not recite what the courts have identified as "significantly more".

Furthermore, regarding dependent claims 2-13, which depend from claim 1, claims 15-20, which depend from claim 14, claims 22-28, which depend from claim 21, the claims are directed to a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG, a law of nature, or a natural phenomenon) without significantly more as highlighted below in the claim limitations by evaluating the claim limitations under the Step2A and 2B:

Claim 2:
Incorporates the rejection of claim 1.
“wherein the at least one sparse pattern includes at least one of a structured sparse pattern or an unstructured sparse pattern”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 3:
Incorporates the rejection of claim 1.
“profiling the workload to obtain at least one profile”
“using the at least one profile to determine the at least one sparse pattern”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 4:
Incorporates the rejection of claim 3.
“wherein profiling the workload comprises determining at least one sparsity measure for the data stored by the at least one data structure” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 5:
Incorporates the rejection of claim 4.
“wherein the at least one sparsity measure comprises a first sparsity measure, a set of sparse patterns comprises the at least one sparse pattern, the set of sparse patterns is associated with a set of sparsity measures, and a second sparsity measure is associated with the at least one sparse pattern and is a closest within the set of sparsity measures to the first sparsity measure”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
	
Claim 6:
Incorporates the rejection of claim 5.
“wherein the set of sparsity measures comprises a set of densities”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 7:
Incorporates the rejection of claim 1.
“determining at least one sparsity measure for the data stored by the at least one data structure” (mental process of judgement)
“selecting one or more sparse patterns based at least in part on the at least one sparsity measure” (mental process of judgement)
“calculating metrics based at least in part on sparse data stored by the one or more candidate sparse data structures” (mental process of judgement)
“selecting the at least one sparse pattern based at least in part on the metrics” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
“decomposing the data stored by the at least one data structure into one or more candidate sparse data structures in accordance with the one or more sparse patterns”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.


Claim 8:
Incorporates the rejection of claim 7.
“wherein selecting the at least one sparse pattern based at least in part on the metrics comprises performing a comparison of the metrics to one or more threshold values and selecting the at least one sparse pattern based at least in part on results of the comparison” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 9:
Incorporates the rejection of claim 7.
“wherein the metrics include at least one of accuracy values, magnitude loss values, or values indicating Multiply-Accumulate operations performed”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 10:
Incorporates the rejection of claim 1.
“determining at least one sparsity measure for the data stored by the at least one data structure by collecting metrics while performing the workload using test data” (mental process of judgement)
“determining the at least one sparse pattern based at least in part on the at least one sparsity measure” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 11:
Incorporates the rejection of claim 10.
“wherein the test data comprises at least one of random data or pseudorandom data”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 12:
Incorporates the rejection of claim 1.
“wherein the workload comprises a trained machine learning model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 13:
Incorporates the rejection of claim 1.
“adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure”
 “adding at least one second process to the workload to cause the workload to process the plurality of sparse data structures”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 15:
Incorporates the rejection of claim 14.
“profile the workload to obtain a profile comprising at least one sparsity measure for the data stored by the at least one data structure”
“use the profile to determine the configuration”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 16:
Incorporates the rejection of claim 14.
“determine at least one sparsity measure for the data stored by the at least one data structure” (mental process of judgement)
“selecting at least one structured sparse pattern based at least in part on metrics calculated based at least in part on sparse data stored by the plurality of sparse data structures” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
“decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns selected based at least in part on the at least one sparsity measure”
“formulating the configuration based at least in part on the at least one structured sparse pattern”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 17:
Incorporates the rejection of claim 14.
“determine at least one sparsity measure for the data stored by the at least one data structure based at least in part on the metrics” (mental process of judgement)
“use the at least one sparsity measure to determine the configuration” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
	“collect metrics while the trained machine learning model performs inferencing using test data”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 18:
Incorporates the rejection of claim 14.
“wherein the workload comprises at least one neural network having one or more layers”
“transforming the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers in accordance with the configuration”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 19:
Incorporates the rejection of claim 14.
“create at least one sparse data structure based at least in part on new data stored by a new data structure and process the at least one sparse data structure instead of the new data structure”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 20:
Incorporates the rejection of claim 14.
“add at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on a new data structure”
 “add at least one second process to the workload to cause the workload to process the plurality of sparse data structures”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 22:
Incorporates the rejection of claim 21.
“wherein the data is first data, the at least one sparse pattern comprises a structured sparse pattern, and the one or more circuits are to provide faster processing of second data including zero values in accordance with the structured sparse pattern than of third data not including zero values in accordance with the structured sparse pattern” 
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 23:
Incorporates the rejection of claim 22.
“profile the workload to obtain at least one sparsity measure based at least in part on the data”
“use the profile to determine the at least one sparse pattern”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 24:
Incorporates the rejection of claim 23.
“selecting at least one structured sparse pattern based at least in part on metrics calculated based at least in part on the plurality of sparse data structures” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
“decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns selected based at least in part on the at least one sparsity measure”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 25:
Incorporates the rejection of claim 23.
“wherein the workload comprises a trained machine learning model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
“profiling the trained machine learning model comprises collecting metrics while the trained machine learning model performs inferencing using test data”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception or instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 26:
Incorporates the rejection of claim 21.
“wherein the workload comprises at least one neural network having one or more layers”
“modifying the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 27:
Incorporates the rejection of claim 21.
“adding at least one process to the workload that causes the workload to create at least one sparse data structure based at least in part on new data stored by a new data structure”
“processing the at least one sparse data structure instead of the new data structure”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 28:
Incorporates the rejection of claim 21.
“adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure”
 “adding at least one second process to the workload to cause the workload to process sparse data stored by the plurality of sparse data structures”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

The dependent claims as analyzed above, do not recite limitations that integrated the judicial exception into a practical application. In addition, the claim limitations do not include additional elements that are sufficient to amount to significantly more than the judicial exception (Step-2B). Therefore, the claims do not recite any limitations, when considered individually or as a whole, that recite what have the courts have identified as "significantly more", see MPEP 2106.05; and therefore, as a whole the claims are not patent eligible. As shown above, the dependent claims do not provide any additional elements that when considered individually or as an ordered combination, amount to significantly more than the abstract idea identified. Therefore, as a whole, the dependent claims do not recite what have the courts have identified as "significantly more" than the recited judicial exception. Therefore, claims 2-13, 15-20, 22-28 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception and does not recite, when claim elements are examined individually and as a whole, elements that the courts have identified as "significantly more" than the recited judicial exception.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3, 12, 14-15, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Baskaran et al. (U.S. Pre-Grant Publication No. 20150169369, hereinafter ‘Baskaran'), in view of Nurvitadhi et al. (U.S. Pre-Grant Publication No. 20200279349, hereinafter 'Nurvitadhi'). 

	Regarding claim 1, Baskaran teaches A method comprising: determining at least one sparse pattern based on data stored by at least one data structure within a workload, the data comprising at least one value determined at runtime ([0090] Specifically, in some embodiments, each work processor has a local work queue and the work processor may dynamically based on data stored by at least one data structure within a workload acquire the workload thereof, the data comprising at least one value determined at runtime through the run-time layer and from a work pool of tasks.; [0094] Dynamic scheduling typically includes determining at least one sparse pattern identifying, at run time, patterns of non-zero data elements, and forming groups of data elements that are non-zero (852).); and
Baskaran fails to teach modifying the workload based at least in part on the at least one sparse pattern.
	Nurvitadhi teaches modifying the workload based at least in part on the at least one sparse pattern ([0161] In a further embodiment, based at least in part on the at least one sparse pattern pattern recognition logic 708 may be implemented to parse through large volumes of dense data to determine segments that may be processed as sparse operations. As a result, segment logic 709 records the address locations of sparse segments identified by pattern recognition logic 708. In one embodiment, sparse segments 709 comprise pointers to the sparse segment components. As discussed above, matrix multiplications for sparse operations can be bypassed, thus modifying the workload reducing the processing load at GPU 614.).
Baskaran and Nurvitadhi are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Nurvitadhi to Baskaran before the effective filing date of the claimed invention in order to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit (cf. Nurvitadhi, [0038] In embodiments, mechanisms for performing sparse matrix processing mechanism is disclosed. In some embodiments, the processing mechanism includes processing elements including a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit. In other embodiments, the processing mechanism includes pattern tracking logic to detect one or more segments of sparse data in a stored block of data and record an address location for each detected segment of sparse data. In yet other embodiments, the processing mechanism compresses sparse matrices and stores one or more frequently used sparse matrices in a sparse compressed buffer for execution for processing. In a further embodiment, the processing mechanism partitions a plurality of execution units (EUs) and allocates each partition of EUs to execute threads associated with a neural network layer.).

Regarding claim 3, Baskaran, as modified by Nurvitadhi, teaches The method of claim 1.
	Baskaran teaches further comprising: profiling the workload to obtain at least one profile; and using the at least one profile to determine the at least one sparse pattern ([0085] In various embodiments, when there are multiple iterations of a computation block, if the computation block is to be distributed across processors, and using the at least one profile to determine the at least one sparse pattern if the non-zero structure or access pattern does not significantly change within the block, the first iteration of the block is executed with a dynamic task scheduling scheme. The "state" information about the processor workload (such as which portions of the computation block (or equivalently tasks) get executed on which processor), is logged or stored. In the subsequent iterations, profiling the workload to obtain at least one profile the logged/stored information about the processor workload is used to schedule statically various tasks across processors, thereby achieving the benefit of load balanced execution without incurring significant scheduling overhead. This static-plus-dynamic (or hybrid) task scheduling approach can greatly reduce the task scheduling overhead and also facilitate an improved load balance across processors. As described below, the load balance can be improved further via task/operation migration prior to the subsequent iterations.; [0086] In the case of sparse tensor computations, as mentioned earlier, there is usually a repetition of mode-specific operations and the iteration with dynamic scheduling is carefully chosen before the start of a computation block where the logged information obtained during the dynamic scheduling can be reused.).
Baskaran and Nurvitadhi are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 12, Baskaran, as modified by Nurvitadhi, teaches The method of claim 1.
	Nurvitadhi teaches wherein the workload comprises a trained machine learning model ([0176] FIG. 9 illustrates a highly-parallel general-purpose graphics processing unit 900, according to an embodiment. In one embodiment, the general-purpose processing unit (GPGPU) 900 can be configured to be particularly efficient in processing the type of computational workload comprises a trained machine learning model workloads associated with training deep neural networks.).
	Baskaran and Nurvitadhi are combinable for the same rationale as set forth above with respect to claim 1.

	Regarding claim 14, Baskaran teaches A system comprising: at least one processor; and
memory storing instructions that when executed by the at least one processor cause the at least one processor to ([0057] In another aspect, a computer system includes a first processor and a first memory coupled to the first processor. The first memory includes instructions which, when executed by a processing unit that includes the first processor and/or a second processor, program the processing unit for scheduling operations on a number of work processors.; [0059] In another aspect, an article of manufacture that includes a non-transitory storage medium has stored therein instructions which, when executed by a processing unit program the processing unit to schedule operations on a number of work processors.):
	determine a configuration based at least in part on data stored by at least one data structure within a workload at runtime ([0090] Specifically, in some embodiments, each work processor has a local work queue and the work processor may dynamically based at least in part on data stored by at least one data structure acquire the within a workload at runtime  workload thereof, through the run-time layer and from a work pool of tasks.; [0094] Dynamic scheduling typically includes determine a configuration identifying, at run time, patterns of non-zero data elements, and forming groups of data elements that are non-zero (852).); and
	Baskaran fails to teach transform the workload into a sparse workload based at least in part on the configuration.
	Nurvitadhi teaches transform the workload into a sparse workload based at least in part on the configuration ([0161] In a further embodiment, into a sparse workload based at least in part on the configuration pattern recognition logic 708 may be implemented to parse through large volumes of dense data to determine segments that may be processed as sparse operations. As a result, segment logic 709 records the address locations of sparse segments identified by pattern recognition logic 708. In one embodiment, sparse segments 709 comprise pointers to the sparse segment components. As discussed above, matrix multiplications for sparse operations can be bypassed, thus transform the workload reducing the processing load at GPU 614.).
Baskaran and Nurvitadhi are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 15, Baskaran, as modified by Nurvitadhi, teaches The system of claim 14.
	Baskaran teaches wherein the instructions, when executed by the at least one processor, cause the at least one processor to: profile the workload to obtain a profile comprising at least one sparsity measure for the data stored by the at least one data structure; and use the profile to determine the configuration ([0020] comprising at least one sparsity measure for the data stored by the at least one data structure When the input tensor is dense, the above-mentioned computations are all dense and regular. The R-Stream™ compiler has been well established as an effective source-to-source high level optimizing compiler to parallelize and optimize such dense regular computations. However when the input tensor is sparse, the computations involving (and affected by the) sparsity of the input tensor are also generally sparse and irregular.; [0085] In various embodiments, when there are multiple iterations of a computation block, if the computation block is to be distributed across processors, and use the profile to determine the configuration if the non-zero structure or access pattern does not significantly change within the block, the first iteration of the block is executed with a dynamic task scheduling scheme. The "state" information about the processor workload (such as which portions of the computation block (or equivalently tasks) get executed on which processor), is logged or stored. In the subsequent iterations, profile the workload to obtain a profile the logged/stored information about the processor workload is used to schedule statically various tasks across processors, thereby achieving the benefit of load balanced execution without incurring significant scheduling overhead. This static-plus-dynamic (or hybrid) task scheduling approach can greatly reduce the task scheduling overhead and also facilitate an improved load balance across processors. As described below, the load balance can be improved further via task/operation migration prior to the subsequent iterations.; [0086] In the case of sparse tensor computations, as mentioned earlier, there is usually a repetition of mode-specific operations and the iteration with dynamic scheduling is carefully chosen before the start of a computation block where the logged information obtained during the dynamic scheduling can be reused.).
	Baskaran and Nurvitadhi are combinable for the same rationale as set forth above with respect to claim 1.

	Regarding claim 21, Baskaran teaches A processor, comprising ([0057] In another aspect, a computer system includes a first processor and a first memory coupled to the first processor. The first memory includes instructions which, when executed by a processing unit that includes the first processor and/or a second processor, program the processing unit for scheduling operations on a number of work processors.; [0059] In another aspect, an article of manufacture that includes a non-transitory storage medium has stored therein instructions which, when executed by a processing unit program the processing unit to schedule operations on a number of work processors.):
	one or more circuits to determine at least one sparse pattern based at least in part on data stored by at least one data structure of a workload at runtime ([0090] Specifically, in some embodiments, each work processor has a local work queue and the work processor may dynamically based at least in part on data stored by at least one data structure of a workload acquire the workload thereof, at runtime through the run-time layer and from a work pool of tasks.; [0094] Dynamic scheduling typically includes one or more circuits to determine at least one sparse pattern identifying, at run time, patterns of non-zero data elements, and forming groups of data elements that are non-zero (852).), and
	Baskaran fails to teach modify the workload based at least in part on the at least one sparse pattern.
	Nurvitadhi teaches modify the workload based at least in part on the at least one sparse pattern ([0161] In a further embodiment, based at least in part on the at least one sparse pattern pattern recognition logic 708 may be implemented to parse through large volumes of dense data to determine segments that may be processed as sparse operations. As a result, segment logic 709 records the address locations of sparse segments identified by pattern recognition logic 708. In one embodiment, sparse segments 709 comprise pointers to the sparse segment components. As discussed above, matrix multiplications for sparse operations can be bypassed, thus modify the workload reducing the processing load at GPU 614.).
	Baskaran and Nurvitadhi are combinable for the same rationale as set forth above with respect to claim 1.

Claims 2, 13, 20, 28 are rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, in view of Nurvitadhi, and further in view of Elango et al. (U.S. Pre-Grant Publication No. 20230385374, hereinafter 'Elango'). 

Regarding claim 2, Baskaran, as modified by Nurvitadhi, teaches The method of claim 1.
Baskaran, as modified by Nurvitadhi, fails to teach wherein the at least one sparse pattern includes at least one of a structured sparse pattern or an unstructured sparse pattern.
	Elango teaches wherein the at least one sparse pattern includes at least one of a structured sparse pattern or an unstructured sparse pattern ([0015] wherein the at least one sparse pattern includes at least one of a structured sparse pattern Sparsity may be implemented as structured sparsity or or an unstructured sparse pattern unstructured sparsity. Unstructured sparsity allows a high degree of freedom for pruning but often is not hardware friendly. Structured sparsity, on the other hand, can be efficiently implemented in hardware, but may lead to noticeable reduction in model accuracy.).
Baskaran, Nurvitadhi, and Elango are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran and Nurvitadhi, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Elango to Baskaran before the effective filing date of the claimed invention in order to reduce variability and achieve more uniform sparsity (cf. Elango, [0021] To reduce variability and achieve more uniform sparsity, systems and methods are presented herein where a first block as pruned using fine grained balanced sparsity and the second block is pruned using coarse-grained balanced sparsity. In this way, the resulting combined sparsity is uniformly achieved without any additional computational burden. For coarse-grained sparsity, the applied sparsity percentage is applied at the level of sub-blocks, rather than at the level of individual elements. By combining these together, the patterns of the two blocks are complementary in such a way that a desired percentage of elements are maintained from each block, without the risk of oversparsifying.).

Regarding claim 13, Baskaran, as modified by Nurvitadhi, teaches The method of claim 1.
Baskaran, as modified by Nurvitadhi, fails to teach wherein modifying the workload comprises: adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure; and adding at least one second process to the workload to cause the workload to process the plurality of sparse data structures.
	Elango teaches wherein modifying the workload comprises: adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure; and adding at least one second process to the workload to cause the workload to process the plurality of sparse data structures ([0065] During training, both the activation and the weight matrices are dynamically changing, e.g., during each forward phase there will be new elements in the activation matrix and each backpropagation updates the weight matrix. The overall sparsity levels may be set as a constant, or may change progressively over training (e.g., decreasing step-wise based on model performance).; [0066] However, during inference, the weight matrix is fixed based on training. The activation matrix, which depends on the user input, is adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure calculated newly for each forward phase based on the newly input data. The dimensions and size of the activation matrix may essentially stay the same, but the individual elements are different for each forward phase. As such, during inference, when the adding at least one second process to the workload to cause the workload to process the plurality of sparse data structures  sparsity masks are computed, the masks for the weight matrix may be reused or maintained (e.g., static), but the masks for the activation matrix may be dynamically recomputed for each forward phase (e.g., dynamic).).
	Baskaran, Nurvitadhi, and Elango are combinable for the same rationale as set forth above with respect to claim 2.

Regarding claim 20, Baskaran, as modified by Nurvitadhi, teaches The system of claim 14.
Baskaran, as modified by Nurvitadhi, fails to teach wherein transforming the workload comprises: add at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on a new data structure; and add at least one second process to the workload to cause the workload to process the plurality of sparse data structures.
	Elango teaches wherein transforming the workload comprises: add at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on a new data structure; and add at least one second process to the workload to cause the workload to process the plurality of sparse data structures ([0065] During training, both the activation and the weight matrices are dynamically changing, e.g., during each forward phase there will be new elements in the activation matrix and each backpropagation updates the weight matrix. The overall sparsity levels may be set as a constant, or may change progressively over training (e.g., decreasing step-wise based on model performance).; [0066] However, during inference, the weight matrix is fixed based on training. The activation matrix, which depends on the user input, is add at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on a new data structure calculated newly for each forward phase based on the newly input data. The dimensions and size of the activation matrix may essentially stay the same, but the individual elements are different for each forward phase. As such, during inference, when the add at least one second process to the workload to cause the workload to process the plurality of sparse data structures sparsity masks are computed, the masks for the weight matrix may be reused or maintained (e.g., static), but the masks for the activation matrix may be dynamically recomputed for each forward phase (e.g., dynamic).).
Baskaran, Nurvitadhi, and Elango are combinable for the same rationale as set forth above with respect to claim 2.

Regarding claim 28, Baskaran, as modified by Nurvitadhi, teaches The processor of claim 21.
	Baskaran, as modified by Nurvitadhi, fails to teach wherein modifying the workload comprises: adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure; and adding at least one second process to the workload to cause the workload to process sparse data stored by the plurality of sparse data structures.
	Elango teaches wherein modifying the workload comprises: adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure; and adding at least one second process to the workload to cause the workload to process sparse data stored by the plurality of sparse data structures ([0065] During training, both the activation and the weight matrices are dynamically changing, e.g., during each forward phase there will be new elements in the activation matrix and each backpropagation updates the weight matrix. The overall sparsity levels may be set as a constant, or may change progressively over training (e.g., decreasing step-wise based on model performance).; [0066] However, during inference, the weight matrix is fixed based on training. The activation matrix, which depends on the user input, is adding at least one first process to the workload to cause the workload to create a plurality of sparse data structures based at least in part on new data stored by a new data structure calculated newly for each forward phase based on the newly input data. The dimensions and size of the activation matrix may essentially stay the same, but the individual elements are different for each forward phase. As such, during inference, when the adding at least one second process to the workload to cause the workload to process sparse data stored by the plurality of sparse data structures sparsity masks are computed, the masks for the weight matrix may be reused or maintained (e.g., static), but the masks for the activation matrix may be dynamically recomputed for each forward phase (e.g., dynamic).).
Baskaran, Nurvitadhi, and Elango are combinable for the same rationale as set forth above with respect to claim 2.

Claims 4-6, 10, 17, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, Nurvitadhi, and further in view of Acar et al. (U.S. Pre-Grant Publication No. 20160259826, hereinafter 'Acar'). 

Regarding claim 4, Baskaran, as modified by Nurvitadhi, teaches The method of claim 3.
Baskaran, as modified by Nurvitadhi, fails to teach wherein profiling the workload comprises determining at least one sparsity measure for the data stored by the at least one data structure.
	Acar teaches wherein profiling the workload comprises determining at least one sparsity measure for the data stored by the at least one data structure ([0106] Based on results of the comparison, a corresponding compressed matrix representation data structure is selected for use with the current iteration (step 460). For example, if the sparsity of the input vector is equal to or greater than a sparsity threshold value, i.e. the vector is sufficiently sparse, then a first compressed matrix representation data structure (e.g., CSC) is selected for use during the present iteration. However, if the determining at least one sparsity measure for the data stored by the at least one data structure sparsity of the input vector is less than the sparsity threshold value, i.e. the input vector is dense, then a second compressed matrix representation data structure (e.g., CSR) is selected for use during the present iteration. Of course this may be extended to additional types of compressed matrix representations based on additional threshold values such that as the density continues to increase, other compressed matrix representations suitable for parallelized execution at higher density input vectors may be selected.).
	Baskaran, Nurvitadhi, and Acar are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran and Nurvitadhi, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Acar to Baskaran before the effective filing date of the claimed invention in order to improve the efficiency and speed by which operations are performed on such large scale matrices (cf. Acar, [0033] The illustrative embodiments provide mechanisms for improving the efficiency and speed by which such operations are performed on such large scale matrices. The illustrative embodiments leverage the efficiency of different matrix storage formats for ordering non-zero entries in a large data set represented by a large sparse matrix. The particular storage format used for performing iterations of the matrix vector multiplication operation is selected dynamically based on the sparsity of the matrix and/or vector involved in the matrix vector multiplication operation. The leveraging of these different storage formats facilitates parallel execution of partial matrix vector multiplication operations by parallel threads, execution engines, processors, or the like.).

Regarding claim 5, Baskaran, as modified by Nurvitadhi and Acar, teaches The method of claim 4.
	Acar teaches wherein the at least one sparsity measure comprises a first sparsity measure, a set of sparse patterns comprises the at least one sparse pattern, the set of sparse patterns is associated with a set of sparsity measures, and a second sparsity measure is associated with the at least one sparse pattern and is a closest within the set of sparsity measures to the first sparsity measure ([0102] Thus, as the one sparsity measure comprises a first sparsity measure sparsity of the input vector decreases and the input vector becomes more dense with each iteration, the compressed matrix representation may be dynamically switched from one compressed matrix representation to another. Looking at it from a vector density perspective, as the density of the input vector increases with each iteration, the compressed matrix representation may be dynamically switched.; [0106] Based on results of the comparison, a corresponding compressed a set of sparse patterns comprises the at least one sparse pattern matrix representation data structure is selected for use with the current iteration (step 460). For example, the set of sparse patterns is associated with a set of sparsity measures if the a second sparsity measure is associated with the at least one sparse pattern and is a closest within the set of sparsity measures to the first sparsity measure sparsity of the input vector is equal to or greater than a sparsity threshold value, i.e. the vector is sufficiently sparse, then a first compressed matrix representation data structure (e.g., CSC) is selected for use during the present iteration. However, if the sparsity of the input vector is less than the sparsity threshold value, i.e. the input vector is dense, then a second compressed matrix representation data structure (e.g., CSR) is selected for use during the present iteration. Of course this may be extended to additional types of compressed matrix representations based on additional threshold values such that as the density continues to increase, other compressed matrix representations suitable for parallelized execution at higher density input vectors may be selected.).
Baskaran, Nurvitadhi, and Acar are combinable for the same rationale as set forth above with respect to claim 4.

Regarding claim 6, Baskaran, as modified by Nurvitadhi and Acar, teaches The method of claim 5.
	Acar teaches wherein the set of sparsity measures comprises a set of densities ([0105] The set of sparsity measures comprises a set of densities sparsity (or alternatively the density) of the input vector is calculated and compared to one or more sparsity (or density) threshold values (step 450). It should be appreciated that sparsity and density are alternative sides of the same characteristics. Both measure a relation between zero and non-zero values in the input vector. When the number of zero values in the input vector is greater than the number of non-zero values, the input vector is more sparse, or less dense. When the number of zero values in the input vector is less than the number of non-zero values in the input vector, then the input vector is less sparse, or more dense. Thus, sparsity or density may be evaluated in this operation.).
Baskaran, Nurvitadhi, and Acar are combinable for the same rationale as set forth above with respect to claim 4.

Regarding claim 10, Baskaran, as modified by Nurvitadhi, teaches The method of claim 1.
Baskaran, as modified by Nurvitadhi, fails to teach further comprising: determining at least one sparsity measure for the data stored by the at least one data structure by collecting metrics while performing the workload using test data; and determining the at least one sparse pattern based at least in part on the at least one sparsity measure.
	Acar teaches further comprising: determining at least one sparsity measure for the data stored by the at least one data structure by collecting metrics while performing the workload using test data ([0103] FIG. 4 is a flowchart outlining an example operation for dynamically modifying the compressed matrix representation utilized for iterations of a matrix operation based on a determining at least one sparsity measure for the data stored by the at least one data structure determination of the sparsity/density of an input vector collecting metrics while performing the workload using test data using a hybrid matrix representation mechanism in accordance with one illustrative embodiment.; [0099] The parallel partial matrix vector multiplication operations 350 may be repeated until the iterations of the process converge (step 360). Iterations typically converge (step 360) based on monitoring the change in the output vector. If the output vector change becomes very small in relative terms and in magnitude, the iterations are deemed to be converged, and the system generates the output vector (step 370). Based on a benchmark set that typically represents the test cases, the iteration convergence can be also be set as a fixed number of iterations. For example, one could set the number of iterations to 5 based on the benchmark test, where the final output vector is generated upon execution of the fifth iteration.); and
	determining the at least one sparse pattern based at least in part on the at least one sparsity measure ([0102] Thus, as the sparsity of the input vector decreases and the input vector becomes more dense with each iteration, the compressed matrix representation may be dynamically switched from one compressed matrix representation to another. Looking at it from a vector density perspective, as the determining the at least one sparse pattern based at least in part on the at least one sparsity measure density of the input vector increases with each iteration, the compressed matrix representation may be dynamically switched.; [0106] Based on results of the comparison, a corresponding compressed matrix representation data structure is selected for use with the current iteration (step 460). For example, sparsity of the input vector is equal to or greater than a sparsity threshold value, i.e. the vector is sufficiently sparse, then a first compressed matrix representation data structure (e.g., CSC) is selected for use during the present iteration. However, if the sparsity of the input vector is less than the sparsity threshold value, i.e. the input vector is dense, then a second compressed matrix representation data structure (e.g., CSR) is selected for use during the present iteration. Of course this may be extended to additional types of compressed matrix representations based on additional threshold values such that as the density continues to increase, other compressed matrix representations suitable for parallelized execution at higher density input vectors may be selected.).
Baskaran, Nurvitadhi, and Acar are combinable for the same rationale as set forth above with respect to claim 4.

Regarding claim 17, Baskaran, as modified by Nurvitadhi, teaches The system of claim 14.
	Nurvitadhi teaches wherein the workload comprises a trained machine learning model ([0176] FIG. 9 illustrates a highly-parallel general-purpose graphics processing unit 900, according to an embodiment. In one embodiment, the general-purpose processing unit (GPGPU) 900 can be configured to be particularly efficient in processing the type of computational workload comprises a trained machine learning model workloads associated with training deep neural networks.), and
Baskaran, as modified by Nurvitadhi, fails to teach the instructions, when executed by the at least one processor, cause the at least one processor to: collect metrics while the trained machine learning model performs inferencing using test data; determine at least one sparsity measure for the data stored by the at least one data structure based at least in part on the metrics; and use the at least one sparsity measure to determine the configuration.
	Acar teaches the instructions, when executed by the at least one processor, cause the at least one processor to: collect metrics while the trained machine learning model performs inferencing using test data ([0023] These networks or graphs may also be represented as large scale matrices in which indices (column and row indices) represent the nodes, and weights (or strengths) of the edges are represented by values in the matrix.; [0103] FIG. 4 is a flowchart outlining an example operation for dynamically modifying the compressed matrix representation utilized for iterations of a matrix operation based on a determination of the sparsity/density of an input vector collect metrics using a hybrid matrix representation mechanism in accordance with one illustrative embodiment.; [0099] The parallel partial matrix vector multiplication operations 350 may be repeated until the iterations of the process converge (step 360). Iterations typically converge (step 360) based on monitoring the change in the output vector. If the output vector change becomes very small in relative terms and in magnitude, the iterations are deemed to be converged, and the system generates the output vector (step 370). Based on a benchmark set that typically represents the test cases, the iteration convergence can be also be set as a fixed number of iterations. For example, one could set the number of iterations to 5 based on the benchmark test, where the final output vector is generated upon execution of the fifth iteration.; [0116] The scores obtained from the various reasoning algorithms indicate the extent to which the potential response is inferred by the input question based on the specific area of focus of that reasoning algorithm. Each resulting score is then weighted against a statistical model. The statistical model captures while the trained machine learning model performs inferencing using test data how well the reasoning algorithm performed at establishing the inference between two similar passages for a particular domain during the training period of the QA system.);
	determine at least one sparsity measure for the data stored by the at least one data structure based at least in part on the metrics ([0103] FIG. 4 is a flowchart outlining an example operation for dynamically modifying the compressed matrix representation utilized for iterations of a matrix operation based on a determine at least one sparsity measure for the data stored by the at least one data structure determination of the sparsity/density of an input vector based at least in part on the metrics using a hybrid matrix representation mechanism in accordance with one illustrative embodiment.; [0099] The parallel partial matrix vector multiplication operations 350 may be repeated until the iterations of the process converge (step 360). Iterations typically converge (step 360) based on monitoring the change in the output vector. If the output vector change becomes very small in relative terms and in magnitude, the iterations are deemed to be converged, and the system generates the output vector (step 370). Based on a benchmark set that typically represents the test cases, the iteration convergence can be also be set as a fixed number of iterations. For example, one could set the number of iterations to 5 based on the benchmark test, where the final output vector is generated upon execution of the fifth iteration.); and
	use the at least one sparsity measure to determine the configuration ([0102] Thus, as the sparsity of the input vector decreases and the input vector becomes more dense with each iteration, the compressed matrix representation may be dynamically switched from one compressed matrix representation to another. Looking at it from a vector density perspective, as the use the at least one sparsity measure to determine the configuration density of the input vector increases with each iteration, the compressed matrix representation may be dynamically switched.; [0106] Based on results of the comparison, a corresponding compressed matrix representation data structure is selected for use with the current iteration (step 460). For example, sparsity of the input vector is equal to or greater than a sparsity threshold value, i.e. the vector is sufficiently sparse, then a first compressed matrix representation data structure (e.g., CSC) is selected for use during the present iteration. However, if the sparsity of the input vector is less than the sparsity threshold value, i.e. the input vector is dense, then a second compressed matrix representation data structure (e.g., CSR) is selected for use during the present iteration. Of course this may be extended to additional types of compressed matrix representations based on additional threshold values such that as the density continues to increase, other compressed matrix representations suitable for parallelized execution at higher density input vectors may be selected.).
Baskaran, Nurvitadhi, and Acar are combinable for the same rationale as set forth above with respect to claim 4.

Regarding claim 19, Baskaran, as modified by Nurvitadhi, teaches The system of claim 14.
Baskaran, as modified by Nurvitadhi, fails to teach wherein transforming the workload comprises causing the workload to: create at least one sparse data structure based at least in part on new data stored by a new data structure and process the at least one sparse data structure instead of the new data structure.
	Acar teaches wherein transforming the workload comprises causing the workload to:
create at least one sparse data structure based at least in part on new data stored by a new data structure and process the at least one sparse data structure instead of the new data structure ([0034] In one illustrative embodiment, for iterations of the matrix vector multiplication operation where the sparsity based at least in part on new data stored by a new data structure of the vector is less than a predetermined threshold, a first matrix storage format data structure is utilized to resolve the matrix vector multiplication operation.; [0107] The process the at least one sparse data structure instead of the new data structure iteration of the matrix operation is then executed in a parallel manner using the create at least one sparse data structure selected compressed matrix representation data structure (step 470). A determination is made as to whether the iterations have converged (step 480) and, if not, the operation returns to step 440 with the input vector now being the output vector of the previous iteration. Otherwise, if the iterations have converged, then the output vector is generated as the aggregate of the output vectors of the partial matrix vector multiplication operations performed during the iterations (step 490).).
Baskaran, Nurvitadhi, and Acar are combinable for the same rationale as set forth above with respect to claim 4.

Claims 7-9, 27 are rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, in view of Nurvitadhi, Elango, and further in view of Acar. 

Regarding claim 7, Baskaran, as modified by Nurvitadhi, teaches The method of claim 1.
Baskaran, as modified by Nurvitadhi, fails to teach further comprising: determining at least one sparsity measure for the data stored by the at least one data structure; selecting one or more sparse patterns based at least in part on the at least one sparsity measure; decomposing the data stored by the at least one data structure into one or more candidate sparse data structures in accordance with the one or more sparse patterns; calculating metrics based at least in part on sparse data stored by the one or more candidate sparse data structures; and selecting the at least one sparse pattern based at least in part on the metrics.
	Elango teaches calculating metrics based at least in part on sparse data stored by the one or more candidate sparse data structures; and selecting the at least one sparse pattern based at least in part on the metrics ([0058] At 870, method 800 includes matrix multiplying the first block and second block. By applying fine-grained sparsity to the first block (e.g. weights) and applying coarse-grained sparsity to the second block (e.g., activations), the first and second blocks will have completely different sparsity patterns. While each corresponding pairs of based at least in part on sparse data stored by the one or more candidate sparse data structures sub-blocks may have different levels of sparsity, the selecting the at least one sparse pattern based at least in part on the metrics differing patterns generate a combined sparsity in the matmul product that is deterministically uniform throughout the product (e.g., the same or within a threshold similarity for each block) without adding any computational cost, thus leading to calculating metrics increased model accuracies at the same cost.; [0069] It has been shown that the loss in accuracy due to sparsity can be reduced by minimizing the one-norm of the pruned values. One approach to achieve this for structured sparsity includes computing a permutation matrix that minimizes the pruned one-norm for each respective weight matrix using a greedy reordering technique. The weight matrices may then be permuted using these permutation matrices. Structured sparsity may then be applied on top of these permuted weight matrices. This process can be adapted to both fine-grained and coarse-grained balanced sparsity patterns to further increase the pruned accuracy. Matrix elements may thus be shuffled around so that they are randomly distributed.).
	Baskaran, Nurvitadhi, and Elango are combinable for the same rationale as set forth above with respect to claim 2.
	Acar teaches further comprising: determining at least one sparsity measure for the data stored by the at least one data structure; selecting one or more sparse patterns based at least in part on the at least one sparsity measure ([0106] Based on results of the comparison, a corresponding compressed matrix representation data structure is selected for use with the current iteration (step 460). For example, if the sparsity of the input vector is equal to or greater than a sparsity threshold value, i.e. the vector is sufficiently sparse, then a first compressed matrix representation data structure (e.g., CSC) is selected for use during the present iteration. However, if the determining at least one sparsity measure for the data stored by the at least one data structure sparsity of the input vector is less than the sparsity threshold value, i.e. the input vector is dense, then a selecting one or more sparse patterns based at least in part on the at least one sparsity measure second compressed matrix representation data structure (e.g., CSR) is selected for use during the present iteration. Of course this may be extended to additional types of compressed matrix representations based on additional threshold values such that as the density continues to increase, other compressed matrix representations suitable for parallelized execution at higher density input vectors may be selected.);
	decomposing the data stored by the at least one data structure into one or more candidate sparse data structures in accordance with the one or more sparse patterns ([0107] The decomposing the data stored by the at least one data structure into one or more candidate sparse data structures in accordance with the one or more sparse patterns iteration of the matrix operation is then executed in a parallel manner using the selected compressed matrix representation data structure (step 470). A determination is made as to whether the iterations have converged (step 480) and, if not, the operation returns to step 440 with the input vector now being the output vector of the previous iteration. Otherwise, if the iterations have converged, then the output vector is generated as the aggregate of the output vectors of the partial matrix vector multiplication operations performed during the iterations (step 490).);
	Baskaran, Nurvitadhi, Elango, and Acar are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran, Nurvitadhi, and Elango, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Acar to Baskaran before the effective filing date of the claimed invention in order to improve the efficiency and speed by which operations are performed on such large scale matrices (cf. Acar, [0033] The illustrative embodiments provide mechanisms for improving the efficiency and speed by which such operations are performed on such large scale matrices. The illustrative embodiments leverage the efficiency of different matrix storage formats for ordering non-zero entries in a large data set represented by a large sparse matrix. The particular storage format used for performing iterations of the matrix vector multiplication operation is selected dynamically based on the sparsity of the matrix and/or vector involved in the matrix vector multiplication operation. The leveraging of these different storage formats facilitates parallel execution of partial matrix vector multiplication operations by parallel threads, execution engines, processors, or the like.).

Regarding claim 8, Baskaran, as modified by Nurvitadhi, Elango, and Acar, teaches The method of claim 7.
	Acar teaches wherein selecting the at least one sparse pattern based at least in part on the metrics comprises performing a comparison of the metrics to one or more threshold values and selecting the at least one sparse pattern based at least in part on results of the comparison ([0034] In one illustrative embodiment, for iterations of the matrix vector multiplication operation where the performing a comparison of the metrics to one or more threshold values and selecting the at least one sparse pattern based at least in part on results of the comparison sparsity of the vector is less than a predetermined threshold, a first matrix storage format data structure is utilized to resolve the matrix vector multiplication operation. For iterations of the matrix vector multiplication operation where the sparsity of the vector is equal to or greater than the predetermined threshold, a second matrix storage format data structure is utilized to resolve the matrix vector multiplication operation.).
	Baskaran, Nurvitadhi, Elango, and Acar are combinable for the same rationale as set forth above with respect to claim 7.

Regarding claim 9, Baskaran, as modified by Nurvitadhi, Elango, and Acar, teaches The method of claim 7.
	Elango teaches wherein the metrics include at least one of accuracy values, magnitude loss values, or values indicating Multiply-Accumulate operations performed ([0058] At 870, method 800 includes matrix multiplying the first block and second block. By applying fine-grained sparsity to the first block (e.g. weights) and applying coarse-grained sparsity to the second block (e.g., activations), the first and second blocks will have completely different sparsity patterns. While each corresponding pairs of sub-blocks may have different levels of sparsity, the differing patterns generate a combined sparsity in the matmul product that is deterministically uniform throughout the product (e.g., the same or within a threshold similarity for each block) without adding any computational cost, thus leading to metrics include at least one of accuracy values increased model accuracies at the same cost.).
	Baskaran, Nurvitadhi, Elango, and Acar are combinable for the same rationale as set forth above with respect to claim 7.

Regarding claim 27, Baskaran, as modified by Nurvitadhi, teaches The processor of claim 21.
	Baskaran, as modified by Nurvitadhi, fails to teach wherein modifying the workload comprises: adding at least one process to the workload that causes the workload to create at least one sparse data structure based at least in part on new data stored by a new data structure, and processing the at least one sparse data structure instead of the new data structure.
	Elango teaches wherein modifying the workload comprises: adding at least one process to the workload that causes the workload to create at least one sparse data structure based at least in part on new data stored by a new data structure ([0065] During training, both the activation and the weight matrices are dynamically changing, e.g., during each forward phase there will be new elements in the activation matrix and each backpropagation updates the weight matrix. The overall sparsity levels may be set as a constant, or may change progressively over training (e.g., decreasing step-wise based on model performance).; [0066] However, during inference, the weight matrix is fixed based on training. The activation matrix, which depends on the user input, is adding at least one process to the workload that causes the workload to create at least one sparse data structure based at least in part on new data stored by a new data structure calculated newly for each forward phase based on the newly input data. The dimensions and size of the activation matrix may essentially stay the same, but the individual elements are different for each forward phase. As such, during inference, when the sparsity masks are computed, the masks for the weight matrix may be reused or maintained (e.g., static), but the masks for the activation matrix may be dynamically recomputed for each forward phase (e.g., dynamic).), and
Baskaran, Nurvitadhi, and Elango are combinable for the same rationale as set forth above with respect to claim 2.
	Acar teaches processing the at least one sparse data structure instead of the new data structure ([0034] In one illustrative embodiment, for iterations of the matrix vector multiplication operation where the sparsity of the vector is less than a predetermined threshold, a first matrix storage format data structure is utilized to resolve the matrix vector multiplication operation.; [0107] The processing the at least one sparse data structure instead of the new data structure iteration of the matrix operation is then executed in a parallel manner using the selected compressed matrix representation data structure (step 470). A determination is made as to whether the iterations have converged (step 480) and, if not, the operation returns to step 440 with the input vector now being the output vector of the previous iteration. Otherwise, if the iterations have converged, then the output vector is generated as the aggregate of the output vectors of the partial matrix vector multiplication operations performed during the iterations (step 490).).
	Baskaran, Nurvitadhi, Elango, and Acar are combinable for the same rationale as set forth above with respect to claim 7.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, in view of Nurvitadhi, Acar, and further in view of Katz et al. (U.S. Pre-Grant Publication No. 20220101043, hereinafter 'Katz'). 

Regarding claim 11, Baskaran, as modified by Nurvitadhi and Acar, teaches The method of claim 10.
Baskaran, as modified by Nurvitadhi and Acar, fails to teach wherein the test data comprises at least one of random data or pseudorandom data.
	Katz teaches wherein the test data comprises at least one of random data or pseudorandom data ([0458] A diagram illustrating several alternative test data input options is shown in FIG. 71. The example circuit, generally referenced 1480, comprises tensor data flow path 1489 that is protected by the intralayer safety mechanism. In this example the tensor data flow path comprises IA 1483, SC 1485, and APU 1487 but may contain different circuit elements depending on the implementation. As described supra, tensor test data is occasionally injected into the tensor data flow path for detecting circuit faults. The test data may be provided by one of several sources: (1) test data 1484 stored in L3 memory 1482; (2) test data 1486 stored in a register in the cluster or elsewhere; and (3) test data (and optionally weights) generated dynamically on the fly via a test data generator 1488. In one embodiment, the wherein the test data comprises at least one of test data comprises a pseudorandom data pseudorandom binary sequence.).
Baskaran, Nurvitadhi, Acar, and Katz are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran, Nurvitadhi, and Acar, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Katz to Baskaran before the effective filing date of the claimed invention in order to apply multiple strategies involving redundancy by design, redundancy through spatial mapping as well as self-tuning procedures that modify static (weights) and monitor dynamic (activations) behavior (cf. Katz, [0017] This disclosure describes a novel invention for several safety mechanisms for use in an artificial neural network (ANN) processor. The mechanisms described herein can be deployed individually or in combination to provide a desired level of safety in the processor and the neural network it is used to implement. The invention applies multiple strategies involving redundancy by design, redundancy through spatial mapping as well as self-tuning procedures that modify static (weights) and monitor dynamic (activations) behavior.).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, in view of Nurvitadhi, Acar, and further in view of Sen et al. (U.S. Pre-Grant Publication No. 20230030287, hereinafter 'Sen'). 

Regarding claim 16, Baskaran, as modified by Nurvitadhi, teaches The system of claim 14.
Baskaran, as modified by Nurvitadhi, fails to teach wherein the instructions, when executed by the at least one processor, cause the at least one processor to: determine at least one sparsity measure for the data stored by the at least one data structure; decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns selected based at least in part on the at least one sparsity measure; selecting at least one structured sparse pattern based at least in part on metrics calculated based at least in part on sparse data stored by the plurality of sparse data structures; and formulating the configuration based at least in part on the at least one structured sparse pattern.
	Acar teaches wherein the instructions, when executed by the at least one processor, cause the at least one processor to: determine at least one sparsity measure for the data stored by the at least one data structure; decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns selected based at least in part on the at least one sparsity measure ([0106]Based on results of the comparison, a corresponding compressed matrix representation data structure is selected for use with the current iteration (step 460). For example, if the sparsity of the input vector is equal to or greater than a sparsity threshold value, i.e. the vector is sufficiently sparse, then a first compressed matrix representation data structure (e.g., CSC) is selected for use during the present iteration. However, if the determine at least one sparsity measure for the data stored by the at least one data structure sparsity of the input vector is less than the sparsity threshold value, i.e. the input vector is dense, then a selected based at least in part on the at least one sparsity measure second compressed matrix representation data structure (e.g., CSR) is selected for use during the present iteration. Of course this may be extended to additional types of compressed matrix representations based on additional threshold values such that as the density continues to increase, other compressed matrix representations suitable for parallelized execution at higher density input vectors may be selected.; [0107] The decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns iteration of the matrix operation is then executed in a parallel manner using the selected compressed matrix representation data structure (step 470). A determination is made as to whether the iterations have converged (step 480) and, if not, the operation returns to step 440 with the input vector now being the output vector of the previous iteration. Otherwise, if the iterations have converged, then the output vector is generated as the aggregate of the output vectors of the partial matrix vector multiplication operations performed during the iterations (step 490).);
	selecting at least one structured sparse pattern based at least in part on metrics calculated based at least in part on sparse data stored by the plurality of sparse data structures ([0034] In one illustrative embodiment, for iterations of the matrix vector multiplication operation where the selecting at least one structured sparse pattern based at least in part on metrics calculated based at least in part on sparse data stored by the plurality of sparse data structures sparsity of the vector is less than a predetermined threshold, a first matrix storage format data structure is utilized to resolve the matrix vector multiplication operation. For iterations of the matrix vector multiplication operation where the sparsity of the vector is equal to or greater than the predetermined threshold, a second matrix storage format data structure is utilized to resolve the matrix vector multiplication operation.);and 
	Baskaran, Nurvitadhi, and Acar are combinable for the same rationale as set forth above with respect to claim 4.
	Sen teaches formulating the configuration based at least in part on the at least one structured sparse pattern ([0018] Accordingly, various implementations are provided for formulating the configuration exploiting fine-grained structured weight sparsity in deep neural networks in a computing environment. In some implementations, a micro-architectural, dataflow and data storage support system is provided for exploiting fine-grained structured weight sparsity on systolic array-based DNN accelerators executing both native convolutions and matrix multiplications. As used herein, based at least in part on the at least one structured sparse pattern fine-grained structured weight sparsity is where a data structure is divided into B-size blocks and there are NZ number of non-zero elements within each of the B-size blocks, where “B” is a positive integer, where “NZ” is the count/number of non-zero valued elements in each block. In one aspect, NZ is a positive integer.).
Baskaran, Nurvitadhi, Acar, and Sen are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran, Nurvitadhi, and Acar, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Sen to Baskaran before the effective filing date of the claimed invention in order to save execution time and energy by exploiting sparsity by skipping redundant zero operand multiply-accumulate (“MAC”) operation for compute savings and by avoiding the storage and accesses of zero values for memory capacity and bandwidth savings (cf. Sen, [0017] Moreover, DNNs exhibit sparsity in their data-structures. However, savings in execution time and energy savings may be achieved by exploiting this sparsity by skipping redundant zero operand multiply-accumulate (“MAC”) operation for compute savings and by avoiding the storage and accesses of zero values for memory capacity and bandwidth savings. Additionally, a special form of data sparsity such as, for example, “fine-grained structured sparsity” can be favorably exploited on different DNN accelerators. For example, specialized pruning techniques may be used to impose fine-grained structured sparsity on weights. However, other existing techniques are unable to exploit fine-grained structured sparsity in systolic arrays-based computing systems or in native convolutions).

Claims 18, 26 are rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, in view of Nurvitadhi, and further in view of Liu et al. (U.S. Pre-Grant Publication No. 20160328643, hereinafter 'Liu'). 

Regarding claim 18, Baskaran, as modified by Nurvitadhi, teaches The system of claim 14.
Baskaran, as modified by Nurvitadhi, fails to teach wherein the workload comprises at least one neural network having one or more layers, and transforming the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers in accordance with the configuration.
	Liu teaches wherein the workload comprises at least one neural network having one or more layers, and transforming the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers in accordance with the configuration ([0046] In this embodiment of the present invention, the workload comprises at least one neural network trained neural network is simplified by introducing sparsity to the coefficients of the filters. As used herein, the term “filter” refers to the weight matrix of a particular node having one or more layers of a layer of the deep neural network. Accordingly, each layer has a plurality of nodes and each node has a weight matrix or “filter” that is used to filter or combine the data from the nodes of the previous layer. Sparsity can be introduced to the coefficients of the filters in different ways. For example, in a possible implementation, a percentage of coefficients having the largest magnitudes can be retained in each filter, with the remaining coefficients set to zero. In another possible implementation L1-norm minimization can be enforced in the back-propagation algorithm, which will drive a number of the coefficients in each filter to zero. Since the connections between the inputs and neurons (nodes) are made sparse, we refer to this approach as “SparseConnect”.; [0060] According to an advantageous embodiment of the present invention, the SparseConnect and ShrinkConnect methods for approximating a trained deep neural network can be combined. The SparseConnect and ShrinkConnect methods exploit different types of redundancy within a trained deep neural network. The methods complement each other and may be combined to achieve an even greater speed up. For example, in a possible implementation, a trained deep neural network can be first be approximated using the ShrinkConnect method to transforming the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers reduce the number of nodes in each layer of the trained deep neural network, followed by using the SparseConnect method (using thresholding or re-weighted L1-norm minimization) to in accordance with the configuration sparsify the weights in the filters connecting each layer in the approximation of the deep neural network resulting from applying the ShrinkConnect method.).
	Baskaran, Nurvitadhi, and Liu are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran and Nurvitadhi, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Liu to Baskaran before the effective filing date of the claimed invention in order to reduce the computational complexity of a trained deep neural network (cf. Liu, [0006] The present invention provides a method and system for approximating a deep neural network for anatomical object detection. Embodiments of the present invention various methods to reduce the computational complexity of a trained deep neural network. Embodiments of the present invention perform anatomical object detection in medical image data using an approximated deep neural network that is more computationally efficient than the deep neural network originally trained for the object detection task.).

Regarding claim 26, Baskaran, as modified by Nurvitadhi, teaches The processor of claim 21.
Baskaran, as modified by Nurvitadhi, fails to teach wherein the workload comprises at least one neural network having one or more layers, and modifying the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers.
	Liu teaches wherein the workload comprises at least one neural network having one or more layers, and modifying the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers ([0046] In this embodiment of the present invention, the workload comprises at least one neural network trained neural network is simplified by introducing sparsity to the coefficients of the filters. As used herein, the term “filter” refers to the weight matrix of a particular node having one or more layers of a layer of the deep neural network. Accordingly, each layer has a plurality of nodes and each node has a weight matrix or “filter” that is used to filter or combine the data from the nodes of the previous layer. Sparsity can be introduced to the coefficients of the filters in different ways. For example, in a possible implementation, a percentage of coefficients having the largest magnitudes can be retained in each filter, with the remaining coefficients set to zero. In another possible implementation L1-norm minimization can be enforced in the back-propagation algorithm, which will drive a number of the coefficients in each filter to zero. Since the connections between the inputs and neurons (nodes) are made sparse, we refer to this approach as “SparseConnect”.; [0060] According to an advantageous embodiment of the present invention, the SparseConnect and ShrinkConnect methods for approximating a trained deep neural network can be combined. The SparseConnect and ShrinkConnect methods exploit different types of redundancy within a trained deep neural network. The methods complement each other and may be combined to achieve an even greater speed up. For example, in a possible implementation, a trained deep neural network can be first be approximated using the ShrinkConnect method to modifying the workload comprises selectively deactivating or removing one or more nodes of a particular layer of the one or more layers reduce the number of nodes in each layer of the trained deep neural network, followed by using the SparseConnect method (using thresholding or re-weighted L1-norm minimization) to in accordance with the configuration sparsify the weights in the filters connecting each layer in the approximation of the deep neural network resulting from applying the ShrinkConnect method.).
Baskaran, Nurvitadhi, and Liu are combinable for the same rationale as set forth above with respect to claim 18.

Claims 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, in view of Nurvitadhi, and further in view of Sen. 

Regarding claim 22, Baskaran, as modified by Nurvitadhi, teaches The processor of claim 21.
	Baskaran, as modified by Nurvitadhi, fails to teach wherein the data is first data, the at least one sparse pattern comprises a structured sparse pattern, and the one or more circuits are to 
provide faster processing of second data including zero values in accordance with the structured sparse pattern than of third data not including zero values in accordance with the structured sparse pattern.
	Sen teaches wherein the data is first data, the at least one sparse pattern comprises a structured sparse pattern, and the one or more circuits are to provide faster processing of second data including zero values in accordance with the structured sparse pattern than of third data not including zero values in accordance with the structured sparse pattern ([0018] Accordingly, various implementations are provided for exploiting fine-grained structured weight sparsity in deep neural networks in a computing environment. In some implementations, a micro-architectural, dataflow and data storage support system is provided for exploiting fine-grained structured weight sparsity on systolic array-based DNN accelerators executing both native convolutions and matrix multiplications. As used herein, fine-grained structured weight sparsity is where a data structure is divided into B-size blocks and there are NZ number of non-zero elements within each of the B-size blocks, where “B” is a positive integer, where “NZ” is the count/number of non-zero valued elements in each block. In one aspect, NZ is a positive integer.; [0019] Also, in general, the systolic array may take two different data structures (e.g., inputs and weights) and generates an output. Thus, mechanisms of the various embodiments, provide an enhanced systolic array-based DNN accelerator where one of the wherein the data is first data input operands has the fine-grained the at least one sparse pattern comprises a structured sparse pattern structured weight sparsity. Also, various implementations provide for variability of a number of NZ numbers in any of the B-size blocks. For example, if there are 10 layers in the neural network, the first layer’s weights may have two non zeros in each block and the second layer’s weights may have three non zeros in each block, etc. As such, the enhanced systolic array-based DNN accelerators enable provide faster processing flexibility or the programmability for various executions of that hardware second data including zero values in accordance with the structured sparse pattern than of third data not including zero values in accordance with the structured sparse pattern while configuring the number of non-zeros.).
Baskaran, Nurvitadhi, and Sen are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran and Nurvitadhi, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Sen to Baskaran before the effective filing date of the claimed invention in order to save execution time and energy by exploiting sparsity by skipping redundant zero operand multiply-accumulate (“MAC”) operation for compute savings and by avoiding the storage and accesses of zero values for memory capacity and bandwidth savings (cf. Sen, [0017] Moreover, DNNs exhibit sparsity in their data-structures. However, savings in execution time and energy savings may be achieved by exploiting this sparsity by skipping redundant zero operand multiply-accumulate (“MAC”) operation for compute savings and by avoiding the storage and accesses of zero values for memory capacity and bandwidth savings. Additionally, a special form of data sparsity such as, for example, “fine-grained structured sparsity” can be favorably exploited on different DNN accelerators. For example, specialized pruning techniques may be used to impose fine-grained structured sparsity on weights. However, other existing techniques are unable to exploit fine-grained structured sparsity in systolic arrays-based computing systems or in native convolutions.).

Regarding claim 23, Baskaran, as modified by Nurvitadhi and Sen, teaches The processor of claim 22.
	Baskaran teaches wherein the one or more circuits are to: profile the workload to obtain at least one sparsity measure based at least in part on the data; and use the profile to determine the at least one sparse pattern ([0085] In various embodiments, when there are multiple iterations of a computation block, if the computation block is to be distributed across processors, and use the profile to determine the at least one sparse pattern if the non-zero structure or access pattern does not significantly change within the block, the first iteration of the block is executed with a dynamic task scheduling scheme. The "state" information about the processor workload (such as which portions of the computation block (or equivalently tasks) get executed on which processor), is logged or stored. In the subsequent iterations, profile the workload to obtain at least one sparsity measure the logged/stored information based at least in part on the data about the processor workload is used to schedule statically various tasks across processors, thereby achieving the benefit of load balanced execution without incurring significant scheduling overhead. This static-plus-dynamic (or hybrid) task scheduling approach can greatly reduce the task scheduling overhead and also facilitate an improved load balance across processors. As described below, the load balance can be improved further via task/operation migration prior to the subsequent iterations.; [0086] In the case of sparse tensor computations, as mentioned earlier, there is usually a repetition of mode-specific operations and the iteration with dynamic scheduling is carefully chosen before the start of a computation block where the logged information obtained during the dynamic scheduling can be reused.).
	Baskaran, Nurvitadhi, and Sen are combinable for the same rationale as set forth above with respect to claim 22.

Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, Nurvitadhi, Sen, and further in view of Acar and Elango. 

Regarding claim 24, Baskaran, as modified by Nurvitadhi and Sen, teaches The processor of claim 23.
Baskaran, as modified by Nurvitadhi and Sen, fails to teach wherein using the profile to determine the at least one sparse pattern comprises: decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns selected based at least in part on the at least one sparsity measure; and selecting at least one structured sparse pattern based at least in part on metrics calculated based at least in part on the plurality of sparse data structures.
	Acar teaches wherein using the profile to determine the at least one sparse pattern comprises: decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns selected based at least in part on the at least one sparsity measure ([0106] Based on results of the comparison, a corresponding compressed matrix representation data structure is selected for use with the current iteration (step 460). For example, if the sparsity of the input vector is equal to or greater than a sparsity threshold value, i.e. the vector is sufficiently sparse, then a first compressed matrix representation data structure (e.g., CSC) is selected for use during the present iteration. However, if the selected based at least in part on the at least one sparsity measure sparsity of the input vector is less than the sparsity threshold value, i.e. the input vector is dense, then a second compressed matrix representation data structure (e.g., CSR) is selected for use during the present iteration. Of course this may be extended to additional types of compressed matrix representations based on additional threshold values such that as the density continues to increase, other compressed matrix representations suitable for parallelized execution at higher density input vectors may be selected.; [0107] The decomposing the at least one data structure into a plurality of sparse data structures in accordance with one or more structured sparse patterns iteration of the matrix operation is then executed in a parallel manner using the selected compressed matrix representation data structure (step 470). A determination is made as to whether the iterations have converged (step 480) and, if not, the operation returns to step 440 with the input vector now being the output vector of the previous iteration. Otherwise, if the iterations have converged, then the output vector is generated as the aggregate of the output vectors of the partial matrix vector multiplication operations performed during the iterations (step 490).); and
Baskaran, Nurvitadhi, Sen, and Acar are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran, Nurvitadhi, and Sen, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Acar to Baskaran before the effective filing date of the claimed invention in order to improve the efficiency and speed by which operations are performed on such large scale matrices (cf. Acar, [0033] The illustrative embodiments provide mechanisms for improving the efficiency and speed by which such operations are performed on such large scale matrices. The illustrative embodiments leverage the efficiency of different matrix storage formats for ordering non-zero entries in a large data set represented by a large sparse matrix. The particular storage format used for performing iterations of the matrix vector multiplication operation is selected dynamically based on the sparsity of the matrix and/or vector involved in the matrix vector multiplication operation. The leveraging of these different storage formats facilitates parallel execution of partial matrix vector multiplication operations by parallel threads, execution engines, processors, or the like.).
	Elango teaches selecting at least one structured sparse pattern based at least in part on metrics calculated based at least in part on the plurality of sparse data structures ([0058] At 870, method 800 includes matrix multiplying the first block and second block. By applying fine-grained sparsity to the first block (e.g. weights) and applying coarse-grained sparsity to the second block (e.g., activations), the first and second blocks will have completely different sparsity patterns. While each corresponding pairs of calculated based at least in part on the plurality of sparse data structures sub-blocks may have different levels of sparsity, the selecting at least one structured sparse pattern based at least in part on metrics differing patterns generate a combined sparsity in the matmul product that is deterministically uniform throughout the product (e.g., the same or within a threshold similarity for each block) without adding any computational cost, thus leading to increased model accuracies at the same cost.; [0069] It has been shown that the loss in accuracy due to sparsity can be reduced by minimizing the one-norm of the pruned values. One approach to achieve this for structured sparsity includes computing a permutation matrix that minimizes the pruned one-norm for each respective weight matrix using a greedy reordering technique. The weight matrices may then be permuted using these permutation matrices. Structured sparsity may then be applied on top of these permuted weight matrices. This process can be adapted to both fine-grained and coarse-grained balanced sparsity patterns to further increase the pruned accuracy. Matrix elements may thus be shuffled around so that they are randomly distributed.).
Baskaran, Nurvitadhi, Sen, Acar, and Elango are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Baskaran, Nurvitadhi, Sen, and Acar, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Elango to Baskaran before the effective filing date of the claimed invention in order to reduce variability and achieve more uniform sparsity (cf. Elango, [0021] To reduce variability and achieve more uniform sparsity, systems and methods are presented herein where a first block as pruned using fine grained balanced sparsity and the second block is pruned using coarse-grained balanced sparsity. In this way, the resulting combined sparsity is uniformly achieved without any additional computational burden. For coarse-grained sparsity, the applied sparsity percentage is applied at the level of sub-blocks, rather than at the level of individual elements. By combining these together, the patterns of the two blocks are complementary in such a way that a desired percentage of elements are maintained from each block, without the risk of oversparsifying.).

Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Baskaran, Nurvitadhi, Sen, and further in view of Acar. 

Regarding claim 25, Baskaran, as modified by Nurvitadhi and Sen, teaches The processor of claim 23.
	Nurvitadhi teaches wherein the workload comprises a trained machine learning model ([0176] FIG. 9 illustrates a highly-parallel general-purpose graphics processing unit 900, according to an embodiment. In one embodiment, the general-purpose processing unit (GPGPU) 900 can be configured to be particularly efficient in processing the type of computational workload comprises a trained machine learning model workloads associated with training deep neural networks.), and
Baskaran, as modified by Nurvitadhi and Sen, fails to teach profiling the trained machine learning model comprises collecting metrics while the trained machine learning model performs inferencing using test data.
	Acar teaches profiling the trained machine learning model comprises collecting metrics while the trained machine learning model performs inferencing using test data ([0023] These profiling the trained machine learning model networks or graphs may also be represented as large scale matrices in which indices (column and row indices) represent the nodes, and weights (or strengths) of the edges are represented by values in the matrix.; [0103] FIG. 4 is a flowchart outlining an example operation for dynamically modifying the compressed matrix representation utilized for iterations of a matrix operation based on a determination of the sparsity/density of an input vector collecting metrics using a hybrid matrix representation mechanism in accordance with one illustrative embodiment.; [0099] The parallel partial matrix vector multiplication operations 350 may be repeated until the iterations of the process converge (step 360). Iterations typically converge (step 360) based on monitoring the change in the output vector. If the output vector change becomes very small in relative terms and in magnitude, the iterations are deemed to be converged, and the system generates the output vector (step 370). Based on a benchmark set that typically represents the test cases, the iteration convergence can be also be set as a fixed number of iterations. For example, one could set the number of iterations to 5 based on the benchmark test, where the final output vector is generated upon execution of the fifth iteration.; [0116] The scores obtained from the various reasoning algorithms indicate the extent to which the potential response is inferred by the input question based on the specific area of focus of that reasoning algorithm. Each resulting score is then weighted against a statistical model. The statistical model captures while the trained machine learning model performs inferencing using test data how well the reasoning algorithm performed at establishing the inference between two similar passages for a particular domain during the training period of the QA system.).
Baskaran, Nurvitadhi, Sen, and Acar are combinable for the same rationale as set forth above with respect to claim 24.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Park et al. (U.S. Pre-Grant Publication No. 20210303359) teaches a method for performing parallel data processing, including: receiving data for parallel processing from a data processing requestor; generating a plurality of data sub-blocks; determining a plurality of data portions in each data sub-block of the plurality of data sub-blocks; changing an order of the plurality of data portions in at least one data sub-block of the plurality of data sub-blocks; providing the plurality of data sub-blocks, including the at least one data sub-block comprising the changed order of the plurality of data portions, to a plurality of processing units for parallel processing; and receiving processed data associated with the plurality of data sub-blocks from the plurality of processing units.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MM/Examiner, Art Unit 2129                                                                                                                                                                                              
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jul 17, 2023
Application Filed
Feb 25, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/330,099
Patent 12602603
MULTI-AGENT INFERENCE
2y 5m to grant Granted Apr 14, 2026
17/392,319
Patent 12596933
CONTEXT-AWARE ENTITY LINKING FOR KNOWLEDGE GRAPHS TO SUPPORT DECISION MAKING
2y 5m to grant Granted Apr 07, 2026
17/062,058
Patent 12579463
GENERATIVE REASONING FOR SYMBOLIC DISCOVERY
2y 5m to grant Granted Mar 17, 2026
17/659,028
Patent 12579452
EVALUATION SCORE DETERMINATION MACHINE LEARNING MODELS WITH DIFFERENTIAL PERIODIC TIERS
2y 5m to grant Granted Mar 17, 2026
17/212,022
Patent 12566941
EXTENSION OF EXISTING NEURAL NETWORKS WITHOUT AFFECTING EXISTING OUTPUTS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
64%
Grant Probability
85%
With Interview (+20.7%)
4y 3m
Median Time to Grant
Low
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allow rate.