Last updated: April 19, 2026
Application No. 17/576,730
OPERATIONS ON MATRIX OPERANDS IRRESPECTIVE OF WHERE OPERANDS ARE STORED IN MEMORY

Final Rejection §101§103§112
Filed
Jan 14, 2022
Examiner
GUDAS, JAKOB OSCAR
Art Unit
2151
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
2 (Final)
This examiner grants 44% of cases after interview

— +71.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 9 resolved cases, 2023–2026
Examiner Intelligence

GUDAS, JAKOB OSCAR View full profile →
Grants 44% of resolved cases
Career Allow Rate
4 granted / 9 resolved
-10.6% vs TC avg
Strong +71% interview lift
Without
With
+71.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
28 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
33.2%
-6.8% vs TC avg
§103
37.0%
-3.0% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
19.9%
-20.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 9 resolved cases
Office Action

§101 §103 §112
Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is final and is in response to claims filed on 11/20/2025 via amendment. Claims 1-31 are pending for examination. Claims 1-5, 7-30 are currently amended. Claims 6 and 31 are as originally filed.

Response to Arguments
Rejections Under 35 U.S.C. 102
Applicant’s arguments with respect to claims 1, 2, 4-10, 12, 16, 17, 20, 22, 24-27, 29, and 30 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Rejections Under 35 U.S.C. 103
Applicant’s arguments with respect to claims 13-14, 19, 23, and 31 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Rejections Under 35 U.S.C. 101
Applicant has amended the claims 16-24 by adding “non-transitory” in front of “machine-readable medium” and thus the previous rejections have been withdrawn.

Applicant’s arguments regarding the 35 U.S.C. 101 rejections have been fully considered. Regarding the rejection under 35 U.S.C. 101, Applicant argues “When multiple threads performing deep learning operations operate on shared matrix data, redundant load and store operations can occur to access this data in memory, causing reduced performance. The invention provides an improvement to computer technology to address this technical problem”.
	Examiner respectfully disagrees with applicant arguments. The compiler, processor, circuits, etc. are clearly generally linking the use of the judicial exception to a particular field of use. see MPEP 2106.05(h). 
	Applicant further argues “this causes deep learning mathematical operations, such as matrix multiplication and/or convolution, to have a higher data throughput through reduction of redundant load and/or store operations.” Also, that the invention is an “improvement in computer technology of reducing a number of load and/or store operations required during launch of a kernel to perform one or more deep learning mathematical operations.”
	Examiner respectfully disagrees with applicant arguments. These purported improvements are not recited in the claim language as written.
Further, it is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception... However, it is important to keep in mind that an improvement in the abstract idea itself (e.g. a recited fundamental economic concept) is not an improvement in technology...”. See MPEP 2106.05(a).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.




Claims 3-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
 
For claim 3, it recites “wherein the one or more operations comprise one or more operations”. This language is confusing as to what the operations are. It would be helpful to add qualifiers, such as first, second, third, etc., to said “one or more operations” to make the claim clearer.

Claims 4 and 5 are rejected for the same reasons as 3 above.

The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 2, 10, 17, and 26 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  The claims recite some variation of “wherein the one or more operations are to cause the one or more matrix operands to have a tiled layout representation in memory”. However, Claims 1, 9, 16, and 25, which these claims depend on recite a variation of “the one or more inserted operations to perform a transformation of the data into a tiled layout in memory”.  Applicant may cancel the claims, amend the claims to place the claims in proper dependent form, rewrite the claims in independent form, or present a sufficient showing that the dependent claims complies with the statutory requirements.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-31 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more.

With regards to claim 1, at Step 1, the claim is directed to a machine, which is a statutory category of invention.
	At Step 2A Prong 1, the examiner notes that the claim is directed to mental processes and/or mathematical concepts. The claim language has been reproduced below:
	A processor comprising: (mental process, evaluation)
one or more circuits to cause a compiler to insert one or more operations (mental process, evaluation)  prior to one or more mathematical operations (mental process, evaluation) to be performed on one or more matrix operands to rearrange where data within the one or more matrix operands are (mathematical calculation) stored in memory, the one or more inserted operations to (mental process, evaluation) perform a transformation of the data into a tiled layout (mathematical calculation) in memory with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands (mental process, evaluation; mathematical relationship) stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation (mental process; evaluation).

Each of the non-bolded limitations are mental processes and/or mathematical calculations. The “comprising” limitation is an evaluation mental process that can be performed by choosing what the processor comprises. The “to cause a compiler to insert one or more operations” limitation is an evaluation mental process that can be performed by choosing what the circuits do. The “prior to one or more mathematical operations” is an evaluation mental process that can be performed by choosing when the operations are performed. The “mathematical operations to be performed” limitation is a mathematical calculation that can be performed by performing the calculations by hand using pen and paper. The “the one or more inserted operations to” limitation is an evaluation mental process that can be performed by choosing what the operations do. The “perform a transformation of the data into a tiled layout” limitation is a mathematical calculation that can be performed by transforming the data by hand using pen and paper. The “irrespective of where the data elements were” limitation is an evaluation mental process that can be performed by choosing where to store the data and not caring where the data was stored previously. 
At step 2A Prong 2, the additional elements are bolded above. The “stored in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘stored’ in the context of the claim encompasses mere data gathering. The “in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘in memory’ in the context of the claim encompasses mere data gathering. The “stored in contiguous memory locations” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘stored’ in the context of the claim encompasses mere data gathering. The “stored in the memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘stored’ in the context of the claim encompasses mere data gathering. The remaining additional elements amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f).
Under Step 2B, the claim recites “stored in memory”, “in memory”, “stored in contiguous memory locations”, “stored in the memory”, and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 9, it recites similar language to claim 1, and is rejected for, at least, the same reasons therein. Herein claim 9 is directed towards the statutory category of a machine, thus also satisfying step 1. Moreover under step 2A prong 2 the additional elements are “a system” and “one or more processors”. These are no more than high level generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Under step 2B, the claims do not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 16, it recites similar language to claim 1, and is rejected for, at least, the same reasons therein. Herein claim 16 is directed towards the statutory category of an article of manufacture, thus also satisfying step 1. Moreover under step 2A prong 2 the additional elements are “a machine-readable medium” and “one or more processors”. These are no more than high level generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Under step 2B, the claims do not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 25, it recites similar language to claim 1, and is rejected for, at least, the same reasons therein. Herein claim 25 is directed towards the statutory category of a method, thus also satisfying step 1. Under steps 2A prong 2 and 2B, the claims do not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claims 2, 10, and 17, they are directed to mental processes and/or mathematical concepts. The “one or more operations are to cause the one or more matrix operands to have a tiled layout” is an evaluation mental process and mathematical calculation that can be performed by choosing the matrix operands to have a tiled layout. Under step 2A Prong 2, the additional elements are the “in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘in memory’ in the context of the claim encompasses mere data gathering. the claims do not recite any other additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.
Under Step 2B, the claim recites “in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 3, it is directed to mental processes and/or mathematical concepts. The “comprise one or more operations” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “insert one or more data values into the one or more matrix operands” is an evaluation mental process and mathematical calculation that can be performed by inserting the data values into the matrix operands by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 4, it is directed to mental processes and/or mathematical concepts. The “comprise one or more operations” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “group one or more data elements of the one or more matrix operands” is an evaluation mental process and mathematical calculation that can be performed by grouping the data elements by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 5, it is directed to mental processes and/or mathematical concepts. The “comprise one or more operations” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “reorder one or more data elements of the one or more matrix operands” is an evaluation mental process and mathematical calculation that can be performed by ordering the matrix operands in row-major by hand using pen and paper. Under step 2A Prong 2, the additional elements are the “in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘in memory’ in the context of the claim encompasses mere data gathering. the claims do not recite any other additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.
Under Step 2B, the claim recites “in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 6, it is directed to mental processes and/or mathematical concepts. The “wherein the one or more matrix operands are tensors” limitation is an evaluation mental process that can be performed by choosing what the matrix operands are and what they are used for. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 7, it is directed to mental processes and/or mathematical concepts. The “comprise a first set of operations to be performed” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations should be performed on and when they should be performed. The “wherein the one or more circuits are to cause the compiler to insert a second set of one or more operations subsequent to the one or more mathematical operations” limitation is an evaluation mental process that can be performed by choosing what the circuits cause the compiler to do. The “to be performed on one or more outputs of” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations should be performed on and when they should be performed. Under steps 2A prong 2 the remaining additional elements (the circuits, the compiler, etc.) are no more than high level generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Under Step 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 8, it is directed to mental processes and/or mathematical concepts. The “operations are to cause” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations are to do. The “the one or more matrix operands to be stored in a second layout” is an evaluation mental process that can be performed choosing the layouts of the matrix operands. Under step 2A Prong 2, the additional elements are the “in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘in memory’ in the context of the claim encompasses mere data gathering. the claims do not recite any other additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.
Under Step 2B, the claim recites “in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 11, it is directed to mental processes and/or mathematical concepts. The “are primitive operations comprising at least an operation to” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “insert one or more groups of data into the one or more matrix operands to alter a shape” limitation is an evaluation mental process and mathematical calculation than can be performed by inserting the data by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 12, it is directed to mental processes and/or mathematical concepts. The “are primitive operations comprising at least an operation to” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “group one or more data elements of the one or more matrix operands” limitation is a mathematical calculation that can be performed by grouping the data elements by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 13, it is directed to mental processes and/or mathematical concepts. The “are primitive operations comprising at least an operation to” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “permute one or more data elements of the one or more matrix operands such that the one or more data elements are consecutively” is an evaluation mental process that can be performed by choosing how the matrix elements are stored. Under step 2A Prong 2, the additional elements are the “stored in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘stored’ in the context of the claim encompasses mere data gathering. the claims do not recite any other additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.
Under Step 2B, the claim recites “stored in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 14, it is directed to mental processes and/or mathematical concepts. The “wherein the one or more matrix operands are tensors comprising” limitation is an evaluation mental process that can be performed by choosing what the matrix operands are. The “the one or more mathematical operations are to be performed on the one” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations are performed on and what they are based on. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 15, it is directed to mental processes and/or mathematical concepts. The “wherein the one or more processors are to cause the compiler to insert” limitation is an evaluation mental process that can be performed by choosing what the processors and compiler are to do. The “where the first group is to cause the one or more matrix operands to have a tiled layout representation” limitation is a mathematical calculation that can be performed by tiling the matrix operands by hand using pen and paper. The “the second group is to remove the tiled layout representation from the one or more matrix operands” limitation is a mathematical calculation that can be performed by removing the tiling of the matrix operands by hand using pen and paper. Under step 2A Prong 2, the additional elements are the “ in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘in memory’ in the context of the claim encompasses mere data gathering. The remaining additional elements (the processors, the compiler, etc.) are no more than high level generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f).
Under Step 2B, the claim recites “in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 18, it is directed to mental processes and/or mathematical concepts. The “cause the one or more processors to insert a first set of the one or more operations prior to one” limitation is an evaluation mental process that can be performed by choosing when the mathematical operations are inserted. The “insert a second set of the one or more operations after” limitation is an evaluation mental process that can be performed by choosing when the mathematical operations are inserted. The “the first set causing the one or more matrix operands to have a first layout” limitation is an evaluation mental process that can be performed by choosing the layouts of the matrix operands. The “the second set causing output from the one or more deep learning operations to have a second layout” limitation is an evaluation mental process that can be performed by choosing the layouts of the matrix operands. Under step 2A Prong 2, the additional elements are the “ in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘in memory’ in the context of the claim encompasses mere data gathering. The remaining additional elements (the processors, etc.) are no more than high level generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f).
Under Step 2B, the claim recites “in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claims 19 and 31, they are directed to mental processes and/or mathematical concepts. The “wherein the one or more matrix operands are tensors comprising” limitation is an evaluation mental process that can be performed by choosing what the matrix operands are. Under steps 2A prong 2 and 2B, the claims do not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 20, it is directed to mental processes and/or mathematical concepts. The “wherein the one or more operations comprise at least one operation to” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “add padding data to the one or more matrix operands” limitation is a mathematical calculation that can be performed by padding the matrix operands by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 21, it is directed to mental processes and/or mathematical concepts. The “wherein the one or more mathematical operations comprise at least one operation to” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “remove padding data to the one or more matrix operands” limitation is a mathematical calculation that can be performed by removing the padding of the matrix operands by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 22, it is directed to mental processes and/or mathematical concepts. The “wherein the one or more mathematical operations comprise at least one operation to” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “to combine one or more data elements of the one or more matrix operands into a group” limitation is a mathematical calculation that can be performed by grouping the matrix elements by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 23, it is directed to mental processes and/or mathematical concepts. The “wherein the one or more mathematical operations comprise at least one operation to” limitation is an evaluation mental process that can be performed by choosing what the mathematical operations comprise. The “to permute one or more data elements of the one or more matrix operands such that each data element of the one or more data elements is consecutively” limitation is an evaluation mental process that can be performed by choosing what the order of the data elements are. Under step 2A Prong 2, the additional elements are the “stored in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘stored’ in the context of the claim encompasses mere data gathering. the claims do not recite any other additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.
Under Step 2B, the claim recites “stored in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 24, it is directed to mental processes and/or mathematical concepts. The “cause the compiler of a parallel processing library to insert the one or more mathematical operations” limitation is an evaluation mental process that can be performed by choosing what the compiler does. The “using, as input, the one or more matrix operands” limitation is an evaluation mental process that can be performed by choosing what the inputs are. The “where the one or more mathematical operations are to transform how the” limitation is an evaluation mental process that can be performed by choosing how the matrix operands are stores in memory. Under step 2A Prong 2, the additional elements are the “stored in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘stored’ in the context of the claim encompasses mere data gathering. The remaining additional elements (the processors, the compiler, etc.) are no more than high level generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f).
Under Step 2B, the claim recites “stored in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 26, it is directed to mental processes and/or mathematical concepts. The “causing the one or more matrix operands to be stored in memory using a tiled layout representation” limitation is an evaluation mental process that can be performed by choosing how the matrix operands are stores in memory. The “for use by one or more deep learning operations” limitation is an evaluation mental process that can be performed by choosing what the operands are used for. The “where the one or more matrix operands are to be stored in memory using a tiled layout representation” limitation is an evaluation mental process that can be performed by choosing how the matrix operands are stores in memory. The “as a result of performing the one or more operations” is an evaluation mental process that can be performed by choosing how the operands are used. Under step 2A Prong 2, the additional elements are the “stored in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘stored’ in the context of the claim encompasses mere data gathering. the claims do not recite any other additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.
Under Step 2B, the claim recites “stored in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

With regards to claim 27, it is directed to mental processes and/or mathematical concepts. The “causing a compiler to insert a first set of the one or more operations into a software program” limitation is an evaluation mental process that can be performed by choosing what the processors and compiler are to do. The “insert a second set of one or more operations into the software program after” limitation is an evaluation mental process that can be performed by choosing what the processors and compiler are to do. The “where the first set is to apply one or more transformations” limitation is a mathematical calculation that can be performed by applying the transformations to the matrix operands by hand using pen and paper. The “the second set is to remove the one or more transformations from the one or more matrix operands” limitation is a mathematical calculation that can be performed by removing the transformations from the matrix operands by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 28, it is directed to mental processes and/or mathematical concepts. The “one or more sets of data to be added to the one or more matrix operands as a result of the one or more operations” limitation is a mathematical calculation that can be performed by adding the sets of data to the operands by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 29, it is directed to mental processes and/or mathematical concepts. The “one or more data elements of the one or more matrix operands to be grouped into tiles” limitation is a mathematical calculation that can be performed by grouping the data elements into tiles by hand using pen and paper. Under steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

With regards to claim 30, it is directed to mental processes and/or mathematical concepts. The “one or more data elements of the one or more matrix operands to be permuted” limitation is a mathematical calculation that can be performed by permuting the operands by hand using pen and paper. Under step 2A Prong 2, the additional elements are the “ in memory” limitation, as claimed under BRI, is an additional element that is insignificant extra-solution activity. The ‘in memory’ in the context of the claim encompasses mere data gathering. the claims do not recite any other additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.
Under Step 2B, the claim recites “in memory,” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well- understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 2, 4-10, 12-13, 16, 17, 20, 22-27, 29, and 30 are rejected as being unpatentable over Lau et al. (US 20190392297 A1) hereinafter Lau in view of Chen et al. (machine translation of CN 112069460 A) hereinafter Chen.

With regards to claim 1, Lau teaches A processor comprising: one or more circuits to cause a compiler to insert one or more operations prior to (Lau [0044]: These instructions may be sent from a general purpose host processor to the DLH device. The instructions, as sent down from the host processor, may also operate on tensors. These instructions may be processed by the control logic of the DLH to feed the other units (MPU, memory, etc.). These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations))
 	one or more mathematical operations to be performed on one or more matrix operands (Lau [0393]: a memory 3702 coupled to processor; Lau [0118]: After retrieving the appropriate matrix subroutine, output engine 1737 may then specify or supply certain information or fields used by the matrix subroutine, if appropriate. For example, in some embodiments, certain information and/or fields of a matrix subroutine may be incomplete or unspecified, such as the size and/or location of the particular operands for the matrix subroutine)
the one or more inserted operations to perform a transformation of the data into a tiled layout in memory (Lau [0144]: For example, matrix 1910 is logically partitioned into 2x2 blocks of matrix elements, and each 2x2 block is stored in a single entry 1902 of memory modules 1901).
Lau fails to teach to rearrange where data within the one or more matrix operands are stored in memory, and with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation.
However, Chen teaches to rearrange where data within the one or more matrix operands are stored in memory, (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address)
with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau with rearranging the matrix data as taught by Chen. One of ordinary skill in the art would be motivated to make this combination because it would increase the speed and efficiency of the system because the electronic device reads the element of the reordered matrix to be calculated from the memory, and does not have more sub-memory block reading times because of blocking as taught by Chen (Chen Page 6 Paragraph 6).

With regards to claim 2, Lau in view of Chen teaches all of the limitations of claim 1 above. Lau further teaches wherein the one or more one or more operations are to cause the one or more matrix operands to have a tiled layout representation in memory (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 4, Lau in view of Chen teaches all of the limitations of claim 1 above. Lau further teaches wherein the one or more operations comprise one or more operations to group one or more data elements of the one or more matrix operands into tiles (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 5, Lau in view of Chen teaches all of the limitations of claim 1 above. Lau further teaches wherein the one or more operations comprise one or more operations to reorder one or more data elements of the one or more matrix operands such that the one or more data elements have a row-major layout in the memory (Lau [0130]: In some embodiments, memory controller 1806 may be used to efficiently store and retrieve the elements of matrix 1810 in memory 1800. For example, memory controller 1806 may store matrix 1810 by spreading or shifting the elements of each row 1812 and column 1814 across the memory modules; Lau [0131]: A row 1812 of matrix 1810, for example, may be written to memory 1800 by storing each element of the row in a different memory module 1801 of memory 1800, but at the same entry 1802 or offset within the memory modules 1801. For example, elements A, B, C in row r1 of matrix 1810 may each be stored in entry e1 of a particular memory module 1801. Similarly, elements D, E, F in row r2 of matrix 1810 may each be stored in entry e2 of a particular memory module 1801. Finally, elements G, H, I in row r3 of matrix 1810 may each be stored in entry e3 of a particular memory module 1801; it is clear that the matrix starts with row r1 with entries A, B, and C, then goes to row r2 with D, E, and F, etc.).

With regards to claim 6, Lau in view of Chen teaches all of the limitations of claim 1 above. Lau further teaches wherein the one or more matrix operands are tensors to be used as input to one or more deep learning operations (Lau [0044]: In one implementation, a machine learning computing system may be provided that includes an application-specific integrated circuit (ASIC)-based deep learning hardware (DLH) device provided that is designed to accelerate computations for deep learning applications. The example DLH device may have the flexibility to support both batch-based and on-line training of networks. The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands).

With regards to claim 7, Lau in view of Chen teaches all of the limitations of claim 1 above. Lau further teaches wherein the one or more operations comprise a first set of operations to be performed on the one or more matrix operands (Lau [0193]-[0196]: The flowchart may then proceed to block 2406 to obtain matrix operands from the matrix data... The flowchart may then proceed to block 2408 to perform a Winograd transform on the sliced matrix operand... The flowchart may then proceed to block 2410 to perform matrix multiplication using the transformed Winograd operand... The flowchart may then proceed to block 2412 to perform another Winograd transform on the output or partial result from the matrix multiplication operation from block 2410; (The first and second operation being the first and second Winograd transforms))
	and wherein the one or more circuits are to cause the compiler to insert a second set of one or more operations subsequent to the one or more mathematical operations (Lau [0044]: These instructions may be sent from a general purpose host processor to the DLH device. The instructions, as sent down from the host processor, may also operate on tensors. These instructions may be processed by the control logic of the DLH to feed the other units (MPU, memory, etc.). These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations))
to be performed on one or more outputs of the one or more mathematical operations (Lau [0193]-[0196]: The flowchart may then proceed to block 2406 to obtain matrix operands from the matrix data... The flowchart may then proceed to block 2408 to perform a Winograd transform on the sliced matrix operand... The flowchart may then proceed to block 2410 to perform matrix multiplication using the transformed Winograd operand... The flowchart may then proceed to block 2412 to perform another Winograd transform on the output or partial result from the matrix multiplication operation from block 2410; (The first and second operation being the first and second Winograd transforms)).

With regards to claim 8, Lau in view of Chen teaches all of the limitations of claim 1 above. Lau further teaches wherein the one or more operations are to cause the one or more matrix operands to be stored in a second layout in memory different from a first layout in memory (Lau [0126]: In some embodiments, memory 1800 may store a particular matrix by spreading or shifting the elements of each particular row and column across the M separate memory modules 1801, as described further below. In this manner, each element of a particular row or column of a matrix is stored in a different memory module 1801 of memory 1800; Lau [0131]: A row 1812 of matrix 1810, for example, may be written to memory 1800 by storing each element of the row in a different memory module 1801 of memory 1800, but at the same entry 1802 or offset within the memory modules 1801; Lau [0132]: A column 1814 of matrix 1810 is written to memory 1800 using a similar approach as described above for rows, with the exception that each element of a column is stored at a different entry 1802 or offset within the memory modules 1801. For example, elements A, D, G in column c1 of matrix 1810 are respectively stored at entries e1, e2, and e3 of particular memory modules 1801; (matrices can be stored according to rows or columns)).

With regards to claim 9, Lau teaches A system comprising: one or more processors to a compiler to insert one or more operations prior to (Lau [0044]: These instructions may be sent from a general purpose host processor to the DLH device. The instructions, as sent down from the host processor, may also operate on tensors. These instructions may be processed by the control logic of the DLH to feed the other units (MPU, memory, etc.). These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations))
one or more mathematical operations to be performed on one or more matrix operands (Lau [0393]: a memory 3702 coupled to processor; Lau [0118]: After retrieving the appropriate matrix subroutine, output engine 1737 may then specify or supply certain information or fields used by the matrix subroutine, if appropriate. For example, in some embodiments, certain information and/or fields of a matrix subroutine may be incomplete or unspecified, such as the size and/or location of the particular operands for the matrix subroutine).
the one or more inserted operations to perform a transformation of the data into a tiled layout in memory (Lau [0144]: For example, matrix 1910 is logically partitioned into 2x2 blocks of matrix elements, and each 2x2 block is stored in a single entry 1902 of memory modules 1901).
Lau fails to teach to rearrange where data within the one or more matrix operands are stored in memory, and with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation.
However, Chen teaches to rearrange where data within the one or more matrix operands are stored in memory, (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address)
with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau with rearranging the matrix data as taught by Chen. One of ordinary skill in the art would be motivated to make this combination because it would increase the speed and efficiency of the system because the electronic device reads the element of the reordered matrix to be calculated from the memory, and does not have more sub-memory block reading times because of blocking as taught by Chen (Chen Page 6 Paragraph 6).

With regards to claim 10, Lau in view of Chen teaches all of the limitations of claim 9 above. Lau further teaches wherein the one or more processors are to cause the compiler to insert the one or more operations into a software program already including the one or more mathematical operations, the one or more operations to transform the one or more matrix operands to a tiled layout representation in memory (Lau [0370]: In some embodiments, a software driver of the host computing system could be used to load the matrix subroutines; Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 12, Lau in view of Chen teaches all of the limitations of claim 9 above. Lau further teaches wherein the one or more mathematical operation are primitive operations comprising at least an operation to group one or more data elements of the one or more matrix operands into sub-matrices (Lau [0367]: In this manner, the fundamental instructions and/or commands supported by the matrix processor can be used to program matrix subroutines for more complex matrix operations; Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 13, Lau in view of Chen teaches all of the limitations of claim 9 above. Lau further teaches wherein the one or more mathematical operations are primitive operations (Lau [0367]: In this manner, the fundamental instructions and/or commands supported by the matrix processor can be used to program matrix subroutines for more complex matrix operations).
	Lau fails to teach comprising at least an operation to permute one or more data elements of the one or more matrix operands such that the one or more data elements are consecutively stored in memory.
	However, Chen does teach comprising at least an operation to permute one or more data elements of the one or more matrix operands such that the one or more data elements are consecutively stored in memory (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with rearranging the matrix data as taught by Chen. One of ordinary skill in the art would be motivated to make this combination because it would increase the speed and efficiency of the system because the electronic device reads the element of the reordered matrix to be calculated from the memory, and does not have more sub-memory block reading times because of blocking as taught by Chen (Chen Page 6 Paragraph 6).

With regards to claim 16, Lau teaches A non-transitory machine-readable medium having stored thereon one or more instructions, which if performed by one or more processors, cause the one or more processors to at least: cause a compiler to insert one or more operations prior to (Lau [0044]: These instructions may be sent from a general purpose host processor to the DLH device. The instructions, as sent down from the host processor, may also operate on tensors. These instructions may be processed by the control logic of the DLH to feed the other units (MPU, memory, etc.). These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations))
one or more mathematical operations to be performed on one or more matrix operands (Lau [0469]: An example machine accessible storage medium may have instructions stored thereon;  Lau [0118]: After retrieving the appropriate matrix subroutine, output engine 1737 may then specify or supply certain information or fields used by the matrix subroutine, if appropriate. For example, in some embodiments, certain information and/or fields of a matrix subroutine may be incomplete or unspecified, such as the size and/or location of the particular operands for the matrix subroutine)
the one or more inserted operations to perform a transformation of the data into a tiled layout in memory (Lau [0144]: For example, matrix 1910 is logically partitioned into 2x2 blocks of matrix elements, and each 2x2 block is stored in a single entry 1902 of memory modules 1901).
Lau fails to teach to rearrange where data within the one or more matrix operands are stored in memory, and with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation.
However, Chen teaches to rearrange where data within the one or more matrix operands are stored in memory, (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address)
with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau with rearranging the matrix data as taught by Chen. One of ordinary skill in the art would be motivated to make this combination because it would increase the speed and efficiency of the system because the electronic device reads the element of the reordered matrix to be calculated from the memory, and does not have more sub-memory block reading times because of blocking as taught by Chen (Chen Page 6 Paragraph 6).

With regards to claim 17, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau further teaches wherein the one or more operations are to cause the one or more matrix operands to be stored in memory with a tiled layout representation (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 20, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau further teaches wherein the one or more operations comprise at least one operation to add padding data to the one or more matrix operands (Lau [0068]: In addition to convolutions, the CSE supports data flattening for the other operations in commonly used in convolutional network such as local response normalization (LRN), local contrast normalization (LCN), max pooling, strides, filter sizing, padding, among other examples).

With regards to claim 22, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau further teaches wherein the one or more operations comprise at least one operation to combine one or more data elements of the one or more matrix operands into a group (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 23, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau fails to teach wherein the one or more operations comprise at least one operation to permute one or more data elements of the one or more matrix operands such that each data element of the one or more data elements is consecutively stored in memory.
However, Chen does teach wherein the one or more mathematical operations comprise at least one operation to permute one or more data elements of the one or more matrix operands such that each data element of the one or more data elements is consecutively stored in memory (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with rearranging the matrix data as taught by Chen. One of ordinary skill in the art would be motivated to make this combination because it would increase the speed and efficiency of the system because the electronic device reads the element of the reordered matrix to be calculated from the memory, and does not have more sub-memory block reading times because of blocking as taught by Chen (Chen Page 6 Paragraph 6).

With regards to claim 24, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau further teaches further comprising instructions that, if performed by the one or more processors, cause the one or more processors to cause the compiler of a parallel processing library to insert the one or more operations into a software program (Lau [0049]: Libraries of subroutines may be provided in an example DLH device to enable instructions to make use of various combinations of the subroutines to implement advance matrix arithmetic and convolution operations; Lau [0370]: In some embodiments, a software driver of the host computing system could be used to load the matrix subroutines; Lau [0051]: In some implementations, an DLH device may support parallelization and scalability by instantiating multiple processing clusters on a single DLH, as well as providing high-speed communication between chips)
	comprising one or more deep learning operations (Lau [0052]: In one example, an DLH device may implement arithmetic processing to support two major operational modes… These modes may be used to implement a variety of deep learning solutions)
	using, as input, the one or more matrix operands, (Lau [0044]: The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands)
	where the one or more operations are to transform how the one or more matrix operands are stored in memory for use by the one or more deep learning operations (Lau [0130]: In some embodiments, memory controller 1806 may be used to efficiently store and retrieve the elements of matrix 1810 in memory 1800. For example, memory controller 1806 may store matrix 1810 by spreading or shifting the elements of each row 1812 and column 1814 across the memory modules; Lau [0131]: A row 1812 of matrix 1810, for example, may be written to memory 1800 by storing each element of the row in a different memory module 1801 of memory 1800, but at the same entry 1802 or offset within the memory modules 1801. For example, elements A, B, C in row r1 of matrix 1810 may each be stored in entry e1 of a particular memory module 1801. Similarly, elements D, E, F in row r2 of matrix 1810 may each be stored in entry e2 of a particular memory module 1801. Finally, elements G, H, I in row r3 of matrix 1810 may each be stored in entry e3 of a particular memory module 1801).

With regards to claim 25, Lau teaches A method comprising: causing a compiler to insert one or more operations prior to (Lau [0044]: These instructions may be sent from a general purpose host processor to the DLH device. The instructions, as sent down from the host processor, may also operate on tensors. These instructions may be processed by the control logic of the DLH to feed the other units (MPU, memory, etc.). These instructions may include data movement (e.g. from off-chip memory into on-chip memory, operands in on-chip memory, and the arithmetic operations)) 
one or more mathematical operations to be performed on one or more matrix operands (Lau [0118]: After retrieving the appropriate matrix subroutine, output engine 1737 may then specify or supply certain information or fields used by the matrix subroutine, if appropriate. For example, in some embodiments, certain information and/or fields of a matrix subroutine may be incomplete or unspecified, such as the size and/or location of the particular operands for the matrix subroutine)
the one or more inserted operations to perform a transformation of the data into a tiled layout in memory (Lau [0144]: For example, matrix 1910 is logically partitioned into 2x2 blocks of matrix elements, and each 2x2 block is stored in a single entry 1902 of memory modules 1901).
Lau fails to teach to rearrange where data within the one or more matrix operands are stored in memory, and with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation.
However, Chen teaches to rearrange where data within the one or more matrix operands are stored in memory, (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address)
with respective tiles of the tiled layout including data elements for two or more dimensions of the one or more matrix operands stored in contiguous memory locations irrespective of where the data elements were stored in the memory prior to the transformation (Chen Page 6 Paragraph 6: It should be noted that, after partitioning the memory block of the matrix to be calculated originally, according to the introduction, the memory address corresponding to a certain column or a row of elements of the matrix is not continuous, so as to cause the un-continuous reading of the to-be calculated matrix of the original one column or a certain row of elements. then after rearranging the elements of the matrix to be calculated, then the elements in the matrix to be calculated corresponding to a plurality of sub-blocks can be arranged corresponding to a continuous memory address).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau with rearranging the matrix data as taught by Chen. One of ordinary skill in the art would be motivated to make this combination because it would increase the speed and efficiency of the system because the electronic device reads the element of the reordered matrix to be calculated from the memory, and does not have more sub-memory block reading times because of blocking as taught by Chen (Chen Page 6 Paragraph 6).

With regards to claim 26, Lau in view of Chen teaches all of the limitations of claim 25 above. Lau further teaches further comprising causing the one or more matrix operands to be stored in memory using a tiled layout representation (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901)
	for use by one or more deep learning operations, (Lau [0044]: In one implementation, a machine learning computing system may be provided that includes an application-specific integrated circuit (ASIC)-based deep learning hardware (DLH) device provided that is designed to accelerate computations for deep learning applications. The example DLH device may have the flexibility to support both batch-based and on-line training of networks. The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands)
	where the one or more matrix operands are to be stored in memory using a tiled layout representation as a result of performing the one or more operations (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 27, Lau in view of Chen teaches all of the limitations of claim 25 above. Lau further teaches further comprising causing a compiler to insert a first set of the one or more operations into a software program prior to a set of deep learning operations in the software program (Lau [0370]: In some embodiments, a software driver of the host computing system could be used to load the matrix subroutines; Lau [0193]-[0196]: The flowchart may then proceed to block 2406 to obtain matrix operands from the matrix data... The flowchart may then proceed to block 2408 to perform a Winograd transform on the sliced matrix operand... The flowchart may then proceed to block 2410 to perform matrix multiplication using the transformed Winograd operand... The flowchart may then proceed to block 2412 to perform another Winograd transform on the output or partial result from the matrix multiplication operation from block 2410; (The first and second operation being the first and second Winograd transforms))
	and insert a second set of one or more operations into the software program after the set of deep learning operations, (Lau [0193]-[0196]: The flowchart may then proceed to block 2406 to obtain matrix operands from the matrix data... The flowchart may then proceed to block 2408 to perform a Winograd transform on the sliced matrix operand... The flowchart may then proceed to block 2410 to perform matrix multiplication using the transformed Winograd operand... The flowchart may then proceed to block 2412 to perform another Winograd transform on the output or partial result from the matrix multiplication operation from block 2410; (The first and second operation being the first and second Winograd transforms))
	where the first set is to apply one or more transformations to the one or more matrix operands (Lau [0193]-[0196]: The flowchart may then proceed to block 2406 to obtain matrix operands from the matrix data... The flowchart may then proceed to block 2408 to perform a Winograd transform on the sliced matrix operand... The flowchart may then proceed to block 2410 to perform matrix multiplication using the transformed Winograd operand... The flowchart may then proceed to block 2412 to perform another Winograd transform on the output or partial result from the matrix multiplication operation from block 2410; (The first and second operation being the first and second Winograd transforms))
and the second set is to remove the one or more transformations from the one or more matrix operands (Lau [0193]-[0196]: The flowchart may then proceed to block 2406 to obtain matrix operands from the matrix data... The flowchart may then proceed to block 2408 to perform a Winograd transform on the sliced matrix operand... The flowchart may then proceed to block 2410 to perform matrix multiplication using the transformed Winograd operand... The flowchart may then proceed to block 2412 to perform another Winograd transform on the output or partial result from the matrix multiplication operation from block 2410; (The first and second operation being the first and second Winograd transforms)).

With regards to claim 29, Lau in view of Chen teaches all of the limitations of claim 25 above. Lau further teaches further comprising causing one or more data elements of the one or more matrix operands to be grouped into tiles as a result of the one or more operations (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).

With regards to claim 30, Lau in view of Chen teaches all of the limitations of claim 25 above. Lau further teaches further comprising causing one or more data elements of the one or more matrix operands to be permuted in memory as a result of the one or more operations (Lau [0130]: In some embodiments, memory controller 1806 may be used to efficiently store and retrieve the elements of matrix 1810 in memory 1800. For example, memory controller 1806 may store matrix 1810 by spreading or shifting the elements of each row 1812 and column 1814 across the memory modules; Lau [0131]: A row 1812 of matrix 1810, for example, may be written to memory 1800 by storing each element of the row in a different memory module 1801 of memory 1800, but at the same entry 1802 or offset within the memory modules 1801. For example, elements A, B, C in row r1 of matrix 1810 may each be stored in entry e1 of a particular memory module 1801. Similarly, elements D, E, F in row r2 of matrix 1810 may each be stored in entry e2 of a particular memory module 1801. Finally, elements G, H, I in row r3 of matrix 1810 may each be stored in entry e3 of a particular memory module 1801)

Claims 3, 11, 21, and 28 are rejected as being unpatentable over Lau in view of Chen further in view of Zeng et al. (US 20110191653 A1) hereinafter Zeng.

With regards to claim 3, Lau in view of Chen teaches all of the limitations of claim 1 above. Lau further teaches wherein the one or more operations comprise one or more operations [to insert one or more data values into the one or more matrix operands to] cause the one or more matrix operands to have a specific shape (Lau [0367]: In this manner, the fundamental instructions and/or commands supported by the matrix processor can be used to program matrix subroutines for more complex matrix operations, such as distributed matrix multiplication and/or convolution operations, dimension shuffle operations, reshape operations, and so forth).
	Lau fails to teach to insert one or more data values into the one or more matrix operands to.
	However, Zeng does teach to insert one or more data values into the one or more matrix operands to (Zeng [0035]: user data is zero padded so that the length with the pad is an integer multiple of circulant size).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the padding of Lau in view of Chen with inserting data values into the matrix operands as taught by Zeng. One of ordinary skill in the art would be motivated to make this combination because it would cause the matrix operands to be the same size, speeding up calculations using them.

With regards to claim 11, Lau in view of Chen teaches all of the limitations of claim 9 above. Lau further teaches wherein the one or more operations are primitive operations comprising at least an operation to [insert one or more groups of data into the one or more matrix operands] to alter a shape of the one or more matrix operands (Lau [0367]: In this manner, the fundamental instructions and/or commands supported by the matrix processor can be used to program matrix subroutines for more complex matrix operations, such as distributed matrix multiplication and/or convolution operations, dimension shuffle operations, reshape operations, and so forth).
	Lau fails to teach insert one or more groups of data into the one or more matrix operands.
	However, Zeng does teach insert one or more groups of data into the one or more matrix operands (Zeng [0035]: user data is zero padded so that the length with the pad is an integer multiple of circulant size).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the reshaping of Lau in view of Chen with inserting groups of data into the matrix operands as taught by Zeng. One of ordinary skill in the art would be motivated to make this combination because it would cause the matrix operands to be the same size, speeding up calculations using them.

With regards to claim 21, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau fails to teach wherein the one or more operations comprise at least one operation to remove padding data to the one or more matrix operands.
	However, Zeng does teach wherein the one or more operations comprise at least one operation to remove padding data to the one or more matrix operands (Zeng [0037]: the zeros are removed from the LDPC encoded data). 
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with removing padding as taught by Zeng. One of ordinary skill in the art would be motivated to make this combination because removing zeros at 508 is acceptable and even desirable in some applications. For example, in storage applications the storage capacity is of significant interest and removing the zero pads before writing to storage is desirable since storage capacity is improved as taught by Zeng (Zeng [0038]).

With regards to claim 28, Lau in view of Chen teaches all of the limitations of claim 25 above. Lau further teaches  further comprising causing [one or more sets of data to be added to the one or more matrix operands] as a result of the one or more operations (Lau [0367]: In this manner, the fundamental instructions and/or commands supported by the matrix processor can be used to program matrix subroutines for more complex matrix operations, such as distributed matrix multiplication and/or convolution operations, dimension shuffle operations, reshape operations, and so forth).
	Lau fails to teach one or more sets of data to be added to the one or more matrix operands.
	However, Zeng teaches one or more sets of data to be added to the one or more matrix operands (Zeng [0035]: user data is zero padded so that the length with the pad is an integer multiple of circulant size).
	Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with adding sets of data into the matrix operands as taught by Zeng. One of ordinary skill in the art would be motivated to make this combination because it would cause the matrix operands to be the same size, speeding up calculations using them.

Claims 14, 19, and 31 are rejected as being unpatentable over Lau in view of Chen further in view of Brady et al. (US 20190392296 A1) hereinafter Brady.

With regards to claim 14, Lau in view of Chen teaches all of the limitations of claim 9 above. Lau further teaches wherein the one or more matrix operands are tensors comprising a shape (Lau [0044]: The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands; Lau [0068]: The CSE may take in multiple rows of data, and re-use the data many times to flatten out 2D regions (e.g., 1105) into rows or columns (e.g., 1110) of a matrix (e.g., as illustrated in the example of FIG. 11); Lau Fig. 11: Fig. 11 shows a matrix with a height and width)
	and the one or more operations are to be performed on the one or more matrix operands according to the shape (Lau [0068]: The CSE may take in multiple rows of data, and re-use the data many times to flatten out 2D regions (e.g., 1105) into rows or columns (e.g., 1110) of a matrix (e.g., as illustrated in the example of FIG. 11); Lau Fig. 11: Fig. 11 shows a matrix with a height and width).
	Lau fails to teach and the stride.
	However, Brady does teach and the stride (determine attributes of the tensor (e.g., its block size, padding used in the tensor, stride applied in the operation).
	Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with the stride as taught by Brady. One of ordinary skill in the art would be motivated to make this combination because this would allow for less resources to be used when the order of the inputs matters.

With regards to claim 19, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau further teaches wherein the one or more matrix operands are tensors comprising a shape (Lau [0044]: The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands; Lau [0068]: The CSE may take in multiple rows of data, and re-use the data many times to flatten out 2D regions (e.g., 1105) into rows or columns (e.g., 1110) of a matrix (e.g., as illustrated in the example of FIG. 11); Lau Fig. 11: Fig. 11 shows a matrix with a height and width).
	Lau fails to teach and the stride.
	However, Brady does teach and the stride (determine attributes of the tensor (e.g., its block size, padding used in the tensor, stride applied in the operation).
	Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with the stride as taught by Brady. One of ordinary skill in the art would be motivated to make this combination because this would allow for less resources to be used when the order of the inputs matters.

With regards to claim 31, Lau in view of Chen teaches all of the limitations of claim 25 above. Lau further teaches wherein the one or more matrix operands are tensors comprising a plurality of data elements and at least a shape (Lau [0044]: The DLH device may include a network of interconnected matrix processing units equipped with processing circuitry to perform arithmetic and convolutional operations on tensor operands; Lau [0068]: The CSE may take in multiple rows of data, and re-use the data many times to flatten out 2D regions (e.g., 1105) into rows or columns (e.g., 1110) of a matrix (e.g., as illustrated in the example of FIG. 11); Lau Fig. 11: Fig. 11 shows a matrix with a height and width).
	Lau fails to teach and the stride.
	However, Brady does teach and the stride (determine attributes of the tensor (e.g., its block size, padding used in the tensor, stride applied in the operation).
	Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with the stride as taught by Brady. One of ordinary skill in the art would be motivated to make this combination because this would allow for less resources to be used when the order of the inputs matters.

Claim 18 is rejected as being unpatentable over Lau in view of Chen further in view of Daga et al. (US 20190066257 A1) hereinafter Daga.

With regards to claim 18, Lau in view of Chen teaches all of the limitations of claim 16 above. Lau further teaches the one or more matrix operands to have a first layout in memory (Lau [0126]: In some embodiments, memory 1800 may store a particular matrix by spreading or shifting the elements of each particular row and column across the M separate memory modules 1801, as described further below. In this manner, each element of a particular row or column of a matrix is stored in a different memory module 1801 of memory 1800; Lau [0131]: A row 1812 of matrix 1810, for example, may be written to memory 1800 by storing each element of the row in a different memory module 1801 of memory 1800, but at the same entry 1802 or offset within the memory modules 1801; (Matrices can be stored in rows))
	to have a second layout in memory (Lau [0126]: In some embodiments, memory 1800 may store a particular matrix by spreading or shifting the elements of each particular row and column across the M separate memory modules 1801, as described further below. In this manner, each element of a particular row or column of a matrix is stored in a different memory module 1801 of memory 1800; Lau [0132]: A column 1814 of matrix 1810 is written to memory 1800 using a similar approach as described above for rows, with the exception that each element of a column is stored at a different entry 1802 or offset within the memory modules 1801. For example, elements A, D, G in column c1 of matrix 1810 are respectively stored at entries e1, e2, and e3 of particular memory modules 1801; (Matrices can be stored in columns)).
	Lau fails to teach further comprising instructions that, if performed by the one or more processors, cause the one or more processors to insert a first set of the one or more operations prior to one or more deep learning operations using the one or more matrix operands and insert a second set of one or more operations after the one or more deep learning operations using the one or more matrix operands, the first set causing and and the second set causing output from the one or more deep learning operations.
	However, Daga does teach further comprising instructions that, if performed by the one or more processors, cause the one or more processors to insert a first set of the one or more operations prior to one or more deep learning operations using the one or more matrix operands (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation))
	and insert a second set of one or more operations after the one or more deep learning operations using the one or more matrix operands, (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation))
	the first set causing (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation))
	and the second set causing output from the one or more deep learning operations (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation)).
	Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with the instructions before and after deep learning operations as taught by Daga. One of ordinary skill in the art would be motivated to make this combination because it would allow for greater flexibility as the program could manipulate the operands prior to the deep learning operations, allowing for a greater number of operations to take place.

Claim 15 is rejected as being unpatentable over Lau in view of Chen further in view of Daga further in view of Guo et al. (“Exploiting Locality and Parallelism with Hierarchically Tiled Arrays”) hereinafter Guo.

With regards to claim 15, Lau in view of Chen teaches all of the limitations of claim 9 above. Lau further teaches wherein the one or more processors are to cause the compiler to insert (Lau [0370]: In some embodiments, a software driver of the host computing system could be used to load the matrix subroutines)
the one or more matrix operands to have a tiled layout representation in memory (Lau [0144]: For example, matrix 1910 is logically partitioned into 2×2 blocks of matrix elements, and each 2×2 block is stored in a single entry 1902 of memory modules 1901).
	Lau fails to teach a first group of the one or more operations into a first location in a software program and insert a second group of one or more operations into a second location in the software program, where the first group is to cause, and and the second group is to.
	However, Daga does teach a first group of the one or more operations into a first location in a software program (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation))
	And insert a second group of the operations into a second location in the software program, (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation))
	where the first group is to cause (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation))
	and the second group is to (Daga [0230]-[0231]: At block 2305, the block is divided or sliced into multiple smaller blocks or tiles being treated as independent images… At block 2307, the multiple blocks are processed through convolution and pooling at each layer of a neural network… At block 2309, the multiple blocks are merged into a single block that is much smaller than the original block without having any overlapping of areas or data; (manipulating the inputs of the deep learning operation and manipulating the data after the deep learning operation)).
Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen with the instructions before and after deep learning operations as taught by Daga. One of ordinary skill in the art would be motivated to make this combination because it would allow for greater flexibility as the program could manipulate the operands prior to the deep learning operations, allowing for a greater number of operations to take place.
	Lau in view of Daga fails to teach remove the tiled layout representation from the one or more matrix operands.
	However, Guo does teach remove the tiled layout representation from the one or more matrix operands (Guo Page 47 Paragraph 1: Since in most algorithms the change of the tiling structure involves only one partition at a time, we provide an interface to add or remove one partition in one operation).
	Therefore, it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Lau in view of Chen further in view of Daga with removing the tiled layout as taught by Guo. One of ordinary skill in the art would be motivated to make this combination because it would allow for the data to be reused for different purposes without having to reinput it, saving time and energy.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jakob O Gudas whose telephone number is (571)272-0695. The examiner can normally be reached Monday-Thursday: 7:30AM-5:00PM Friday: 7:30AM-4:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.O.G./Examiner, Art Unit 2151       

/James Trujillo/Supervisory Patent Examiner, Art Unit 2151
Read full office action
Prosecution Timeline

Jan 14, 2022
Application Filed
Jun 16, 2025
Non-Final Rejection — §101, §103, §112
Nov 20, 2025
Response Filed
Jan 23, 2026
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/485,179
Patent 12602200
ANALOG MULTIPLY-ACCUMULATE UNIT FOR MULTIBIT IN-MEMORY CELL COMPUTING
2y 5m to grant Granted Apr 14, 2026
17/765,495
Patent 12566586
HIGH-SPEED QUANTUM RANDOM NUMBER GENERATOR BASED ON VACUUM STATE FLUCTUATION TECHNOLOGY
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
44%
Grant Probability
99%
With Interview (+71.1%)
4y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 9 resolved cases by this examiner. Grant probability derived from career allow rate.