DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The IDS submitted 8/29/24, contains an NPL entry # 5: “The Compute Architecture of Intel Processor Graphics Gen9” that the examiner hasn’t received and therefore it hasn’t been considered.
Specification
Abstract
The abstract of the disclosure is objected to because it isn’t relevant to the claims of 11/12/2024. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Title of the invention
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The following title is suggested:
UTILIZING A COMPUTE PIPELINE OF A GENERAL-PURPOSE GRAPHICS PROCESSING UNIT TO PERFORM MULTIPLE MATRIX MULTIPLY OPERATIONS
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 30-38 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because claim 30 is directed to a method, with the steps of fetching, determining, scheduling and retiring which amount to nothing more than software instructions. Software instructions are non-statutory under 35 U.S.C. 101.
Claims 31-38 depend from claim 30 and contain additional steps, for example claim 31 contains the step of determining. Therefore claims 31-38 have the same problem as claim 30 and are rejected under the same rationale.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21-25, 30-34 and 39-40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 7 and 11-16 of U.S. Patent No. 10,186,011 in view of BROTHERS (US2016/0358070A1).
Regarding claims 21-22, 24, 30-31, and 39-40 the patent teaches mostly all limitations, but doesn’t teach, however the analogous prior art BROTHERS teaches:
An accelerator comprising (BROTHERS: par. 35 lines 1-2);
first circuitry, second circuitry, third circuitry and fourth circuitry (BROTHERS: par. 145 lines 19-25);
multiple matrix multiply operations (BROTHERS: par. 67 lines 1-6).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine An accelerator; first circuitry, second circuitry, third circuitry and fourth circuitry; multiple matrix multiply operations as shown in BROTHERS with the patent for the benefit of to facilitate improving and/or optimizing the computational efficiency of the trained neural network while substantially maintaining the same input-output relationship of the trained neural network at least with respect to established performance requirements [BROTHERS par. 28 lines 1-6].
Claims of 18/819,073 Claims of US 10,186,011
21. An accelerator comprising:
first circuitry to fetch and decode a single instruction into a decoded instruction, the decoded instruction associated with multiple matrix multiply operations to be performed via a compute pipeline of a general-purpose graphics processing unit;
second circuitry to determine a set of pipeline commands to perform the multiple matrix multiply operations;
third circuitry to schedule the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit; and
fourth circuitry to retire the decoded instruction in response completion of the set of pipeline commands.
22. The accelerator of claim 21, second circuitry to analyze parameters associated with the decoded instruction to determine the set of pipeline commands to perform the multiple matrix multiply operations.
23. The accelerator of claim 21, wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network.
24. The accelerator of claim 21, the third circuitry to schedule the set of pipeline commands to one or more of multiple compute pipelines.
25. The accelerator of claim 24, the multiple compute pipelines include a general-purpose compute pipeline, a plurality of sparse compute pipelines, and/or a near-data compute pipeline.
30. A method comprising:
fetching and decoding a single instruction into a decoded instruction, the decoded instruction associated with multiple matrix multiply operations to be performed via a compute pipeline of a general-purpose graphics processing unit;
determining a set of pipeline commands to perform the multiple matrix multiply operations;
scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit; and
retiring the decoded instruction in response completion of the set of pipeline commands.
31. The method of claim 30, comprising analyzing parameters associated with the decoded instruction to determine the set of pipeline commands to perform the multiple matrix multiply operations.
32. The method of claim 30, wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network.
33. The method of claim 30, wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling the set of pipeline commands to one or more of multiple compute pipelines.
34. The method of claim 33, the multiple compute pipelines include a general-purpose compute pipeline, a plurality of sparse compute pipelines, and/or a near-data compute pipeline.
39. A data processing system comprising:
a memory device; and
an accelerator coupled with the memory device, the accelerator comprising:
first circuitry to fetch and decode a single instruction into a decoded instruction, the decoded instruction associated with multiple matrix multiply operations to be performed via a compute pipeline of a general-purpose graphics processing unit;
second circuitry to determine a set of pipeline commands to perform the multiple matrix multiply operations;
third circuitry to schedule the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit; and
fourth circuitry to retire the decoded instruction in response completion of the set of pipeline commands.
40. The data processing system of claim 39, wherein:
the second circuitry is to analyze parameters associated with the decoded instruction to determine the set of pipeline commands to perform the multiple matrix multiply operations,
wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network;
and
the third circuitry
is to schedule the set of
pipeline commands to one or more of multiple compute pipelines,
the multiple compute
pipelines including a general-purpose compute pipeline,
a plurality of sparse compute pipelines, and a near-data compute pipeline.
11. A method of performing machine learning operations, the method comprising:
fetching and decoding a single instruction into a decoded instruction, the decoded instruction associated with a set of multiple machine learning operations to be performed via a compute pipeline of a general-purpose graphics processing unit;
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
11. determining a set of
pipeline commands to perform the set of multiple machine learning operations; and
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
11. scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit.
13. The method as in claim 11, additionally comprising retiring the decoded instruction in response to completion of the set of pipeline commands.
12. The method as in claim 11, wherein determining a set of pipeline commands to perform the set of multiple machine learning operations includes analyzing parameters associated with the decoded instruction.
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
14. The method as in claim 11, wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network.
15. The method as in claim 11, wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling the set of pipeline commands to multiple compute pipelines…
15. the multiple compute pipelines including a
general-purpose compute pipeline and at least one compute pipeline selected from a sparse compute pipeline or a near-data compute pipeline.
11. A method of performing machine learning operations, the method comprising:
fetching and decoding a single instruction into a decoded instruction, the decoded instruction associated with a set of multiple machine learning operations to be performed via a compute pipeline of a general-purpose graphics processing unit;
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
11. determining a set of pipeline commands to perform the set of multiple machine learning operations; and
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
11. scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit.
13. The method as in claim 11, additionally comprising retiring the decoded instruction in response to completion of the set of pipeline commands.
12. The method as in claim 11, wherein determining a set of pipeline commands to perform the set of multiple machine learning operations includes analyzing parameters associated with the decoded instruction.
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
14. The method as in claim 11, wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network.
15. The method as in claim 11, wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling the set of pipeline commands to multiple compute pipelines…
15. the multiple compute pipelines including a general-purpose compute pipeline and at least one compute pipeline selected from a sparse compute pipeline or a near-data compute pipeline.
16. A data processing system comprising:
a memory coupled to the general-purpose graphics processing unit.
11. A method of performing machine learning operations, the method comprising:
fetching and decoding a single instruction into a decoded instruction, the decoded instruction associated with a set of multiple machine learning operations to be performed via a compute pipeline of a general-purpose graphics processing unit;
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
11. determining a set of pipeline commands to perform the set of multiple machine learning operations;
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
11. scheduling the set of
pipeline commands to the compute pipeline of the general-purpose graphics processing unit.
13. The method as in claim 11, additionally comprising retiring the decoded instruction in response to completion of the set of pipeline commands.
16. A data processing system comprising:
12. The method as in claim 11, wherein determining a set of pipeline commands to perform the set of multiple machine learning operations includes analyzing parameters associated with the decoded instruction.
7. The compute apparatus as in claim 6, wherein the convolution includes multiple matrix operations.
14. The method as in claim 11, wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network.
15. The method as in claim 11, wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling the set of pipeline commands to multiple compute pipelines, the multiple compute pipelines including a general-purpose compute pipeline and at least one compute pipeline selected from a sparse compute pipeline or a near-data compute pipeline.
Allowable Subject Matter
Claims 26-29 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 35-38 would be objected to (except for the 101 rejection) as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding claims 26-29 and 35-38, the prior art doesn’t teach:
26. The accelerator of claim 25, third circuitry to schedule the set of pipeline commands to one or more of the multiple compute pipelines based on a characteristic of one or more of the multiple matrix multiply operations and/or input data associated with one or more of the multiple matrix multiply operations.
27. The accelerator of claim 26, third circuitry to schedule the set of pipeline commands to one of the general-purpose compute pipeline and/or one of the plurality of sparse compute pipelines based on a presence of a sparse matrix operation within the set of pipeline commands.
28. The accelerator of claim 27, third circuitry to schedule the set of pipeline commands to one of the plurality of sparse compute pipelines based on a sparsity characteristic of the input data associated with one or more of the multiple matrix multiply operations, wherein the plurality of sparse compute pipelines include a first sparse compute unit configured for input at a first level of sparsity and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
29. The accelerator of claim 27, third circuitry to schedule the set of pipeline commands to one of the general-purpose compute pipeline, one of the plurality of sparse compute pipelines, and/or the near-data compute pipeline based on a memory access complexity of a workload to be performed by the set of pipeline commands.
35. The method of claim 34, comprising scheduling the set of pipeline commands to one or more of the multiple compute pipelines based on a characteristic of one or more of the multiple matrix multiply operations and/or input data associated with one or more of the multiple matrix multiply operations.
36. The method of claim 35, comprising scheduling the set of pipeline commands to one of the general-purpose compute pipeline and/or one of the plurality of sparse compute pipelines based on a presence of a sparse matrix operation within the set of pipeline commands.
37. The method of claim 36, comprising scheduling the set of pipeline commands to one of the plurality of sparse compute pipelines based on a sparsity characteristic of the input data associated with one or more of the multiple matrix multiply operations, wherein the plurality of sparse compute pipelines include a first sparse compute unit configured for input at a first level of sparsity and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
38. The method of claim 36, comprising scheduling the set of pipeline commands to one of the general-purpose compute pipeline, one of the plurality of sparse compute pipelines, and/or the near-data compute pipeline based on a memory access complexity of a workload to be performed by the set of pipeline commands.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
ORION (US2008/0313435A1) discloses a data processing apparatus and method are provided for executing complex instructions. The data processing apparatus executes instructions defining operations to be performed by the data processing apparatus, those instructions including at least one complex instruction defining a sequence of operations to be performed. The data processing apparatus comprises a plurality of execution pipelines, each execution pipeline having a plurality of pipeline stages and arranged to perform at least one associated operation. Issue circuitry interfaces with the plurality of execution pipelines and is used to schedule performance of the operations defined by the instructions. For the at least one complex instruction, the issue circuitry is arranged to schedule a first operation in the sequence, and to issue control signals to one of the execution pipelines with which that first operation is associated, those control signals including an indication of each additional operation in the sequence. Then, when performance of the first operation reaches a predetermined pipeline stage in that execution pipeline, that predetermined pipeline stage is arranged to schedule a next operation in the sequence, and to issue additional control signals to a further one of the execution pipelines with which that next operation is associated in order to cause that next operation to be performed. This has been found to provide a particularly efficient mechanism for handling the execution of complex instructions without the need to provide dedicated execution pipelines for those complex instructions, and without an increase in complexity of the issue circuitry; LIU (US2017/0024849A1) discloses convolution neural networks are able to be trained using a GPU and a CPU. To efficiently utilize a device's resources, the HetNet and HybNet approaches have been developed. The HetNet approach separates batches into partitions such that the GPU and CPU process separate batches. The HybNet approach separates the layers of a convolution neural network for the GPU and CPU; PURI (US2007/0047802A1) discloses a convolutional neural network is implemented on a graphics processing unit. The network is then trained through a series of forward and backward passes, with convolutional kernels and bias matrices modified on each backward pass according to a gradient of an error function. The implementation takes advantage of parallel processing capabilities of pixel shader units on a GPU, and utilizes a set of start-to-finish formulas to program the computations on the pixel shaders. Input and output to the program is done through textures, and a multi-pass summation process is used when sums are needed across pixel shader unit registers; RAO (US2015/0187040A1) discloses a method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency; CHUNG (US2018/0247185A1) discloses processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding a chain of instructions received via an input queue, where the chain of instructions comprises a first instruction that can only be processed by the matrix vector unit and a sequence of instructions that can only be processed by a multifunction unit. The method includes processing the first instruction using the MVU and processing each of instructions in the sequence of instructions depending upon a position of the each of instructions in the sequence of instructions.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAURICE L MCDOWELL, JR whose telephone number is (571)270-3707. The examiner can normally be reached Mon-Thur & Sat: 2pm-10pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Said A. Broome can be reached at 571-272-2931. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MAURICE L. MCDOWELL, JR/Primary Examiner, Art Unit 2612