Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Drawings
Figures 26-29 should be designated by a legend such as --Prior Art-- because only that which is old is illustrated. See MPEP § 608.02(g). Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 3-5, and 7-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to patent ineligible subject matter.
To determine if a claim is directed to patent ineligible subject matter, the Court has guided the Office to apply the Alice/Mayo test, which requires:
1. Determining if the claim falls within a statutory category;
2A. Determining if the claim is directed to a patent ineligible judicial exception consisting of a law of nature, a natural phenomenon, or abstract idea; and
2B. If the claim is directed to a judicial exception, determining if the claim recites limitations or elements that amount to significantly more than the judicial exception.
(See MPEP 2106).
Step 2A is a two-prong inquiry, (see MPEP 2106.04(II)(A)). Under the first prong, examiners evaluate whether a law of nature, natural phenomenon, or abstract idea is set forth or described in the claim. Abstract ideas include mathematical concepts, certain methods of organizing human activity, and mental processes, (see MEPEP 2106.04(a)(2)). The second prong is an inquiry into whether the claim integrates a judicial exception into a practical application. MPEP 2106.04(d).
During examination, examiners should apply the same eligibility analysis to all claims regardless of the number of exceptions recited therein. Unless it is clear that a claim recites distinct exceptions, such as a law of nature and an abstract idea, care should be taken not to parse the claim into multiple exceptions, particularly in claims involving abstract ideas. Accordingly, if possible examiners should treat the claim for Prong Two and Step 2B purposes as containing a single judicial exception. See MPEP 2106.04(II)(B).
With respect to claim 1, applying step 1, the preamble of claim 1 claims a device so this claim falls within the statutory category of a machine. In order to apply step 2A, a recitation of claim 1 is copied below. The limitations of the claim that describe an abstract idea are bolded.
A program control device comprising:
a memory configured to store instructions; and
a processor configured to execute the instructions to:
convert multiple programs to be simultaneously executed by an accelerator into intermediate representations indicating computation operations to be executed by programs and memory information indicating an amount of memory required by data used in the computation operations, respectively; and
determine an execution order of multiple intermediate representations so that an amount of memory usage used by the accelerator when the accelerator executes the multiple programs simultaneously is below a threshold value of the memory based on the converted multiple intermediate representations and multiple memory information.
Under step 2A prong one, the limitations “convert multiple programs to be simultaneously executed by an accelerator into intermediate representations indicating computation operations to be executed by programs and memory information indicating an amount of memory required by data used in the computation operations, respectively;” and “determine an execution order of multiple intermediate representations so that an amount of memory usage used by the accelerator when the accelerator executes the multiple programs simultaneously is below a threshold value of the memory based on the converted multiple intermediate representations and multiple memory information.” have been determined by the courts to be mental processes because such language (i.e., defining, determining, processing, and analyzing) can practically be performed in the human mind. For example:
a claim to "collecting information, analyzing it, and displaying certain results of the collection and analysis," where the data analysis steps are recited at a high level of generality such that they could practically be performed in the human mind, Electric Power Group v. Alstom, S.A., 830 F.3d 1350, 1353-54, 119 USPQ2d 1739, 1741-42 (Fed. Cir. 2016)
An application program interface for extracting and processing information from a diversity of types of hard copy documents – Content Extraction, 776 F.3d at 1345, 113 USPQ2d at 1356.
(see MPEP 2106.04(a)(2)(III)).
Under step 2A prong two, the judicial exception has not been integrated into a practical application. Here, the claim recites two additional limitations: A program control device comprising:
a memory configured to store instructions; and a processor configured to execute the instructions to. The only additional limitation falls within the category of “apply it” because the additional limitations are general purpose computer elements used in a standard capacity. See MPEP 2106.04(d) referencing MPEP 2106.05(f).
If the claim as a whole integrates the recited judicial exception into a practical application, then it would be patent eligible. Here, the claim is generally linked to accelerating code execution (see Specification [0003]-[0005]), but the claim does not provide any specific details of any form of parallel processing or accelerated code execution. Stated at this level of generality and when the claim is considered as a whole, merely describing steps that can result in improved code execution does not incorporate the judicial exception into a practical application.
Moving on to step 2B of the analysis, Examiner must consider whether each claim limitation individually or as an ordered combination amounts to significantly more than the abstract idea. This analysis includes determining whether an inventive concept is furnished by an element or a combination of elements that is beyond the judicial exceptions. For limitations that were categorized as “apply it” or generally linking the use of the abstract idea to a particular technological environment or field of use, the analysis is the same. The limitations that were determined to be extra-solution activity will require further analysis.
The only additional limitations are not significantly more because they are “apply it” limitations, (see MPEP 2106.05(f) – “Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more”).
After reviewing the specification and looking at the claim as a whole, there is no indication that this claim is directed to a combination of known elements arranged or organized in an unconventional manner. As drafted and under a broadest reasonable interpretation, generic computer components may be arranged in the conventional manner to perform the claimed limitations. Therefore, looking at the claim as an ordered combination, the limitations do not amount to significantly more than the abstract idea.
For the foregoing reasons, claim 1 is rejected under 35 U.S.C. 101 as being directed to patent ineligible subject matter.
With respect to claim 3, the invention is rejected for being directed to an abstract idea without significantly more. The claim recites: wherein the processor is further configured to execute the instructions to: include in the multiple intermediate representations a process of saving data used in the computation operations to memory used by a computing device other than the accelerator. The limitation is modifying “multiple intermediate representations”, which means it is a further limitation on the “convert...” and “determine...” limitations of the independent claim. In that context, placing further limitations on what is converted does not change the nature of the limitation. To the extent that the limitation implies data will be saved when the intermediate representation is eventually executed, that interpretation of the limitation would be an “apply it” limitation, (see MPEP 2106.05(f) – “Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more”). This judicial exception is not integrated into a practical application because there are no additional limitations outside of the abstract idea other than what has already been considered. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the same reason. For the foregoing reasons, claim 3 is rejected under 35 U.S.C. 101, as being directed to patent ineligible subject matter.
With respect to claim 4, the invention is rejected for being directed to an abstract idea without significantly more. The claim recites: wherein the memory information includes a range of the computation operations in which data is used. The limitation is modifying “information”, which means it is a further limitation on the “convert...” and “determine...” limitations of the independent claim. In that context, placing further limitations on what is converted does not change the nature of the limitation. This judicial exception is not integrated into a practical application because there are no additional limitations outside of the abstract idea other than what has already been considered. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the same reason. For the foregoing reasons, claim 4 is rejected under 35 U.S.C. 101, as being directed to patent ineligible subject matter.
With respect to claim 5, applying step 1, the preamble of claim 5 claims a method so this claim falls within the statutory category of a process.
Regarding the regarding the rest of the analysis, incorporating the rejection of claim 1, claim 5 is rejected for substantially similar rationale.
With respect to claim 7, incorporating the rejections of claim 5 and claim 3, claim 7 is rejected for a substantially similar rationale.
Claims 8-10 are rejected under 35 U.S.C. 101 because they do not conform to one of the four enumerated statutory classes. The current patent eligibility test is described in the MPEP (see MPEP 2106). Under step 1, Examiner must consider whether the claim is directed to a process, machine, manufacture, composition of matter, or an improvement thereof. Here, claims 8-10 are directed to a computer-readable recording medium, which covers mediums such as signals including data per se and software per se (see MPEP 2106.03). Because these mediums are not patent eligible under 35 U.S.C. 101, these claims are rejected under 35 U.S.C. 101 for being patent ineligible. Note, step 1 of the patent eligibility test is easily traversed by adding the term “non-transitory” to the claim (i.e., a non-transitory computer readable medium). The specification (Specification [0109]) has support for this amendment.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-10 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by "CASE: a compiler-assisted SchEduling framework for multi-GPU systems" (Chen)
With respect to claim 1, Chen teaches A program control device comprising (Chameleon node, [page 24 col 2 paragraph 3 lines 2-3]; solving the Motivating Example problem, [page 18 col 1 paragraph 4]-[page 18 col 2 paragraph 1]; where FIG. 1(c) shows the system shared by App1 & App 2, [page 19]; using framework shown in FIG. 2, [page 20]): a memory configured to store instructions (1128GB DRAM, [page 23 col 2 paragraph 3 lines 3-4]; storing the instructions shown graphically as the program framework shown in FIG. 2, [page 20]); and a processor configured to execute the instructions to (Intel Xeon E5-2670 CPU, [page 23 col 2 paragraph 3 line 3] executing the instructions shown graphically as the program framework shown in FIG. 2, [page 20]): convert multiple programs to be simultaneously executed by an accelerator into intermediate representations indicating computation operations to be executed by programs (see FIG. 2, the "compiler pass" applied to the App 1 through App k, [page 20]; CASE leverages a compiler pass, coupled with the lazy runtime, to construct GPU tasks... . It works on the LLVM IR of applications, [page 20 col 2 paragraph 3 lines 1-3]; note the term IR stands for "intermediate representation" and the accelerator is the GPU; example IR is shown in FIG. 4, each line in FIG. 4 is a computation operation to be executed by programs, [page 21]; see also FIG. 1 for context, [page 19]) and memory information indicating an amount of memory required by data used in the computation operations, respectively (see FIG. 2, the "compiler pass", [page 20]; CASE leverages a compiler pass, coupled with the lazy runtime, to ... and gather their resource requirements, [page 20 col 2 paragraph 3 lines 1-3]); and determine an execution order of multiple intermediate representations (see FIG. 2, the "scheduler", [page 20], A user-level scheduler is deployed to place GPU tasks on appropriate devices based on their resource requirements, [page 23 col 1 paragraph 1 lines 1-2]; the kernels within tasks are ordered see example of “two successive GPU kernels”, [page 21 col 2 paragraph 2 lines 1-8]) so that an amount of memory usage used by the accelerator when the accelerator executes the multiple programs simultaneously is below a threshold value of the memory ("based on their resource requirements", [page 23 col 1 paragraph 1 line 1]; previously defined as maximum heap memory size: "maximum heap memory size used by dynamic memory allocations inside a GPU is either statically bound to a CUDA task or dynamically intercepted and bound by the lazy runtime by analyzing the call cudaDeviceSetLimit, [page 22 col 2 paragraph 1 lines 6-10]) based on the converted multiple intermediate representations and multiple memory information (The aforementioned compiler pass will automatically insert it at the beginning of each GPU task, and feed it with appropriate parameters, which contain the details about the resources required by the task, including the number of blocks, the threads per block, the total memory size, [page 23 col 1 paragraph 1 lines 7-11]).
With respect to claim 2, Chen teaches all of the limitations of claim 1, as noted above. Chen further teaches further comprising: the accelerator which simultaneously executes the multiple programs by executing the multiple intermediate representations according to a determined execution order (2NVIDIA P100s, [page 24 col 2 paragraph 3 line 4], which are the GPU with the CUDA runtime shown FIG. 2, [page 20], The device ID task_begin is returned to the application, and then the
directs the following GPU task to that device via specific mechanisms provided by the GPU runtime system, [page 23 col 1 paragraph 1 lines 17-20], here the GPU is called “the device”; the kernels within tasks are ordered see example of “two successive GPU kernels”, [page 21 col 2 paragraph 2 lines 1-8]).
With respect to claim 3, Chen teaches all of the limitations of claim 1, as noted above. Chen further teaches wherein the processor is further configured to execute the instructions to: include in the multiple intermediate representations a process of saving data used in the computation operations to memory used by a computing device other than the accelerator (example program shown in FIG. 3, the “main” function is running on the CPU, see lines 7-8; the initialized variables get copied to the GPU/accelerator memory at lines 27-28; and then the result gets copied back to main memory at line 35 into variable C, so line 35 of FIG. 3 teaches the limitation, [page 21]).
With respect to claim 4, Chen teaches all of the limitations of claim 1, as noted above. Chen further teaches wherein the memory information includes a range of the computation operations in which data is used (memory information is the resource requirements for a GPU task: “constructs GPU tasks and instruments applications with one probe per task. At runtime, the probes convey tasks’ resource requirements to the scheduler before they execute, [page 20 col 2 paragraph 1 lines 1-4]; and the ”range of computation operations” is all of the computation operations required for that task with FIG. 4 showing each computation operation in the range from 1-11, [page 21]).
With respect to claim 5, Chen teaches A program control method comprising (CASE constructs GPU tasks from CUDA programs and instruments the code with a probe before each one. At runtime, each probe conveys information about its task’s resource requirements such as memory and the number of streaming multiprocessor (SMs) needed to a user-level scheduler. The scheduler then places each task onto a suitable device by employing a policy appropriate to the system, [Abstract] lines 12-20]; solving the Motivating Example problem, [page 18 col 1 paragraph 4]-[page 18 col 2 paragraph 1]; where FIG. 1(c) shows the system shared by App1 & App 2, [page 19]; using framework shown in FIG. 2, [page 20]).
Regarding the rest of claim 5, incorporating the rejection of claim 1, claim 5 is rejected for a substantially similar rationale.
With respect to claim 6, incorporating the rejections of claim 2 and claim 5, claim 6 is rejected for a substantially similar rationale.
With respect to claim 7, incorporating the rejections of claim 3 and claim 5, claim 7 is rejected for a substantially similar rationale.
With respect to claim 8, Chen teaches A computer-readable recording medium recording a program control program causing an accelerator to execute ((1128GB DRAM, [page 23 col 2 paragraph 3 lines 3-4]; storing the instructions shown graphically as the program framework shown in FIG. 2, [page 20]; executing the instructions shown graphically as the program framework shown in FIG. 2, [page 20], which causes an accelerator/GPU to execute).
Regarding the rest of claim 8, incorporating the rejection of claim 1, claim 8 is rejected for a substantially similar rationale.
With respect to claim 9, incorporating the rejections of claim 2 and claim 8, claim 9 is rejected for a substantially similar rationale.
With respect to claim 10, incorporating the rejections of claim 3 and claim 8, claim 10 is rejected for a substantially similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20190324810 A1 (Zhao) - A method of scheduling a dedicated processing resource includes: obtaining source code of an application to be compiled; extracting, during compiling of the source code, metadata associated with the application, the metadata indicating an amount of the dedicated processing resource required by the application; and obtaining, based on the metadata, the dedicated processing resource allocated to the application. In this manner, performance of the dedicated processing resource scheduling system and resource utilization is improved, [Abstract].
US 20150221059 A1 (Baker) - A method and system for a command processor for efficient processing of a program multi-processor core system with a CPU and GPU. The multi-core system includes a general purpose CPU executing commands in a CPU programming language and a graphic processing unit (GPU) executing commands in a GPU programming language. A command processor is coupled to the CPU and CPU. The command processor sequences jobs from a program for processing by the CPU or the GPU. The command processor creates commands from the jobs in a state free command format. The command processor generates a sequence of commands for execution by either the CPU or the GPU in the command format. A compiler running a meta language converts program data for the commands into a first format readable by the CPU programming language and a second format readable by the GPU programming language, [Abstract].
US 9,477,525 B2 (Munshi) - A method and an apparatus for a parallel computing program calling APis (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task, [Abstract].
US 10,831,547 B2 (Suzuki) - An accelerator control apparatus includes : a task storage part which holds an executable task(s) ; a data scheduler which selects a task needing a relatively small input/output data amount on a memory included in an accelerator when the task is executed by the accelerator from the executable task (s) and instructs the accelerator to prepare for data I/O on the memory for the selected task ; and a task scheduler which instructs the accelerator to execute the selected task and adds a task that becomes executable upon completion of the selected task to the task storage part , wherein the data scheduler continues , depending on a use status of the memory , selection of a next task from the executable task (s) held in the task storage part and preparation of data I/O for the next task selected, [Abstract].
“Using an Intermediate Representation to map workloads on heterogeneous parallel systems” (2016-Benoit) - Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor’s internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we show how to start from a specific Intermediate Representation (IR) to explore the configuration space of program-part mappings on heterogeneous targets thank to a GCC pluggin. Then we define a simple execution model with predictable performance and use execution time from generated elementary kernels to predict the performance of the whole application. We show a very good accuracy of this framework to predict real world performance on experimental results and how we can use it to choose the best parallel configuration, [Abstract].
“TIRAMISU: A Polyhedral Compiler for Expressing Fast and Portable Code” (2018-Baghdadi) - This paper introduces TIRAMISU, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. TIRAMISU introduces a scheduling language with novel commands to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. TIRAMISU has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. TIRAMISU uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate TIRAMISU by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that TIRAMISU matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines, [Abstract].
“A Compiler Optimization Framework For Directive-Based GPU Computing” (2016-Tian) - In Figure 3.2(b), unstructured data directives and a kernels compute construct are used. This code defines arrays to be allocated in the device memory for the remaining duration of the program, or until an exit data directive that deallocates the data is encountered. They also tell whether data should be copied from the host to the device memory at the enter data directive, and copied from the device to host memory at the exit data directive. The dynamic range of the program between the enter data directive and the matching exit data directive is the data lifetime for that data, [page 29 paragraph 2 lines 1-8]; and FIG. 3.2:
PNG
media_image1.png
392
698
media_image1.png
Greyscale
,
[page 30].
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL MILLER whose telephone number is (408) 918-7548. The examiner can normally be reached on Monday-Friday from 11am to 5pm (PT).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kevin Young, can be reached at telephone number (571) 270-3180. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center and Private PAIR to authorized users only. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated- interview-request-air-form.
/D.M./Examiner, Art Unit 2187
/KEVIN L YOUNG/Supervisory Patent Examiner, Art Unit 2194