DETAILED ACTION
Authorization for Internet Communications
The examiner encourages Applicant to submit an authorization to communicate with the examiner via the Internet by making the following statement (from MPEP 502.03):
“Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with the undersigned and practitioners in accordance with 37 CFR 1.33 and 37 CFR 1.34 concerning any subject matter of this application by video conferencing, instant messaging, or electronic mail. I understand that a copy of these communications will be made of record in the application file.”
Please note that the above statement can only be submitted via Central Fax, Regular postal mail, or EFS Web (PTO/SB/439).
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Examiner Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-7, and 16-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding claim 1 recite “executing a plurality of at least partially concurrent workloads by multiple compute engines of a parallel processor,” and “based at least in part on the executing of the at least partially concurrent workloads,” but do not appear in applicants’ specification. Applicants’ specification ¶ [0014] mentions executing workload concurrently but do not recite the particulars of the claim.
Regarding claim 16, recite “allocate computing resources of the parallel processor for use in execution of a plurality of at least partially concurrent workloads by multiple compute engines of the parallel processor” but do not appear in applicants’ specification. Applicants’ specification ¶ [0014] mentions executing workload concurrently but do not recite the particulars of the claim.
Claims 2-7, and 17-20 are rejected based on dependency to independent claims.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-7, 16-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1 and 16, recite “partially concurrent workloads” and “based at least in part on the execution of the plurality of workloads” and it is not clear what these statements mean as what is considered “part” and “partially”? Are the concurrent workloads able to be divided into parts? And if so, how many parts? The specification do not mention this and the metes and bounds of the claim limitation is not clear.
Claims 2-7, and 17-20 are rejected based on dependency to independent claims.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-9, and 11-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Regarding claim 1 this part of the eligibility analysis evaluates whether the claim falls within any statutory category. MPEP §2106.03. The claim recites a method; thus, the claim is directed to a method which is one of the statutory categories of invention.
Step 2A Prong 1: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04(II) and the October 2019 Update, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim.
The limitations “calculating, at least one priority inversion heuristic indicating a priority inversion between one or more higher priority workloads and lower priority workloads of the parallel processor; and based on the at least one priority inversion heuristic” as drafted, recite functions that, under its broadest reasonable interpretation, covers functions that could reasonably be performed in the mind, including with the aid of pen and paper, but for the recitation of generic computer components. That is, the limitations as drafted, are functions that, under its broadest reasonable interpretation, recite the abstract idea of a mental process. The limitations encompass a human mind carrying out the functions through observation, evaluation, judgment and/or opinion, or even with the aid of pen and paper. Thus, these limitations recite and fall within the “Mental Processes” grouping of abstract ideas. See MPEP §2106.04(a)(2). Accordingly, claim 1 recites a judicial exception (i.e. an abstract idea).
Step 2A, Prong 2, This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (a) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (b) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. 2019 PEG Section III(A)(2), 84 Fed. Reg. at 54-55.
In this case, this judicial exception is not integrated into a practical application. The claim recites the following additional elements “resource allocation circuitry of a parallel processor” and “scheduling circuitry of the parallel processor” is recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Accordingly, the additional elements do not integrate the recited judicial exception into a practical application, and the claim is therefore directed to the judicial exception. See MPEP 2106.05(f).
The claims include additional elements “issuing a signal to prevent allocation of at least one compute resource of the parallel processor for processing the lower priority workloads” and “executing a plurality of at least partially concurrent workloads by multiple compute engines of a parallel processor, the workloads comprising multiple higher priority workloads and multiple lower priority workloads”
The additional elements do not integrate the judicial exception into a practical application, because it only amounts to insignificant extra-solution activity of data input and output. Data input and output is consider well understood, routine, and conventual activity. See MPEP 2106.05(g).
Step 2B, This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. MPEP 2106.05.
The claims include additional elements “issuing a signal to prevent allocation of at least one compute resource of the parallel processor for processing the lower priority workloads” and “executing a plurality of at least partially concurrent workloads by multiple compute engines of a parallel processor, the workloads comprising multiple higher priority workloads and multiple lower priority workloads”
The additional elements do not do not add an inventive concept to the claim because it only amounts to insignificant extra-solution activity of data input and output. Data input and output is consider well understood, routine, and conventual activity. See MPEP 2106.05(g).
Claim 2, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “responsive to a release condition, allowing allocation of the at least one compute resource for processing the lower priority workloads, wherein the release condition includes at least one of expiry of a timer, a reset of the parallel processor, or a quantity of the higher priority workloads being below a threshold” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 3, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the at least one priority inversion heuristic comprises one or more of a quantity of incoming higher priority workloads in a queue of the parallel processor, a quantity of in-flight higher priority workloads in at least one pipeline of the parallel processor, a quantity of incoming lower priority workloads in the at least one pipeline, a quantity of render targets of the in-flight higher priority workloads and the incoming lower priority workloads, or one or more ratios of at least a subset of the lower priority workloads to at least a subset of the higher priority workloads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 4, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “further comprising: initiating a timer responsive to determining that allocation for a higher priority workload has failed; selectively preventing, responsive to determining that allocation for the higher priority workload is unsuccessful throughout a timer period between initiation of the timer and expiry of the timer, allocation of the at least one compute resource of the parallel processor for processing the lower priority workloads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 5, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein selectively preventing allocation of the at least one compute resource is further responsive to successful allocation for at least one lower priority workload. ” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 6, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the higher priority workloads are geometry workloads and the lower priority workloads are pixel workloads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 7, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the higher priority workloads are asynchronous compute workloads and the lower priority workloads are graphics workloads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 8, is an independent processor claim rejected for the same reasons as claim 1. In particular, the claim recites additional elements –a graphics engine, graphics queue, and shader engines--. The graphics engine, queue, and shader engines are recited at a high-level of generality (i.e., as a generic engine, queue, and shader engine) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Claim 9, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the higher priority workloads are geometry workloads and the lower priority workloads are pixel workloads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 11, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the priority inversion heuristics include at least one of a quantity of incoming higher priority workloads in the graphics queue, a quantity of in-flight higher priority workloads in the at least one graphics pipeline, a quantity of incoming lower priority workloads in the at least one graphics pipeline, a quantity of render targets of the in-flight higher priority workloads and the incoming lower priority workloads, and one or more ratios of at least a subset of the lower priority workloads to at least a subset of the higher priority workloads.” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 12, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the resource allocation circuitry is further configured to allow, responsive to a release condition, allocation of the at least one compute resource for processing the lower priority workloads, wherein the release condition includes at least one of expiry of a timer, a reset of the parallel processor, or a quantity of the higher priority workloads being below a threshold” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 13, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the resource allocation circuitry is further configured to: initiate a timer responsive to determining that allocation for a higher priority workload has failed, wherein the resource allocation circuitry is configured to prevent allocation of the at least one compute resource responsive to determining that allocation for the higher priority workload is unsuccessful throughout a timer period between initiation of the timer and expiry of the timer.” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 14, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the resource allocation circuitry is further configured to prevent allocation of the at least one compute resource responsive to determining that the at least one priority inversion heuristic exceeds a corresponding threshold” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 15, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “allow allocation of the at least one compute resource responsive to determining that there are less than a predetermined quantity of lower priority workloads in the graphics queue, regardless of whether the at least one priority inversion heuristic exceeds the corresponding threshold” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 16, is rejected for the same reasons as claim 1. In particular, the claim recites additional elements –a queue in at least one pipeline—. The queue in at least one pipeline is recited at a high-level of generality (i.e., as a generic queue) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim recites additional elements “allocate computing resources of the parallel processor for use in execution of a plurality of at least partially concurrent workloads by multiple compute engines of the parallel processor, the workloads comprising multiple higher priority workloads and multiple lower priority workloads; calculate, based at least in part on the execution of the plurality of workloads”
The additional elements do not integrate the judicial exception into a practical application, because it only amounts to insignificant extra-solution activity of data input and output. Data input and output is consider well understood, routine, and conventual activity. See MPEP 2106.05(g).
Claim 17, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the priority inversion heuristics include at least one of a quantity of incoming higher priority workloads in a queue, a quantity of in-flight higher priority workloads in at least one pipeline, a quantity of incoming lower priority workloads in the at least one pipeline, a quantity of render targets of the in-flight higher priority workloads and the incoming lower priority workloads, and one or more ratios of at least a subset of the lower priority workloads to at least a subset of the higher priority workloads.” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 18, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the resource allocator is further configured to: initiate a timer responsive to determining that allocation for a higher priority workload has failed, and issue the signal to the scheduling circuitry responsive to determining that allocation for the higher priority workload is unsuccessful throughout a timer period between initiation of the timer and expiry of the timer.” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 19, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the higher priority workloads are geometry workloads and the lower priority workloads are pixel workloads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim 20, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the higher priority workloads are asynchronous compute workloads and the lower priority workloads are graphics workloads.” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 4-6, and 8-19 are rejected under 35 U.S.C. 103 as being unpatentable over Araki (U.S. PG PUB 2011/0202930) in view of Acharaya et al. (U.S. PG PUB 2017/0091895).
Regarding claim 1, Araki teaches a method comprising:
calculating, (¶ [0066] “GPU driver 28 may send a preemption notification to GPU controller 32 to indicate that another command stream (e.g., a high-priority command stream) is ready for execution. In some examples, the preemption notification may also indicate to GPU 12 which command stream to execute upon preemption of the command stream currently being executed. GPU driver 28 may send a preemption notification to GPU controller 32, for example, by writing one or more values (e.g., via operating system 30) to one or more GPU registers that are polled by GPU controller 32. The one or more GPU registers may include one or more hardware GPU registers that are located in GPU 12, one or more memory-mapped GPU registers that are located in a memory (e.g., memory 10) or any combination thereof. When writing values to a memory-mapped GPU register, CPU 6 may write the values to one or more particular memory addresses in memory 10 that are polled by GPU controller 32. In some examples, GPU driver 28 may write a first value to one or more GPU registers to activate a preemption notification (including “interrupts” in GPU 12), and write a second value to the one or more GPU registers that identifies the location of the command stream to execute upon preemption of the command stream currently being executed.) via resource allocation circuitry (see ¶[0106] “and a relatively high speed operation processing device 40”) of a parallel processor (see ¶[0067] “When the instructions are thus carried out in parallel by the plurality of processors, the processing speed can be accelerated.”) and based at least in part on the executing of the at least partially concurrent workloads (see ¶[003] “When a resource is jointly used by different processes in a multiprocessor where the processes are being executed in parallel by a plurality of processors, exclusion control of the resource is a necessary step to guarantee a consistency between the processes. The exclusion control allows the processes to use the resource exclusively, and an interval during which the exclusion control is necessary is hereinafter called an exclusion control interval.”), at least one priority inversion heuristic indicating a priority inversion between one or more of the higher priority workloads of the parallel processor and at least one of the lower priority workloads of the parallel processor (see ¶ [0187] see priority inversion and see ¶ [0038] parallel by using a plurality of processors).
Araki does not expressly disclose, however, Archarya teaches
executing a plurality of at least partially concurrent workloads by multiple compute engines of a parallel processor (see ¶ [0069] “In some examples, a programmable shader unit may include a plurality of processing units that are configured to operate in parallel, e.g., an SIMD pipeline. A programmable shader unit may have a program memory that stores shader program instructions and an execution state register, e.g., a program counter register that indicates the current instruction in the program memory being executed or the next instruction to be fetched. The programmable shader units in processing units 34 may include, for example, vertex shader stages, pixel shader stages, geometry shader stages, hull shader stages, domain shader stages, compute shader stages, and/or unified shader stages”),
the workloads comprising multiple higher priority workloads and multiple lower priority workloads (see ¶[0072] “In some examples, GPU 12 may switch between command streams of the same application if needed (e.g., a high priority stream of application 24A preempts a low priority stream of application 24A). The command streams described in this disclosure should not be considered limited to being for different applications, and may be for the same application.”);
based on the at least one priority inversion heuristic, issuing a signal to scheduling circuitry of the parallel processor to prevent allocation of at least one compute resource of the parallel processor for processing the lower priority workloads (see ¶[0074] “In this example, software application 24B has a higher scheduling priority than the scheduling priority of software application 24A. In particular, in this example, software application 24B is a user interface (UI) software application that includes one or more instructions that cause graphic content to be displayed and that demands high priority access to GPU 12 to ensure timely updates of the UI.” and ¶ [0077] “GPU driver 28 provides a preemption notification to controller 32 indicating that the high-priority command stream is ready for processing and that this command stream should preempt any other lower-priority command streams that are executing on GPU 12” ).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Araki by adapting Acharaya to produce an acceptable user experience with respect to the UI by executing higher priority context over lower priority (see ¶ [0019] of Acharya).
Regarding claim 2, Araki teaches further comprising:
responsive to a release condition, allowing allocation of the at least one compute resource for processing the lower priority workloads (see ¶ [0009] “Therefore, a different process which similarly intends to acquire the same lock object is unable to proceed for the time being, and the process which failed to acquire the lock object has to busy-wait or sleep until the process which acquired the lock object releases the acquired lock object.”),
wherein the release condition includes at least one of expiry of a timer, a reset of the parallel processor, or a quantity of the higher priority workloads being below a threshold (see ¶ [0010] “When a length of time for the lock object to be released is larger than a process switch overhead, any process that failed to acquire the lock object sleeps, making it more advantageous to assign the processor to any other process. When the length of time until for the lock object to be released is smaller than the process switch overhead, on the other hand, any process that failed to acquire the lock object does not sleep, making it more advantageous to continue to wait for the lock object to be released while busy waiting”).
Regarding claim 4, Araki teaches further comprising: initiating a timer responsive to determining that allocation for a higher priority workload has failed, and selectively preventing, responsive to determining that allocation for the higher priority workload is unsuccessful throughout a timer period between initiation of the timer and expiry of the timer (see ¶ [0009] “Therefore, a different process which similarly intends to acquire the same lock object is unable to proceed for the time being, and the process which failed to acquire the lock object has to busy-wait or sleep until the process which acquired the lock object releases the acquired lock object.”), allocation of the at least one compute resource of the parallel processor for processing the lower priority workloads (see ¶[0187] “When a process B with an intermediate priority order is activated while the sharable resource is being used by the process C, the process B whose priority order is higher than that of the process C is prioritized over the process A regardless of its priority order lower than that of the process A.”).
Regarding claim 5, Araki teaches wherein selectively preventing allocation of the at least one compute resource is further responsive to successful allocation for at least one lower priority workload (see ¶ [0187] “When a process B with an intermediate priority order is activated while the sharable resource is being used by the process C, the process B whose priority order is higher than that of the process C is prioritized over the process A regardless of its priority order lower than that of the process A.”).
Regarding claim 6, Araki describes higher priority workloads and lower priority workloads (see ¶ [0187] “For example, when a process A with a high priority order wants to use a sharable resource in a multiprocessing system, a process C with a low priority order is already using the sharable resource”).
Araki does not specify, however, Acharya teaches that workloads are geometry workloads and workloads are pixel workloads and that they can be assigned a priority (see ¶[0025] “In general, a vertex shader stage in the GPU is typically fast and is constant for given geometry because the number of vertices is the same regardless of the size of the geometry (e.g., three vertices for a triangle regardless of the size of the triangle).” And “The pixel shader stage in the GPU may be time consuming (e.g., workload varies by resolution independent of geometry). For example, the workload of the vertex shader stage may be the same for different sized triangles, but the workload of the pixel shader stage may be different for different size triangles.” And see ¶[0027] “In the techniques described in this disclosure, the GPU executes a first set of commands via a graphics pipeline of the GPU in response to receiving a draw call (e.g., from a CPU). The draw call defines a plurality of primitives that are to be rendered by the first set of commands, and the graphics pipeline is configured to store data generated during the execution of the first set of commands in local memory of the GPU. Based on need for the GPU to execute a higher priority set of commands (e.g., second set of commands), the GPU may receive a preemption notification (e.g., from the CPU) during execution of the first set of commands and prior to rendering all of the one or more primitives, and in response to receiving the preemption notification, the GPU dynamically configures interconnection of stages the graphics pipeline of the GPU to output intermediate data generated during execution of one or more commands of the first set of commands to a memory that is external to the GPU. The GPU may then preempt the execution of the first set of commands, prior to completing the execution of the first set of commands to render the plurality of primitives of the draw call, for executing a second set of commands (e.g., the higher priority commands relative to the first set of commands”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Araki by adapting Acharaya to produce an acceptable user experience with respect to the UI by executing higher priority context over lower priority (see ¶ [0019] of Acharya).
Regarding claim 8, Araki teaches a parallel processor comprising:
resource allocation circuitry (see ¶[0106] “and a relatively high speed operation processing device 40”) configured to: the at least one priority inversion heuristic indicating priority inversion between higher priority workloads and lower priority workloads of the graphics workloads (see ¶ [0187] see priority inversion and see ¶ [0038] parallel by using a plurality of processors); and
Araki does not specify, however, Acharya teaches
a graphics engine configured to receive graphics workloads via a graphics queue and at least one graphics pipeline for processing the graphics workloads via a plurality of shader engines (see ¶ [0069] “In some examples, a programmable shader unit may include a plurality of processing units that are configured to operate in parallel, e.g., an SIMD pipeline. A programmable shader unit may have a program memory that stores shader program instructions and an execution state register, e.g., a program counter register that indicates the current instruction in the program memory being executed or the next instruction to be fetched. The programmable shader units in processing units 34 may include, for example, vertex shader stages, pixel shader stages, geometry shader stages, hull shader stages, domain shader stages, compute shader stages, and/or unified shader stages”);
calculate at least one priority inversion heuristic based on graphics workloads in at least one of the graphics queue or the at least one graphics pipeline (see ¶[0019] “When a high-priority UI command stream is queued by a host CPU to be executed on a GPU, the GPU may be executing another queued command stream associated with a different context that has a lower priority, such as, e.g., a non-UI graphics context or a context that uses a GPU to perform a general-purpose computing task (i.e., a general-purpose computing on graphics processing unit (GPGPU) task). Waiting for the lower-priority context to complete execution prior to executing the higher-priority UI command stream may not, in some cases, produce an acceptable user experience with respect to the UI.”);
issue a signal to scheduling circuitry of the parallel processor to prevent allocation of at least one shader engine of the plurality of shader engines (see ¶ [0069] “The programmable shader units in processing units 34 may include, for example, vertex shader stages, pixel shader stages, geometry shader stages, hull shader stages, domain shader stages, compute shader stages, and/or unified shader stages.”) to the lower priority workloads of the graphics workloads based on the at least one calculated priority inversion heuristic (see ¶[0074] “In this example, software application 24B has a higher scheduling priority than the scheduling priority of software application 24A. In particular, in this example, software application 24B is a user interface (UI) software application that includes one or more instructions that cause graphic content to be displayed and that demands high priority access to GPU 12 to ensure timely updates of the UI.” and ¶ [0077] “GPU driver 28 provides a preemption notification to controller 32 indicating that the high-priority command stream is ready for processing and that this command stream should preempt any other lower-priority command streams that are executing on GPU 12” see ¶ [0066] “In some examples, the preemption notification may also indicate to GPU 12 which command stream to execute upon preemption of the command stream currently being executed. GPU driver 28 may send a preemption notification to GPU controller 32, for example, by writing one or more values (e.g., via operating system 30) to one or more GPU registers that are polled by GPU controller 32. The one or more GPU registers may include one or more hardware GPU registers that are located in GPU 12, one or more memory-mapped GPU registers that are located in a memory (e.g., memory 10) or any combination thereof. When writing values to a memory-mapped GPU register, CPU 6 may write the values to one or more particular memory addresses in memory 10 that are polled by GPU controller 32. In some examples, GPU driver 28 may write a first value to one or more GPU registers to activate a preemption notification (including “interrupts” in GPU 12), and write a second value to the one or more GPU registers that identifies the location of the command stream to execute upon preemption of the command stream currently being executed”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Araki by adapting Acharaya to produce an acceptable user experience with respect to the UI by executing higher priority context over lower priority (see ¶ [0019] of Acharya).
Regarding claim 9, Araki describes higher priority workloads and lower priority workloads (see ¶ [0187] “For example, when a process A with a high priority order wants to use a sharable resource in a multiprocessing system, a process C with a low priority order is already using the sharable resource”).
Araki does not specify, however, Acharya teaches that workloads are geometry workloads and workloads are pixel workloads and that they can be assigned a priority (see ¶[0025] “In general, a vertex shader stage in the GPU is typically fast and is constant for given geometry because the number of vertices is the same regardless of the size of the geometry (e.g., three vertices for a triangle regardless of the size of the triangle).” And “The pixel shader stage in the GPU may be time consuming (e.g., workload varies by resolution independent of geometry). For example, the workload of the vertex shader stage may be the same for different sized triangles, but the workload of the pixel shader stage may be different for different size triangles.” And see ¶[0027] “In the techniques described in this disclosure, the GPU executes a first set of commands via a graphics pipeline of the GPU in response to receiving a draw call (e.g., from a CPU). The draw call defines a plurality of primitives that are to be rendered by the first set of commands, and the graphics pipeline is configured to store data generated during the execution of the first set of commands in local memory of the GPU. Based on need for the GPU to execute a higher priority set of commands (e.g., second set of commands), the GPU may receive a preemption notification (e.g., from the CPU) during execution of the first set of commands and prior to rendering all of the one or more primitives, and in response to receiving the preemption notification, the GPU dynamically configures interconnection of stages the graphics pipeline of the GPU to output intermediate data generated during execution of one or more commands of the first set of commands to a memory that is external to the GPU. The GPU may then preempt the execution of the first set of commands, prior to completing the execution of the first set of commands to render the plurality of primitives of the draw call, for executing a second set of commands (e.g., the higher priority commands relative to the first set of commands”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Araki by adapting Acharaya to schedule graphics processing unit (GPU) processes (see ¶[0018] of Acharaya).
Regarding claim 11, Araki does not specify, however, Acharya teaches wherein the priority inversion heuristics include at least one of a quantity of incoming higher priority workloads in the graphics queue, a quantity of in-flight higher priority workloads in the at least one graphics pipeline, a quantity of incoming lower priority workloads in the at least one graphics pipeline, a quantity of render targets of the in-flight higher priority workloads and the incoming lower priority workloads, and one or more ratios of at least a subset of the lower priority workloads to at least a subset of the higher priority workloads (see ¶[0019] ‘”When a high-priority UI command stream is queued by a host CPU to be executed on a GPU, the GPU may be executing another queued command stream associated with a different context that has a lower priority, such as, e.g., a non-UI graphics context or a context that uses a GPU to perform a general-purpose computing task (i.e., a general-purpose computing on graphics processing unit (GPGPU) task). Waiting for the lower-priority context to complete execution prior to executing the higher-priority UI command stream may not, in some cases, produce an acceptable user experience with respect to the UI.”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Araki by adapting Acharaya to produce an acceptable user experience with respect to the UI by executing higher priority context over lower priority (see ¶ [0019] of Acharya).
Regarding claim 12, Araki teaches wherein the resource allocation circuitry is further configured to allow, responsive to a release condition, allocation of the at least one shader engine (see ¶ [0069] “The programmable shader units in processing units 34 may include, for example, vertex shader stages, pixel shader stages, geometry shader stages, hull shader stages, domain shader stages, compute shader stages, and/or unified shader stages.”) for processing the lower priority workloads, wherein the release condition includes at least one of expiry of a timer, a reset of the parallel processor, or a quantity of the higher priority workloads being below a threshold (see ¶[0125] “Step S4 may analyze the information of the locking interval to select a method of accelerating the processing speed to a relatively high processing