Office Action Analysis: 18374745 — HARDWARE QUEUE PRIORITY MECHANISM

Office Action

§101 §103
DETAILED ACTION
Claims 1-20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.

With Respect to Claim 1:
Step 1: Claim 1 is directed to a system, which is a machine, and falls within one of the statutory categories of invention.

Step 2A, Prong One: Claim 1 recites the limitations:
schedule the elements of the dispatch queue for execution at the shader circuitry based on the plurality of priority indicators.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper. A person could compute priority values and determine the element for the dispatch queue to be executed earlier than other elements.

Step 2A, Prong Two: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements:
and an arbitration circuit configured to…
a dispatch queue configured to store elements to be executed by shader circuitry; store a plurality of priority indicators corresponding to respective elements of the dispatch queue;
The additional element (a) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus does not integrate into a practical application. See MPEP § 2106.05(f).
Furthermore, the additional element (b) is mere data gathering or storing. Therefore, (b) is an insignificant extra-solution activity to the judicial exception. See MPEP § 2106.05(g).

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as a combination do not amount to significantly more than the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the claim recites the additional elements:
and an arbitration circuit configured to…
a dispatch queue configured to store elements to be executed by shader circuitry; store a plurality of priority indicators corresponding to respective elements of the dispatch queue;
The additional element (a) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus does not amount to significantly more than the judicial exception. See MPEP § 2106.05(f).
With regards to additional element (b), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d).

With Respect to Claim 2:
Under Step 2A Prong 1, Claim 2 depends on Claim 1, and it recites the following additional limitation:
wherein scheduling the elements of the dispatch queue for execution comprises, in response to receiving a particular priority corresponding to a particular element, identifying a next element to be sent to the shader circuitry.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.
Claim 2 does not recite any additional elements that have not already been analyzed previously under parent claims.

With Respect to Claim 3:
Under Step 2A Prong 1, Claim 3 depends on Claim 2, and it recites the following additional element:
wherein identifying the next element to be sent to the shader circuitry is performed prior to receiving a corresponding request for the next element to be sent to the shader circuitry.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.
Claim 3 does not recite any additional elements that have not already been analyzed previously under parent claims.

With Respect to Claim 4:
Under Step 2A Prong 2, Claim 4 depends on Claim 1, and it recites the following additional elements:
wherein the dispatch queue comprises a plurality of slots configured to store respective elements, and wherein the plurality of priority indicators correspond to respective slots of the plurality of slots.
This judicial exception is not integrated into a practical application. The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not integrate into a practical application. See MPEP § 2106.05(h).
Under Step 2B, The claim recites the additional element:
wherein the dispatch queue comprises a plurality of slots configured to store respective elements, and wherein the plurality of priority indicators correspond to respective slots of the plurality of slots.
The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not amount to significantly more than the judicial exception in Step 2B. See MPEP § 2106.05(h).

With Respect to Claim 5:
Under Step 2A Prong 1, Claim 5 depends on Claim 1, and it recites the following limitation:
wherein scheduling the elements of the dispatch queue for execution comprises scheduling at least one element having a second priority in response to scheduling a threshold number of elements having a first priority, wherein the second priority is a lower priority than the first priority.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.
Claim 5 does not recite any additional elements that have not already been analyzed previously under parent claims.

With Respect to Claim 6:
Under Step 2A Prong 2, Claim 6 depends on Claim 1, and it recites the following additional element:
wherein the plurality of priority indicators are identifiers of respective processes that correspond to the elements.
This judicial exception is not integrated into a practical application. The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not integrate into a practical application. See MPEP § 2106.05(h).
Under Step 2B, The claim recites the additional element:
wherein the plurality of priority indicators are identifiers of respective processes that correspond to the elements.
The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not amount to significantly more than the judicial exception in Step 2B. See MPEP § 2106.05(h).

With Respect to Claim 7:
Under Step 2A Prong 2, Claim 7 depends on Claim 1, and it recites the following additional element:
wherein the plurality of priority indicators are identifiers of respective virtual machines that correspond to the elements.
This judicial exception is not integrated into a practical application. The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not integrate into a practical application. See MPEP § 2106.05(h).
Under Step 2B, The claim recites the additional element:
wherein the plurality of priority indicators are identifiers of respective virtual machines that correspond to the elements.
The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not amount to significantly more than the judicial exception in Step 2B. See MPEP § 2106.05(h).

With Respect to Claim 8:
Under Step 2A Prong 1, Claim 8 depends on Claim 1, and it recites the following limitation:
and wherein scheduling the elements of the dispatch queue for execution comprises selecting, based on a priority selection signal, between an element indicated by the plurality of priority indicators and an element indicated by the second plurality of priority indicators.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.
Under Step 2A Prong 2, Claim 8 recites the following additional elements:
the arbitration circuit
wherein 
The additional element (a) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus does not integrate into a practical application. See MPEP § 2106.05(f). Furthermore, the additional element (b) is mere data gathering or storing. Therefore, (b) is an insignificant extra-solution activity to the judicial exception. See MPEP § 2106.05(g).
Under Step 2B, The claim recites the additional elements:
the arbitration circuit
wherein 
The additional element (a) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(f). Furthermore, with regards to additional element (b), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d).

With Respect to Claim 9:
Under Step 2A Prong 2, Claim 9 depends on Claim 8, and it recites the following additional element:
wherein the priority selection signal is indicative of availability of a hardware circuit.
This judicial exception is not integrated into a practical application. The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not integrate into a practical application. See MPEP § 2106.05(h).
Under Step 2B, The claim recites the additional element:
wherein the priority selection signal is indicative of availability of a hardware circuit.
The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not amount to significantly more than the judicial exception in Step 2B. See MPEP § 2106.05(h).

With Respect to Claim 10:
Under Step 2A Prong 2, Claim 10 depends on Claim 9, and it recites the following additional element:
wherein the hardware circuit is a memory circuit or the shader circuitry.
This judicial exception is not integrated into a practical application. The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not integrate into a practical application. See MPEP § 2106.05(h).
Under Step 2B, The claim recites the additional element:
wherein the hardware circuit is a memory circuit or the shader circuitry.
The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not amount to significantly more than the judicial exception in Step 2B. See MPEP § 2106.05(h).

With Respect to Claim 11:
Step 1: Claim 11 is directed to a method, which is a process, and falls within one of the statutory categories of invention.

Step 2A, Prong One: Claim 11 recites the limitations:
sorting priorities stored
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.

Step 2A, Prong Two: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements:
receiving, 
at an arbitration circuit, at the arbitration circuit, executed by shader circuitry
and in response to a request for an element, providing a next element indicated by the sorted priorities to the shader circuitry.
The additional elements (a) and (c) are mere data gathering or storing. Therefore, (a) and (c) are insignificant extra-solution activities to the judicial exception. See MPEP § 2106.05(g).
Furthermore, the additional element (b) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus does not integrate into a practical application. See MPEP § 2106.05(f).

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as a combination do not amount to significantly more than the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the claim recites the additional elements:
receiving, 
at an arbitration circuit, stored at the arbitration circuit, executed by shader circuitry
and in response to a request for an element, providing a next element indicated by the sorted priorities to the shader circuitry.
With regards to additional elements (a) and (c), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d). 
Furthermore, the additional element (b) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus does not amount to significantly more than the judicial exception. See MPEP § 2106.05(f).

With Respect to Claim 12:
Under Step 2A Prong 1, Claim 12 depends on Claim 11, and it recites the following limitation:
wherein the priority indicates that the element has a first priority.
This limitation expands on the limitation “sorting priorities stored” from Claim 11. Thus, these recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.

Under Step 2A Prong 2, Claim 12 depends on Claim 11, and it recites the following additional element:
and wherein a second element currently stored at the dispatch queue has a second priority.
The additional element (a) is mere data gathering or storing. Therefore, (a) is an insignificant extra-solution activity to the judicial exception. See MPEP § 2106.05(g).
Under Step 2B, The claim recites the additional element:
and wherein a second element currently stored at the dispatch queue has a second priority.
With regards to additional element (a), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d). 

With Respect to Claim 13:
Under Step 2A Prong 2, Claim 13 depends on Claim 11, and it recites the following additional elements:
wherein receiving the indication of the priority comprises extracting the priority from a received opcode corresponding to the element.
This judicial exception is not integrated into a practical application. The additional element (a) is mere data gathering or storing. Therefore, (a) is an insignificant extra-solution activity to the judicial exception. See MPEP § 2106.05(g). Furthermore, the additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not integrate into a practical application. See MPEP § 2106.05(h).
Under Step 2B, The claim recites the additional elements:
wherein receiving the indication of the priority comprises extracting the priority from a received opcode corresponding to the element.
With regards to additional element (a), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d). Furthermore, the additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not amount to significantly more than the judicial exception. See MPEP § 2106.05(h).

With Respect to Claim 14:
Under Step 2A Prong 1, Claim 14 depends on Claim 11, and it recites the following limitation:
wherein the indication of the priority of the element is specified by a user.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.
Claim 14 does not recite any additional elements that have not already been analyzed previously under parent claims.

With Respect to Claim 15:
Under Step 2A Prong 2, Claim 15 depends on Claim 11, and it recites the following additional elements:
wherein the indication of the priority of the element is specified by a driver based on a status of a user-visible process that is to use data generated using the element
This judicial exception is not integrated into a practical application. The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not integrate into a practical application. See MPEP § 2106.05(h).
Under Step 2B, The claim recites the additional elements:
wherein the indication of the priority of the element is specified by a driver based on a status of a user-visible process that is to use data generated using the element
The additional element (a) is not comprised of anything beyond generally linking the use of the judicial exception to a particular field of use, and does not amount to significantly more than the judicial exception in Step 2B. See MPEP § 2106.05(h).

With Respect to Claim 16:
Step 1: Claim 16 is directed to a system, which is a machine, and falls within one of the statutory categories of invention.

Step 2A, Prong One: Claim 16 recites the limitations:
and to schedule the respective elements for processing 
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.

Step 2A, Prong Two: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements:
a bus; a first processing circuit configured to issue a plurality of commands via the bus; shader circuitry configured to process the plurality of elements;  and a second processing circuit configured to receive the plurality of commands from the first processing circuit; by the shader circuitry
a dispatch queue configured to store a plurality of elements from the plurality of commands; and an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the plurality of elements
The additional element (a) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus does not integrate into a practical application. See MPEP § 2106.05(f).
The additional element (b) is mere data gathering or storing. Therefore, (b) is an insignificant extra-solution activity to the judicial exception. See MPEP § 2106.05(g).

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as a combination do not amount to significantly more than the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the claim recites the additional elements:
a bus; a first processing circuit configured to issue a plurality of commands via the bus; shader circuitry configured to process the plurality of elements;  and a second processing circuit configured to receive the plurality of commands from the first processing circuit; by the shader circuitry
a dispatch queue configured to store a plurality of elements from the plurality of commands; and an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the plurality of elements
The additional element (a) is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the judicial exception using generic computer components, and thus does not amount to significantly more than the judicial exception. See MPEP § 2106.05(f).
Furthermore, with regards to additional element (b), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d).

With Respect to Claim 17:
Under Step 2A Prong 2, Claim 17 depends on Claim 16, and it recites the following additional element:
wherein scheduling the elements of the dispatch queue for processing comprises, in response to receiving a request for an element, sending a previously identified element to the shader circuitry.
This judicial exception is not integrated into a practical application. The additional element (a) is mere data gathering or storing. Therefore, (a) is an insignificant extra-solution activity to the judicial exception. See MPEP § 2106.05(g).

Under Step 2B, The claim recites the additional element:
wherein scheduling the elements of the dispatch queue for processing comprises, in response to receiving a request for an element, sending a previously identified element to the shader circuitry.
With regards to additional element (a), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d).

With Respect to Claim 18:
Under Step 2A Prong 2, Claim 18 depends on Claim 17, and it recites the following additional element:
wherein the arbitration circuit is further configured to prefetch the previously identified element.
This judicial exception is not integrated into a practical application. The additional element (a) is mere data gathering or storing. Therefore, (a) is an insignificant extra-solution activity to the judicial exception. See MPEP § 2106.05(g).
Under Step 2B, The claim recites the additional element:
wherein the arbitration circuit is further configured to prefetch the previously identified element.
With regards to additional element (a), the courts have identified functions such as gathering, displaying, updating, transmitting and storing data as well-understood, routine, conventional activity, and thus do not amount to significantly more than the judicial exception. See MPEP § 2106.05(d).

With Respect to Claim 19:
Under Step 2A Prong 1, Claim 19 depends on Claim 16, and it recites the following limitation:
wherein scheduling the elements of the dispatch queue for processing comprises, in response to receiving a request for an element, identifying a next element to send to the shader circuitry.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.
Claim 19 does not recite any additional elements that have not already been analyzed previously under parent claims.

With Respect to Claim 20:
Under Step 2A Prong 1, Claim 20 depends on Claim 16, and it recites the following limitation:
wherein scheduling the elements of the dispatch queue for processing comprises schedule at least one element having a second priority in response to scheduling a threshold number of elements having a first priority, wherein the second priority is a lower priority than the first priority.
These recited steps, under the broadest reasonable interpretation, cover performance of the steps in the human mind alone or with the aid of pen and paper.
Claim 20 does not recite any additional elements that have not already been analyzed previously under parent claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8-12, 14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hartog (EP 2791795 B1) in view of Havlir (US 20230050061 A1).
Regarding Claim 1, Hartog teaches a system comprising:
a dispatch queue configured to store elements to be executed by shader circuitry (

    PNG
    media_image1.png
    629
    949
    media_image1.png
    Greyscale

Hartog discloses, “In one example, a subset of work-items in a workgroup that execute simultaneously together on a single SIMD engine can be referred to as a wavefront 136. The width of a wavefront is a characteristic of the hardware SIMD engine. As referred to herein, a workgroup is a collection of related work-items that execute on a single compute unit,” ¶ 0034, “To efficiently process data from multiple compute inputs, arbitration occurs between pipeline queues within compute pipelines CS P0 - CS P7, as illustrated in greater detail in FIG. 3. More specifically, arbitration policies in accordance with embodiments of the present invention allocate APD resources among the multiple pipeline inputs. A shader input block (SPI) 202 provides an arbitration scheme for submitting wavefronts between compute pipelines CS P0 - CS P7 and graphics pipeline 204. Wave dispatchers 206 are connected from two compute pipelines alternate to forward the wavefronts to shader core 208. Shader core 208 executes the wavefronts,” ¶ 0074.
The claimed “dispatch queue” is mapped to the disclosed “pipeline queue” that stores wavefronts (elements) that will be executed by a “shader core”.
The claimed “elements” is mapped to the disclosed “work-items” that comprise a wavefront. Said wavefronts are stored in the pipeline queue. This is illustrated by FIG. 2 of Hartog, which shows the compute pipelines CS P0 through CS P7 that store the wavefronts that consist of work-items before they are sent to the shader core 208.
The claimed “shader circuitry” is mapped to the disclosed “shader core”.);
and an arbitration circuit configured to store  (
Hartog discloses, “To efficiently process data from multiple compute inputs, arbitration occurs between pipeline queues within compute pipelines CS P0 - CS P7, as illustrated in greater detail in FIG. 3. More specifically, arbitration policies in accordance with embodiments of the present invention allocate APD resources among the multiple pipeline inputs. A shader input block (SPI) 202 provides an arbitration scheme for submitting wavefronts between compute pipelines CS P0 - CS P7 and graphics pipeline 204,” ¶ 0074, and “For example, HWS 128 supports scheduling techniques applied to RLC 150, based upon priority level, or based on other arbitration scheduling criteria,” ¶ 0076.
The claimed “arbitration circuit” is mapped to the circuitry that contains the disclosed “pipeline queues” where arbitration occurs. Here, wavefronts (elements) are scheduled for execution by the shader core based on arbitration policies.).
Hartog does not teach an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the dispatch queue, wherein the arbitration circuit is configured to schedule the elements of the dispatch queue for execution at the shader circuitry based on the plurality of priority indicators.
However, Havlir teaches an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the dispatch queue, wherein the arbitration circuit is configured to schedule the elements of the dispatch queue for execution at the shader circuitry based on the plurality of priority indicators (

    PNG
    media_image2.png
    437
    519
    media_image2.png
    Greyscale

Havlir discloses, “Each sub-unit 220 may include multiple shaders that accept work from distributed slots in the sub-unit and use pipelines to execute the work. For example, each shader may include a queue for each distributed hardware slot and may select work from among the queues based on work priority,” ¶ 0049,
“The control circuitry may select the first and second distribution rules based on amounts of work in the first and second sets of graphics work. The control circuitry may determine the first distribution rule based on one or more software overrides signaled by a graphics program being executed… The control circuitry may allow a logical slot with a first priority level to reclaim a hardware slot that is assigned to a logical slot with a second, lower priority level, based on one or more of the respective hold values,” ¶ 0195.
Havlir also teaches an “arbitration circuit” in the form of disclosed “control circuitry” that manages logical slots with different priority levels. 
The claimed “priority indicators” is mapped to the disclosed “priority levels” associated with the logical slots. The priority levels can indicate which logical slots will be assigned to the hardware slots for work.
The disclosed “control circuitry” schedules the elements for execution based on the work priority indicated by the priority levels.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the dispatch queue, wherein the arbitration circuit is configured to schedule the elements of the dispatch queue for execution at the shader circuitry based on the plurality of priority indicators. Doing so would help ensure that elements with a higher priority are selected first (Havlir discloses, “In some embodiments, low-priority logical slots are not allowed to reclaim hardware slots from high-priority logical slots unless there is no chance that a high-priority logical slot will use them,” ¶ 0134.).

Regarding Claim 2, Hartog in view of Havlir teaches the system of claim 1, wherein scheduling the elements of the dispatch queue for execution comprises, in response to receiving a particular priority corresponding to a particular element, identifying a next element to be sent to the shader circuitry (
Hartog discloses, “At step 506, the queue arbiter at the top of the compute pipeline signals a respective CP ME 301 thread to stop on the next packet boundary when the arbiter determines a better queue is ready for processing. If it is determined that a better queue is not available, the processes continues at step 508,” ¶ 0084, and “As illustrated in FIG. 6, for each priority level, the compute pipeline maintains a last queue executed scoreboard. A return to that priority level will process the next ready queue. If only one queue is ready in a priority level, it will resume,” ¶ 0089.
The claimed “next element” is mapped to the element contained in the disclosed “next ready queue”.
Here, the next ready queue is identified for its element to be sent for execution, in response to a priority level, associated with the queue, being received.).

Regarding Claim 3, Hartog in view of Havlir teaches the system of claim 2, wherein identifying the next element to be sent to the shader circuitry is performed prior to receiving a corresponding request for the next element to be sent to the shader circuitry (

    PNG
    media_image3.png
    805
    429
    media_image3.png
    Greyscale

Hartog discloses, “At step 506, the queue arbiter at the top of the compute pipeline signals a respective CP ME 301 thread to stop on the next packet boundary when the arbiter determines a better queue is ready for processing. If it is determined that a better queue is not available, the processes continues at step 508,” ¶ 0084, “At step 514, the state of the previous queue is scheduled to be saved and prefetched data is scheduled to be discarded. The CP ME can release the fetcher to select the next queue for processing,” ¶ 0087, and “As illustrated in FIG. 6, for each priority level, the compute pipeline maintains a last queue executed scoreboard. A return to that priority level will process the next ready queue. If only one queue is ready in a priority level, it will resume,” ¶ 0089.
The claimed “next element” is the element contained in the disclosed “next ready queue”.
The claimed “corresponding request” is mapped to the disclosed “signal” made by the queue arbiter to a respective “CP ME 301 thread” to stop on the next packet boundary to request a better queue for processing if ready. This leads to the next ready queue being selected for processing.
Here, the next ready queue is identified for its element to be sent for execution, prior to the next ready queue being selected in order for its element to be sent for processing. This is illustrated in steps 506 through 514 in FIG. 5 of Hartog.).

Regarding Claim 4, Hartog in view of Havlir teaches the system of claim 1, wherein the dispatch queue comprises a plurality of slots configured to store respective elements, and wherein the plurality of priority indicators correspond to respective slots of the plurality of slots (
Havlir discloses, “Each distributed hardware slot may include various circuitry configured to process an assigned kick or portion thereof, including configuration registers, a work queue, circuitry configured to iterate through work in the queue (e.g., batches of compute workitems), circuitry to sequence context loads/stores, and work distribution tracking circuitry. Each sub-unit 220 may include multiple shaders that accept work from distributed slots in the sub-unit and use pipelines to execute the work. For example, each shader may include a queue for each distributed hardware slot and may select work from among the queues based on work priority,” ¶ 0049.
Here, each slot is configured to store a queue that corresponds to work priority, so that work-items (elements) can be selected from among the queues based on priority.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein the dispatch queue comprises a plurality of slots configured to store respective elements, and wherein the plurality of priority indicators correspond to respective slots of the plurality of slots. Doing so would help ensure that work that has a higher priority is selected first in order to improve the efficiency of scheduling (Havlir discloses, “Each sub-unit 220 may include multiple shaders that accept work from distributed slots in the sub-unit and use pipelines to execute the work. For example, each shader may include a queue for each distributed hardware slot and may select work from among the queues based on work priority,” ¶ 0049.).

Regarding Claim 5, Hartog in view of Havlir teaches the system of claim 1, wherein scheduling the elements of the dispatch queue for execution comprises scheduling at least one element having a second priority in response to scheduling a threshold number of elements having a first priority, wherein the second priority is a lower priority than the first priority (
Havlir discloses, “As briefly discussed above, different logical slots may have different priority levels, e.g., as specified by software. In some embodiments, on a given mGPU, a subset of hardware slots are reserved for logical slots that meets a threshold priority (e.g., higher priority slots in a system with two priority levels),” ¶ 0131.
Here, at least two priority levels exist, where one priority level is higher than the other, lower priority. If all the higher-level priority slots are taken, then an element will be scheduled on a slot with lower priority.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein scheduling the elements of the dispatch queue for execution comprises scheduling at least one element having a second priority in response to scheduling a threshold number of elements having a first priority, wherein the second priority is a lower priority than the first priority. Doing so would help ensure that higher priority tasks will be selected first in order to improve the efficiency of scheduling (Havlir discloses, “In some embodiments, on a given mGPU, a subset of hardware slots are reserved for logical slots that meets a threshold priority (e.g., higher priority slots in a system with two priority levels),” ¶ 0131.).

Regarding Claim 6, Hartog in view of Havlir teaches the system of claim 1, wherein the plurality of priority indicators are identifiers of respective processes that correspond to the elements (
Havlir discloses, “FIG. 19B is a diagram illustrating example tracking and status data per tracking slot, according to some embodiments. In the illustrated embodiment, circuitry 1920 maintains the following information for each tracking slot: identifier, status, data identification, dependencies, run data, and configuration. Each of these example fields is discussed in detail below. In some embodiments, the status and run data fields are read-only by software and the other fields are software configurable,” ¶ 0171.
Here, each slot associated with a priority level also identifies a process (consisting of associated data such as data identification, dependencies, run data, and status) that corresponds to the slot.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein the plurality of priority indicators are identifiers of respective processes that correspond to the elements. Doing so would help provide more information regarding the processes to the user, in order to make more informed decisions based on the information. (Havlir discloses, “In the illustrated embodiment, circuitry 1920 maintains the following information for each tracking slot: identifier, status, data identification, dependencies, run data, and configuration,” ¶ 0171.).

Regarding Claim 8, Hartog in view of Havlir teaches the system of claim 1, wherein the arbitration circuit is further configured to store a second plurality of priority indicators, and wherein scheduling the elements of the dispatch queue for execution comprises selecting, based on a priority selection signal, between an element indicated by the plurality of priority indicators and an element indicated by the second plurality of priority indicators (

    PNG
    media_image4.png
    381
    399
    media_image4.png
    Greyscale

Havlir discloses, “At 1130, in the illustrated embodiment, DRA 620 first determines if the request is for a low or high priority logical slot and operates accordingly. Note that other granularities of priority may be supported in other embodiments. For a low-priority requestor, the DRA 620 generates a do_set of slots which are slots in allowed_set with a medium hold value that are owned by a low-priority logical slot,” ¶ 0119,
“For a high-priority requestor, the DRA 620 generates a do_set of slots which are slots in allowed_set with a medium hold value that are owned by a high-priority logical slot,” ¶ 0120,
 “FIG. 13 is a block diagram illustrating a number of hardware slots of an mGPU. In some embodiments, one or more dSlots (shown in solid black in FIG. 13) are reserved for high priority logical slots and one or more dSlots (shown with horizontal shading in FIG. 13) are available to all logical slots (and are the only hardware slots available to low-priority logical slots),” ¶ 0132.
The claimed “priority selection signal” is mapped to the disclosed “request” for a logical slot, which can be a low or high-priority logical slot, given that the claimed “priority indicator” is mapped to “priority level” of the slots.
Here, there are two different types of dSlots, one type associated with high priority logical slots and one type associated with low priority logical slots.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein the arbitration circuit is further configured to store a second plurality of priority indicators, and wherein scheduling the elements of the dispatch queue for execution comprises selecting, based on a priority selection signal, between an element indicated by the plurality of priority indicators and an element indicated by the second plurality of priority indicators. Doing so would help ensure that higher priority tasks will be selected first in order to improve the efficiency of scheduling (Havlir discloses, “In some embodiments, on a given mGPU, a subset of hardware slots are reserved for logical slots that meets a threshold priority (e.g., higher priority slots in a system with two priority levels),” ¶ 0131.).

Regarding Claim 9, Hartog in view of Havlir teaches the system of claim 8, wherein the priority selection signal is indicative of availability of a hardware circuit (
Havlir discloses, “Sub-units 220, in some embodiments, are scaling units that may be replicated to increase the processing capabilities of a GPU. Each GPU sub-unit 220 may be capable of independently processing instructions of a graphics program. Sub-units 220, in the illustrated embodiment, include circuitry that implements respective distributed hardware slots 230. These hardware slots may also be referred to herein as ‘dSlots.’ Each sub-unit may include multiple hardware slots 230. Sub-units may also be referred to herein as “mGPUs.” In some embodiments, primary control circuitry 210 assigns work from a logical slot to at most one distributed hardware slot in each sub-unit 220. In some embodiments, each sub-unit includes fragment generator circuitry, shader core circuitry configured to execute shader programs, memory system circuitry (which may include one or more caches and a memory management unit), geometry processing circuitry, and distributed workload distribution circuitry (which may coordinate with primary control circuitry 210 to distribute work to shader pipelines)” ¶ 0048,
 “At 1130, in the illustrated embodiment, DRA 620 first determines if the request is for a low or high priority logical slot and operates accordingly. Note that other granularities of priority may be supported in other embodiments,” ¶ 0119,
and “At 1140, in the illustrated embodiment, DRA 620 adds slots to the do_set that are in the allowed_set, have a high hold value, and belong to a logical slot with a lower priority and lower age. DRA 620 finds dSlots in both the flushing_set and updated do_set. If these dSlots are sufficient to service the request, DRA 620 reclaims those dSlots and begins the cache flush invalidate for those dSlots. If not, it may cancel reclaim and restart arbitration,” ¶ 0121.
The claimed “hardware circuit” is mapped to the overall circuitry that implements the sub-units, which includes “circuitry that implements respective distributed hardware slots 230”, which are also referred to as “dSlots”. Here, the request for a logical slot (claimed “priority selection signal”) indicates the availability of the “dSlots” that the logical slots can reclaim, because the request is made when  the dSlots are sufficient to service the request, otherwise the reclaim request would have been canceled.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein the priority selection signal is indicative of availability of a hardware circuit. Doing so would help allow using the improved efficiency and performance of a dedicated hardware circuit (Havlir discloses, “FIG. 13 is a block diagram illustrating a number of hardware slots of an mGPU. In some embodiments, one or more dSlots (shown in solid black in FIG. 13) are reserved for high priority logical slots…,” ¶ 0132.).

Regarding Claim 10, Hartog in view of Havlir teaches the system of claim 9, wherein the hardware circuit is a memory circuit or the shader circuitry (
Havlir discloses, “Sub-units 220, in some embodiments, are scaling units that may be replicated to increase the processing capabilities of a GPU. Each GPU sub-unit 220 may be capable of independently processing instructions of a graphics program. Sub-units 220, in the illustrated embodiment, include circuitry that implements respective distributed hardware slots 230. These hardware slots may also be referred to herein as ‘dSlots.’ Each sub-unit may include multiple hardware slots 230. Sub-units may also be referred to herein as “mGPUs.” In some embodiments, primary control circuitry 210 assigns work from a logical slot to at most one distributed hardware slot in each sub-unit 220. In some embodiments, each sub-unit includes fragment generator circuitry, shader core circuitry configured to execute shader programs, memory system circuitry (which may include one or more caches and a memory management unit), geometry processing circuitry, and distributed workload distribution circuitry (which may coordinate with primary control circuitry 210 to distribute work to shader pipelines).” ¶ 0048.
This means that the circuitry implementing the dSlots are part of each sub-unit, and each sub-unit also includes memory system circuitry and shader core circuitry. This means that the overall circuitry of the sub-units is also a memory circuit or a shader circuit.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein the hardware circuit is a memory circuit or the shader circuitry. Doing so would help allow using the improved efficiency and performance of a dedicated hardware memory circuit (Havlir discloses, “FIG. 13 is a block diagram illustrating a number of hardware slots of an mGPU. In some embodiments, one or more dSlots (shown in solid black in FIG. 13) are reserved for high priority logical slots…,” ¶ 0132.).

Regarding Claim 11, Hartog teaches a method, comprising:
receiving, at an arbitration circuit, (
Hartog discloses, “As discussed above, hardware scheduler HWS 128 is configured to select a scheduled process from RLC 150 for execution on the APD, For example, HWS 128 supports scheduling techniques applied to RLC 150, based upon priority level, or based on other arbitration scheduling criteria. Additionally, KMD 110, together with SWS 112, can perform scheduling of processes to be executed on the APD. The OS SWS 112, for example, can include logic to maintain a prioritized list of processes to be executed on APD 200 as a result of arbitration,” ¶ 0076.);
sorting priorities stored at the arbitration circuit based on the indication of the priority (
Hartog discloses, “The pipes can order the queues from zero to seven, and at reset the previous queue will be set to seven, resulting in Q0 .fwdarw. Q7 as the native ordering. If Q0, Q3, Q7 become ready with a queue priority 7 at a quantum enabled just after reset, the queues would process in the following order Q0, Q3, Q7, Q0 etc. If Q5 showed up with the same queue priority level (7), it would get executed after Q3 and before Q7 during the next cycle,” ¶ 0090.
Here, the queues are ordered by number, and then processed in an order based on queue priority level, which is essentially sorting the priorities such that the queues associated with the greater priorities are executed first.);
and in response to a request for an element, providing a next element indicated by the sorted priorities to the shader circuitry (

    PNG
    media_image3.png
    805
    429
    media_image3.png
    Greyscale

Hartog discloses, “At step 506, the queue arbiter at the top of the compute pipeline signals a respective CP ME 301 thread to stop on the next packet boundary when the arbiter determines a better queue is ready for processing. If it is determined that a better queue is not available, the processes continues at step 508,” ¶ 0084, and “As illustrated in FIG. 6, for each priority level, the compute pipeline maintains a last queue executed scoreboard. A return to that priority level will process the next ready queue. If only one queue is ready in a priority level, it will resume,” ¶ 0089.
The claimed “next element” is mapped to the element contained in the disclosed “next ready queue”.
Here, the next ready queue is identified for its element to be sent for execution, in response to a priority level, associated with the queue, being received. According to FIG. 5, specifically step 514, this is done in order of the sorted priorities from highest to lowest.).
Hartog does not teach receiving an indication of a priority of an element to be executed by shader circuitry.
However, Havlir teaches receiving an indication of a priority of an element to be executed by shader circuitry (
Havlir discloses, “each shader may include a queue for each distributed hardware slot and may select work from among the queues based on work priority,” ¶ 0049,
“The control circuitry may determine the first distribution rule based on one or more software overrides signaled by a graphics program being executed. These may include any appropriate combination of the following types of example software overrides: mask information that indicates which sub-units are available to the first set of work, a specified distribution rule, group information that indicates a group of sub-units on which the first set of work should be deployed, and policy information that indicates a scheduling policy,” ¶ 0195.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide receiving an indication of a priority of an element to be executed by shader circuitry. Doing so would help improve efficiency of the scheduling (Havlir discloses, “In various embodiments, the disclosed techniques may advantageously avoid non-deterministic flushing behavior, improve cache efficiency, or both,” ¶ 0141.).

Regarding Claim 12, Hartog in view of Havlir teaches the method of claim 11, wherein the priority indicates that the element has a first priority and wherein a second element currently stored at the dispatch queue has a second priority (
Havlir discloses, “Each distributed hardware slot may include various circuitry configured to process an assigned kick or portion thereof, including configuration registers, a work queue, circuitry configured to iterate through work in the queue (e.g., batches of compute workitems), circuitry to sequence context loads/stores, and work distribution tracking circuitry. Each sub-unit 220 may include multiple shaders that accept work from distributed slots in the sub-unit and use pipelines to execute the work,” ¶ 0049, and “The control circuitry may allow a logical slot with a first priority level to reclaim a hardware slot that is assigned to a logical slot with a second, lower priority level, based on one or more of the respective hold values,” ¶ 0195.
Here, a first element associated with a logical slot can have a different priority level from a second element associated with another logical slot. One of these elements can be currently stored at the work queue (dispatch queue) in the hardware slot, and if the other element has a higher priority level, it can reclaim that hardware slot (and thus the work queue) from the element currently stored at the work queue in the hardware slot.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein the priority indicates that the element has a first priority and wherein a second element currently stored at the dispatch queue has a second priority. Doing so would help ensure that work that has a higher priority is selected first in order to improve the efficiency of scheduling (Havlir discloses, “Each sub-unit 220 may include multiple shaders that accept work from distributed slots in the sub-unit and use pipelines to execute the work. For example, each shader may include a queue for each distributed hardware slot and may select work from among the queues based on work priority,” ¶ 0049.).

Regarding Claim 14, Hartog in view of Havlir teaches the method of claim 11, wherein the indication of the priority of the element is specified by a user (
Hartog discloses, “In an alternative embodiment, an arbitration event can be created for any write to queue priority register of the compute pipeline. This method can enable a user to control the amount of work issued prior to enabling other queues of the pipe to make progress. Additionally this alternative embodiment can enable a privileged queue per CP ME,” ¶ 0094.
Here, an element is written to a register in the queue. This is indicated by the user, who controls the amount of work issued to the queue. After the combination of Hartog with Havlir, the indication of the priority of the element (priority level) is also written to the queue.).

Regarding Claim 16, Hartog teaches a processing system, comprising:
a bus (
Hartog discloses, “In the example shown, communication infrastructure 109 interconnects the components of system 100 as needed. Communication infrastructure 109 can include (not shown)one or more of a peripheral component interconnect (PCI) bus, extended PCI (PCI-E) bus, advanced microcontroller bus architecture (AMBA) bus, accelerated graphics port (AGP), or such communication infrastructure,” ¶ 0055.
The claimed “bus” is mapped to the disclosed “one or more of a peripheral component interconnect (PCI) bus, extended PCI (PCI-E) bus, advanced microcontroller bus architecture (AMBA) bus” used in “communication infrastructure 109”.);
a first processing circuit configured to issue a plurality of commands via the bus; and a second processing circuit configured to receive the plurality of commands from the first processing circuit (

    PNG
    media_image5.png
    677
    459
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    610
    909
    media_image6.png
    Greyscale

Hartog discloses, “In the example shown, communication infrastructure 109 interconnects the components of system 100 as needed,” ¶ 0055, “CP 124 can be configured to process the command lists that are provided as inputs from command buffers 125, shown in FIG. 1A. In the exemplary operation of FIG. 1B, CP input 0 (124a) is responsible for driving commands into a graphics pipeline 162. CP inputs 1 and 2 (124b and 124c) forward commands to a compute pipeline 160.Also provided is a controller mechanism 166 for controlling operation of HWS 128,” ¶ 0065, “To efficiently process data from multiple compute inputs, arbitration occurs between pipeline queues within compute pipelines CS P0 - CS P7, as illustrated in greater detail in FIG. 3. More specifically, arbitration policies in accordance with embodiments of the present invention allocate APD resources among the multiple pipeline inputs. A shader input block (SPI) 202 provides an arbitration scheme for submitting wavefronts between compute pipelines CS P0 - CS P7 and graphics pipeline 204. Wave dispatchers 206 are connected from two compute pipelines alternate to forward the wavefronts to shader core 208. Shader core 208 executes the wavefronts,” ¶ 0074.
The claimed “first processing circuit” is mapped to the circuitry containing the disclosed “two compute pipelines”. As seen in FIG. 1A and FIG. 1B of Hartog, it is connected to the bus used in communication infrastructure 109 in order to send work to the shader core. 
The claimed “second processing circuit” is mapped to the circuitry containing the disclosed “shader core”.
The claimed “plurality of commands” is mapped to commands that include the disclosed “commands” that are forwarded to a compute pipeline.), and comprising:
a dispatch queue configured to store a plurality of elements from the plurality of commands (
Hartog discloses, “In one example, a subset of work-items in a workgroup that execute simultaneously together on a single SIMD engine can be referred to as a wavefront 136. The width of a wavefront is a characteristic of the hardware SIMD engine. As referred to herein, a workgroup is a collection of related work-items that execute on a single compute unit,” ¶ 0034, “CP 124 can be configured to process the command lists that are provided as inputs from command buffers 125, shown in FIG. 1A. In the exemplary operation of FIG. 1B, CP input 0 (124a) is responsible for driving commands into a graphics pipeline 162. CP inputs 1 and 2 (124b and 124c) forward commands to a compute pipeline 160.Also provided is a controller mechanism 166 for controlling operation of HWS 128,” ¶ 0065, and “To efficiently process data from multiple compute inputs, arbitration occurs between pipeline queues within compute pipelines CS P0 - CS P7, as illustrated in greater detail in FIG. 3. More specifically, arbitration policies in accordance with embodiments of the present invention allocate APD resources among the multiple pipeline inputs. A shader input block (SPI) 202 provides an arbitration scheme for submitting wavefronts between compute pipelines CS P0 - CS P7 and graphics pipeline 204. Wave dispatchers 206 are connected from two compute pipelines alternate to forward the wavefronts to shader core 208. Shader core 208 executes the wavefronts,” ¶ 0074.
The claimed “plurality of elements” is mapped to the disclosed “work-items” that comprise a wavefront. Said wavefronts are stored in the pipeline queue. This is illustrated by FIG. 2 of Hartog, which shows the compute pipelines CS P0 through CS P7 that store the wavefronts that consist of work-items before they are sent to the shader core 208. The pipeline queue stores the elements via the commands that are sent to it from CP inputs 1 and 2.
The claimed “dispatch queue” is mapped to the disclosed “pipeline queue” that stores wavefronts (elements) that will be executed by a “shader core”.);
shader circuitry configured to process the plurality of elements (
Hartog discloses, “A shader input block (SPI) 202 provides an arbitration scheme for submitting wavefronts between compute pipelines CS P0 - CS P7 and graphics pipeline 204. Wave dispatchers 206 are connected from two compute pipelines alternate to forward the wavefronts to shader core 208. Shader core 208 executes the wavefronts,” ¶ 0074.
The claimed “shader circuitry” is mapped to the disclosed “shader core”.);
and an arbitration circuit configured to store  (
Hartog discloses, “To efficiently process data from multiple compute inputs, arbitration occurs between pipeline queues within compute pipelines CS P0 - CS P7, as illustrated in greater detail in FIG. 3. More specifically, arbitration policies in accordance with embodiments of the present invention allocate APD resources among the multiple pipeline inputs. A shader input block (SPI) 202 provides an arbitration scheme for submitting wavefronts between compute pipelines CS P0 - CS P7 and graphics pipeline 204,” ¶ 0074.
The claimed “arbitration circuit” is mapped to the circuitry that contains the disclosed “pipeline queues” where arbitration occurs.).
Hartog does not teach an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the plurality of elements and to schedule the respective elements for processing by the shader circuitry based on the plurality of priority indicators.
However, Havlir teaches an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the plurality of elements and to schedule the respective elements for processing by the shader circuitry based on the plurality of priority indicators (

    PNG
    media_image2.png
    437
    519
    media_image2.png
    Greyscale

Havlir discloses, “Each sub-unit 220 may include multiple shaders that accept work from distributed slots in the sub-unit and use pipelines to execute the work. For example, each shader may include a queue for each distributed hardware slot and may select work from among the queues based on work priority,” ¶ 0049,
“The control circuitry may select the first and second distribution rules based on amounts of work in the first and second sets of graphics work. The control circuitry may determine the first distribution rule based on one or more software overrides signaled by a graphics program being executed… The control circuitry may allow a logical slot with a first priority level to reclaim a hardware slot that is assigned to a logical slot with a second, lower priority level, based on one or more of the respective hold values,” ¶ 0195.
Havlir also teaches an “arbitration circuit” in the form of disclosed “control circuitry” that manages logical slots with different priority levels. 
The claimed “priority indicators” is mapped to the disclosed “priority levels” associated with the logical slots. The priority levels can indicate which logical slots will be assigned to the hardware slots for work.
The disclosed “control circuitry” schedules the elements for execution based on the work priority indicated by the priority levels.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide an arbitration circuit configured to store a plurality of priority indicators corresponding to respective elements of the plurality of elements and to schedule the respective elements for processing by the shader circuitry based on the plurality of priority indicators. Doing so would help ensure that elements with a higher priority are selected first (Havlir discloses, “In some embodiments, low-priority logical slots are not allowed to reclaim hardware slots from high-priority logical slots unless there is no chance that a high-priority logical slot will use them,” ¶ 0134.).

Regarding Claim 17, Hartog in view of Havlir teaches the processing system of claim 16, wherein scheduling the elements of the dispatch queue for processing comprises, in response to receiving a request for an element, sending a previously identified element to the shader circuitry (

    PNG
    media_image3.png
    805
    429
    media_image3.png
    Greyscale

Hartog discloses, “In step 504, the queue with the highest queue priority that is determined to be ready for processing is selected. Once selected, for example, a queue remains selected until one of the following conditions occurs,” ¶ 0083, At step 506, the queue arbiter at the top of the compute pipeline signals a respective CP ME 301 thread to stop on the next packet boundary when the arbiter determines a better queue is ready for processing. If it is determined that a better queue is not available, the processes continues at step 508,” ¶ 0084, and “At step 510, CP ME 301 performs a context switching routine and signals the fetcher to stop fetching queue data and the DC to stop dispatching wavefronts for the current queue,” ¶ 0085.
The claimed “request for an element” is mapped to the disclosed “signal” made by the queue arbiter to a respective “CP ME 301 thread” to stop on the next packet boundary to request a better queue for processing if ready. This leads to the next ready queue being selected for processing.
Before step 510, the element contained in the current, highest priority queue, from step 504, is prefetched for transmission to the shader core.
The claimed “previously identified element” is mapped to the element contained in the disclosed “queue with the highest queue priority” from step 504. This is a previously identified element because this queue has its element prefetched before the next queue is selected after step 510.).

Regarding Claim 18, Hartog in view of Havlir teaches the method of claim 17, wherein the arbitration circuit is further configured to prefetch the previously identified element (
Hartog discloses, “In step 504, the queue with the highest queue priority that is determined to be ready for processing is selected. Once selected, for example, a queue remains selected until one of the following conditions occurs,” ¶ 0083, and “At step 510, CP ME 301 performs a context switching routine and signals the fetcher to stop fetching queue data and the DC to stop dispatching wavefronts for the current queue,” ¶ 0085.
Before step 510, the element contained in the current, highest priority queue, from step 504, is prefetched for transmission to the shader core.).

Regarding Claim 19, Hartog in view of Havlir teaches the method of claim 16, wherein scheduling the elements of the dispatch queue for processing comprises, in response to receiving a request for an element, identifying a next element to send to the shader circuitry (
Hartog discloses, “At step 506, the queue arbiter at the top of the compute pipeline signals a respective CP ME 301 thread to stop on the next packet boundary when the arbiter determines a better queue is ready for processing. If it is determined that a better queue is not available, the processes continues at step 508,” ¶ 0084, “As illustrated in FIG. 6, for each priority level, the compute pipeline maintains a last queue executed scoreboard. A return to that priority level will process the next ready queue. If only one queue is ready in a priority level, it will resume,” ¶ 0089.
Here, the next ready queue is identified for its element to be sent for execution, prior to the ready queue actually sending its element.).

Regarding Claim 20, Hartog in view of Havlir teaches the method of claim 16, wherein scheduling the elements of the dispatch queue for processing comprises schedule at least one element having a second priority in response to scheduling a threshold number of elements having a first priority, wherein the second priority is a lower priority than the first priority (
Havlir discloses, “As briefly discussed above, different logical slots may have different priority levels, e.g., as specified by software. In some embodiments, on a given mGPU, a subset of hardware slots are reserved for logical slots that meets a threshold priority (e.g., higher priority slots in a system with two priority levels),” ¶ 0131.
The claimed “threshold number” is mapped to the number of logical slots that have been reserved for a priority level, e.g., a higher priority level. This is a threshold number because if all of the logical slots with the higher priority level are being used and thus unavailable, a lower priority logical slot will have to be selected.
Here, for Havlir’s logical slots, at least two priority levels exist, where one priority level is higher than the other, lower priority. If all the higher-level priority slots are taken, then an element will be scheduled on a slot with lower priority.).
Hartog and Havlir are both considered to be analogous to the claimed invention because they are in the same field of computer task scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog to incorporate the teachings of Havlir and provide wherein scheduling the elements of the dispatch queue for processing comprises schedule at least one element having a second priority in response to scheduling a threshold number of elements having a first priority, wherein the second priority is a lower priority than the first priority. Doing so would help ensure that higher priority tasks will be selected first in order to improve the efficiency of scheduling (Havlir discloses, “In some embodiments, on a given mGPU, a subset of hardware slots are reserved for logical slots that meets a threshold priority (e.g., higher priority slots in a system with two priority levels),” ¶ 0131.).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Hartog (EP 2791795 B1) in view of Havlir (US 20230050061 A1) and Kovacevic (WO 2020261180 A1).
Regarding Claim 7, Hartog in view of Havlir teaches the system of claim 1. Hartog in view of Havlir does not teach wherein the plurality of priority indicators are identifiers of respective virtual machines that correspond to the elements.
However, Kovacevic teaches wherein the plurality of priority indicators are identifiers of respective virtual machines that correspond to the elements (
Kovacevic discloses, “A scheduler in the GPU schedules the guest VM to execute the virtual function at a scheduled time. In some embodiments, the guest VM is scheduled based on a priority associated with the guest VM and other priorities associated with other guest VMs that are ready to be scheduled,” Page 5.
After the combination of Hartog in view of Havlir, with Kovacevic, the priority levels from Hartog in view of Havlir, are used to represent the VM identifiers as specified by Kovacevic.).
Hartog in view of Havlir, and Kovacevic, are both considered to be analogous to the claimed invention because they are in the same field of computer scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog in view of Havlir to incorporate the teachings of Kovacevic and provide wherein the plurality of priority indicators are identifiers of respective virtual machines that correspond to the elements. Doing so would help improve the performance the elements for processing (Kovacevic discloses, “FIGs 1 -13 disclose embodiments of techniques that improve the execution speed of multimedia applications, while reducing power consumption of the processing system, by allowing multiple virtual machines to share the hardware functionality provided by fixed function hardware blocks in a GPU instead of forcing all but one process to use hardware acceleration provided by software executing on a CPU,” Page 4).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Hartog (EP 2791795 B1) in view of Havlir (US 20230050061 A1) and Gruner (US 20030009629 A1).
Regarding Claim 13, Hartog in view of Havlir teaches the method of claim 11. Hartog in view of Havlir does not teach wherein receiving the indication of the priority comprises extracting the priority from a received opcode corresponding to the element.
However, Gruner teaches wherein receiving the indication of the priority comprises extracting the priority from a received opcode corresponding to the element (
Gruner discloses, “A user can elect to have cache 80 assign priority based on a request's Opcode field or the age of the request,” ¶ 0121.
Here, a priority value is determined (or extracted) based on an opcode field. After the combination of Hartog in view of Havlir, with Gruner, said opcode field corresponds to the element from Hartog in view of Havlir.).
Hartog in view of Havlir, and Gruner are both considered to be analogous to the claimed invention because they are in the same field of computer scheduling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog in view of Havlir to incorporate the teachings of Gruner and provide wherein receiving the indication of the priority comprises extracting the priority from a received opcode corresponding to the element. Doing so would help allow a convenient means of accessing the priority in order to make scheduling determinations (Gruner discloses, “A user can elect to have cache 80 assign priority based on a request's Opcode field or the age of the request. The scheduler employs the above-described descriptors to make these priority determinations,” ¶ 0121.).


Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Hartog (EP 2791795 B1) in view of Havlir (US 20230050061 A1) and Kini (US 20140344821 A1).
Regarding Claim 15, Hartog in view of Havlir teaches the method of claim 11. Hartog in view of Havlir does not teach wherein the indication of the priority of the element is specified by a driver based on a status of a user-visible process that is to use data generated using the element.
However, Kini teaches wherein the indication of the priority of the element is specified by a driver based on a status of a user-visible process that is to use data generated using the element (
Kini discloses, “The software application 125 may generate requests (i.e., calls) for processing by the CUDA software stack 150 to produce a desired set of results,” ¶ 0029,
“More specifically, the CUDA driver 220 may submit one or more streams (not shown) to the parallel processing subsystem 112 for execution within the parallel processing subsystem 112. Each stream may include any number and combination of work components. In particular, a CUDA stream may include one or more CPU-launched kernels 235. In general, a kernel is a function that has a defined entrance and exit and, typically, performs a computation on each element of an input list. Each CPU-launched kernel 235 is invoked by code that is executed by the CPU, such as the software application 112 [Examiner’s Note: “software application 112” should be “software application 125”, as the number 112 is already used by the “parallel processing subsystem 112”.],” ¶ 0033,
 “The parallel processing subsystem 112 includes advanced prioritization functionality that enables prioritization of kernels and preemption of currently executing kernels. Thus, the parallel processing subsystem 112 may schedule kernels in priority-order. And the parallel processing subsystem 112 may preempt a lower-priority kernel executing on a parallel processing subsystem 112 resource in favor of one or more higher-priority kernels,” ¶ 0034,
“The CUDA driver 220 then splits the valid device priorities 222 between valid device priorities 222 for prioritizing streams and valid device priorities 222 for "child" GPU-launched kernels. For example, suppose that the max nesting depth 224 were N. The CUDA driver 220 would ensure that for each of the valid device priorities 222 allocated for prioritizing streams, the next (N-1) higher priority valid device priorities 222 were reserved for "child" GPU-launched kernels,” ¶ 0041,
“In particular, this flexible approach enables application developers to make trade-offs between prioritization and dynamic parallelism based on the needs of the software application 125. By assigning a low max nesting depth 224, the software application 125 may use more unique valid stream priorities 212 to strategically prioritize CPU-launched kernels 235,” ¶ 0043.
The claimed “driver” is mapped to the disclosed “CUDA driver”.
The claimed “user-visible process” is mapped to the disclosed “software application” that generates requests for processing by the CUDA software stack to produce a desired set of results.
Here, as specified by paragraph 41 of Kini, the indication of the priority of kernels is specified by the CUDA driver. Said indication is based on the status of a software application (user-visible process) that requests processing via a CPU-launched kernel, wherein said application requests a priority for the kernel.
Said software application uses the data generated using the CPU-launched kernel, in the form of the disclosed “desired set of results”.).
Hartog in view of Havlir, and Kini, are both considered to be analogous to the claimed invention because they are in the same field of computer resources. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hartog in view of Havlir to incorporate the teachings of Kini and provide wherein the indication of the priority of the element is specified by a driver based on a status of a user-visible process that is to use data generated using the element. Doing so would help allow for increased flexibility and efficiency in processing each element in a determined schedule based on priority (Kini discloses, “Advantageously, if a stream is associated with a particular valid device priority 222, then the CUDA driver 220 is configured to submit all CPU-launched kernels 235 within the stream at the particular valid device priority 222. Thus, the CUDA driver 220 may submit CPU-launched kernels 235 that initiate dynamic parallelism (i.e., are configured to launch "child" GPU-launched kernels) at valid device priorities 222 other than the lowest valid device priority 222. In contrast, in prior-art approaches to prioritizing streams, CPU-launched kernels that initiate dynamic parallelism are submitted at the lowest priority included in the valid device priorities, thereby limiting the effectiveness of assigning priorities to streams,” ¶ 0046.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Guthrie et al. (US 20080140936 A1): Method For Priority Scheduling And Priority Dispatching Of Store Conditional Operations In A Store Queue
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW SUN whose telephone number is (571)272-6735. The examiner can normally be reached Monday-Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANDREW NMN SUN/Examiner, Art Unit 2195

/Aimee Li/Supervisory Patent Examiner, Art Unit 2195
Read full office action
HARDWARE QUEUE PRIORITY MECHANISM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

HARDWARE QUEUE PRIORITY MECHANISM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email