DETAILED ACTION
Claims 1-20 are presented for examination.
Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA 35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA 35 U.S.C. 103(c) and potential pre-AIA 35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA 35 U.S.C. 103(a).
Claims 1, 6-9, 14, 15, and 20 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Smith et al (U.S. Pat. 9535815 B2, hereinafter Smith) in view of Vaz (U.S. Pat. Pub. No. 2023/0185635 A1).
As per claim 1, Smith teaches the limitations substantially as claimed, including an apparatus, the apparatus comprising:
control circuitry configured to (the architecture and/or functionality of the various previous figures may be implemented in the context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application-specific system, and/or any other desired system in at least col. 15, lines 31-42):
receive different sets (The tasks are loaded into the graphics memory and an instruction is transmitted to the parallel processing unit to launch the execution of the tasks. In one embodiment, the thread blocks are assigned to an available core by a macro-scheduler (i.e., task management unit), and a micro-scheduler (i.e., scheduler unit) manages the execution of up to 16 thread blocks substantially simultaneously for a given partition of a core in at least col. 5, lines 42-61) of graphics work (A system, method, and computer program product are provided for collecting trace information based on a graphics processing unit workload in at least col. 2, lines 6-19) and schedule sets of graphics work for execution on distributed hardware resources (Each core may be partitioned, where each partition includes a micro-scheduler unit that manages the execution of a number of thread blocks or warps. A thread block or a warp is a plurality of related threads based on a single-instruction, multiple-thread (SIMT) architecture in at least col. 3 lines 38-59 and the PPU 200 implements a SIMD (Single-Instruction, Multiple-Data) architecture where each thread block (i.e., warp) in a grid is concurrently executed on a different data set by different threads in the thread block in at least col. 7 line 54 – col. 8 line 3), including a first set of work that depends on [other] work (the micro-scheduler unit also ensures that any dependencies are resolved for each thread in the thread block prior to dispatching the next instruction for that thread block. Thus, the micro-scheduler tracks a status for each of the thread blocks managed by the micro-scheduler unit in at least col 3 line 60 – col. 4 line 8 and An active grid is transferred to the pending pool when execution of the active grid is blocked by a dependency in at least col. 7 lines 40-53 and col. 12, lines 12-27);
in response to a release signal from the [other] work that indicates that the [other] work has reached a first processing point, initiate processing of the first set of work (Stall reasons may include, but are not limited to, that the thread block is waiting on a dependency barrier (e.g., waiting for a memory request to be fulfilled, waiting for synchronization between thread blocks, etc.) in at least col. 12, lines 12-27 and Pending grids are transferred to the active grid pool by the TMU 215 when a pending grid is eligible to execute, i.e., has no unresolved data dependencies in at least col. 7, lines 39-53, Examiner notes that thread blocks/grids wait at a dependency barrier to synchronize with the thread blocks from which they depend and pending grids are only executed when there are no unresolved data dependencies, thus, it must be signaled that the second work, on which first work depends, reached a first processing point if the first work is eligible to execute based on no unresolved data dependencies.);
stall processing of the first set of work in response to reaching a gate point in the first set of work (The thread state information may indicate a status of each thread block managed by the scheduler unit 310 including indicating whether each thread block is active (i.e., an instruction for the thread block is dispatched during the current clock cycle) or inactive and, if the thread block is inactive, a stall vector that encodes a reason why the thread block is inactive. Stall reasons may include, but are not limited to, that the thread block is waiting on a dependency barrier (e.g., waiting for a memory request to be fulfilled, waiting for synchronization between thread blocks, etc.), that the thread block was ready to issue but wasn't selected to issue that clock cycle, that the dispatch unit is stalled (i.e., there are no available resources to execute the particular instruction), that the instruction is waiting for a texture or memory value to be generated by the texture unit, and the like in at least col. 12, lines 12-27 and An active grid is transferred to the pending pool when execution of the active grid is blocked by a dependency in at least col. 7 lines 39-53); and
resume processing of the first set of work in response to an end signal for the [other] work
(Stall reasons may include, but are not limited to, that the thread block is waiting on a dependency barrier (e.g., waiting for a memory request to be fulfilled, waiting for synchronization between thread blocks, etc.), that the thread block was ready to issue but wasn’t selected to issue that clock cycle, that the dispatch unit is stalled (i.e., there are no available resources to execute the particular instruction), that the instruction is waiting for a texture or memory value to be generated by the texture unit, and the like in at least col. 12, lines 12-27 and Pending grids are transferred to the active grid pool by the TMU 215 when a pending grid is eligible to execute, i.e., has no unresolved data dependencies in at least col. 7 lines 39-53).
While Smith teaches stalling and resuming processing of graphics work, in general, Smith does not specifically teach a particular first work that depends on a particular second work and signaling therebetween.
However, in analogous art, Vaz teaches a first set of work that depends on a second set of work; in response to a release signal from the second set of work; in response to an end signal for second set of work (a timeline semaphore is used to coordinate or synchronize mixed workloads, where a first API can signal timeline semaphore when it is done processing a part of a first workload and a second API, which has been waiting on said timeline semaphore to be signaled so that it reaches or exceeds a threshold value to indicate it can start processing a second workload (where said first and second workloads are related to running said application). In at least one embodiment, a timeline semaphore enables computing resources such as GPU or CPU threads to be allocated at particular times to coordinate workload processing and/or to control resource access. In at least one embodiment, computing resources refer to hardware such as a CPU or GPU or software such as threads, streams, and queues running on hardware in at least ¶ [0050] and if an active task is idle on GPC 2418, such as while waiting for a data dependency to be resolved, then the active task is evicted from GPC 2418 and returned to a pending task pool while another task in the pending task pool is selected and scheduled for execution on GPC 2418 in at least ¶ [0248]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the particular first work that depends on a particular second work and signaling therebetween of Vaz with the systems and methods of Smith resulting in a system in which a particular set of depend jobs signal when each reaches points where the other must stall and resume. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of sharing resources efficiently and in an organized manner such that waste is reduced (Vaz ¶ [0002]).
As per claim 6, Smith teaches that the control circuitry supports multiple classes of gate points, including: a first gate point class that corresponds to a point at which one or more execution state load SIMD groups have been allocated resources but have not executed instructions; and a second gate point class that corresponds to a point at which one or more work SIMD groups have been allocated resources but have not executed instructions (Col. 12, Lines 12-27).
As per claim 7, Smith teaches a plurality of single-instruction multiple-data pipelines configured to execute instructions; and fixed-function circuitry configured to control the single-instruction multiple-data pipelines to perform operations for at least one of the following types of programs: graphics shader programs; and machine learning programs (Col. 2, Line 45 – Col. 3, Line 3).
As per claim 8, Vaz teaches that the apparatus is a computing device that further comprises: a display; a central processing unit; and a network interface (Figure 9).
As per claims 9 and 14, they are medium claims with no further limitations beyond those rejected above. Therefore, they are rejected for the same reasons.
As per claims 15 and 20, they are method claims with no further limitations beyond those rejected above. Therefore, they are rejected for the same reasons.
Claims 2, 3, 10, and 16-18 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Smith and Vaz, as applied to claim 1 above, and further in view of Nordquist (U.S. Pat. No. 7594095 B1).
As per claim 2, Vaz teaches a late release signal that indicates that all SIMD groups have completed for a given set of work (Paragraph [0084] teaches signaling on completion).
Smith and Vaz do not teach that the control circuitry is configured to receive multiple types of release signals, including an early release signal that indicates that all SIMD groups have been launched for a given set of work.
However, Nordquist teaches that the control circuitry is configured to receive multiple types of release signals, including an early release signal that indicates that all SIMD groups have been launched for a given set of work (Col. 2, Lines 18-36).
It would have been obvious to one of ordinary skill in the art at the time of the filing of the application to combine the teachings of Nordquist with those of Smith and Vaz in order to allow for Smith’s and Vaz’s apparatus to provide greater information sharing, which could increase the efficiency of the apparatus, thereby potentially increasing buy-in among prospective users.
As per claim 3, Nordquist teaches that the control circuitry is configured to allow a dependent set of work to initiate processing based on an early release signal from one or more first types of sets of work and based on a late release signal from one or more second types of sets of work (Col. 2, Lines 18-36).
As per claim 10, it is a medium claim with no further limitations beyond those rejected above. Therefore, it is rejected for the same reasons.
As per claims 16-18, they are method claims with no further limitations beyond those rejected above. Therefore, they are rejected for the same reasons.
Claims 4, 5, 11-13, and 19 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Smith and Vaz, as applied to claim 1 above, and further in view of Khullar (U.S. Pat. Pub. No. 2021/0311727 A1).
As per claim 4, Smith and Vaz do not teach that the control circuitry is configured to enforce indicated hard dependencies for which a parent set of work must complete before initiating processing for a child set of work and indicated soft dependencies for which processing may be initiated for a child set of work based on a release signal from a parent set of work, prior to completion of the parent set of work.
However, Khullar teaches that the control circuitry is configured to enforce indicated hard dependencies for which a parent set of work must complete before initiating processing for a child set of work and indicated soft dependencies for which processing may be initiated for a child set of work based on a release signal from a parent set of work, prior to completion of the parent set of work (Paragraph [0005]).
It would have been obvious to one of ordinary skill in the art at the time of the filing of the application to combine the teachings of Khullar with those of Smith and Vaz in order to allow for Smith’s and Vaz’s apparatus to provide greater flexibility when dealing with dependent work, which could increase the efficiency of the apparatus, thereby potentially increasing buy-in among prospective users.
As per claim 5, Khullar teaches that the control circuitry is configured to track both hard and soft dependencies using dependency matrix circuitry (Paragraph [0049]).
As per claims 11-13, they are medium claims with no further limitations beyond those rejected above. Therefore, they are rejected for the same reasons.
As per claim 19, it is a method claim with no further limitations beyond those rejected above. Therefore, it is rejected for the same reasons.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Gregory Kessler whose telephone number is (571)270-7762. The examiner can normally be reached M-Th 8:30 - 5, Alternate Fridays 8:30-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bradley Teets can be reached at (571)272-3338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GREGORY A KESSLER/Primary Examiner, Art Unit 2197