Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This Office Action is in response to the amendment filed 03/09/2026.
Claims 1-22 are pending in this application.
Claims 1,9 and 16 are independent claims.
Claims 1 and 9-16 are currently amended.
This Office Action is made final.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 9-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
As to claim 9, it recites a “least one computer-readable medium having stored thereon”. Paragraph 471 does mention “The computer-readable medium may include, but is not limited to, magnetic disks, optical disks, read-only memory (ROM), random access memory (RAM), erasable programmable read-only memory (EPROM), electrically-erasable programmable read-only memory (EEPROM), magnetic or optical cards, flash memory, or other type of computer-readable medium suitable for storing electronic instructions. Moreover, embodiments may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer”. Since it does not exclude transitory “signal” storing computer-readable code within relatively short amount of time, the broadest reasonable interpretation in light of specification encompasses that the computer-readable medium is signal per se. Thus, the claim is not eligible subject matter.
Claims 10-15 are rejected for being dependent on claim 9 and for the same reasons mentioned above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3,6,7,9-11,14-18, 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Valerio (US 2021/0263785 A1) in view of Suzuki (US 2014/0082637 A1).
As per claim 1, Valerio teaches An apparatus comprising:
processing circuitry comprising graphics processing circuitry having a plurality of processing resources; (Valerio [0058] FIG. 2C illustrates a graphics processing unit (GPU) 239 that includes dedicated sets of graphics processing resources arranged into multi-core groups 240A-240N. While the details of only a single multi-core group 240A are provided, it will be appreciated that the other multi-core groups 240B-240N [plurality of processing resources] may be equipped with the same or similar sets of graphics processing resources.)
memory for storage of data including data for graphics processing; (Valerio [0233] System memory 1908 may be made available to other components within the computing device 1900. For example, any data (e.g., input graphics data) received from various interfaces to the computing device 1900 (e.g., keyboard and mouse, printer port, Local Area Network (LAN) port, modem port, etc.) or retrieved from an internal storage element of the computer device 1900 (e.g., hard disk drive) are often temporarily queued into system memory 1908 prior to being operated upon by the one or more processor(s) in the implementation of a software program.).
wherein the graphics processor is to:
receive a request for establishment of a local team barrier for a thread team, the thread team being allocated to a first processing resource, the thread team including a plurality of threads; (Valerio [0259] Some embodiments pertain to Example 1 that includes an apparatus to facilitate thread barrier synchronization, comprising a plurality of processing resources to execute a plurality of execution threads included in a thread workgroup [thread team] and barrier synchronization hardware to assign a first named barrier to a first set of the plurality of execution threads in the thread workgroup, assign a second named barrier to a second set of the plurality of execution threads in the thread workgroup, synchronize execution of the first set of execution threads via the first named barrier and synchronize execution of the second set of execution threads via the second named barrier. See also Fig 23 that shows that barriers being created open_named_barrier and the waiting process all the way to the barriers being closed (bottom of Fig 23))
Valerio discloses more than one thread barrier. For purposes of examination only one thread barrier is considered.
determine requirements and designated threads for the local team barrier; (Valerio [0249] To implement named barriers, barrier synchronization mechanism 2130 receives a global name and maps the global name to a name that is local ( or local name) [local barrier] to a sub-slice 2005 that is to be used as a named barrier. [0250] In a further embodiment, barrier synchronization mechanism 2130 causes each thread of a named barrier to open and acquire (or assigned) a handle, which enables gateway 2150 to register physical thread identifiers (or IDs) as part of the named barrier. In this embodiment, the first thread may set the global state of the barrier, while numPthreads and numCthreads set the production and consumption counters 2157 [requirements based on number of threads] within gateway 2150. [0255] FIG. 22 is a flow diagram illustrating one embodiment of a process for performing a barrier synchronization process. At processing block 2210, threads in a dispatched thread group are assigned to named barriers ( e.g., barrier A and barrier B). At processing block 2220, the threads in the sub-group are executed)
establish the local team barrier in a local register of the first processing resource based at least in part on the requirements and designated threads for the local barrier. (Valerio [0252] According to one embodiment, a gateway counter associated with the named barrier is incremented when a thread transmits a signal to a named barrier. Additionally, production/consumption thread counters are incremented according to a flag set in the API. In a further embodiment, a thread is either waiting for production to be completed or consumption to be completed whenever the thread invokes a wait for named barrier. In such an embodiment, an EU 2110 [processing resource] register tracks notifications of multiple named barriers and makes a thread ready once it a notification is received for this named barrier from gateway 2150. Once a named barrier is closed, gateway counters 2157 and EU 2110 notification registers for this named barrier are reset so that the next workgroup can use the barrier. [0253] EU 2110 hardware may map each named barrier to a bit location in this register).
Valerio does not teach wherein one or more threads of the thread team is executed in a time sliced manner using the first processing resource.
However, Suzuki teaches wherein one or more threads of the thread team is executed in a time sliced manner using the first processing resource (Suzuki [0067] The scheduler unit 59 controls the switching of the threads when one processor executes the plural threads, switching the threads by the time slice execution. The time slice execution can cause even one processor to appear to be plural pseudo terminals operating concurrently with each other. [0096] The receiving unit 58 checks whether any thread currently waiting for the barrier process is present. If the receiving unit 58 determines that no thread currently waiting for the barrier process is present (step S52: NO), the receiving unit 58 ends the series of process steps of the operation executed when the process result is received. [0097] On the other hand, if the receiving unit 58 determines that a thread currently waiting for the barrier process is present (step S52: YES), the receiving unit 58 acquires from the memory 54, the process information 61 corresponding to the ID (identification information) of the thread currently waiting for the barrier process (step S53) and acquires from the memory 54, the process information 61 of the subsequent process of the process indicated by the process information 61 acquired at step S53 (step S54). For example, in the example depicted in FIG. 2, if the thread currently executing the process 1-1 is waiting for the barrier process, the receiving unit 58 acquires from the memory 54, the process information 61 of the process 2-1 that is the subsequent process of the process 1-1.)
It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Suzuki with the system of Valerio to execute threads in a time-sliced manner. One having ordinary skill in the art would have been motivated to use Suzuki into the system of Valerio for the purpose of executing processes in a sequence (Suzuki paragraph 33).
As per claim 2, Valerio teaches wherein the local team barrier includes one or more threads designated as signalers to signal a barrier state and one or more threads designated as waiters to wait for a barrier state. (Valerio [0250] In this embodiment, the first thread may set the global state of the barrier, while numPthreads and numCthreads set the production and consumption counters 2157 within gateway 2150. Subsequently, producer and consumer threads use the same named barrier to signal to (or wait for) wait for each other. In one embodiment, a producer thread first signals the availability of a resource using the named barrier, while the consumer thread waits for the signal from producer).
As per claim 3, Valerio teaches wherein determining requirements for the local team barrier includes determining whether the local team barrier is a one-to-many, many-to-one, or many-to-many barrier. (Valerio Fig 23 open_named_barrier ( bufO_barr, numPThreads=16,numCThreads=16) [0250] In a further embodiment, barrier synchronization mechanism 2130 causes each thread of a named barrier to open and acquire (or assigned) a handle, which enables gateway 2150 to register physical thread identifiers (or IDs) as part of the named barrier. In this embodiment, the first thread may set the global state of the barrier, while numPthreads and numCthreads set the production and consumption counters 2157 within gateway 2150. Subsequently, producer and consumer threads use the same named barrier to signal to (or wait for) wait for each other. In one embodiment, a producer thread first signals the availability of a resource using the named barrier, while the consumer thread waits for the signal from producer.)
The examiner will take these scenarios and the associated figures in the specification (Figs 30-32) to be how many producer and consumer threads are specified at the time of the creation of the thread barrier. This is shown above in Valerio.
As per claim 6, Valerio teaches wherein establishing the local barrier includes establishing the local barrier in one of a plurality of slots for local team barriers in the local register of the first processing resource. (Valerio [0250] In a further embodiment, barrier synchronization mechanism 2130 causes each thread of a named barrier to open and acquire (or assigned) a handle, which enables gateway 2150 to register physical thread identifiers (or IDs) as part of the named barrier. In this embodiment, the first thread may set the global state of the barrier, while numPthreads and numCthreads set the production and consumption counters 2157 within gateway 2150. Subsequently, producer and consumer threads use the same named barrier to signal to (or wait for) wait for each other. In one embodiment, a producer thread first signals the availability of a resource using the named barrier, while the consumer thread waits for the signal from producer. [0252] According to one embodiment, a gateway counter associated with the named barrier is incremented when a thread transmits a signal to a named barrier. Additionally, production/consumption thread counters are incremented according to a flag set in the API. In a further embodiment, a thread is either waiting for production to be completed or consumption to be completed whenever the thread invokes a wait for named barrier. In such an embodiment, an EU 2110 register tracks notifications of multiple named barriers and makes a thread ready once it a notification is received for this named barrier from gateway 2150. Once a named barrier is closed, gateway counters 2157 and EU 2110 notification registers for this named barrier are reset so that the next workgroup can use the barrier).
As per claim 7, Valerio teaches wherein establishing the local team barrier includes setting one or more of a plurality of scoreboard bits to set barrier states and setting one or more of a plurality of mask bits to enable or disable the scoreboard bits. (Valerio [0253] According to one embodiment, EUs 2110 implement a wait mechanism to wait on a particular named barrier. As discussed above, an architectural register (e.g., 32 bits) is used to track notification of multiple named barriers per thread. EU 2110 hardware may map each named barrier to a bit location in this register. Accordingly, a bit mask is implemented to track which named barrier has arrived. In one embodiment, a wait(n0.1) state waits for gateway 2150 notification of a named barrier associated with bit position 1 of n0, while a wait(null) state waits for gateway 2150 notification of any barrier associated with this thread).
As to claims 9 and 16, they are rejected based on the same reason as claim 1.
As to claims 10 and 17, they are rejected based on the same reason as claim 2.
As to claims 11 and 18, they are rejected based on the same reason as claim 3.
As to claims 14 and 21, they are rejected based on the same reason as claim 6.
As to claims 15 and 22, they are rejected based on the same reason as claim 7.
With respect to claim 9 and (dependent claims 10-15) paragraph 241 of Valerio does mention computer-readable storage medium.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 4, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Valerio (US 2021/0263785 A1) in view of Suzuki (US 2014/0082637 A1) in further view of Deng (US 2021/0089580 A1).
As per claim 4, Valerio and Suzuki do not teach wherein determining requirements for the local team barrier includes determining which of the threads of the thread team is designated as a main thread for the local team barrier.
However, Deng teaches wherein determining requirements for the local team barrier includes determining which of the threads of the thread team is designated as a main thread for the local team barrier. (Deng [0039] Multi-threaded barriers are set on the main thread to wait for returning of search entities of each thread, and it is determined that the search for the current layer is completed after all threads return, and the main thread is allowed to pass the multi-threaded barriers).
It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Deng with the system of Valerio and Suzuki to designate a main thread. One having ordinary skill in the art would have been motivated to use Deng into the system of Valerio and Suzuki for the purpose of searching a graph in a multi-threaded asynchronous manner. (Deng paragraph 38).
The examiner believes this is consistent with what is disclosed in the specification ([0410] FIG. 31 illustrates a one-to-many team barrier 3100, which in this illustration may include a single signaler (Team Local ID 0 (designated as the main thread) in this example, but any of the local team threads may be the waiter).)
As to claims 12 and 19, they are rejected based on the same reason as claim 4.
Claims 5, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Valerio (US 2021/0263785 A1) in view of Suzuki (US 2014/0082637 A1) in further view of Kottapalli (US 2014/0337857 A1).
As per claim 5, Valerio and Suzuki do not teach wherein the graphics processor is further to engage the local team barrier in execution of an application, the local team barrier to provide synchronization of the threads of the thread team.
However, Kottapalli teaches wherein the graphics processor is further to engage the local team barrier in execution of an application, the local team barrier to provide synchronization of the threads of the thread team. (Kottapalli [Abstract] a first processor core; a second processor core coupled to the first processor core; a shared cache coupled to the first processor core and the second processor core to store a plurality of thread synchronization variables; and logic, responsive to a fetchset instruction comprising a single instruction of an instruction set architecture, to overwrite a thread synchronization variable stored in the shared cache while a cache line including the plurality of thread synchronization variables is prevented from being copied into either of the first and second processor cores when a corresponding one of a plurality of threads reaches a barrier, wherein each of the plurality of thread synchronization variables is to indicate a synchronization status for one of the plurality of threads at the barrier)
It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Kottapalli with the system of Valerio and Suzuki to provide synchronization of the threads of the thread team. One having ordinary skill in the art would have been motivated to use Kottapalli into the system of Valerio and Suzuki for the purpose of providing improved manners of synchronization between multiple threads (Kottapalli paragraph 08)
As to claims 13 and 20, they are rejected based on the same reason as claim 5.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Valerio (US 2021/0263785 A1) in view of Suzuki (US 2014/0082637 A1) in further view of Gupta (US 2015/0187042 A1).
As per claim 8, Valerio and Suzuki do not teach wherein the thread team is a sub-portion of a thread group, the thread group including a plurality of hardware threads to be executed by the plurality of processing resources.
However, Gupta teaches wherein the thread team is a sub-portion of a thread group, the thread group including a plurality of hardware threads to be executed by the plurality of processing resources. (Gupta Fig 1 and Fig 2 Block 204 (Receiving, at a plurality of processing elements, a plurality of threads that form one or more thread groups) [0026] At 200, the method receives, at a graphics processor, a workload from a host processor. The workload may be partitioned into a plurality of threads. At 202, receiving, at a plurality of processing elements, a plurality of thread that form one or more thread/work groups. The plurality of processing elements may be arranged into one or more local thread/work groups. The workload may be executed across the one or more local thread groups. At 206, synchronizing, at a global barrier in communication with the plurality of processing elements, the processing of the workload across the one or more thread groups. The one or more thread groups may form a global thread group. As discussed above, the global barrier allows the workload to be executed concurrently across the one or more thread groups thereby improving system efficiency.)
The examiner is interpreting this according to what is disclosed in the specification ([0378] In some embodiments, a thread team may be a sub-portion of a thread group 2850, with a thread group comprising a certain number of thread teams. For example, thread group 2850 may include thread team 2855 together with thread teams 2851, 2852, 2853, 2854, etc., where each thread team may be allocated to a particular processing resource 2830-2835).
It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Gupta with the system of Valerio and Suzuki to provide a thread team. One having ordinary skill in the art would have been motivated to use Gupta into the system of Valerio and Suzuki for the purpose of synchronizing all the thread groups to be over before the task which requires multiple thread group data to proceed. (Gupta paragraph 07)
Response to Arguments
Applicant's arguments filed on 03/09/2026 have been fully considered but they are not persuasive.
Applicant’s arguments with respect to claims 1, 9 and 16 have been considered but are moot because the arguments do not apply because of the introduction of new art by Suzuki.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEHRAN KAMRAN whose telephone number is (571)272-3401. The examiner can normally be reached on 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor April Blair, can be reached on (571)270-1014. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MEHRAN KAMRAN/Primary Examiner, Art Unit 2196