Last updated: April 19, 2026
Application No. 18/792,866
SCHEDULING OF THREADS FOR EXECUTION UTILIZING LOAD BALANCING OF THREAD GROUPS

Non-Final OA §103§DP
Filed
Aug 02, 2024
Examiner
HSU, JONI
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Intel Corporation
OA Round
1 (Non-Final)
Interview Optional

— +7.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 848 resolved cases, 2023–2026
Examiner Intelligence

HSU, JONI View full profile →
Grants 87% — above average
Career Allow Rate
741 granted / 848 resolved
+25.4% vs TC avg
Moderate +7% lift
Without
With
+7.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
34 currently pending
Career history
882
Total Applications
across all art units
Statute-Specific Performance

§101
8.4%
-31.6% vs TC avg
§103
59.7%
+19.7% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
3.1%
-36.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 848 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 24, 2024 was filed after the mailing date of the application on August 2, 2024.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21, 22, 24, 25, 27-29, 31, 32, 34-36, 38, and 40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 5-8, 10, 12, and 13 of U.S. Patent No. 12,086,602 in view of Rao (US 20150187040A1).
As per Claim 21, patent Claims 1, 2, and 7 essentially cover the limitations of Claim 21, as shown in the table below.
However, the patent claims do not expressly recite a cache memory to store data; receive an application kernel for the scheduling of thread groups; the barrier usage data is for the application kernel.  However, Rao teaches a cache memory to store data [0036]; receive an application kernel for the scheduling of thread groups; the barrier usage data is for the application kernel [0028, 0030] (Fig. 2) [0025, 0029].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the patent claims to include a cache memory to store data; receive an application kernel for the scheduling of thread groups; the barrier usage data is for the application kernel as suggested by Rao.  It is well-known in the art that a cache memory speeds up performance by avoiding slow data retrieval from primary sources.  It is well-known in the art that an application kernel acts as a crucial intermediary between user software and computer hardware, managing resources, and ensuring stability and security through privileged access and system calls.
As per Claims 22, 24, and 25, these claims are covered by the limitations of patent Claims 1, 5, and 6 respectively.  As per Claim 27, Claim 27 is covered by the limitations of patent Claim 3.  As per Claims 28, 29, 31, 32, and 34, these claims are similar in scope to Claims 21, 22, 24, 25, and 27 respectively, and therefore are rejected under the same rationale.  As per Claims 35, 36, 38, and 40, these claims are similar in scope to Claims 21, 22, 24, and 27 respectively, and therefore are rejected under the same rationale.
Claims 21, 23, 24, 28, 30, 31, 35, 37, and 38 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 and 5 of U.S. Patent No. 11,768,687 in view of Rao (US 20150187040A1).
As per Claim 21, patent Claim 1 essentially covers the limitations of Claim 21, as shown in the table below.
However, the patent claims do not expressly recite a cache memory to store data.  However, Rao teaches a cache memory to store data [0036].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the patent claims to include a cache memory to store data as suggested by Rao.  It is well-known in the art that a cache memory speeds up performance by avoiding slow data retrieval from primary sources.  
As per Claims 23-24, these claims are each covered by the limitations of patent Claim 1.  As per Claims 28, 30, and 31, these claims are similar in scope to Claims 21, 23, and 24 respectively, and therefore are rejected under the same rationale.  As per Claims 35, 37, and 38, these claims are similar in scope to Claims 21, 23, and 24 respectively, and therefore are rejected under the same rationale.
Claims 21, 23-25, 28, 30-32, 35, 37, and 38 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 3, 5, 8, 10, 11, and 13 of U.S. Patent No. 11,397,585 in view of Rao (US 20150187040A1).
As per Claim 21, patent Claim 1 essentially covers the limitations of Claim 21, as shown in the table below.
However, the patent claims do not expressly recite a cache memory to store data.  However, Rao teaches a cache memory to store data [0036].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the patent claims to include a cache memory to store data as suggested by Rao.  It is well-known in the art that a cache memory speeds up performance by avoiding slow data retrieval from primary sources.  
As per Claims 23-25, these claims are covered by the limitations of patent Claims 2, 3, and 5 respectively.  As per Claims 28 and 30-32, these claims are similar in scope to Claims 21 and 23-25 respectively, and therefore are rejected under the same rationale.  As per Claims 35, 37, and 38, these claims are similar in scope to Claims 21, 23, and 24 respectively, and therefore are rejected under the same rationale.
Claims 21, 22, 24, 25, 27-29, 31, 32, 34-36, 38, and 40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 7, and 8 of U.S. Patent No. 10,922,085 in view of Rao (US 20150187040A1).
As per Claim 21, patent Claim 1 essentially covers the limitations of Claim 21, as shown in the table below.
However, the patent claims do not expressly recite a cache memory to store data.  However, Rao teaches a cache memory to store data [0036].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the patent claims to include a cache memory to store data as suggested by Rao.  It is well-known in the art that a cache memory speeds up performance by avoiding slow data retrieval from primary sources.  
As per Claim 22, Claim 22 is covered by the limitations of patent Claim 2.  As per Claims 24-25, these claims are each covered by the limitations of patent Claim 1.  As per Claim 27, Claim 27 is covered by the limitations of patent Claim 2.  As per Claims 28, 29, 31, 32, and 34, these claims are similar in scope to Claims 21, 22, 24, 25, and 27 respectively, and therefore are rejected under the same rationale.  As per Claims 35, 36, 38, and 40, these claims are similar in scope to Claims 21, 22, 24, and 27 respectively, and therefore are rejected under the same rationale.
Claims 21, 23-25, 28, 30-32, 35, 37, and 38 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-4, 7, and 9-11 of U.S. Patent No. 10,599,438 in view of Rao (US 20150187040A1).
As per Claim 21, patent Claim 1 essentially covers the limitations of Claim 21, as shown in the table below.
However, the patent claims do not expressly recite a cache memory to store data.  However, Rao teaches a cache memory to store data [0036].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the patent claims to include a cache memory to store data as suggested by Rao.  It is well-known in the art that a cache memory speeds up performance by avoiding slow data retrieval from primary sources.  
As per Claims 23-25, these claims are covered by the limitations of patent Claims 2-4 respectively.  As per Claims 28 and 30-32, these claims are similar in scope to Claims 21 and 23-25 respectively, and therefore are rejected under the same rationale.  As per Claims 35, 37, and 38, these claims are similar in scope to Claims 21, 23, and 24 respectively, and therefore are rejected under the same rationale.
Claims 21, 23-25, 28, 30-32, 35, 37, and 38 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 5, 8, and 10-12 of U.S. Patent No. 10,310,861 in view of Rao (US 20150187040A1).
As per Claim 21, patent Claim 1 essentially covers the limitations of Claim 21, as shown in the table below.
However, the patent claims do not expressly recite a cache memory to store data.  However, Rao teaches a cache memory to store data [0036].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the patent claims to include a cache memory to store data as suggested by Rao.  It is well-known in the art that a cache memory speeds up performance by avoiding slow data retrieval from primary sources.  
As per Claims 23-25, these claims are covered by the limitations of patent Claims 2, 3, and 5 respectively.  As per Claims 28 and 30-32, these claims are similar in scope to Claims 21 and 23-25 respectively, and therefore are rejected under the same rationale.  As per Claims 35, 37, and 38, these claims are similar in scope to Claims 21, 23, and 24 respectively, and therefore are rejected under the same rationale.
18/792,866
Claim 21
22
23
24
25
27
12,086,602
Claims 1, 2, 7
1

5
6
3
11,768,687
Claim 1

1
1


11,397,585
Claim 1

2
3
5

10,922,085
Claim 1
2

1
1
2
10,599,438
Claim 1

2
3
4

10,310,861
Claim 1

2
3
5



18/792,866
28
29
30
31
32
34
12,086,602
1, 2, 7
8

12
13
10
11,768,687
5

5
5


11,397,585
8

10
11
13

10,922,085
7
8

7
7
8
10,599,438
7

9
10
11

10,310,861
8

10
11
12



18/792,866
35
36
37
38
40
12,086,602
1, 2, 7
1

5
3
11,768,687
1

1
1

11,397,585
1

2
3

10,922,085
1
2

1
2
10,599,438
1

2
3

10,310,861
1

2
3



18/792,866 (Claim 21)
12,086,602 (Claims 1, 2, and 7)
A graphics processor comprising:
A processor comprising: (Claim 1)
The processor is a graphics processor (Claim 7)
a plurality of graphics cores;
a plurality of multiprocessors, each multiprocessor including a plurality of cores (Claim 1)
a cache memory to store data; and
Taught by Rao
scheduling circuitry to schedule thread groups for execution; wherein the scheduling circuitry is to:
scheduling hardware to schedule the multiple threads…schedule the threads of the plurality of thread groups (Claim 1)
receive an application kernel for scheduling of thread groups;
Taught by Rao
identify barrier usage data for the application kernel, the barrier usage data representing a magnitude of barrier messages in the application kernel; and
determine the magnitude of barrier messages used in each of the plurality of thread groups (Claim 2)

the application kernel taught by Rao
schedule a plurality of thread groups for execution by one or more processors based at least in part on the barrier usage data.
determine barrier weight values based on the determined magnitude of barriers used…schedule the threads of the plurality of thread groups to the plurality of multiprocessors based at least in part on the determined barrier weight values. (Claim 1)


18/792,866 (Claim 21)
11,768,687 (Claim 1)
A graphics processor comprising:
a graphics processor, including
a plurality of graphics cores;
a plurality of multiprocessors;
a cache memory to store data; and
Taught by Rao
scheduling circuitry to schedule thread groups for execution; wherein the scheduling circuitry is to:
a scheduler to schedule the plurality of threads…the scheduler is to prioritize scheduling threads of a thread group
receive an application kernel for scheduling of thread groups;
scheduling threads of a thread group based upon usage of barriers by the application kernel
identify barrier usage data for the application kernel, the barrier usage data representing a magnitude of barrier messages in the application kernel; and
determine a magnitude of barrier messages in the application kernel
schedule a plurality of thread groups for execution by one or more processors based at least in part on the barrier usage data.
scheduling threads of a thread group based upon usage of barriers…including, upon a determination that the identified barrier usage data indicates a low usage of barriers…prioritizing the scheduling of threads utilizing the load balancing of the plurality of threads across the plurality of multiprocessors.


18/792,866 (Claim 21)
11,397,585 (Claim 1)
A graphics processor comprising:
graphics processor includes:
a plurality of graphics cores;
a plurality of workgroups processors
a cache memory to store data; and
Taught by Rao
scheduling circuitry to schedule thread groups for execution; 
each wavefront including a plurality of threads, and a schedule to schedule a plurality of wavefronts for execution
wherein the scheduling circuitry is to:
wherein the scheduler is to
receive an application kernel for scheduling of thread groups; identify barrier usage data for the application kernel, the barrier usage data representing a magnitude of barrier messages in the application kernel; and
analyze an application kernel to determine a magnitude of barrier messages in the application kernel and generate barrier usage data having a value corresponding to the magnitude of barrier messages in the application kernel; 
schedule a plurality of thread groups for execution by one or more processors based at least in part on the barrier usage data.
schedule a plurality of wavefronts for execution by the plurality of workgroup processors according to a scheduling policy, the scheduling policy being based at least in part on the barrier usage data


18/792,866 (Claim 21)
10,922,085 (Claim 1)
A graphics processor comprising:
graphics processor includes:
a plurality of graphics cores;
a plurality of streaming multiprocessors, each streaming multiprocessor including a plurality of cores
a cache memory to store data; and
Taught by Rao
scheduling circuitry to schedule thread groups for execution; 
a scheduler to schedule a plurality of thread groups for execution
wherein the scheduling circuitry is to:
wherein the scheduler is to
receive an application kernel for scheduling of thread groups; identify barrier usage data for the application kernel, the barrier usage data representing a magnitude of barrier messages in the application kernel; and
analyze an application kernel to determine a magnitude of barrier messages in the application kernel and generate barrier usage data having a value corresponding to the magnitude of barrier messages in the application kernel;
schedule a plurality of thread groups for execution by one or more processors based at least in part on the barrier usage data.
schedule a plurality of thread groups for execution by the plurality of streaming multiprocessors according to a scheduling policy, the scheduling policy being based at least in part on load balancing of thread groups…wherein the scheduler is to perform load balancing by performing one or more cost functions based at least in part on…the barrier usage data


18/792,866 (Claim 21)
10,599,438 (Claim 1)
A graphics processor comprising:
graphics processor includes:
a plurality of graphics cores;
a plurality of streaming multiprocessors, each streaming multiprocessor including a plurality of cores
a cache memory to store data; and
Taught by Rao
scheduling circuitry to schedule thread groups for execution; 
a scheduler to schedule a plurality of thread groups for execution
wherein the scheduling circuitry is to:
wherein the scheduler is to
receive an application kernel for scheduling of thread groups; identify barrier usage data for the application kernel, the barrier usage data representing a magnitude of barrier messages in the application kernel; and
analyze an application kernel to determine a magnitude of barrier messages in the application kernel and generate barrier usage data having a value corresponding to the magnitude of barrier messages in the application kernel;
schedule a plurality of thread groups for execution by one or more processors based at least in part on the barrier usage data.
schedule a plurality of thread groups for execution by the plurality of streaming multiprocessors according to a scheduling policy, the scheduling policy being based at least in part on the barrier usage data


18/792,866 (Claim 21)
10,310,861 (Claim 1)
A graphics processor comprising:
a graphics processor, including:
a plurality of graphics cores;
a plurality of multiprocessors;
a cache memory to store data; and
Taught by Rao
scheduling circuitry to schedule thread groups for execution; wherein the scheduling circuitry is to:
scheduler is to prioritize scheduling threads of a thread group
receive an application kernel for scheduling of thread groups; identify barrier usage data for the application kernel, the barrier usage data representing a magnitude of barrier messages in the application kernel; and
analyze an application kernel to determine a magnitude of barrier messages in the application kernel and generate barrier usage data having a value corresponding to the magnitude of barrier messages in the application kernel;
schedule a plurality of thread groups for execution by one or more processors based at least in part on the barrier usage data.
prioritize scheduling threads of a thread group to a same microprocessor of the plurality of multiprocessors upon a determination that the barrier usage data indicates a high magnitude of barrier messages in the thread group.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 21, 28, and 35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao (US 20150187040A1) in view of Mei (US009652284B2).
As per Claim 21, Rao teaches a graphics processor (108) comprising:  a plurality of graphics cores (110) (GPU 108 includes a plurality of execution units 110, [0020]); a cache memory to store data (DRAM cache hits, [0036]); and scheduling circuitry to schedule thread groups for execution; wherein the scheduling circuitry (select scheduling attributes by selecting aspects of thread scheduling and dispatch, [0012]) is to:  receive an application kernel for scheduling of thread groups; identify barrier usage data for the application kernel, the barrier usage data representing a magnitude of barriers in the application kernel; and schedule a plurality of thread groups for execution by one or more processors based at least in part on the barrier usage data (at block 208, it is determined if a barrier is used in the kernel, if a barrier is used in the kernel, process flow continues to block 210, at block 210, the number of work groups available based on the number of barriers within the computing system WGSBmin is computed as: WGSBmin = (Threads_per_slice * SIMD_size)/Barriers_per_slice where Barriers_per_slices is the number of barriers, [0028, 0030], Fig. 2, any of the functionalities of the CPU 102 may be implemented in a processor, the functionality may be implemented in logic implemented in a specialized graphics processing unit, or in any other device, [0025], local work group may be processed using several execution units, and a barrier may be used to synchronize processing local work group across multiple execution units, int eh event there are not enough barriers to synchronize the processing of each local work group within the global work group, the scheduler will stall dispatching new threads until there is enough available resource, as a result, a local work group size is selected so that minimum number of barriers are used, [0029]).
	However, Rao does not teach the barrier usage data representing a magnitude of barrier messages.  However, Mei teaches the barrier usage data representing a magnitude of barrier messages (compiler 18 may insert calls to the divergence barrier instruction into the code of kernel 20, compiler 18 may analyze kernel 20 and determine at least one of a location the program where divergence is likely to occur, and a location that would significantly impact performance, and may insert divergence barrier instructions at least one of those locations, compiler 18 may insert divergence barrier instructions into the instructions of kernel 20 at run-time at least one of a location where thread divergence is likely to occur, and a location that would significantly impact performance, col. 9, line 60-col. 10, line 5).  Since Rao teaches the barrier usage data representing a magnitude of barriers [0028, 0030] (Fig. 2), this teaching of barrier messages from Mei can be implemented into the device of Rao so that the barrier usage data representing a magnitude of barrier messages.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rao to include the barrier usage data representing a magnitude of barrier messages because Mei suggests that barrier messages that are barrier instructions are needed so that the processor can execute the barrier instructions (col. 9, line 60-col. 10, line 5).
27.	As per Claim 28, Claim 28 is similar in scope to Claim 21, and therefore is rejected under the same rationale.
28.	As per Claim 35, Claim 35 is similar in scope to Claim 21, except that Claim 35 is directed to an apparatus comprising: a memory for storage of data for processing; and a plurality of processors including the graphics processor of Claim 21.  Rao teaches an apparatus comprising: a memory for storage of data for processing (process the local dataset within the local memory, [0036]); and a plurality of processors including the graphics processor (computing device 100 may include a central processing unit (CPU) 102, [0019], computing device 100 may also include a graphics processing unit (GPU) 108, [0020]).  Thus, Claim 35 is rejected under the same rationale as Claim 21 along with these additional teachings from Rao.
29.	Claim(s) 22, 29, and 36 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao (US 20150187040A1) and Mei (US009652284B2) in view of Kothari (US006243863B1).
30.	As per Claim 22, Rao and Mei are relied upon for the teachings as discussed above relative to Claim 21.  Rao teaches wherein scheduling of the plurality of thread groups is based on use of barriers in a thread group, as discussed in the rejection for Claim 21.
	However, Rao and Mei do not teach wherein scheduling of the plurality of thread groups is further based on a thread policy, the thread policy providing weight values based on use of barriers in a thread group.  However, Kothari teaches that in order to minimize the parallel execution time, load balance and communication are the main issues.  Communication cost has three components: contention, latency and volume. Latency and volume components are reduced by collapsing the sync/exchange points.  Typically, the latency is high because of the high message startup cost.  Therefore, there is an advantage in collapsing the number of sync/exchange points and aggregating smaller messages into a single large message code may have hundreds of sync/exchange points.  In practice, these can (and, for efficiency, must) be collapsed into a small number of sync/exchange points (col. 20, lines 1-14).  Thus, the latency is high when there are a large number of sync/exchange points.  Thus, when there are a large number of sync/exchange points, then they must be collapsed into a small number of sync/exchange points.  Thus, the latency is low when there are a small number of sync/exchange points.  Thus, when there are a small number of sync/exchange points, then there is no need to collapse the sync/exchange points since the latency is low.  Thus, it would have been obvious to one of ordinary skill in the art that a weight value is assigned based on the magnitude of sync/exchange points so that a high weight value is assigned when there are a large number of sync/exchange points and a small weight value is assigned when there are a small number of sync/exchange points so that when there is a high weight value, the processor knows to collapse the synch/exchange points into a small number of sync/exchange points and when there is a small weight value, the processor knows that the sync/exchange points do not need to be collapsed.  Thus, teaches providing weight values based on use of barriers.  Since Rao teaches wherein scheduling of the plurality of thread groups is based on use of barriers in a thread group, as discussed in the rejection for Claim 21, this teaching from Kothari can be implemented into the device of Rao so that scheduling of the plurality of thread groups is further based on a thread policy, the thread policy providing weight values based on use of barriers in a thread group.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rao and Mei so scheduling of plurality of thread groups is further based on a thread policy, the thread policy providing weight values based on use of barriers in a thread group because Kothari suggests that this reduces latency (col. 20, lines 1-14).
31.	As per Claims 29 and 36, these claims are each similar in scope to Claim 22, and therefore are rejected under the same rationale.
32.	Claim(s) 23, 30, and 37 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao (US 20150187040A1) and Mei (US009652284B2) in view of Sheaffer (US 20100332716A1).
33.	As per Claim 23, Rao and Mei are relied upon for the teachings as discussed above relative to Claim 21.
	However, Rao and Mei do not teach wherein identifying the barrier usage data includes the scheduling circuitry to parse thread metadata for the application kernel to obtain the barrier usage data.  However, Sheaffer teaches wherein identifying the barrier usage data includes the scheduling circuitry to parse thread metadata for the application kernel to obtain the barrier usage data (different threads may simultaneously access (store and/or load) different metadata for the same corresponding data and at the same physical address as the data without one thread observing the other thread’s metadata, [0018], a single metadata load instruction using the same address as datum for which a filter property bit is being checked may be used by software to determine if a memory barrier sequence should be executed, [0055]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rao and Mei so that identifying the barrier usage data includes the scheduling circuitry to parse thread metadata for the application kernel to obtain the barrier usage data because Sheaffer suggests this way, significant overall speedup in software transactional memory implementation is possible [0055].
34.	As per Claims 30 and 37, these claims are each similar in scope to Claim 23, and therefore are rejected under the same rationale.
35.	Claim(s) 24, 25, 31, 32, and 38 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao (US 20150187040A1) and Mei (US009652284B2) in view of Phull (US 20120233486A1).
36.	As per Claim 24, Rao and Mei are relied upon for the teachings as discussed above relative to Claim 21.
	However, Rao and Mei do not teach wherein scheduling the plurality of thread groups is further based on load balancing of the thread groups across the one or more processors.  However, Phull teaches wherein scheduling the plurality of thread groups is further based on load balancing of the thread groups across the one or more processors (load balancing, [0021], the use of the phase-specific dependencies permits the determination of a minimal amount of data to be reassigned from one processor to another to achieve a substantial balancing effect per data unit transferred, [0023]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rao and Mei so that scheduling the plurality of thread groups is further based on load balancing of the thread groups across the one or more processors because Phull suggests that this improves the efficiency of parallelized processing on the processors [0021].
37.	As per Claim 25, Rao and Mei do not teach wherein the load balancing includes performing on one or more cost functions, the one or more cost functions being based at least in part on the barrier usage data.  However, Phull teaches wherein the load balancing [0021] includes performing on one or more cost functions [0023], the one or more cost functions being based at least in part on the barrier usage data (Scotch partitioning is employed, as it attempts to minimize the number of processor boundaries (i.e. sync points between processors), [0039]).  This would be obvious for the reasons given in the rejection for Claim 24.
38.	As per Claims 31-32, these claims are similar in scope to Claims 24-25 respectively, and therefore are rejected under the same rationale.  As per Claim 38, Claim 38 is similar in scope to Claim 24, and therefore is rejected under the same rationale.
39.	Claim(s) 27, 34, and 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao (US 20150187040A1) and Mei (US009652284B2) in view of Unno (US007971029B2).
40.	As per Claim 27, Rao and Mei are relied upon for the teachings as discussed above relative to Claim 21.
	However, Rao and Mei do not teach wherein scheduling the plurality of thread groups for execution includes selecting a number of processors for scheduling of threads in a thread group based on the magnitude indicated by the barrier usage data.  However, Unno teaches wherein scheduling the plurality of thread groups for execution includes selecting a number of processors for scheduling of threads in a thread group based on the magnitude indicated by the barrier usage data (N, which is a total number of the barrier blades satisfies: N≥2M·X where M represents a total number of the processor cores, and X represents a total number of logic processors configured by a processor core, col. 9, lines 49-54).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rao and Mei so that scheduling the plurality of thread groups for execution includes selecting a number of processors for scheduling of threads in a thread group based on the magnitude indicated by the barrier usage data because Unno suggests that this way, the number of synchronization processes to be executed by processor cores for realizing barrier synchronization can be reduced so that barrier synchronization can be accelerated (col. 3, line 67-col. 4, line 3).
41.	As per Claims 34 and 40, these claims are each similar in scope to Claim 27, and therefore are rejected under the same rationale.
Allowable Subject Matter
42.	Claims 26, 33, and 39 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and terminal disclaimers are filed to overcome the double patenting rejections discussed above.
43.	Following is a statement of reasons for the indication of allowable subject matter:  Prior art taken singly or in combination do not teach or suggest the combination of all the limitations of Claim 26 and base Claim 21, and in particular, do not teach wherein the scheduling circuitry is to perform the scheduling of the plurality of thread groups without regard to barrier usage upon determining that barrier usage for the application kernel is below a certain level.  Claims 33 and 39 are each similar in scope to Claim 26, and thus also contain allowable subject matter.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONI HSU whose telephone number is (571)272-7785. The examiner can normally be reached M-F 10am-6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JH
/JONI HSU/Primary Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Aug 02, 2024
Application Filed
Jan 22, 2026
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/257,410
Patent 12592028
METHODS AND DEVICES FOR IMMERSING A USER IN AN IMMERSIVE SCENE AND FOR PROCESSING 3D OBJECTS
2y 5m to grant Granted Mar 31, 2026
18/337,537
Patent 12586306
METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR MODELING OBJECT
2y 5m to grant Granted Mar 24, 2026
18/432,989
Patent 12586260
CREATING IMAGE ENHANCEMENT TRAINING DATA PAIRS
2y 5m to grant Granted Mar 24, 2026
18/027,304
Patent 12581168
A METHOD FOR A MEDIA FILE GENERATING AND A METHOD FOR A MEDIA FILE PROCESSING
2y 5m to grant Granted Mar 17, 2026
18/449,286
Patent 12561850
IMAGE GENERATION WITH LEGIBLE SCENE TEXT
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
87%
Grant Probability
95%
With Interview (+7.2%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 848 resolved cases by this examiner. Grant probability derived from career allow rate.