Last updated: May 29, 2026

Application No. 17/957,907

Executing Kernel Workgroups Across Multiple Compute Unit Types

Non-Final OA §103

Filed

Sep 30, 2022

Examiner

WAI, ERIC CHARLES

Art Unit

2195

Tech Center

2100 — Computer Architecture & Software

Assignee

Advanced Micro Devices, Inc.

OA Round

3 (Non-Final)

Interview Optional

— +27.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 82% grant rate with +27.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 645 resolved cases, 2023–2026

Examiner Intelligence

WAI, ERIC CHARLES View full profile →

Grants 82% — above average

Career Allowance Rate

530 granted / 645 resolved

+27.2% vs TC avg

Strong +27% interview lift

Without

With

+27.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 8m

Avg Prosecution

18 currently pending

Career history

671

Total Applications

across all art units

Statute-Specific Performance

§101

3.1%

-36.9% vs TC avg

§103

85.1%

+45.1% vs TC avg

§102

4.9%

-35.1% vs TC avg

§112

3.1%

-36.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 645 resolved cases

Office Action

§103

DETAILED ACTION
Claims 1-8, 10-14, 16-19, and 21-23 are presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/05/2026 has been entered.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 5-6, 8, 10-14, 16, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) further in view of Lee et al. (US PG Pub No. 2016/0321102 A1).

Regarding claim 1, Hsu teaches a method comprising:
maintaining, on a computing device, a first workgroup and at least one other workgroup ([0024], “employ a variable-group-size partitioning scheme for partitioning a task into a plurality of sub-tasks, where the task comprises a kernel and a plurality of data items to be processed by the kernel”; [0026-27]; [0036], wherein workgroups can be defined based on locality), each maintained workgroup targeted for execution on a first type of compute unit of the computing device([0030], “some kernels may prefer one of the computing devices 102 and 104 for execution, and some kernels may prefer the other of the computing devices 102 and 104 for execution” and wherein “the performance of executing a first kernel in the computing device (e.g., CPU) 102 may be better than that of executing the same first kernel in the computing device (e.g., GPU) 104, and the performance of executing a second kernel in the computing device (e.g., GPU) 104 may be better than that of executing the same second kernel in the computing device (e.g., CPU) 102”);
executing the first workgroup on the second type of compute unit at least in part overlapping with an execution of the at least one other workgroup on the first type of compute unit ([0029]; wherein the task split into sub-tasks 304_1 and 304_2 include the same kernel; [0037], wherein the dynamic task scheduler 100 dispatches the sub-task 304_1 (which includes the kernel 312 and the first portion A0 of the data items 314) to the computing device 102 (e.g. CPU), and dispatches the sub-task 304_2 (which includes the kernel 312 and the second portion A1 of the data items 314) to the computing device 104 (e.g. GPU).
Hsu does not teaches executing the first workgroup responsive to identifying that a second type of compute unit of the computing device is idle.  
Lee teaches using an idle checker for determining whether a second core is idle before migrating a task ([0007]; [0023]). It would have been obvious to one of ordinary skill before the effective filing date of the invention to determine where a core is idle in advance of migration. One would motivate by the desire to ensure that the second core is available to process the task. 

Regarding claim 2, Hsu does not teach wherein each maintained workgroup includes multiple threads of a kernel on the computing device.
Hsu does teach multiple tasks/subtasks of the kernel ([0007-10]). It is old and well known that tasks can be implemented as a single or multiple threads. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to try implementing the workgroup using multiple threads of the kernel. A patent claim can be proved obvious merely by showing that the combination of elements was obvious to try.  When there is a design need or market pressure to solve a problem and there are a finite number of identified, predictable solutions, a person of ordinary skill has good reason to pursue the known options within his or her technical grasp.  If this leads to the anticipated success, it is likely the product is not of innovation but of ordinary skill and common sense.  KSR v. Teleflex

Regarding claim 3, Hsu teaches wherein the first type of compute unit is a graphics processing unit core and the second type of compute unit is a central processing unit core ([0023]) or the first type of compute unit is the central processing unit core and the second type of compute unit is the graphics processing unit core.

Regarding claim 5, Lee teaches receiving a request from the second type of compute unit to execute a maintained workgroup, wherein the identifying that the second type of compute unit is idle is in response to the request; and receiving, from the second type of compute unite ([0007]; [0023]), an execution complete notification indicating completion of the executing of the first workgroup ([0084]).

Regarding claims 8, 10-11, 13-14, 16 and 18, they are the system and device claims of claims 1-3 and 5 above. Therefore, they are rejected for the same reasons as claims 1-3 and 5 above. 

Claim(s) 6, 12, 19, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Yeh et al. (US PG Pub No. 2016/0183281 A1).

Regarding claims 6 and 22, Hsu and Lee do not teach receiving a request from the second type of compute unit to execute a maintained workgroup; and communicating a rejection response to the second type of compute unit; wherein the rejection response is based at least in part on a rate at which the first type of compute unit is executing workgroups or a number of workgroups remaining for the first type of compute unit to execute.
Yeh teaches stopping the offloading of tasks/users onto another resource when on-time throughput requirements of those tasks/users are being met ([0077]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to reject a request to offload tasks to the second type of compute unit based at least in part on a rate at which the first type of compute unit is executing workgroups. One would be motivated by the desire to stop the offloading process when it is not necessary such as when on-time throughput requirements are being met such as taught by Yeh ([0077]).

Regarding claims 12 and 19, they are the system and device claims of claims 1 6 above. Therefore, they are rejected for the same reasons as claims 1 6 above.

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Subramaniam et al. (US Pat No. 11,100,028). 

Regarding claim 7, Hsu and Lee do not teach wherein the first type of compute unit and the second type of compute unit are included on a single accelerated processing unit that includes multiple chiplets in a single package, the first type of compute unit is included on a first chiplet of the multiple chiplets and the second type of compute unit is included on a second chiplet of the multiple chiplets, the first type of compute unit is a parallel accelerated processor core, and the second type of compute unit is a central processing unit core.

Subramaniam teaches the use chiplets for carrying out such component functions as communication, memory, I/O, hardware acceleration wherein some chiplets can also serve as application-specific ICs (ASICs), processor cores, field programmable gate arrays (FPGAs), serializers/deserializers (SerDes), network flow processors (NFPs), reduced instruction set computers (RISCs) (col 3 lines 4-10). It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to combine the first type of compute unit and the second type of compute unit on a single accelerated processing unit that includes multiple chiplets in a single package. One would be motivated by the desire to aggregate various functional components—in the form of chiplets—in a heterogeneous manner as dictated by the larger IC chip design as taught by Subramaniam (col 3 lines 11-14).

Claim(s) 4 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Visconti et al. (US PG Pub No. 2020/0209946 A1).

Regarding claim 4, Hsu and Lee do not teach wherein identifying that the second type of compute unit is idle includes determining that the second type of compute unit has not executed at least one instruction during a threshold number of execution cycles or amount of time.
Visconti teaches determining idleness using idleness criteria for by comparing utilization thresholds, defined per time division, of one or more of at least one processor ([0007]). It would have been obvious to one of ordinary skill before the effective filing date of the invention to determine illness based on compute units not executing at least one instruction during a threshold amount of time. One would motivated by the desire to use any number of common methods for determining idleness as taught by Visconti.

Regarding claim 17, it is the device claim of claim 4 above. Therefore, it is rejected for the same reasons as claim 4 above. 

Claim(s) 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Hux et al. (US PG Pub No. 2014/0089905 A1).

Regarding claim 23, Hsu and Lee do not teach the front end processing core is configured to retrieve, from a command packet in a queue, a first binary pointer corresponding to first code compiled for a first instruction set architecture of the first set of compute units and a second binary pointer corresponding to second code compiled for a second instruction set architecture of the second set of compute units; and the synchronization circuitry is configured to communicate the second binary pointer to the compute unit of the second set of compute units for execution of the first workgroup.
Hux teaches a heterogeneous computing platform wherein each address space of the distinct GPU and CPU devices may utilize a different data structure and potentially reside upon distinct memories, within which the underlying executable instruction resides, but the provided architecture 103, through the device aware vtable handle 171 and subsequently referenced pointers, will yield the appropriate location within the appropriate data structure ([0057]).  Hux further teaches that each of the multiple devices have specific binary executable instructions ([0069]). It would have been obvious to one of ordinary skill before the effective filing date of the invention to retrieve a first binary pointer corresponding to first code compiled for a first instruction set architecture of the first set of compute units and a second binary pointer corresponding to second code compiled for a second instruction set architecture of the second set of compute units; and the synchronization circuitry is configured to communicate the second binary pointer to the compute unit of the second set of compute units for execution of the first workgroup. One would be motivate by the desire to ensure that device specific code is properly executed.

Response to Arguments
Applicant's arguments filed 03/05/2026 have been fully considered but they are not persuasive. 
Regarding claim 1, Applicant argues the following on page 11 of Remarks: 
In Hsu, when the sub-tasks are partitioned and dispatched to different computing devices of "a heterogeneous computing system," the sub-tasks are not maintained as workgroups much less each targeted for one type of compute unit. Amended claim 1, by contrast, recites "maintaining, on a computing device, a first workgroup and at least one other workgroup, each maintained workgroup targeted for execution on a first type of compute unit." Hsu does not disclose or suggest this approach.
Examiner disagrees. Hsu is directed toward executing a single task across a heterogenous processing system comprising a CPU and GPU. Hsu teaches partitioning a task into sub-tasks 304_1 and 304_2 (i.e. workgroups) wherein each sub-task comprises the same kernel ([0029]). Hsu is very clear that each kernel is targeted for execution on either a first or second type of computing device by performing better either on the first or second type of computing device ([0030]). Hsu is very clear that the sub-tasks 304_1 and 304_2, each comprising the same kernel, are then executed overlapped across computing devices 102 (e.g. CPU) and 104 (e.g. GPU) ([0037]).

Allowable Subject Matter
Claim 21 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC C WAI whose telephone number is (571)270-1012. The examiner can normally be reached Monday - Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Eric C Wai/Primary Examiner, Art Unit 2195

Read full office action

Prosecution Timeline

Show 4 earlier events

Aug 22, 2025

Examiner Interview Summary

Sep 07, 2025

Response Filed

Dec 17, 2025

Final Rejection mailed — §103

Mar 04, 2026

Applicant Interview (Telephonic)

Mar 04, 2026

Examiner Interview Summary

Mar 05, 2026

Request for Continued Examination

Mar 13, 2026

Response after Non-Final Action

Mar 18, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/904,824

Patent 12639093

VIRTUALIZATION METHOD, DEVICE, BOARD CARD AND COMPUTER-READABLE STORAGE MEDIUM

3y 9m to grant Granted May 26, 2026

17/922,277

Patent 12632319

SYSTEMS, APPARATUS, AND METHODS TO CONFIGURE HARDWARE BASED ON APPLICATION RATIOS

3y 6m to grant Granted May 19, 2026

18/607,953

Patent 12632301

DATA PROCESSING ACROSS APPLICATIONS IN CLOUD ENVIRONMENTS

2y 2m to grant Granted May 19, 2026

18/153,460

Patent 12608229

CONTROL SYSTEM AND REQUEST PROCESSING METHOD IN CONTROL SYSTEM

3y 3m to grant Granted Apr 21, 2026

17/821,543

Patent 12602261

CONTAINER SCHEDULING ACCORDING TO PREEMPTING A SET OF PREEMPTABLE CONTAINERS DEPLOYED IN A CLUSTER

3y 7m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

82%

Grant Probability

99%

With Interview (+27.1%)

3y 8m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 645 resolved cases by this examiner. Grant probability derived from career allowance rate.