Prosecution Insights
Last updated: April 19, 2026
Application No. 17/957,907

Executing Kernel Workgroups Across Multiple Compute Unit Types

Non-Final OA §103
Filed
Sep 30, 2022
Examiner
WAI, ERIC CHARLES
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Advanced Micro Devices, Inc.
OA Round
3 (Non-Final)
82%
Grant Probability
Favorable
3-4
OA Rounds
3y 9m
To Grant
99%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
529 granted / 644 resolved
+27.1% vs TC avg
Strong +27% interview lift
Without
With
+27.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
27 currently pending
Career history
671
Total Applications
across all art units

Statute-Specific Performance

§101
15.7%
-24.3% vs TC avg
§103
50.0%
+10.0% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
14.4%
-25.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 644 resolved cases

Office Action

§103
DETAILED ACTION Claims 1-8, 10-14, 16-19, and 21-23 are presented for examination. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 03/05/2026 has been entered. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-3, 5-6, 8, 10-14, 16, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) further in view of Lee et al. (US PG Pub No. 2016/0321102 A1). Regarding claim 1, Hsu teaches a method comprising: maintaining, on a computing device, a first workgroup and at least one other workgroup ([0024], “employ a variable-group-size partitioning scheme for partitioning a task into a plurality of sub-tasks, where the task comprises a kernel and a plurality of data items to be processed by the kernel”; [0026-27]; [0036], wherein workgroups can be defined based on locality), each maintained workgroup targeted for execution on a first type of compute unit of the computing device([0030], “some kernels may prefer one of the computing devices 102 and 104 for execution, and some kernels may prefer the other of the computing devices 102 and 104 for execution” and wherein “the performance of executing a first kernel in the computing device (e.g., CPU) 102 may be better than that of executing the same first kernel in the computing device (e.g., GPU) 104, and the performance of executing a second kernel in the computing device (e.g., GPU) 104 may be better than that of executing the same second kernel in the computing device (e.g., CPU) 102”); executing the first workgroup on the second type of compute unit at least in part overlapping with an execution of the at least one other workgroup on the first type of compute unit ([0029]; wherein the task split into sub-tasks 304_1 and 304_2 include the same kernel; [0037], wherein the dynamic task scheduler 100 dispatches the sub-task 304_1 (which includes the kernel 312 and the first portion A0 of the data items 314) to the computing device 102 (e.g. CPU), and dispatches the sub-task 304_2 (which includes the kernel 312 and the second portion A1 of the data items 314) to the computing device 104 (e.g. GPU). Hsu does not teaches executing the first workgroup responsive to identifying that a second type of compute unit of the computing device is idle. Lee teaches using an idle checker for determining whether a second core is idle before migrating a task ([0007]; [0023]). It would have been obvious to one of ordinary skill before the effective filing date of the invention to determine where a core is idle in advance of migration. One would motivate by the desire to ensure that the second core is available to process the task. Regarding claim 2, Hsu does not teach wherein each maintained workgroup includes multiple threads of a kernel on the computing device. Hsu does teach multiple tasks/subtasks of the kernel ([0007-10]). It is old and well known that tasks can be implemented as a single or multiple threads. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to try implementing the workgroup using multiple threads of the kernel. A patent claim can be proved obvious merely by showing that the combination of elements was obvious to try. When there is a design need or market pressure to solve a problem and there are a finite number of identified, predictable solutions, a person of ordinary skill has good reason to pursue the known options within his or her technical grasp. If this leads to the anticipated success, it is likely the product is not of innovation but of ordinary skill and common sense. KSR v. Teleflex Regarding claim 3, Hsu teaches wherein the first type of compute unit is a graphics processing unit core and the second type of compute unit is a central processing unit core ([0023]) or the first type of compute unit is the central processing unit core and the second type of compute unit is the graphics processing unit core. Regarding claim 5, Lee teaches receiving a request from the second type of compute unit to execute a maintained workgroup, wherein the identifying that the second type of compute unit is idle is in response to the request; and receiving, from the second type of compute unite ([0007]; [0023]), an execution complete notification indicating completion of the executing of the first workgroup ([0084]). Regarding claims 8, 10-11, 13-14, 16 and 18, they are the system and device claims of claims 1-3 and 5 above. Therefore, they are rejected for the same reasons as claims 1-3 and 5 above. Claim(s) 6, 12, 19, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Yeh et al. (US PG Pub No. 2016/0183281 A1). Regarding claims 6 and 22, Hsu and Lee do not teach receiving a request from the second type of compute unit to execute a maintained workgroup; and communicating a rejection response to the second type of compute unit; wherein the rejection response is based at least in part on a rate at which the first type of compute unit is executing workgroups or a number of workgroups remaining for the first type of compute unit to execute. Yeh teaches stopping the offloading of tasks/users onto another resource when on-time throughput requirements of those tasks/users are being met ([0077]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to reject a request to offload tasks to the second type of compute unit based at least in part on a rate at which the first type of compute unit is executing workgroups. One would be motivated by the desire to stop the offloading process when it is not necessary such as when on-time throughput requirements are being met such as taught by Yeh ([0077]). Regarding claims 12 and 19, they are the system and device claims of claims 1 6 above. Therefore, they are rejected for the same reasons as claims 1 6 above. Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Subramaniam et al. (US Pat No. 11,100,028). Regarding claim 7, Hsu and Lee do not teach wherein the first type of compute unit and the second type of compute unit are included on a single accelerated processing unit that includes multiple chiplets in a single package, the first type of compute unit is included on a first chiplet of the multiple chiplets and the second type of compute unit is included on a second chiplet of the multiple chiplets, the first type of compute unit is a parallel accelerated processor core, and the second type of compute unit is a central processing unit core. Subramaniam teaches the use chiplets for carrying out such component functions as communication, memory, I/O, hardware acceleration wherein some chiplets can also serve as application-specific ICs (ASICs), processor cores, field programmable gate arrays (FPGAs), serializers/deserializers (SerDes), network flow processors (NFPs), reduced instruction set computers (RISCs) (col 3 lines 4-10). It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to combine the first type of compute unit and the second type of compute unit on a single accelerated processing unit that includes multiple chiplets in a single package. One would be motivated by the desire to aggregate various functional components—in the form of chiplets—in a heterogeneous manner as dictated by the larger IC chip design as taught by Subramaniam (col 3 lines 11-14). Claim(s) 4 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Visconti et al. (US PG Pub No. 2020/0209946 A1). Regarding claim 4, Hsu and Lee do not teach wherein identifying that the second type of compute unit is idle includes determining that the second type of compute unit has not executed at least one instruction during a threshold number of execution cycles or amount of time. Visconti teaches determining idleness using idleness criteria for by comparing utilization thresholds, defined per time division, of one or more of at least one processor ([0007]). It would have been obvious to one of ordinary skill before the effective filing date of the invention to determine illness based on compute units not executing at least one instruction during a threshold amount of time. One would motivated by the desire to use any number of common methods for determining idleness as taught by Visconti. Regarding claim 17, it is the device claim of claim 4 above. Therefore, it is rejected for the same reasons as claim 4 above. Claim(s) 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hsu (US PG Pub No. 2015/0363239 A1) in view of Lee et al. (US PG Pub No. 2016/0321102 A1), further in view of Hux et al. (US PG Pub No. 2014/0089905 A1). Regarding claim 23, Hsu and Lee do not teach the front end processing core is configured to retrieve, from a command packet in a queue, a first binary pointer corresponding to first code compiled for a first instruction set architecture of the first set of compute units and a second binary pointer corresponding to second code compiled for a second instruction set architecture of the second set of compute units; and the synchronization circuitry is configured to communicate the second binary pointer to the compute unit of the second set of compute units for execution of the first workgroup. Hux teaches a heterogeneous computing platform wherein each address space of the distinct GPU and CPU devices may utilize a different data structure and potentially reside upon distinct memories, within which the underlying executable instruction resides, but the provided architecture 103, through the device aware vtable handle 171 and subsequently referenced pointers, will yield the appropriate location within the appropriate data structure ([0057]). Hux further teaches that each of the multiple devices have specific binary executable instructions ([0069]). It would have been obvious to one of ordinary skill before the effective filing date of the invention to retrieve a first binary pointer corresponding to first code compiled for a first instruction set architecture of the first set of compute units and a second binary pointer corresponding to second code compiled for a second instruction set architecture of the second set of compute units; and the synchronization circuitry is configured to communicate the second binary pointer to the compute unit of the second set of compute units for execution of the first workgroup. One would be motivate by the desire to ensure that device specific code is properly executed. Response to Arguments Applicant's arguments filed 03/05/2026 have been fully considered but they are not persuasive. Regarding claim 1, Applicant argues the following on page 11 of Remarks: In Hsu, when the sub-tasks are partitioned and dispatched to different computing devices of "a heterogeneous computing system," the sub-tasks are not maintained as workgroups much less each targeted for one type of compute unit. Amended claim 1, by contrast, recites "maintaining, on a computing device, a first workgroup and at least one other workgroup, each maintained workgroup targeted for execution on a first type of compute unit." Hsu does not disclose or suggest this approach. Examiner disagrees. Hsu is directed toward executing a single task across a heterogenous processing system comprising a CPU and GPU. Hsu teaches partitioning a task into sub-tasks 304_1 and 304_2 (i.e. workgroups) wherein each sub-task comprises the same kernel ([0029]). Hsu is very clear that each kernel is targeted for execution on either a first or second type of computing device by performing better either on the first or second type of computing device ([0030]). Hsu is very clear that the sub-tasks 304_1 and 304_2, each comprising the same kernel, are then executed overlapped across computing devices 102 (e.g. CPU) and 104 (e.g. GPU) ([0037]). Allowable Subject Matter Claim 21 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC C WAI whose telephone number is (571)270-1012. The examiner can normally be reached Monday - Friday 9-5. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Eric C Wai/Primary Examiner, Art Unit 2195
Read full office action

Prosecution Timeline

Sep 30, 2022
Application Filed
Jun 10, 2025
Non-Final Rejection — §103
Aug 12, 2025
Interview Requested
Aug 19, 2025
Applicant Interview (Telephonic)
Aug 22, 2025
Examiner Interview Summary
Sep 07, 2025
Response Filed
Dec 12, 2025
Final Rejection — §103
Mar 04, 2026
Applicant Interview (Telephonic)
Mar 04, 2026
Examiner Interview Summary
Mar 05, 2026
Request for Continued Examination
Mar 13, 2026
Response after Non-Final Action
Mar 15, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602261
CONTAINER SCHEDULING ACCORDING TO PREEMPTING A SET OF PREEMPTABLE CONTAINERS DEPLOYED IN A CLUSTER
2y 5m to grant Granted Apr 14, 2026
Patent 12602248
METHOD AND DEVICE OF LAUNCHING AN APPLICATION IN BACKGROUND
2y 5m to grant Granted Apr 14, 2026
Patent 12585498
SYSTEM AND METHOD FOR RESOURCE MANAGEMENT IN DYNAMIC SYSTEMS
2y 5m to grant Granted Mar 24, 2026
Patent 12585503
UNIFIED RESOURCE MANAGEMENT ARCHITECTURE FOR WORKLOAD SCHEDULERS
2y 5m to grant Granted Mar 24, 2026
Patent 12579001
REINFORCEMENT LEARNING SPACE STATE PRUNING USING RESTRICTED BOLTZMANN MACHINES
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+27.2%)
3y 9m
Median Time to Grant
High
PTA Risk
Based on 644 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month