Last updated: April 19, 2026
Application No. 18/195,230
DYNAMIC LOAD BALANCING OF COMPUTE ASSETS AMONG DIFFERENT COMPUTE CONTEXTS

Final Rejection §103§DP
Filed
May 09, 2023
Examiner
CHU JOY, JORGE A
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Intel Corporation
OA Round
2 (Final)
Interview Optional

— +37.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 408 resolved cases, 2023–2026
Examiner Intelligence

CHU JOY, JORGE A View full profile →
Grants 77% — above average
Career Allow Rate
314 granted / 408 resolved
+22.0% vs TC avg
Strong +37% interview lift
Without
With
+37.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
41 currently pending
Career history
449
Total Applications
across all art units
Statute-Specific Performance

§101
11.0%
-29.0% vs TC avg
§103
55.3%
+15.3% vs TC avg
§102
3.2%
-36.8% vs TC avg
§112
19.6%
-20.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 408 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Claims 1-20 are cancelled.
Claims 21, 22, and 24-41 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claim 44 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. US 11,726,826 B2 in view of Cowperthwaite et al. (US 2016/0239333 A1).
The differences are shown in bold below.
Instant Application
US 11,726,826 B2
21. An apparatus comprising: 
a graphics processing unit (GPU) including a plurality of processing resources and circuitry to dispatch at least one command for execution by the GPU, wherein the circuitry is to: 

based on a configuration, permit execution of a first command of the at least one command by one or more particular processing resources of the GPU based on a source of the first command, wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources.
41. (New) The apparatus of claim 21, wherein based on utilization of a second configuration, the circuitry is to change one or more particular processing resources of the GPU permitted to perform operations for the source of the first command.
1. An apparatus comprising:
a graphics processing unit (GPU) including a plurality of processing resources and circuitry to dispatch at least one command for execution by the GPU, wherein the circuitry is to:

based on a configuration, limit execution of a first command of the at least one command to one or more particular processing resources of the GPU based on a source of the first command, wherein the configuration is to indicate one or more particular processing resources of the GPU permitted to execute commands from one or more particular sources and 





wherein based on utilization of a second configuration, the circuitry is to change one or more particular processing resources of the GPU permitted to perform operations for the source of the first command.


Patent ‘826 does not explicitly teach “wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources.” 
However, Cowperthwaite teaches wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources ([0013] Additionally, the graphics microcontroller may maintain a corresponding GPU state unique to each VM, e.g., a set of GPU configuration parameters associated with the VM. Upon a scheduled transition of GPU workload execution from a first VM to a second VM, the graphics microcontroller may save the GPU state unique to the first VM from the GPU and provide (or restore) the corresponding GPU state of the second VM to the GPU. In an embodiment, the graphics microcontroller may save each configuration of the GPU in a memory-based storage, e.g. one or more sets of virtual or physical registers, and may provide or restore to the GPU a particular GPU state of a particular VM, the particular GPU state corresponding to a driver interface (vdriver) that has facilitated transmission of one or more GPU workloads from the particular VM. Each VM's GPU state may be saved/restored from the memory that is accessible to the GPU.; [0015] In operation, the GPU microcontroller 102 may schedule workloads of each of the guest VMs 110.sub.1, 110.sub.2, . . . , 110.sub.n to be executed by the GPU 130. Each VM 110.sub.1, 110.sub.2, . . . , 110.sub.n may be scheduled, in corresponding schedule slots, to exclusively access the GPU 102 or a portion thereof, e.g., rendering engine 132, media effects engine 134, video encode engine 142, or other engine or portion of the GPU 130, according to a schedule implemented by the GPU microcontroller 102. For example, the VM 110.sub.1 may be scheduled, in a first schedule slot, to exclusively access the GPU 102 and to execute a first workload, e.g., a first set of instructions supplied by the VM 110.sub.1, to be executed by the GPU 130. Prior to execution of the first workload by the GPU 102, the GPU microcontroller 102 may retrieve a first GPU state, e.g., a first set of parameter values specific to VM 110.sub.1 with which to configure the GPU 130. Thus, the GPU 130 may be configured prior to execution of the first workload issued from the VM 110.sub.1. The first GPU state may be previously stored, by the vdriver interface 106, in, e.g., the aperture memory 140 (e.g., within memory portion 150 or within other memory accessible to the GPU microcontroller 102). Each of the VMs 110.sub.1-110.sub.n may have a corresponding set of parameter values (GPU state) stored (e.g., in the memory portion 150) by the vdriver interface 106. When a particular VM.sub.i is scheduled to access the GPU 130 (or portion thereof), the GPU state of the VM.sub.i may be recalled by the GPU microcontroller 102 in coordination with the vdriver interface 106 to provide or restore a corresponding configuration of the GPU 130 (or portion thereof) prior to execution of a task by the GPU 130 or portion thereof.; [0019]; [0022]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Cowperthwaite with the teachings of ‘826 to schedule execution of VMs/sources in an exclusive GPUs or portions thereof based on time slots. The modification would have been motivated by the desire of reducing latencies associated with the traditional model for GPU workload scheduling, and to provide for flexible access to the GPU. (See at least [0012]) 

Claim 29 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 9 of U.S. Patent No. US 11,726,826 B2 in view of Cowperthwaite et al. (US 2016/0239333 A1).
For brevity purposes, a single table is shown for claim 21+41.
Claim 35 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 15 of U.S. Patent No. US 11,726,826 B2. in view of Cowperthwaite et al. (US 2016/0239333 A1).
Claim 21 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. US 11,074,109 B2 in view of Cowperthwaite et al. (US 2016/0239333 A1).
The differences are shown in bold below.
Instant Application
US 11,074,109 B2
21. An apparatus comprising: 

a graphics processing unit (GPU) including a plurality of processing resources and circuitry to dispatch at least one command for execution by the GPU, wherein the circuitry is to: 


based on a configuration, permit execution of a first command of the at least one command by one or more particular processing resources of the GPU based on a source of the first command, wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources.

1. An apparatus comprising:
a memory device;
a graphics processing unit (GPU) including a plurality of execution units;
a thread scheduler to dispatch at least one thread for execution by at least a portion of the execution units of the GPU; and

a mapping table to identify one or more particular execution units of the GPU that are limited to execute commands from one or more particular sources, wherein to dispatch at least one thread for execution by at least a portion of the execution units of the GPU, the thread scheduler is to:
in response to receipt of a first command from a first source:
access the mapping table to identify one or more particular execution units of the GPU to execute one or more threads associated with a command from the first source and
allocate the identified one or more particular execution units of the GPU to execute one or more threads associated with the first command and
in response to receipt of a first command from a second source while a thread associated with the first command is executing and after allocation of the execution units of the GPU to execute the one or more threads associated with the first command, allocate a first portion of the execution units to perform the one or more threads associated with the first command and allocate a second portion of the execution units to perform the one or more threads associated with the first command from the second source, wherein:
one or more particular execution units of the GPU are limited to execute commands from one or more particular sources and
the first portion of the execution units are limited to the identified one or more particular execution units of the GPU to execute the one or more threads associated with the first command.

Patent ‘826 does not explicitly teach “wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources.” 
However, Cowperthwaite teaches wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources ([0013] Additionally, the graphics microcontroller may maintain a corresponding GPU state unique to each VM, e.g., a set of GPU configuration parameters associated with the VM. Upon a scheduled transition of GPU workload execution from a first VM to a second VM, the graphics microcontroller may save the GPU state unique to the first VM from the GPU and provide (or restore) the corresponding GPU state of the second VM to the GPU. In an embodiment, the graphics microcontroller may save each configuration of the GPU in a memory-based storage, e.g. one or more sets of virtual or physical registers, and may provide or restore to the GPU a particular GPU state of a particular VM, the particular GPU state corresponding to a driver interface (vdriver) that has facilitated transmission of one or more GPU workloads from the particular VM. Each VM's GPU state may be saved/restored from the memory that is accessible to the GPU.; [0015] In operation, the GPU microcontroller 102 may schedule workloads of each of the guest VMs 110.sub.1, 110.sub.2, . . . , 110.sub.n to be executed by the GPU 130. Each VM 110.sub.1, 110.sub.2, . . . , 110.sub.n may be scheduled, in corresponding schedule slots, to exclusively access the GPU 102 or a portion thereof, e.g., rendering engine 132, media effects engine 134, video encode engine 142, or other engine or portion of the GPU 130, according to a schedule implemented by the GPU microcontroller 102. For example, the VM 110.sub.1 may be scheduled, in a first schedule slot, to exclusively access the GPU 102 and to execute a first workload, e.g., a first set of instructions supplied by the VM 110.sub.1, to be executed by the GPU 130. Prior to execution of the first workload by the GPU 102, the GPU microcontroller 102 may retrieve a first GPU state, e.g., a first set of parameter values specific to VM 110.sub.1 with which to configure the GPU 130. Thus, the GPU 130 may be configured prior to execution of the first workload issued from the VM 110.sub.1. The first GPU state may be previously stored, by the vdriver interface 106, in, e.g., the aperture memory 140 (e.g., within memory portion 150 or within other memory accessible to the GPU microcontroller 102). Each of the VMs 110.sub.1-110.sub.n may have a corresponding set of parameter values (GPU state) stored (e.g., in the memory portion 150) by the vdriver interface 106. When a particular VM.sub.i is scheduled to access the GPU 130 (or portion thereof), the GPU state of the VM.sub.i may be recalled by the GPU microcontroller 102 in coordination with the vdriver interface 106 to provide or restore a corresponding configuration of the GPU 130 (or portion thereof) prior to execution of a task by the GPU 130 or portion thereof.; [0019]; [0022]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Cowperthwaite with the teachings of ‘826 to schedule execution of VMs/sources in an exclusive GPUs or portions thereof based on time slots. The modification would have been motivated by the desire of reducing latencies associated with the traditional model for GPU workload scheduling, and to provide for flexible access to the GPU. (See at least [0012]) 

Claim 29 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 10 of U.S. Patent No. US 11,074,109 B2 in view of Cowperthwaite et al. (US 2016/0239333 A1).
For brevity purposes, a single table is shown for claim 21.
Claim 35 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. US 11,074,109 B2 in view of Cowperthwaite et al. (US 2016/0239333 A1).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 21-22, 24-30, 32-36, 38-39, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Gandhi et al. (US 2017/0256018 A1) in further view of Cowperthwaite et al. (US 2016/0239333 A1).

Gandhi was cited in the previous Office Action.

Regarding claim 21, Gandhi teaches the invention as claimed including an apparatus comprising: 
a graphics processing unit (GPU) including a plurality of processing resources and circuitry to dispatch at least one command for execution by the GPU ([0017] In particular, processing system 100 may include a GPU 102 comprising a processor 104 and a memory 106. Processor 104 may be a multi-core processor (e.g., processor 104 may be a 4,000 core processor, an 8,000 core processor, etc.). Memory 106 may be a random access memory, for example, that stores instructions that are executable by processor 104.; [0018] GPU gatekeeper 108 mediates access to GPU 102 on processing system 100 to ensure that applications (APP) 110, 112 can share the resources of GPU 102 (including processor 104 and memory 106) in a fair and secure/protected manner. GPU gatekeeper 108 also enables sharing of GPU 102 (or multiple GPUs) along both space and time dimensions. That is, GPU gatekeeper 108 partitions (i.e., slices) GPU 102 and its resources (i.e., processor 104 and memory 106) by amount of processing power (i.e., number of cores) and amount of memory (i.e., size in bytes). A slice is a specific portion of a hardware unit (e.g., 1 core of a processor or 1 byte of memory) of a GPU (e.g., GPU 102).[0032]), wherein the circuitry is to: 
based on a configuration, permit execution of a first command of the at least one command by one or more particular processing resources of the GPU based on a source of the first command (Fig. 3; [0030] At block 304, method 300 includes receiving a first request, which may include a minimum and/or maximum amount of resources, from a first application of the plurality of applications for first requested GPU resources, the GPU resources comprising a processor and a memory…At block 308, method 300 includes determining whether the first request and/or the second request can be fulfilled, for example, while satisfying a fairness policy, if any. At block 310, method 300 includes getting the availability of the GPU capacity to determine whether the requests can be fulfilled at block 308. At block 312, method 300 includes, responsive to determining that the first requested GPU resources are available, allocating a first slice of the GPU resources with a first requested amount of resources to the first application; wherein the “configuration” corresponds to the min and max amount of resources, the “source” corresponds to the first application ).
Gandhi does not explicitly teach wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources.
However, Cowperthwaite teaches wherein the configuration is to indicate one or more particular processing resources of the GPU to be exclusively allocated to execute commands from one or more particular sources and indicate to save state after utilization of the exclusively allocated one or more particular processing resources of the GPU for access by a second command from the one or more particular sources ([0013] Additionally, the graphics microcontroller may maintain a corresponding GPU state unique to each VM, e.g., a set of GPU configuration parameters associated with the VM. Upon a scheduled transition of GPU workload execution from a first VM to a second VM, the graphics microcontroller may save the GPU state unique to the first VM from the GPU and provide (or restore) the corresponding GPU state of the second VM to the GPU. In an embodiment, the graphics microcontroller may save each configuration of the GPU in a memory-based storage, e.g. one or more sets of virtual or physical registers, and may provide or restore to the GPU a particular GPU state of a particular VM, the particular GPU state corresponding to a driver interface (vdriver) that has facilitated transmission of one or more GPU workloads from the particular VM. Each VM's GPU state may be saved/restored from the memory that is accessible to the GPU.; [0015] In operation, the GPU microcontroller 102 may schedule workloads of each of the guest VMs 110.sub.1, 110.sub.2, . . . , 110.sub.n to be executed by the GPU 130. Each VM 110.sub.1, 110.sub.2, . . . , 110.sub.n may be scheduled, in corresponding schedule slots, to exclusively access the GPU 102 or a portion thereof, e.g., rendering engine 132, media effects engine 134, video encode engine 142, or other engine or portion of the GPU 130, according to a schedule implemented by the GPU microcontroller 102. For example, the VM 110.sub.1 may be scheduled, in a first schedule slot, to exclusively access the GPU 102 and to execute a first workload, e.g., a first set of instructions supplied by the VM 110.sub.1, to be executed by the GPU 130. Prior to execution of the first workload by the GPU 102, the GPU microcontroller 102 may retrieve a first GPU state, e.g., a first set of parameter values specific to VM 110.sub.1 with which to configure the GPU 130. Thus, the GPU 130 may be configured prior to execution of the first workload issued from the VM 110.sub.1. The first GPU state may be previously stored, by the vdriver interface 106, in, e.g., the aperture memory 140 (e.g., within memory portion 150 or within other memory accessible to the GPU microcontroller 102). Each of the VMs 110.sub.1-110.sub.n may have a corresponding set of parameter values (GPU state) stored (e.g., in the memory portion 150) by the vdriver interface 106. When a particular VM.sub.i is scheduled to access the GPU 130 (or portion thereof), the GPU state of the VM.sub.i may be recalled by the GPU microcontroller 102 in coordination with the vdriver interface 106 to provide or restore a corresponding configuration of the GPU 130 (or portion thereof) prior to execution of a task by the GPU 130 or portion thereof.; [0019]; [0022]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Cowperthwaite with the teachings of Gandhi to schedule execution of VMs/sources in an exclusive GPUs or portions thereof based on time slots. The modification would have been motivated by the desire of reducing latencies associated with the traditional model for GPU workload scheduling, and to provide for flexible access to the GPU. (See at least [0012]) 

Regarding claim 22, Gandhi teaches wherein in response to receipt of a third command from a second source while the first command is executing, the circuitry is to:
based on the configuration indicating that the second source is permitted to execute a command on the one or more particular processing resources that execute the first command: permit the first command to complete execution on the one or more particular processing resources and allocate the second command for execution on one or more particular processing resources that previously executed the first command ([0030] At block 304, method 300 includes receiving a first request, which may include a minimum and/or maximum amount of resources, from a first application of the plurality of applications for first requested GPU resources, the GPU resources comprising a processor and a memory. In examples, the processor comprises multiple GPU cores, wherein each GPU core of the multiple GPU cores comprises a plurality of hardware threads. At block 306, method 300 includes receiving a second request, which may also include a minimum and/or/maximum amount of resources from a second application of the plurality of applications for second GPU resources. At block 308, method 300 includes determining whether the first request and/or the second request can be fulfilled, for example, while satisfying a fairness policy, if any. At block 310, method 300 includes getting the availability of the GPU capacity to determine whether the requests can be fulfilled at block 308. At block 312, method 300 includes, responsive to determining that the first requested GPU resources are available, allocating a first slice of the GPU resources with a first requested amount of resources to the first application, and, responsive to determining that the second requested GPU resources are available, allocating a second slice of the GPU resources with a second requested amount of resources to the second application. At block 314, method 300 includes enabling the first application and the second application to execute concurrently within the first slice of the GPU and the second slice of the GPU respectively. Method 300 continues to block 316 and ends.).

Regarding claim 24, Gandhi teaches wherein the circuitry is to: in response to detection of completion of the third command and an unexecuted command from the source of the first command, allocate one or more of the processing resources formerly allocated to execute the third command to perform the unexecuted command from the second source ([0030]; [0033] It should be understood that the processes depicted in FIG. 3 represent illustrations, and that other processes may be added or existing processes may be removed, finish executing, modified, or rearranged without departing from the scope and spirit of the present disclosure.).

Regarding claim 25, Gandhi teaches wherein the first command is associated with a compute context or a render context and based on completion of execution of the first command, permit access to the compute context or the render context for use by the second command ([0034]; [0042] In some aspects of the present disclosure, processing system 20 includes a graphics processing unit 37. Graphics processing unit 37 is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. In general, graphics processing unit 37 is very efficient at manipulating computer graphics and image processing, and has a highly parallel structure that makes it more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel.).
In addition, Cowperthwaite teaches wherein the saved state comprises the compute context or the render context ([0013] Additionally, the graphics microcontroller may maintain a corresponding GPU state unique to each VM, e.g., a set of GPU configuration parameters associated with the VM. Upon a scheduled transition of GPU workload execution from a first VM to a second VM, the graphics microcontroller may save the GPU state unique to the first VM from the GPU and provide (or restore) the corresponding GPU state of the second VM to the GPU. In an embodiment, the graphics microcontroller may save each configuration of the GPU in a memory-based storage, e.g. one or more sets of virtual or physical registers, and may provide or restore to the GPU a particular GPU state of a particular VM, the particular GPU state corresponding to a driver interface (vdriver) that has facilitated transmission of one or more GPU workloads from the particular VM. Each VM's GPU state may be saved/restored from the memory that is accessible to the GPU.).

Regarding claim 26, Gandhi teaches wherein the circuitry is to: in response to receipt of a third command associated with a third source, allocate one or more of the processing resources to perform the third command based on the configuration ([0030]; [0033] Additional processes also may be included. For example, method 300 may include receiving a third request from a third application of the plurality of applications and, responsive to determining that the third requested GPU resources are available, allocating a third slice of the GPU resources with a third requested amount of resources to the third application. Method 300 may further include enabling the third application to execute on the third slice concurrently with the first application executing on the first slice and the second application executing on the second slice. It should be understood that the processes depicted in FIG. 3 represent illustrations, and that other processes may be added or existing processes may be removed, finish executing, modified, or rearranged without departing from the scope and spirit of the present disclosure.).

Regarding claim 27, Gandhi teaches comprising a memory device communicatively coupled to the circuitry to dispatch at least one command for execution by the GPU (Fig. 1, GPU Gatekeeper, GPU, Memory 106), wherein the memory device is to store the configuration ([0028] Once the mapping of applications 210, 212 to GPUs 264, 274, 276, 284 is complete, sharing at each GPU 264, 274, 276, 284 is managed by the respective local gatekeepers 262, 272, 282. Further, each application 210, 212 may be responsible for distributing its computation and data on the multiple GPUs 264, 274, 276, 284. In examples, GPU distributed gatekeeper 250 may also be responsible for ensuring that an application (e.g., applications 210, 212) also gets access to at least one central processing unit (CPU) core on each node that hosts a GPU slice, which may guide the placement of applications on GPUs.; Fig. 4, 408).

Regarding claim 28, Gandhi teaches comprising one or more of: a general purpose graphics processing unit or a central processing unit (Fig. 1, GPU; [0002]), wherein the general purpose graphics processing unit or the central processing unit is to execute a process that is a source of the first command (Fig. 3; [0030]).

Regarding claim 29, it is a method type claim having similar limitations as claim 21 above. Therefore, it is rejected under the same rationale above.

Regarding claim 30, it is a method type claim having similar limitations as claim 22 above. Therefore, it is rejected under the same rationale above.

Regarding claim 32, Gandhi teaches comprising: in response to receipt of a third command from a second source while the first command is executing: 
based on the first configuration indicating that the second source is permitted to execute a command on the one or more particular processing resources that execute the first command: 
permitting the first command to complete execution on the first set of one or more particular processing resources, allocating the third command for execution on one or more particular processing resources that previously executed the first command, and in response to completion of the third command and an unexecuted command from the one or more particular sources of the first command, based on the first configuration, allocating one or more of the particular processing resources formerly allocated to execute the third command to perform the unexecuted command from the second source ([0021-22] discusses taking turns using resources and concurrent execution; [0030] At block 304, method 300 includes receiving a first request, which may include a minimum and/or maximum amount of resources, from a first application of the plurality of applications for first requested GPU resources, the GPU resources comprising a processor and a memory. In examples, the processor comprises multiple GPU cores, wherein each GPU core of the multiple GPU cores comprises a plurality of hardware threads. At block 306, method 300 includes receiving a second request, which may also include a minimum and/or/maximum amount of resources from a second application of the plurality of applications for second GPU resources. At block 308, method 300 includes determining whether the first request and/or the second request can be fulfilled, for example, while satisfying a fairness policy, if any. At block 310, method 300 includes getting the availability of the GPU capacity to determine whether the requests can be fulfilled at block 308. At block 312, method 300 includes, responsive to determining that the first requested GPU resources are available, allocating a first slice of the GPU resources with a first requested amount of resources to the first application, and, responsive to determining that the second requested GPU resources are available, allocating a second slice of the GPU resources with a second requested amount of resources to the second application. At block 314, method 300 includes enabling the first application and the second application to execute concurrently within the first slice of the GPU and the second slice of the GPU respectively. Method 300 continues to block 316 and ends.; [0033] Additional processes also may be included. For example, method 300 may include receiving a third request from a third application of the plurality of applications and, responsive to determining that the third requested GPU resources are available, allocating a third slice of the GPU resources with a third requested amount of resources to the third application. Method 300 may further include enabling the third application to execute on the third slice concurrently with the first application executing on the first slice and the second application executing on the second slice. It should be understood that the processes depicted in FIG. 3 represent illustrations, and that other processes may be added or existing processes may be removed, finish executing, modified, or rearranged without departing from the scope and spirit of the present disclosure.).

Regarding claim 33, it is a method type claim having similar limitations as claim 25 above. Therefore, it is rejected under the same rationale above.

Regarding claim 34, Gandhi teaches comprising: in response to receipt of a fourth command associated with a third source, allocating one or more of the particular processing resources to perform the fourth command based on the first configuration ([0030] At block 304, method 300 includes receiving a first request, which may include a minimum and/or maximum amount of resources, from a first application of the plurality of applications for first requested GPU resources, the GPU resources comprising a processor and a memory. In examples, the processor comprises multiple GPU cores, wherein each GPU core of the multiple GPU cores comprises a plurality of hardware threads. At block 306, method 300 includes receiving a second request, which may also include a minimum and/or/maximum amount of resources from a second application of the plurality of applications for second GPU resources. At block 308, method 300 includes determining whether the first request and/or the second request can be fulfilled, for example, while satisfying a fairness policy, if any. At block 310, method 300 includes getting the availability of the GPU capacity to determine whether the requests can be fulfilled at block 308. At block 312, method 300 includes, responsive to determining that the first requested GPU resources are available, allocating a first slice of the GPU resources with a first requested amount of resources to the first application, and, responsive to determining that the second requested GPU resources are available, allocating a second slice of the GPU resources with a second requested amount of resources to the second application. At block 314, method 300 includes enabling the first application and the second application to execute concurrently within the first slice of the GPU and the second slice of the GPU respectively. Method 300 continues to block 316 and ends.[0033] Additional processes also may be included. For example, method 300 may include receiving a third request from a third application of the plurality of applications and, responsive to determining that the third requested GPU resources are available, allocating a third slice of the GPU resources with a third requested amount of resources to the third application. Method 300 may further include enabling the third application to execute on the third slice concurrently with the first application executing on the first slice and the second application executing on the second slice. It should be understood that the processes depicted in FIG. 3 represent illustrations, and that other processes may be added or existing processes may be removed, finish executing, modified, or rearranged without departing from the scope and spirit of the present disclosure.; [0075] The descriptions of the various examples of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described techniques. The terminology used herein was chosen to best explain the principles of the present techniques, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the techniques disclosed herein. i.e., fourth request).

Regarding claim 35, it is a media/product type claim having similar limitations as claim 21 above. Therefore, it is rejected under the same rationale above. The additional limitation “At least one non-transitory computer-readable medium comprising instructions thereon, that if executed by one or more processors, cause the one or more processors to” is taught by Gandhi in at least Claim 17 “a non-transitory storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method”.

Regarding claim 36, it is a media/product type claim having similar limitations as claim 22 above. Therefore, it is rejected under the same rationale above.

Regarding claim 38, it is a media/product type claim having similar limitations as claim 32 above. Therefore, it is rejected under the same rationale above.

Regarding claim 39, it is a media/product type claim having similar limitations as claim 25 above. Therefore, it is rejected under the same rationale above.

Regarding claim 40, it is a media/product type claim having similar limitations as claim 34 above. Therefore, it is rejected under the same rationale above.

Regarding claim 41, Gandhi teaches wherein based on utilization of a second configuration, the circuitry is to change one or more particular processing resources of the GPU permitted to perform operations for the source of the first command ([0026] GPU distributed gatekeeper 250 communicates periodically with local gatekeepers 262, 272, 282 to determine their resource availability in terms of free GPU cores and memory. Applications 210, 212 specify their requirements to GPU distributed gatekeeper 250, which can then perform matching of resource requirements (from applications 210, 212) to resource availability (on GPUs 264, 274, 276, 284). GPU distributed gatekeeper 250 can also optimize performance by deciding how much of each GPU 264, 274, 276, 284 should be given to each application 210, 212, and which applications 210, 212 should be co-located.; [0027] GPU distributed gatekeeper 250 can also act in a distributed manner and expose the resource availability of GPUs 264, 274, 276, 284 to applications 210, 212, who can then decide how much of the resources each application 210, 212 wishes to utilize. Once a decision is made, the application can request specific slices of specific GPUs. In the example of FIG. 2, application 210 is mapped to GPU 264 on node 260 and to GPU 274 on node 270 while application 212 is mapped to GPU 276 on node 270 and to GPU 284 on node 280.).

Claims 31, and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Gandhi and Cowperthwaite as cited above, in further view of Jiao (US 2010/0123717 A1).

Jiao was cited in IDS.

Regarding claim 31, Gandhi nor Cowperthwaite do not teach after completion of the first command, flushing a write data buffer to a cache and invalidating state and constant caches associated with the one or more particular processing resources that executed the first command.
	However, Jiao teaches after completion of the first command, flushing a write data buffer to a cache and invalidating state and constant caches associated with the one or more particular processing resources that executed the first command ([0049] The cache/control device 60 may further include a scheduler (not shown), such as one which is similar to the scheduler 55 shown in FIG. 3A, for scheduling tasks of the EUs 56. The scheduler in this embodiment also handles the assignment of tasks to different EUs 56 and to individual threads of the EUs 56. As the tasks are completed, the scheduler removes or drops the task from cache 62 and indicates that certain thread slots are not occupied. When empty thread slots are available, the scheduler assigns additional tasks to these threads.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jiao of remove data from the completed task to make space for new tasks. The modification would have been motivated by the desire of flushing a write data buffer of the second portion from execution of the one or more threads associated with the first command to a cache and invalidate state and constant caches associated with the second portion in order handle the assignment of tasks or threads to different EUs efficiently.

Regarding claim 35, it is a media/product type claim having similar limitations as claim 23 above. Therefore, it is rejected under the same rationale above.

Response to Arguments
Applicant’s arguments with respect to claims 21, 22, and 24-41 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CHU JOY-DAVILA whose telephone number is (571)270-0692. The examiner can normally be reached Monday-Friday, 6:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J Li can be reached at (571)272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195
Read full office action
Prosecution Timeline

May 09, 2023
Application Filed
Aug 10, 2023
Response after Non-Final Action
Sep 13, 2025
Non-Final Rejection — §103, §DP
Nov 25, 2025
Interview Requested
Dec 17, 2025
Applicant Interview (Telephonic)
Dec 17, 2025
Response Filed
Dec 19, 2025
Examiner Interview Summary
Jan 26, 2026
Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/391,320
Patent 12602244
OFFLOADING PROCESSING TASKS TO DECOUPLED ACCELERATORS FOR INCREASING PERFORMANCE IN A SYSTEM ON A CHIP
2y 5m to grant Granted Apr 14, 2026
17/957,939
Patent 12596565
USER ASSIGNED NETWORK INTERFACE QUEUES
2y 5m to grant Granted Apr 07, 2026
18/332,830
Patent 12591821
DYNAMIC ADJUSTMENT OF WELL PLAN SCHEDULES ON DIFFERENT HIERARCHICAL LEVELS BASED ON SUBSYSTEMS ACHIEVING A DESIRED STATE
2y 5m to grant Granted Mar 31, 2026
18/229,644
Patent 12585490
MIGRATING VIRTUAL MACHINES WHILE PERFORMING MIDDLEBOX SERVICE OPERATIONS AT A PNIC
2y 5m to grant Granted Mar 24, 2026
18/076,742
Patent 12579065
LIGHTWEIGHT KERNEL DRIVER FOR VIRTUALIZED STORAGE
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
77%
Grant Probability
99%
With Interview (+37.3%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 408 resolved cases by this examiner. Grant probability derived from career allow rate.