Prosecution Insights
Last updated: April 19, 2026
Application No. 18/334,723

System and Method for Token-based Graphics Processing Unit (GPU) Utilization

Non-Final OA §101§103
Filed
Jun 14, 2023
Examiner
NGUYEN, BRANDON A
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
Grant Probability
Favorable
1-2
OA Rounds
3y 3m
To Grant

Examiner Intelligence

Grants only 0% of cases
0%
Career Allow Rate
0 granted / 0 resolved
-55.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
10 currently pending
Career history
10
Total Applications
across all art units

Statute-Specific Performance

§101
16.7%
-23.3% vs TC avg
§103
66.7%
+26.7% vs TC avg
§102
12.5%
-27.5% vs TC avg
§112
4.2%
-35.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 0 resolved cases

Office Action

§101 §103
Notice of Pre-AIA or AIA Status 1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 101 2. 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 3. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The independent claims 1, 9, and 15 recite the limitation “determining a token utilization for the workload data based upon, at least in part, the maximum number of KV cache blocks available for the workload data”. Under the broadest reasonable interpretation in view of the specification, this determining step could reasonably be performed in the mind, including with aid of pen and paper but for recitation of generic resource utilization, provided Equations 1 and 2 in the specifications. Thus, the limitation falls within the ”Mental Processes” grouping of abstract ideas under prong 1. Under prong 2, this judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because they are recited at a high-level of generality such that it amounts to no more than the general allocating of resources to a processor. Under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional dependent claims of a simulation engine in claim 2, determining general processing units’ limitations in claims 3 and 4, converting memory available into tokens available in claim 5, simple division of processing tokens by available tokens in claim 6, and creating and deploying a performance configuration in claims 7 and 8, amount to no more than a prediction of resource usage, general data hardware usage, general resource allocation, and simple mental processes respectively. The recitation of generic resource prediction and allocation to apply the judicial exception, and mere hardware utilization data do not amount to significantly more, thus cannot provide an inventive concept. Accordingly, the claims are not patent eligible under 35 U.S.C. 101. Claim Rejections - 35 USC § 103 4. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 5. Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Maschmeyer et al. Pub No. US-20240311192-A1 with a priority date to Provisional Application No. 63/501,854 (hereafter Maschmeyer) in view of SHULMAN Pub no. US-20240356535-A1 (hereafter Shulman). 6. Regarding claim 1, Maschmeyer teaches “A computer-implemented method, executed on a computing device, comprising: processing workload data associated with processing a plurality of requests for an artificial intelligence (AI) model on a processing unit; ([0061] – [0063] teaches a LLM generating outputs based on user input (prompts) using processing units. [0074] teaches converting resource usage to token count such that it teaches workload data) determining a maximum number of key-value (KV) cache blocks available for the workload data by simulating the workload data using a simulation engine; ([0009]-[0013] teaches a prediction ML model (the simulation engine) determining a maximum resource capacity available within the LLM. See [0073]-[0075] for more specific details) determining a token utilization for the workload data based upon, at least in part, the maximum number of KV cache blocks available for the workload data ([0074] teaches that total resource usage parameter may be the token count, tokens used in generating a response); and allocating processing unit resources for the AI model based upon, at least in part, the token utilization. ([0140]-[0144] and Fig. 3A-3D teaches a progress bar that represents the number of resources to be allocated for the output).” Maschmeyer does not explicitly teach that the maximum number of resource/memory available is a key-value cache. Shulman teaches memory being key-values such that it teaches the limitation ([0040] teaches that memory may be the assortment of storage mechanisms including key-value store). It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to apply the teachings of Shulman with the Maschmeyer to teach as evidence, that resource capacity may be represented as a key-value cache. A person having ordinary skill in the art would have been motivated to make this combination for the purpose of more effective storage within processors, especially when using Graphics Processing Units (GPUs). Together, Shulmer in combination with Maschmeyer, teach every limitation of the claimed invention. Since the teachings were analogous art known at the filing time of invention, one of ordinary skill could have applied said teachings to achieve expected results. 7. Regarding claim 2, wherein the combination Maschmeyer teaches “The computer-implemented method of claim 1, wherein processing the workload data includes mirroring a plurality of requests received for processing by the AI model on the processing unit to the simulation engine. ([0133]-[0135] teaches a resource prediction model estimating resources for a given prompt such that it teaches mirroring the requests onto a simulation engine).” 8. Regarding claim 3, wherein the combination Maschmeyer implicitly teaches “The computer-implemented method of claim 1, wherein determining the token utilization includes determining a processing unit memory utilization limit. ([0073]-[0075] teaches that the computer systems’ memory and computing power dictates the token limit such that it the memory utilization limit must be determined before determining the token limit).” 9. Regarding claim 4, wherein the combination Maschmeyer teaches “The computer-implemented method of claim 1, wherein determining the token utilization includes determining a processing unit computing utilization limit. ([0073]-[0075] also implicitly teaches that similar to claim 3).” 10. Regarding claim 5, wherein the combination Maschmeyer teaches “The computer-implemented method of claim 1, wherein determining the maximum number of KV cache blocks available for the workload data includes converting the maximum number of KV cache blocks available into a number of tokens available. ([0074]-[0076] teaches total resource usage represents a token count with respect to available tokens remaining).” 11. Regarding claim 6, wherein the combination Maschmeyer teaches “The computer-implemented method of claim 5, wherein determining the token utilization includes determining a number of processing tokens ([0139]-[0140] and [00001] teaches the equation used to determine total utilization. Also [0073] teaches “For example, an LLM may have a token limit of 4097 tokens which can be shared between a prompt and a response (e.g., a prompt that uses 4000 tokens will limit the response to a maximum of 97 tokens). [0055]-[0056] teaches how input tokens are broken down into processing tokens).” 12. Regarding claim 7, wherein the combination Maschmeyer teaches “The computer-implemented method of claim 6, wherein determining the token utilization includes determining a performance configuration for the workload data based upon, at least in part, the number of tokens available and the number of processing tokens ([0140]-[0150] and Fig. 3A-3D, 4 teach generating a visual representation of the configuration and recommendations/warnings based on user input and the calculated token utilization).” 13. Regarding claim 8, wherein the combination Maschmeyer teaches “The computer-implemented method of claim 7, wherein allocating the processing unit resources for the AI model includes allocating processing unit resources for the AI model using the performance configuration. ([0140]-[0145] teaches a visual representation of total resources used based on token utilization such that it is already using the performance configuration to allocate the resources as soon as the user continues with their prompt).” 14. Claim 9 is similar to claims 1 and 2, and is therefore rejected for similar reasons. Claim 9 is directed towards “A computing system (Maschmeyer [0023] computing system) comprising: a memory; and a processor configured to process workload data associated with processing a plurality of requests for an artificial intelligence (AI) model on a graphics processing unit (GPU) (Maschmeyer [0064] processor may be a GPU) by mirroring a plurality of requests received for processing by the AI model on the GPU to a simulation engine, to determine a maximum number of key-value (KV) cache blocks available for the workload data by simulating the workload data using the simulation engine, to determine a token utilization for the workload data based upon, at least in part, the maximum number of KV cache blocks available for the workload data, and to allocate GPU resources for the AI model based upon, at least in part, the token utilization.” 15. Claim 10 is similar to claim 3, therefore is rejected for similar reasons. 16. Claim 11 is similar to claim 4, therefore is rejected for similar reasons. 17. Claim 12 is similar to claim 5, therefore is rejected for similar reasons. 18. Claim 13 is similar to claim 6, therefore is rejected for similar reasons. 19. Claim 14 is similar to claim 7, therefore is rejected for similar reasons. 20. Claim 15 is similar to claim 1, therefore is rejected for similar reasons. Claim 15 is directed towards “A computer program product (Maschmeyer [0188]) residing on a non-transitory computer readable medium (Maschmeyer [0023]) having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform operations comprising: processing workload data associated with processing a plurality of requests for an artificial intelligence (AI) model on a graphics processing unit (GPU); determining a maximum number of key-value (KV) cache blocks available for the workload data by simulating the workload data using a simulation engine; converting the maximum number of KV cache blocks available into a maximum number of tokens available; determining a token utilization for the workload data based upon, at least in part, the number of tokens available; and allocating GPU resources for the AI model based upon, at least in part, the token utilization.” 21. Claim 16 is similar to claim 2, therefore is rejected for similar reasons. 22. Claim 17 is similar to claims 3 and 10, and are therefore rejected for similar reasons. 23. Claim 18 is similar to claims 4 and 11, and are therefore rejected for similar reasons. 24. Claim 19 is similar to claims 6 and 13, and are therefore rejected for similar reasons. 25. Claim 20 is similar to claims 7 and 14, and are therefore rejected for similar reasons. Conclusion 26. Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON A NGUYEN whose telephone number is (571)272-6074. The examiner can normally be reached Mon-Fri (10am-6pm). Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCXformat. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /BRANDON NGUYEN/Examiner, Art Unit 2195 /Aimee Li/Supervisory Patent Examiner, Art Unit 2195
Read full office action

Prosecution Timeline

Jun 14, 2023
Application Filed
Jan 14, 2026
Non-Final Rejection — §101, §103 (current)

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
Grant Probability
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 0 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month