Prosecution Insights
Last updated: April 19, 2026
Application No. 18/772,080

ADVANCED SEMANTIC CACHING WITH CDN FOR RAG-BASED LLM APPLICATIONS

Non-Final OA §101
Filed
Jul 12, 2024
Examiner
PATEL, SHREYANS A
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
89%
Grant Probability
Favorable
1-2
OA Rounds
2y 3m
To Grant
96%
With Interview

Examiner Intelligence

Grants 89% — above average
89%
Career Allow Rate
359 granted / 403 resolved
+27.1% vs TC avg
Moderate +7% lift
Without
With
+7.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 3m
Avg Prosecution
46 currently pending
Career history
449
Total Applications
across all art units

Statute-Specific Performance

§101
21.3%
-18.7% vs TC avg
§103
36.0%
-4.0% vs TC avg
§102
22.6%
-17.4% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 403 resolved cases

Office Action

§101
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101. Claims 1, 9 and 17 are directed to an abstract idea of organizing, comparing, and distributing information based on semantic similarity and access permissions. The claimed steps of receiving a natural language prompt, matching the prompt to a semantically similar cached response, determining whether a user has access permission, and sending a response upon satisfaction of that condition constitute mental processes and methods of organizing human activity that could be performed conceptually by a human or with pen-and-paper equivalents. In particular, semantically comparing a request to previously stored information and deciding whether to provide that information based on authorization criteria are longstanding information-management practices that fall squarely within abstract concepts such as information retrieval, classification, and conditional dissemination. The claims do not integrate the identified abstract idea into a practical application. Although the claim recites implementation in a “content delivery network (CDN)” and references a “RAG-based large language model,” these elements are invoked only as generic computing environments in which the abstract idea is executed. The claim does not recite any specific improvement to CDN operation, cache architecture, access-control mechanisms, or large language model technology. The steps are functionally described at a high level and amount to applying the abstract idea using conventional computer components to achieve predictable results, rather than effecting a technological improvement. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims are (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device. Dependent claims 2-8, 10-16 and 18-20 further recite an abstract idea performable by a human and do not amount to significantly more than the abstract idea as they do not provide steps other than what is conventionally known in data retrieval systems. Claims 2, 10 and 18: directed to the abstract idea of conditional information routing based on access permissions, which constitutes an organizing and decision-making process applied using generic computer components without a technological improvement. Claims 3, 11 and 19: merely refines the abstract idea by specifying withholding cached information while forwarding a request, which remains a mental process for controlling information flow and does not add significantly more than Claim 1. Claims 4, 12 and 20: is directed to the abstract idea of associating information with classification tags to control access, a fundamental data organization technique implemented using a generic database. Claims 5 and 13: recites the abstract idea of administrative control over stored information via purge commands, which reflects routine information lifecycle management rather than a technological improvement. Claims 6 and 14: directed to the abstract idea of labeling information with document identifiers to track provenance, which is a longstanding information categorization practice performed on generic computing infrastructure. Claims 7 and 15: applies the abstract idea by removing stored information based on document classifications and control commands, which is a conventional data governance operation lacking an inventive concept. Claims 8 and 16: directed to the abstract idea of selectively responding to a request using previously stored information instead of recomputing it, which is a basic caching and decision-making practice implemented on generic computer technology. Allowable Subject Matter Claims 1-20 would be allowable if the Applicant can overcome the 101 Abstract Idea rejection set forth. The following is a statement of reasons for the indication of allowable subject matter: Dang et al. teaches a semantic caching system for question-answering in which natural-language user queries are received, converted into embedding representations, and compared against a cached question-answer dataset to identify semantically similar prior questions, after which the corresponding cached answers are returned without re-executing the underlying generation pipeline. Dang further discloses that cached answers are generated offline by identifying candidate documents from a document corpus and applying a transformer-based model to extract or generate answers from those documents, and that the cached mappings are stored in a database for efficient reuse. Tewari et al. teaches a secure content delivery system architecture using cached content. A user request is associated with user authentication or authorization data (e.g., secure URLs or hash-based tokens), and a content server determines whether the requesting user is authorized to access the requested cached content before delivering it. Tewari therefore teaches CDN-level request handling, association of requests with user-specific authorization information, and conditional delivery of cached content based on access permission. Dang nor Tewari disclose evaluating user access rights to underlying source documents prior to returning a cached answer and limits authorization checks to requested cached content rather than to the provenance documents used during response generation. Additionally, neither reference explicitly teach a “RAG-based LLM.”. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Fu et al. (“GPTCache: An Open-Source Semantic Cache for LLM Applications Enabling Faster Answers and Cost Savings”; 2023) – The rise of ChatGPT 1 has led to the development of artificial intelligence (AI) applications, particularly those that rely on large language models (LLMs). However, recalling LLM APIs can be expensive, and the response speed may slow down during LLMs’ peak times, causing frustration among developers. Potential solutions to this problem include using better LLM models or investing in more computing resources. However, these options may increase product development costs and decrease development speed. GPTCache 2 is an open-source semantic cache that stores LLM responses to address this issue. When integrating an AI application with GPTCache, user queries are first sent to GPTCache for a response before being sent to LLMs like ChatGPT. If GPTCache has the answer to a question, it quickly returns the answer to the user without having to query the LLM. This approach saves costs on API recalls and makes response times much faster. For instance, integrating GPTCache with the GPT service offered by OpenAI can increase response speed 2-10 times when the cache is hit. Moreover, network fluctuations will not affect GPTCache’s response time, making it highly stable. This paper presents GPTCache and its architecture, how it functions and performs, and the use cases for which it is most advantageous. Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. SHREYANS A. PATEL Primary Examiner Art Unit 2653 /SHREYANS A PATEL/Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

Jul 12, 2024
Application Filed
Feb 02, 2026
Non-Final Rejection — §101
Mar 11, 2026
Examiner Interview Summary
Mar 11, 2026
Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12586597
ENHANCED AUDIO FILE GENERATOR
2y 5m to grant Granted Mar 24, 2026
Patent 12586561
TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM, AND A METHOD OF CALCULATING AN EXPRESSIVITY SCORE
2y 5m to grant Granted Mar 24, 2026
Patent 12548549
ON-DEVICE PERSONALIZATION OF SPEECH SYNTHESIS FOR TRAINING OF SPEECH RECOGNITION MODEL(S)
2y 5m to grant Granted Feb 10, 2026
Patent 12548583
ACOUSTIC CONTROL APPARATUS, STORAGE MEDIUM AND ACCOUSTIC CONTROL METHOD
2y 5m to grant Granted Feb 10, 2026
Patent 12536988
SPEECH SYNTHESIS METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
96%
With Interview (+7.4%)
2y 3m
Median Time to Grant
Low
PTA Risk
Based on 403 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month