Last updated: April 19, 2026

Application No. 18/387,768

HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT

Final Rejection §103

Filed

Nov 07, 2023

Examiner

OPSASNICK, MICHAEL N

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Google LLC

OA Round

2 (Final)

Interview Optional

— +10.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 900 resolved cases, 2023–2026

Examiner Intelligence

OPSASNICK, MICHAEL N View full profile →

Grants 82% — above average

Career Allow Rate

737 granted / 900 resolved

+19.9% vs TC avg

Moderate +10% lift

Without

With

+10.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

46 currently pending

Career history

946

Total Applications

across all art units

Statute-Specific Performance

§101

17.7%

-22.3% vs TC avg

§103

33.0%

-7.0% vs TC avg

§102

29.9%

-10.1% vs TC avg

§112

6.3%

-33.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 900 resolved cases

Office Action

§103

09Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s)  1-14, 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Laban et al “Beyond the Chat:Executable and Verifiable Text-Editing with LLMs”, September 2023 (09/27/2023), pp 1-36, hereinafter “Beyond the Chat…” in view of Lee et al (20250140245).

As per claim 1, “Beyond the Chat…” teaches a method implemented by one or more processors, the method comprising: 
receiving, via a client device, a user query for content generation in response to receiving the user query for content generation (See section 4), 
processing the user query using a first LLM, to generate a first LLM output, (see section 4:  LLM based conversational interface,  “the system replies to a user query (also called a message or a prompt) with an answer formatted in plain text”);
causing initial content, that is based on the first LLM output to be rendered via a user interface of the client device ( section 4, fig. 3)
generating a text prompt based on the first LLM output, the text prompt further including a request for one or more focused edits (see section 4.1.3: “Each of the four edit-suggesting components is implemented with a single prompt to an LLM”); 
providing the text prompt to a second LLM that is less computationally efficient than the first LLM,(section 4.2.1, Fig. 3 “the user prompted the system with:” let’s give some dish examples..”
receiving, in response to providing the text prompt, one or more focused edits to replace one or more initial segments in the initial content with one or more updated segments (section 4: “For all prompts, the LLM is expected to return a valid JSON string that follows a predefined schema and contains the list of executable edit suggestions from the system”; section 4.2.1., Fig.3:”…and the system responded with three dish suggestions which are displayed both as an executable edit”),
 wherein the one or more focused edits are generated using the second LLM to perform one or more iterations of content refinement based on the text prompt (see section 4: “For all prompts, the LLM is expected to return a valid JSON string that follows a predefined schema and contains the list of executable edit suggestions from the system”);
and causing the one or more focused edits to the initial content to be visually rendered via the user interface, resulting in revised content responsive to the user query for content generation (Figure 3, section 4.2.1).
As per claim 1, “Beyond the Chat…” teaches the use of LLM’s and further discusses accessing search engines via a browser/internet – see 4.3.2, as an example; and furthermore, the ChatGPT 4.0 used in “Beyond the Chat…”, that 4.0 uses both a Pre-Trained LLM as well as a supervised fine-tuned model – e.g., see disclosed ChatGPT:Comprehending its Operational Structure, AI Techniques, Working, Features, and Limitations (Naik et al, IEEE ICTBIG2023).  What is not explicitly taught, is the structure/location of the larger LLM and the smaller tailored LLM; Lee et al (20250140245) teaches the storage of larger generalized LLM’s on a server, and downloading smaller/tailored to the client devices (abstract, para 0065 – discussing the general LLM and a personalized LLM; and downloaded to the client device – para 0067).  Therefore, it would have been obvious to one of ordinary skill in the art of LLM implementations to expand the client/browser structure in “Beyond the Chat…” to include availability of a larger LLM on a server and downloadable smaller specialized LLM, to the client device, so that the user could have closer/faster access to a smaller LLM that is personalized for that user (see Lee et al (20250140245), para 0066, especially comments toward ‘personalized content’ and ‘personal characteristics’).     

As per claim 2, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, wherein the one or more initial segments each corresponds to a sentence, and wherein causing the one or more focused edits to the initial version of the document to be visually rendered via the user interface comprises: causing a first focused edit to the initial content to be visually rendered via the user interface, wherein the first focused edit is generated during a first iteration of content refinement and replaces a first initial sentence in the initial content with a first updated sentence (as, “Beyond the Chat…”, see pp 10, Figure 4, under “Brainstorm”),
 and causing a second focused edit to be visually rendered via the user interface, wherein the second focused edit is generated during a second iteration of content refinement and replaces a second initial sentence in the initial content with a second updated sentence, the second initial sentence being different from the first initial sentence (as, “Beyond the Chat…”, see pp 9. Design Choice #1, suggesting more than one possible edit for a span of text).

As per claim 3, “the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 2, wherein the text prompt is processed as input using the second LLM during the first iteration to generate the first focused edit, and wherein a second text prompt different from the text prompt is processed as input using the second LLM during the second iteration to generate the second focused edit (as, “Beyond the Chat…”, see page 10, the use of OpenAI’s GPT-4 LLM – examiner notes that it is notoriously well known in the ChatGPT 4.0, that 4.0 uses both a Pre-Trained LLM as well as a supervised fine-tuned model – e.g., see disclosed ChatGPT:Comprehending its Operational Structure, AI Techniques, Working, Features, and Limitations (Naik et al, IEEE ICTBIG2023)..

As per claim 4, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 3, wherein the second text prompt corresponds to the text prompt incorporating the first focused edit to the initial content (as, “Beyond the Chat…”, see pp10, “For all prompts…” teaches a further prompt; in view of ChatGPT 4.0 – see Naik et al, ChatGPT:Comprehending its Operational Structure, AI Techniques, Working, Features and Limitations).

As per claim 5, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, further comprising: receiving, via the user interface, a first user edit while the one or more focused edits are being applied to the initial content; and causing the first user edit to be applied to the initial document while the one or more focused edits are being applied to the initial content (as, “Beyond the Chat…”, as, feedback loop of user corrections – see pp11, section 4.2.2).

As per claim 6, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, further comprising: receiving, via the user interface, a second user edit to the revised content; and causing the second user edit to be applied to the revised content (as, “Beyond the Chat…”, performing further second edits[Wingdings font/0xE0] see Figure 5 and pp12, section 4.3.1.).

As per claim 7, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, wherein the request for one or more focused edits is a request for replacing a single sentence or a single paragraph in the initial content during each iteration of the one or more iterations (as, “Beyond the Chat…”, see pp 9, fig. 3 – sentence replacement and design Choice 1-4.

As per claim 8, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, wherein the text prompt includes the initial content generated based on the first LLM output (as, “Beyond the Chat…”, see pp9, Fig. 3, showing LLM suggestions).

As per claims 9,10, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, wherein the text prompt further includes a particular sentence refinement request to refine a particular initial sentence in the initial content and is generated based on the first LLM output indicating the particular sentence to be refined.( as, “Beyond the Chat…”, as refining sentences – pp9, fig. 3, as editing is based on dish selections).

As per claim 11, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, wherein generating the text prompt based on the user query and the initial content comprises: filtering the initial content to redact privacy information from the initial content, thereby generating a redacted version of the initial content; and generating the text prompt to include the redacted version of the initial content (as, “Beyond the Chat…”, see pp9, fig. 3, text editor panel, allowing deletes and pp 11, see “Markers”, for hidden information).

As per claim 12, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 1, further comprising: generating a training instance for the first LLM, wherein the training instance includes: the user query for content generation as a training instance input ((as, “Beyond the Chat…”, see page 10, the use of OpenAI’s GPT-4 LLM – examiner notes that it is notoriously well known in the ChatGPT 4.0, that 4.0 uses both a Pre-Trained LLM as well as a supervised fine-tuned model – e.g., see disclosed ChatGPT:Comprehending its Operational Structure, AI Techniques, Working, Features, and Limitations (Naik et al, IEEE ICTBIG2023)..,
and the revised content as a ground truth output (as, “Beyond the Chat…”, as using the client side as the source of truth – pp9, design choice #4).

As per claim 13, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches the method of claim 12, further comprising: training the first LLM using the generated training instance, including: processing the user query for content generation as input using the first LLM, to generate an output of the first LLM, comparing the output of the first LLM with the ground truth output, and updating one or more weights of the first LLM based on comparing the output of the first LLM with the ground truth output (as, “Beyond the Chat…”, as, comparing LLM results with ground truth, and using a weighting of “0” == ignoring system suggestions, and going with user edits – pp 9, Design Choice #4).

	Claims 14,16-19 are method claims whose steps are found throughout method claims 1-13 above and as such, claims 14-19 are similar in scope and content to claims 1-13 above; therefore, claims 14,16-19 are rejected under similar rationale as presented against claims 1-13 above.  Further to claim 14, the combination of  “Beyond the Chat…” in view of Lee et al (20250140245) teaches a second distinct smaller LLM than the generalized LLM, and located/downloaded on the client device (see Lee et al (20250140245), abstract, para 0065-0067, as mapped above against claim 1.   

	Claim 20 is a method claim whose steps are found throughout, piecemeal, in method claims 1-13 above and as such, claim 20 is similar in scope and content to the claim features found throughout in claims 1-13 above; therefore, claim 20 is rejected under similar rationale as presented against claims 1-13 above.  Further to claim 20, see Lee et al (20250140245) teaches the smaller personalized LLM residing on the client device (see Lee et al (20250140245), abstract, para 0065-0067 and as mapped against claim 1 above.

Response to Arguments

Applicant’s arguments with respect to the claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  Examiner notes the introduction of the Lee et al (20250140245) reference teaching the concept of dual LLM’s with a larger generalized LLM at a server, and a smaller personalized LLM downloaded to the client.  
Furthermore, examiner notes, that the general notion of storing larger speech recognition models on a server, and downloading a smaller recognition model to a client device, has long been established in the art.  See the additional references, teaching such limitations, listed in the conclusion section of the office action. 

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see related art listed on the PTO-892 form.  Furthermore, the following references were found to be pertinent to certain claim/specification features:

Sharma et al (20020143551) teaches a server storing a larger model and a smaller model at the client device (para 0053).

Cross Jr et al (20060122836) teaches a smaller speech model at the client device and a larger model on a server (para 0007, 0009).

Hofer (20160300566) teaches prior art showing larger speech recognition processing on servers and smaller processing on client devices – para 0002. 

Sontag et al (20250139384) teaches the use of 2 LLM’s (para 0025), that is more focused than the generic LLM – see para 0026; to be used in a editable visual report – fig. 1

Safavi et al (20250131189) teaches a writing assistant that display editable/suggestable changes (Fig. 1) using a plurality of language models – fig. 2, see para 0028, 0040.

Harang et al (20250131261) teaches the use of multiple LLM tied to special tokens (para 0049)

Gonsalves et al (20250045336) teaches a framework for editable media projects – fig. 1, abstract, para 0027) using multiple data sources synchronized to a search engine (fig. 3) using multiple LLMs (para 0005).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        03/27/2026

Read full office action

Prosecution Timeline

Nov 07, 2023

Application Filed

Sep 06, 2025

Non-Final Rejection — §103

Dec 10, 2025

Response Filed

Mar 27, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/512,723

Patent 12602554

SYSTEMS AND METHODS FOR PRODUCING RELIABLE TRANSLATION IN NEAR REAL-TIME

2y 5m to grant Granted Apr 14, 2026

17/698,029

Patent 12592246

SYSTEM AND METHOD FOR EXTRACTING HIDDEN CUES IN INTERACTIVE COMMUNICATIONS

2y 5m to grant Granted Mar 31, 2026

18/367,779

Patent 12586580

System For Recognizing and Responding to Environmental Noises

2y 5m to grant Granted Mar 24, 2026

18/344,007

Patent 12579995

Automatic Speech Recognition Accuracy With Multimodal Embeddings Search

2y 5m to grant Granted Mar 17, 2026

18/273,354

Patent 12567432

VOICE SIGNAL ESTIMATION METHOD AND APPARATUS USING ATTENTION MECHANISM

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

82%

Grant Probability

92%

With Interview (+10.5%)

3y 3m

Median Time to Grant

Moderate

PTA Risk

Based on 900 resolved cases by this examiner. Grant probability derived from career allow rate.