Last updated: April 19, 2026

Application No. 19/174,305

SYSTEMS AND METHODS FOR HETEROGENEOUS LARGE LANGUAGE MODEL PROMPT ATTENTION-PROCESSING

Non-Final OA §102

Filed

Apr 09, 2025

Examiner

PHAM, KHANH B

Art Unit

2166

Tech Center

2100 — Computer Architecture & Software

Assignee

Expedera Inc.

OA Round

1 (Non-Final)

Interview Optional

— +15.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 835 resolved cases, 2023–2026

Examiner Intelligence

PHAM, KHANH B View full profile →

Grants 72% — above average

Career Allow Rate

604 granted / 835 resolved

+17.3% vs TC avg

Strong +15% interview lift

Without

With

+15.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

34 currently pending

Career history

869

Total Applications

across all art units

Statute-Specific Performance

§101

10.3%

-29.7% vs TC avg

§103

38.9%

-1.1% vs TC avg

§102

30.7%

-9.3% vs TC avg

§112

9.2%

-30.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 835 resolved cases

Office Action

§102

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities: misspelled word “attention-procession” should be replaced with “attention-processing”.  Appropriate correction is required.

Claim 8 is objected to because of the following informalities: claim 8 recites “wherein the prompt KV-cache and generation KV-cache are separate are access over separate memory buses”, the meaning of this limitation is unclear.  Appropriate correction is required.

Claim 9 is objected to because of the following informalities: Claim 9 recites “the prompt memory” in line 1. There is insufficient antecedent basis for this limitation in the claim.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Blacklock et al. (US 2025/0173561 A1), hereinafter “Blacklock”

As per claim 1, Blacklock teaches a method for implementing a neural large language model comprising:
“processing a plurality of tokens by a prompt attention-processing subsystem having a prompt KV-cache, thereby populating the prompt KV-cache with values associated with the token processing by the prompt-attention processing subsystem” at [0064]-[0066] and Fig. 5;
(Blacklock teaches input text 502 includes a prompt is provided to the tokenizer 504, the tokenizer divides the input text 502 into multiple tokens 506. The LLM 512 processes a plurality of tokens 506 and writes the generated tokens 514 to memory, such as a KV tensor buffer 516 (i.e., “prompt KV-cache”))
“transferring the prompt KV-cache into a generation KV-cache of a generation attention processing subsystem upon completion of the prompt-attention processing by the prompt-attention processing subsystem” at [0066] and Fig. 5;
(Blacklock teaches the internal state KV$ may be read from the KV tensor buffer 516, wherein the internal state KV$ is a data structure referred to as KV cache 518 (i.e., “generation KV-cache”), which may represent keys (K) and values (V) of the previously generated token. The internal state KV$ 518 may be read from the KV tensor buffer 516 and appended to the previous input token 506) 
“generating by the generation attention processing subsystem an output sequence based on the transferred KV-cache” at [0066] and Fig. 5.
(Blacklock teaches the detokenizer 524 generates output text 526 (e.g., a sequence of characters, words, or phrases) based on the KV cache 518) 

As per claim 2, Blacklock teaches the method of claim 1, further comprising “encoding a prompt into the plurality of tokens” at [0064]-[0065].

As per claim 3, Blacklock teaches the method of claim 1, wherein “the prompt attention-processing subsystem and the generation attention processing subsystem is multi-headed, thereby providing multi-headed neural processing as part of the prompt attention processing subsystem and the generation attention processing subsystem” at [0069].
As per claim 4, Blacklock teaches the method of claim 3, wherein “the multi-headed prompt attention processing subsystem and the muti-headed generation processing subsystem use the same weight values for the multi-headed neural processing” at [0043]-[0050].

As per claim 5, Blacklock teaches the method of claim 2, further comprising “segmenting the prompt into a plurality of token segments, wherein each token segment is processed by the prompt attention processing subsystem thereby generating prompt segment KV-cache values, stored in the prompt KV-cache, for each of the plurality of token segments, and wherein the prompt segment KV-cache value, for each token segment, are transferred to the generation KV-cache upon completion of the processing of each of the plurality of token segments by the prompt-attention processing subsystem” at [0064]-[0066] and Fig. 5.

As per claim 6, Blacklock teaches the method of claim 5, wherein “each token within a token segment is processed in parallel by the prompt attention processing subsystem” at [0024]-[0027].

As per claim 7, Blacklock teaches the method of claim 6, wherein “one hundred and twenty-eight tokens are processed in parallel” at [0024]-[0027].

As per claim 8, Blacklock teaches the method of claim 1, wherein “the prompt KV-cache and generation KV-cache are separate are access over separate memory bus” at [0108]-[0111].

As per claim 9, Blacklock teaches the method of claim 8, wherein “the prompt memory is high bandwidth memory” at [0066].

Claims 10-20 recite similar limitations as in claims 1-8 and are therefore rejected by the same reason



Conclusion
Examiner's Note: Examiner has cited particular columns and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. 

	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHANH B PHAM whose telephone number is (571)272-4116. The examiner can normally be reached Monday - Friday, 8am to 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sanjiv Shah can be reached at (571)272-4098. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/KHANH B PHAM/Primary Examiner, Art Unit 2166                                                                                                                                                                                                        
February 4, 2026

Read full office action

Prosecution Timeline

Apr 09, 2025

Application Filed

Feb 04, 2026

Non-Final Rejection — §102

Mar 31, 2026

Examiner Interview Summary

Mar 31, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/401,794

Patent 12602358

DATABASE AND DATA STRUCTURE MANAGEMENT SYSTEMS

2y 5m to grant Granted Apr 14, 2026

18/077,361

Patent 12585915

TRAINING METHOD AND APPARATUS FOR A NEURAL NETWORK MODEL, DEVICE AND STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

18/399,895

Patent 12579116

DATABASE AND DATA STRUCTURE MANAGEMENT SYSTEMS

2y 5m to grant Granted Mar 17, 2026

18/430,497

Patent 12579163

SYSTEMS AND METHODS FOR DETECTING PERFORMANCE DEGRADATION IN DISTRIBUTED DATABASE DEPLOYMENTS

2y 5m to grant Granted Mar 17, 2026

19/188,542

Patent 12579161

ETL JOB DISTRIBUTED PROCESSING SYSTEM AND METHOD BASED ON DYNAMIC CLUSTERING

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

72%

Grant Probability

88%

With Interview (+15.2%)

3y 5m

Median Time to Grant

Low

PTA Risk

Based on 835 resolved cases by this examiner. Grant probability derived from career allow rate.

SYSTEMS AND METHODS FOR HETEROGENEOUS LARGE LANGUAGE MODEL PROMPT ATTENTION-PROCESSING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email