DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 1 is objected to because of the following informalities: misspelled word “attention-procession” should be replaced with “attention-processing”. Appropriate correction is required.
Claim 8 is objected to because of the following informalities: claim 8 recites “wherein the prompt KV-cache and generation KV-cache are separate are access over separate memory buses”, the meaning of this limitation is unclear. Appropriate correction is required.
Claim 9 is objected to because of the following informalities: Claim 9 recites “the prompt memory” in line 1. There is insufficient antecedent basis for this limitation in the claim. Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Blacklock et al. (US 2025/0173561 A1), hereinafter “Blacklock”
As per claim 1, Blacklock teaches a method for implementing a neural large language model comprising:
“processing a plurality of tokens by a prompt attention-processing subsystem having a prompt KV-cache, thereby populating the prompt KV-cache with values associated with the token processing by the prompt-attention processing subsystem” at [0064]-[0066] and Fig. 5;
(Blacklock teaches input text 502 includes a prompt is provided to the tokenizer 504, the tokenizer divides the input text 502 into multiple tokens 506. The LLM 512 processes a plurality of tokens 506 and writes the generated tokens 514 to memory, such as a KV tensor buffer 516 (i.e., “prompt KV-cache”))
“transferring the prompt KV-cache into a generation KV-cache of a generation attention processing subsystem upon completion of the prompt-attention processing by the prompt-attention processing subsystem” at [0066] and Fig. 5;
(Blacklock teaches the internal state KV$ may be read from the KV tensor buffer 516, wherein the internal state KV$ is a data structure referred to as KV cache 518 (i.e., “generation KV-cache”), which may represent keys (K) and values (V) of the previously generated token. The internal state KV$ 518 may be read from the KV tensor buffer 516 and appended to the previous input token 506)
“generating by the generation attention processing subsystem an output sequence based on the transferred KV-cache” at [0066] and Fig. 5.
(Blacklock teaches the detokenizer 524 generates output text 526 (e.g., a sequence of characters, words, or phrases) based on the KV cache 518)
As per claim 2, Blacklock teaches the method of claim 1, further comprising “encoding a prompt into the plurality of tokens” at [0064]-[0065].
As per claim 3, Blacklock teaches the method of claim 1, wherein “the prompt attention-processing subsystem and the generation attention processing subsystem is multi-headed, thereby providing multi-headed neural processing as part of the prompt attention processing subsystem and the generation attention processing subsystem” at [0069].
As per claim 4, Blacklock teaches the method of claim 3, wherein “the multi-headed prompt attention processing subsystem and the muti-headed generation processing subsystem use the same weight values for the multi-headed neural processing” at [0043]-[0050].
As per claim 5, Blacklock teaches the method of claim 2, further comprising “segmenting the prompt into a plurality of token segments, wherein each token segment is processed by the prompt attention processing subsystem thereby generating prompt segment KV-cache values, stored in the prompt KV-cache, for each of the plurality of token segments, and wherein the prompt segment KV-cache value, for each token segment, are transferred to the generation KV-cache upon completion of the processing of each of the plurality of token segments by the prompt-attention processing subsystem” at [0064]-[0066] and Fig. 5.
As per claim 6, Blacklock teaches the method of claim 5, wherein “each token within a token segment is processed in parallel by the prompt attention processing subsystem” at [0024]-[0027].
As per claim 7, Blacklock teaches the method of claim 6, wherein “one hundred and twenty-eight tokens are processed in parallel” at [0024]-[0027].
As per claim 8, Blacklock teaches the method of claim 1, wherein “the prompt KV-cache and generation KV-cache are separate are access over separate memory bus” at [0108]-[0111].
As per claim 9, Blacklock teaches the method of claim 8, wherein “the prompt memory is high bandwidth memory” at [0066].
Claims 10-20 recite similar limitations as in claims 1-8 and are therefore rejected by the same reason
Conclusion
Examiner's Note: Examiner has cited particular columns and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHANH B PHAM whose telephone number is (571)272-4116. The examiner can normally be reached Monday - Friday, 8am to 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sanjiv Shah can be reached at (571)272-4098. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KHANH B PHAM/Primary Examiner, Art Unit 2166
February 4, 2026