Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 13 recites a method, comprising:
receiving a prompt for a large language model, the prompt including an input text and a target length (Receiving a prompt encompasses observations that can practically be performed in the human mind and therefore encompasses a mental process); and
generating… an output text that includes a number of words equal to the target length within a user determined tolerance based on a length guidance vector that encodes the target length (Generating an output text that includes a target length number of words encompasses evaluations, judgements, and opinions that can practically be performed in the human mind and therefore encompasses a mental process. Alternatively, generating an output text based on a length guidance vector that encodes the target length encompasses mathematical concepts, as specifically encoding the target length using a vector is a mathematical operation using a specific mathematical relationship, calculation, formula, or equation).
As shown above, claim 13 recites one or more abstract ideas. These will be considered as a single abstract idea for further analysis. MPEP 2106.04.
This judicial exception is not integrated into a practical application because claim 13 only recites one additional element: the large language model. However, the claim merely instructs to generate the output text “in the large language model”. The large language model is used to generally apply the abstract idea without placing any limits on how the large language model functions. The claim does not identify where the large language model is stored and/or processed, and does not recite any computer elements capable of executing the large language model. Considering the claim as a whole, the additional element of a large language model does not integrate the recited judicial exception into a practical application.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as noted above, the only additional element is the recited large language model, which amounts to mere instructions to generally apply the abstract idea without placing any limits on how the large language model functions. The recited large language model does not provide an inventive concept.
Claim 1 is directed to a system comprising a generic computer that includes a processor and a memory, the memory including instructions executable by the processor to perform the same method recited in claim 13. As noted in MPEP 2106.04(a)(2), a claim that requires a computer may still recite a mental process if the claim merely performs the mental process on a generic computer. At best, the recited computer would merely serve as a tool to perform the mental process. The recited computer does not integrate the judicial exception into a practical application or amount to significantly more than the judicial exception because the claim as a whole amounts to no more than mere instructions to implement the abstract idea on a computer or merely use the computer as a tool to perform the abstract idea.
With respect to the dependent claims, claims 2 and 14 merely further specify the output length and do not recite any additional elements.
Claims 3 and 15 recite a tokenizer that generates tokens that represent words in the input text. However, tokenizing text encompasses evaluations, judgements, and opinions that can practically be performed in the human mind and therefore encompasses a mental process. The recited tokenizer is used to generally apply the abstract idea without placing any limits on how the tokenizer functions. Claims 3 and 15 therefore do not integrate the judicial exception into a practical application or amount to significantly more than the judicial exception because the claim as a whole amounts to no more than mere instructions to implement the abstract idea on a computer or merely use the computer as a tool to perform the abstract idea.
Claims 4 and 16 further define the large language model to include an embedding block. However, the claims merely specify that the embedding block comprises further mathematical relationships including token vectors and position vectors. Claims 4 and 16 therefore do not include any additional elements that would integrate the judicial exception into a practical application or amount to significantly more than the judicial exception.
Claims 5 and 17 recite a decoder that generates the length guidance vector. However, the recited decoder is used to generally apply the abstract idea without placing any limits on how the decoder functions. Claims 5 and 17 therefore do not integrate the judicial exception into a practical application or amount to significantly more than the judicial exception because the claim as a whole amounts to no more than mere instructions to implement the abstract idea on a computer or merely use the computer as a tool to perform the abstract idea.
Claims 6-11 and 18-20 merely further define mathematical concepts and relationships. Claims 6-11 and 18-20 do not include any additional elements.
Claim 12 requires the determined tolerance to be selected by the user and determined during training of the large language model and is configurable at inference time. In other words, the user provides the determine tolerance at various timepoints. Providing a determined tolerance encompasses observations, evaluations, judgements, and opinions that can practically be performed in the human mind and therefore encompasses a mental process. Claim 12 further merely requires the user to specify a determined tolerance at the time of training the large language model, but does not encompass the performance of the training of the large language model. Claim 12 therefore does not include additional elements.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4-5, 12-14, and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (Length Control in Abstractive Summarization by Pretraining Information Selection, hereinafter “Liu”), in view of Kikuchi et al. (Controlling Output Length in Neural Encoder-Decoders, hereinafter “Kikuchi”).
In regard to claim 1, Liu discloses a system, comprising:
a computer that includes a processor and a memory (computer comprising a GPU and RAM, section 3.3, final paragraph), the memory including instructions executable by the processor to:
receive a prompt for a large language model, the prompt including an input text and a target length (input to the model is a source document and the desired length, section 2.1); and
generate, in the large language model, an output text that includes a number of words equal to the target length based on a length guidance vector that encodes the target length (the model outputs a summary text, section 2.1; the model utilizes a length-aware attention mechanism that utilizes a length-based one-hot vector to weight the attention scores used to select the output text, section 2.2).
Liu further discloses performing soft length control tests, in which range of lengths between a minimum length and maximum length are generated (section 3.3). However, Liu does not expressly disclose using a user determined tolerance.
Kikuchi discloses a method for controlling the length of a large language model used for summarization, that allows a user to set a determined tolerance to a selected length (the user sets an allowable range between minimum and maximum lengths, and the language model discards results outside of that range, section 4.2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to allow the user to specify a target length tolerance, because it allows the model flexibility to plan the generated output text sequences within the specified range, as taught by Kikuchi (section 4.2).
In regard to claim 2, Liu discloses the input text includes more words than the target length and the output text is based on the input text (summarization, section 1 and Table 1).
In regard to claim 4, Liu discloses the large language model includes an embedding block that includes an array that includes token vectors and a position vector that encodes the position of the token vectors in the array (the transformer seq2seq summarization model encodes length information using a one-hot vector which is applied to the token vector attention matrix A, sections 2.1 and 2.2).
In regard to claim 5, Liu discloses a decoder generates the length guidance vector (the one-hot vector is generated in the decoder, section 2.2).
In regard to claim 12, Liu discloses a tolerance determining during training of the large language model (see Table 4 and section 3.5, test data is divided into different sets according to length range during training).
However, Liu does not expressly disclose the tolerance is selected by the user.
Kikuchi discloses a method for controlling the length of a large language model used for summarization, that allows a user to set a determined tolerance to a selected length (the user sets an allowable range between minimum and maximum lengths, and the language model discards results outside of that range, section 4.2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to allow the user to specify a target length tolerance, because it allows the model flexibility to plan the generated output text sequences within the specified range, as taught by Kikuchi (section 4.2).
In regard to claim 13, Liu discloses a method, comprising:
receiving a prompt for a large language model, the prompt including an input text and a target length (input to the model is a source document and the desired length, section 2.1); and
generating, in the large language model, an output text that includes a number of words equal to the target length based on a length guidance vector that encodes the target length (the model outputs a summary text, section 2.1; the model utilizes a length-aware attention mechanism that utilizes a length-based one-hot vector to weight the attention scores used to select the output text, section 2.2).
Liu further discloses performing soft length control tests, in which range of lengths between a minimum length and maximum length are generated (section 3.3). However, Liu does not expressly disclose using a user determined tolerance.
Kikuchi discloses a method for controlling the length of a large language model used for summarization, that allows a user to set a determined tolerance to a selected length (the user sets an allowable range between minimum and maximum lengths, and the language model discards results outside of that range, section 4.2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to allow the user to specify a target length tolerance, because it allows the model flexibility to plan the generated output text sequences within the specified range, as taught by Kikuchi (section 4.2).
In regard to claim 14, Liu discloses the input text includes more words than the target length and the output text is based on the input text (summarization, section 1 and Table 1).
In regard to claim 16, Liu discloses the large language model includes an embedding block that includes an array that includes token vectors and a position vector that encodes the position of the token vectors in the array (the transformer seq2seq summarization model encodes length information using a one-hot vector which is applied to the token vector attention matrix A, sections 2.1 and 2.2).
In regard to claim 17, Liu discloses a decoder generates the length guidance vector (the one-hot vector is generated in the decoder, section 2.2).
Claim(s) 3 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu, in view of Kikuchi, and further in view of Saito et al. (U.S. Patent Application Pub. No. 2022/0366140, hereinafter “Saito”).
In regard to claims 3 and 15, Liu discloses operating on a document that has been divided into tokens (section 2.1), but Liu and Kikuchi do not expressly disclose receiving the input text by a tokenizer that generates tokens that represent words in the input text.
Saito discloses a length controllable summarizer method that comprises receiving the input text by a tokenizer that generates tokens that represent words in the input text (a source text is divided using a BERT tokenizer, paragraph [0048]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate tokens that represent words in the input text using a tokenizer, because Liu requires tokens as input to the model, and the BERT tokenizer disclosed by Saito has achieved state of the art, as taught by Saito (paragraph [0048]).
Allowable Subject Matter
Claims 6-11 and 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and if the rejections under 35 U.S.C. 101 were overcome.
The following is a statement of reasons for the indication of allowable subject matter:
In regard to claims 6 and 18, Miculicich et al. (Summarization with Precise Length Control, hereinafter “Miculicich”) represents the closest prior art. Miculicich discloses a length control method for summarization that utilizes a reversed position encoding to provide the model information about how many token should be decoded in each decoding step (section 3.1). This is somewhat similar to the linear length guidance vector of claim 5. However, Miculicich and the additional prior art of record do not disclose or suggest determining a scalar multiple starting at zero at an origin of the length guidance vector and ending at one at an entry equal to the target length, and the remainder of entries in the length guidance vector set to zeros.
In regard to claims 7 and 19, Takase et al. (Positional Encoding to Control Output Sequence Length, hereinafter “Takase”) represents the closest prior art. Takase discloses a transformer model for summarization that extends the sinusoidal position encoder with length-difference positional encoding and length-ratio positional encoding. This is somewhat similar to the sinusoidal length guidance vector of claim 6. However, Takase and the additional prior art of record do not disclose or suggest determining a scalar multiple equal to a sinusoidal function starting at zero at an origin of the length guidance vector, having a value of one an entry equal to one-half the target length, and returning to 0 at the entry equal to the target length, and a remainder of entries in the length guidance vector set to zeros.
In regard to claims 8 and 20, while Liu and Kikuchi both disclose an allowable range around a target length, they only disclose using a single length guidance vector when generating a summary. Liu, Kikuchi, and the additional prior art of record do not disclose or suggest modifying the language model to include a first length guidance vector and a second length guidance vector when the large language model receives a first target length and a second target length indicating a range of target lengths.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yu et al., Nguyen et al., Song et al., Xie et al., Jie et al., Saito et al., and Federico et al. disclose additional length controlled large language models.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
BLA 2/5/26
/BRIAN L ALBERTALLI/Primary Examiner, Art Unit 2656