DETAILED ACTION
1. This action is responsive to Application no.18/751,047 filed 6/21/2024. All claims have been examined and are currently pending.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
3. Claims 4-5, 9-10 are objected to because of the following informalities: the claims recite a finite state machine (FSA), where the specification discloses a finite state automaton. Further, the abbreviation in the claims, FSA, also corresponds to finite state automaton. Appropriate correction is required for consistency.
Claim 18 recites the method of claim 11, where it appears it should read the method of claim 13. Appropriate correction is required.
Claim Rejections - 35 USC § 102
4. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
6. Claims 13-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Scholak et al (2022/0358125).
Regarding claim 13 Scholak teaches A method for decoding tokens provided by a language model (0011 decoding…of the language model), comprising:
receiving a sequence of token from a language model (0011: natural language query; 40; 44: may define a number of tokens…configured to output n potential next tokens);
performing a constrained search for a subsequent token (0011: constrains the output of the language model; at each decoding step of the language model, the model generates a predicted next token; 39; 44);
integrating the subsequent token and the sequence of tokens into two or more new sequences of tokens (41: identifies one or more valid potential translations; selects the highest scoring); and
selecting one of the two or more new sequences of tokens based on a probability score
(0011: the DSL parser may also score and rank the set of partial potential translations at each auto-regressive decoding step, at the conclusion of the decoding process, or any combination thereof, based on confidence values generated by the language model for the tokens of the partial potential translation, based on the analysis of the partial potential translation by the DSL parser, or any combination thereof. As such, by incrementally parsing at each decoding step, the DSL parser enables the NLQ-to-DSLQ translation system to “fail early” with respect to invalid and low-scoring translations as they are being generated, which reduces overall computational resource usage and enables the expended computational resources to be focused on generating and validating the most promising potential translations;
41: identifies one or more valid potential translations; selects the highest scoring.).
Regarding claim 14 Scholak teaches The method of claim 13, wherein a probability score is determined for each of the new sequences of tokens (11: score; 41).
Regarding claim 15 Scholak teaches The method of claim 13, wherein a plurality of token sequences is received from the language model and a plurality of subsequent tokens are integrated with each of the plurality of token sequences (0011 generates predicted next token; 44: n potential next tokens).
Regarding claim 16 Scholak teaches The method of claim 15, wherein a probably score is determined for each new token sequence generated from the plurality token sequences and the plurality of tokens (11: score; 41).
Regarding claim 17 Scholak teaches The method of claim 16, wherein selecting one of the plurality of new sequences includes selecting a subset of the plurality of new token sequences having the highest probability score (11; 41: identifies one or more valid potential translations; selects the highest scoring).
Regarding claim 18 Scholak teaches teaches The method of claim 11, further comprising:
determining that the constrained search results in no allowable tokens (11; 40: invalid or low-scoring); and
changing a token selection from a previous iteration of decoding (52 may be reduced in rank of entirely ejected from the beam in subsequent decoding step of the language model).
Claim Rejections - 35 USC § 103
7. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
8. Claims 1,6, 11 are rejected under 35 U.S.C. 103 as being unpatentable over Scholak in view of Hildebrandt et al (2010/0312755) in further view of Ren et al (2022/0269431).
Regarding claim 1 Scholak et al (2022/0358125) teaches A method for performing constrained decoding by a language model (0011: language model and…parser that constrains the output of the language model; decoding step of the language model), comprising:
accessing a {context free} grammar {(CFG)}(11: grammar);
{converting the CFG to a byte-level CFG;}
{constructing} a {byte-level} representation of a tokenizer vocabulary of the language model (11: at each decoding step of the language model, the model generates a predicted next token; 44 define a number of tokens in the vocabulary); and
parsing a {byte} sequence by a parsing mechanism to determine if the {byte} sequence corresponds to an allowed string prefix {according to the byte-level CFG} (11: At each decoding step of the language model, the model generates a predicted next token for each of a set of partial potential translations of the NLQ. The DSL parser evaluates each of the partial potential translations generated by the model at each decoding step based on a set of stored DSL rules, which define valid terminology, syntax, grammar, and/or other constraints of the DSL.);
but does not specifically teach
accessing a context free grammar (CFG);
converting the CFG to a byte-level CFG;
constructing a byte-level representation of a tokenizer vocabulary of the language model; and
parsing a byte sequence by a parsing mechanism to determine if the byte sequence corresponds to an allowed string prefix according to the byte-level CFG.
In a similar filed of endeavor, Hildebrandt et al (2010/0312755) teaches a context free grammar and a representation of the compressed grammar
([0037] First of all, there is a description of the compression of data through the production of a context-free grammar according to an embodiment of the invention.
[0041] The context-free grammar to be produced for data to be compressed can additionally be obtained by means of so-called context compression. In context compression, a multiplicity of (basic) rules K.sub.1 to K.sub.n is either predetermined or used from a previously created grammar, which can then be referenced to produce a new, context-free grammar from the data currently to be compressed. Therefore, the rules of context grammar K.sub.1 to K.sub.n can be used both to create new rules and also in start rule S.sub.0.
[0042] After compression has been carried out by means of the context-free grammar, for further improvement of this first compression, a code is then used to store the grammar, wherein frequent symbols are assigned shorter code words than infrequent symbols. For this purpose, it is possible, for example, to use a Huffman code).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Hildebrandt and the use of reduced components to minimize storage requirements, and for more efficient processing, for improved decoding by the language model.
Scholak teaches the benefits of reducing data:
[0042] In addition to constraining the output of the language model 304 into the DSL, it may be appreciated that the NLQ-to-DSLQ translation system 302 also offers advantages in terms of selection of the language model 304. For example, in one embodiment, a NLQ-to-DSLQ translation system 302 having a smaller language model 304 (e.g., T5-base model) in combination with the DSL parser 306 performed better at NLQ-to-DSL translation than a comparable translation system having a larger language model 304 (e.g., T5-large) without the DSL parser 306. As such, by including and applying the DSL parser 306, the disclosed NLQ-to-DSLQ translation system 302 can enable enhanced translation performance using smaller language models, which consume fewer computing resources during operation.
Thus, one could look to Hildebrandt for reduction, with the benefits discussed below:
[0003] The compression of digital data by electronic means, i.e. in an electronic system for information processing or data transfer, is used above all to economize on storage space and transmission capacity. Especially in cases where large volumes of digital data are transferred over data networks, compression is important not only for the efficient use of existing transmission capacities, for example of available bandwidth, but also in order to speed up the data transfer process. Yet also in relation to the storage of large volumes of digital data of the order of gigabytes or even terabytes, such as in databases, efficient compression is frequently necessary in order to reduce the amount of storage space that would be required for the uncompressed digital data, thereby making it possible to economize on technical resources.
Scholak and Hildebrandt do not specifically teach where Ren teaches converting to a byte-level ([0078] The compression is a byte-level data reduction technology. A concept of the compression is to use an encoding technology to represent longer data in a shorter encoded format to reduce a data size.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Ren to further compress the data and to represent longer data in a shorter encoded format to reduce a data size (Ren 78).
Ren 0003: Deduplication and compression are key technologies in the storage industry. A storage device performs deduplication and compression, so that an amount of actually stored data can be reduced, storage space occupied by the data in the storage device can be reduced, and storage efficiency of the storage device can be improved.
Thus, prior art Scholak, Hildebrandt, and Ren would therefore teach:
accessing a context free grammar (CFG);
converting the CFG to a byte-level CFG;
constructing a byte-level representation of a tokenizer vocabulary of the language model; and
parsing a byte sequence by a parsing mechanism to determine if the byte sequence corresponds to an allowed string prefix according to the byte-level CFG.
Regarding claim 6 Scholak, Hildebrandt, and Ren teach A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor (Scholak figures 1-4) to perform constrained decoding by a language model, the method comprising:
accessing a context free grammar (CFG);
converting the CFG to a byte-level CFG;
constructing a byte-level representation of a tokenizer vocabulary of the language model; and
parsing a byte sequence by a parsing mechanism to determine if the byte sequence corresponds to an allowed string prefix according to the byte-level CFG.
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning
Regarding claim 11 Scholak, Hildebrandt, and Ren teach A system for perform constrained decoding by a language model (Scholak fig 3-4; 0037: server hosts), comprising:
one or more servers, wherein each server includes a memory and a processor (Scholak fig 3,4; para 33; 37); and
one or more modules stored in the memory and executed by at least one of the one or more processors (fig 3,4; para 33; 37) to
access a context free grammar (CFG),
convert the CFG to a byte-level grammar,
construct a byte-level representation of a tokenizer vocabulary of the language model, and
parse a byte-level representation of a tokenizer vocabulary of the language model.
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning
9. Claims 2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Scholak in view of Hildebrandt et al (2010/0312755) in further view of Ren et al (2022/0269431) in further view of Buchholz (2007/0016398).
Regarding claim 2 Scholak, Hildebrandt, and Ren do not specifically teach where Buchholz (2007/0016398) teaches The method of claim 1, wherein parsing includes parsing each incrementally generated string during left-to-right decoding of the language model ([0066] As discussed above the parsing method of the present invention determines the heads and grammatical roles of tokens strictly from left to right, i.e. in the first step, it determines which role the first token takes and which other token is the first token's head, in the second step it determines the same for the second token, and so on until the last token.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Buchholz with Scholak to allow for proper parsing presenting a reasonable expectation of success. Scholak already teaches a parser for parsing and one could look to Buchholz to ensure the parsing is done in a manner consistent with the specific linguistic requirements.
Claim 7 recites limitations similar to claim 2 and is rejected for similar rationale and reasoning
10. Claims 3-5, 8-10, 12 are rejected under 35 U.S.C. 103 as being unpatentable over Scholak in view of Hildebrandt et al (2010/0312755) in further view of Ren et al (2022/0269431) in further view of Levit et al (2015/0325235).
Regarding claim 3 Scholak, Hildebrandt, and Ren teach
The method of claim 1, wherein parsing includes performing {lattice} parsing {on a lattice} representing a plurality of tokens to determine a set of byte sequences that are accepted by the byte-level CFG (rejected for similar rationale and reasoning as claim 1);
But do not specifically teach where Levit teaches
wherein parsing includes performing lattice parsing on a lattice ([0028] Storage 106 may also store information about parsed representations of corpora (i.e., parses). In some embodiments, corpora parses are stored as a lattice structure, as described in connection to parsing component 124. Information about the parses may include tokens created from words, entities, or phrases of a corpus; statistics associated with the tokens; and tags, which may identify the token type. In some embodiments, tokens are tagged by parsing component 124 to represent a type of sequences of words,; [0037] In an embodiment, parsing component 124 determines a “lattice” data structure of nonlinear sequences of corpus elements. The lattice data structure is a directed graph providing a compact representation of a number of alternative parses. Each path through the lattice produces a different parse of the corpus, and each path is associated with a probability.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate lattice based parsing for improved parsing (of Scholak), allowing to better determine the potential results, while presenting a reasonable expectation of success in allowing the parsing to still be completed.
Regarding claim 4 Scholak, Hildebrandt, and Ren teach
The method of claim 1, further comprising {minimizing a finite state machine (FSA) that} represents a set of possible byte sequences corresponding to the language model's vocabulary (rejected for similar rationale and reasoning as claim 1)
But do not specifically teach where Levit teaches further comprising minimizing a finite state machine (FSA) ([0031] Entity definitions may also comprise implicitly defined instances of entity-types. In particular, for certain entity-types, it is not efficient to explicitly enumerate all possible instances of the entity-type. For example, while all (or most) actors could be explicitly included in a definition for the actor entity-type, it is not efficient to enumerate all possible phone numbers, temporal information, such as dates and times, or other combinatorial entity-types. Therefore, in some embodiments, these entities may be implicitly defined by combinatorial models that can provide the entity definition. For example, a finite state machine (FSM) or similar model may be used. ).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the finite state machine for improved and more efficient decoding (for determining possible, potential results), while presenting a reasonable expectation of success,
And thus teaching minimizing a finite state machine (FSA) that represents a set of possible byte sequences corresponding to the language model's vocabulary.
Regarding claim 5 Scholak, Hildebrandt, and Ren teach potential byte sequences that can be parsed,
But do not specifically teach, where Levit teaches minimized FSA and simultaneous parsing using lattice parsing (28; 31; 37).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the finite state machine and lattice for improved and more efficient decoding (for determining possible, potential results), while presenting a reasonable expectation of success, teaching
wherein the minimized FSA includes a plurality of potential byte sequences that can be parsed simultaneously using lattice parsing.
Scholak, Hildebrandt, and Ren already teach byte representations, and parsing byte sequences to determine (linguistically) allowed outputs. Incorporating Levit and a finite state machine and lattice parsing would allow for parsing using the provided structures, which are linguistic tools to process all possible alternatives and paths to present a collection of best or most relevant outputs.
Claims 8 and 12 recite limitations similar to claim 3 and are rejected for similar rationale and reasoning
Claim 9 recites limitations similar to claim 4 and is rejected for similar rationale and reasoning
Claim 10 recites limitations similar to claim 5 and is rejected for similar rationale and reasoning
Conclusion
11. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: See PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541. The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.
For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAUN ROBERTS/Primary Examiner, Art Unit 2655