Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 17-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation of the claimed computer readable media includes a signal embodiment and as such the claims are considered to comprise a signal per se.
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claim(s) 1, 13, 17 is/are directed to a system, method, etc. for optimizing a probability score for determining an end of a sentence or phrase. The claims rely on well understood, routine, and conventional structures such as a processor, memory, data structure, etc. to instruct the system along methods by which the likelihood of arriving at a terminal point of a generative sentence or sequence of tokens is accumulated and/or exponentially grown or dampened. The claims are considered a manner by which data resolves more data, in this case the accumulative increase of an end of sentence token likelihood and/or a scalar multiple which accelerates, decelerates, etc. the likelihood, such as based on a determined or desired length of the sentence, sequence, etc.; the claims are also considered a stand in for human behavior as the claims steps are substantially similar to the manner in which a human being might by asked to make up a sentence but to keep it short. Further construals of the claims exist where the determining of tokens can be considered a mathematical process and in such a case all that remains of the claims must be considered embodiments of mathematical formulae and/or mathematical calculations As such the claims cannot be considered to integrate the judicial exceptions of an abstract idea such as data per se or programs per se nor the judicial exception of human activity and/or mental processes such as operations performed in the human mind, human activity, human behavior; etc. as the claims do not include substantially more than the performance of such exceptions upon a computer claimed at a high level of generality and based on models intended to mimic or replicate human cognitive processes. Again there also exist reasonable interpretations of the claimed subject matter where the claims are merely math. As such the claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as any media, processors, instructions are not claimed in a manner sufficient to feature as to comprise significantly more. Dependent claims 2-12, 14-16, 18-20 do not remedy and are similarly rejected as the claims further address additional subject matter which may be seen as the generation of data from data; a stand in for human behavior, and/or human application of agency in concert with assistive instructions, mathematic concepts, AI models, etc.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 10, 13-20 rejected under 35 U.S.C. 103 as being unpatentable over Lee: 20210110259 further in view of Song: 20210110259.
Regarding claim 1
Lee teaches:
A computing device comprising a processor system and memory, wherein the computing device implements a generative artificial intelligence (“AI”) model (Lee: ¶ 3, 4, 59, 92, 104: an encoded decoder neural network operative of a sequence to sequence model for generating an output sequence such as a translated or transformed sentence such as by ranking output tokens in sequential order or conducting beam search; said ranking or search based on token probabilities) configured to perform operations comprising:
accepting, at the generative AI model, input (Lee: Abstract; ¶ 57-60; Fig 1: system generates a context vector 113 from an input, tokenizes same);
producing, with the generative AI model, input tokens that encode the input (Lee: Abstract; ¶ 57-60; Fig 1: system generates a context vector 113 from an input, tokenizes same such as with a neural network implementation suitable to generate tokens therefrom; in Lee Figure 1 this is discussed in the context of a decoder but the functionality is the same tokens are created which allow the neural network to produce probability or likelihood scores based thereon, further decoder 120, 130 obtain tokens based on the encoder data, that is a layer of the decoder model operates to encode tokens); and
producing, with the generative AI model, a text response using the input tokens that encode the input (Lee: ¶ 3, 4, 56-60; Fig 1: using a stream of input tokens decoders 120, 130 produce a stream of output tokens in a sequential manner based on iteratively calculated likelihoods thereof),
wherein the producing the text response includes multiple iterations of output token generation (id.), and
wherein a probability of an end-of-sentence (“EOS”) token over successive iterations among the multiple iterations of output token generation eventually causes an EOS token to be generated as output (Lee: Abstract; ¶ 57-60; Fig 1, 6: in this way the system iteratively predicts a next word(s) or token(s) for each time step based on a probability thereof, thereby predictively generating output of next words, tokens, etc. until an EOS token is predicted, occurs, etc.).
Lee strongly suggests but does not explicitly teach increases in probability of an EOS token among successive iterations of the multiple iterations of the taught output token generation; Lee admits a probability of a candidate token as a predicted word, token, etc. (Lee: ¶ 59) and accepts as a next token a token with an overall greatest adjusted probability or likelihood (Lee: ¶ 60, 92, 103, 104, 115) wherein a next token ultimately and terminally comprises an EOS token but in Lee such likelihoods drift and is therefore not disclosed as increasing in successive iterations.
In a related field of endeavor Song teaches a transformer model generative of output text (Song: Abstract) by producing, with the generative AI model, input tokens that encode the input (Song: Abstract; ¶ 36; Fig 4: a transformer neural model receives tokenized text as an input generates target summary text as output); producing, with the generative AI model, a text response using the input tokens that encode the input (id.),
wherein a probability of a token over successive iterations among the multiple iterations of output token generation eventually causes an EOS token to be generated as output and comprising applying a diminishing reward value to candidate words in a sequence of words to thereby maintain the sequence at or about a length threshold (Song: ¶ 3, 4, 36; Claim 1). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to adjust the likelihoods of an EOS token in the Lee system and method utilizing an effective length penalty as taught or suggested by Song to bound the length of the Lee and/or Song output sequences such that as the length of a beam or sequence rank increases the probability of any particular word or token decreases and the probability of an EOS token increases and for at least the purpose of managing the communication style of output sequences by limiting overall sentence, sequence, etc. length to a small number; one of ordinary skill in the art would have expected only predictable results therefrom.
It would have selected by a beam search (Song: ¶ 3, 4, 14) such that a logits processor module (logits_process.py) comprising an exponential decay length penalty (ExponentialDecayLengthPenalty(LogitsProcessor) which adds an increasing length penalty based on a start index representative of the point at which the probability of an EOS token increases thereby encouraging shorter sequences and disincentivizing sequences over certain lengths (
It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to as taught or suggested by device and method. The average skilled practitioner would have been motivated to do so for the purpose of and would have expected predictable results therefrom.
Regarding claim 2
Lee in view of Song teaches or suggests:
The computing device of claim 1, wherein the generative AI model is a large language model (Lee: ¶ 3, etc.: a generative encoder-decoder model ); (Song: Abstract: a generative transformer model), and wherein a text encoder of the generative AI model accepts the input and produces the input tokens that encode the input (Lee: Abstract; ¶ 57-60; Fig 1: system generates a context vector 113 from an input, tokenizes same such as with a neural network implementation suitable to generate tokens therefrom; in Lee Figure 1 this is discussed in the context of a decoder but the functionality is the same tokens are created which allow the neural network to produce probability or likelihood scores based thereon, further decoder 120, 130 obtain tokens based on the encoder data, that is a layer of the decoder model operates to encode tokens); . Examiner considers Lee and or Song to teach of suggest the implementation of the taught steps as a large language model in as much as Lee teaches an encoder-decoder model (Lee: ¶ 3, etc.) and Song teaches a generative transformer model (Song: Abstract). The claim is considered obvious over Lee as modified by Song as addressed in the base claim as it would have been obvious to apply the further teaching of Lee and/or Song to the modified device of Lee and Song; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 3
Lee in view of Song teaches or suggests:
The computing device of claim 1, wherein the generative AI model is a language model (Lee: ¶ 3, etc.); (Song: Abstract, etc.), and wherein an encoder and/or text encoder of the generative AI model accept the input and produce the input tokens that encode the input . Examiner takes official notice that adapting a vision model to comprise a vision language model adapted to encode, tokenize text, etc. was well known in the art before the filing date of the instant application and would have comprised an obvious inclusion for at least the purpose of integrating modalities such as vision for the purpose of performance of multi-modal predictive tasks; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 4
Lee in view of Song teaches or suggests:
The computing device of claim 1, wherein a text decoder of the generative AI model produces the text response (Lee: Abstract; ¶ 57-60; Fig 1: system generates a context vector 113 from an input, tokenizes same such as with a neural network implementation suitable to generate tokens therefrom; in Lee Figure 1 this is discussed in the context of a decoder but the functionality is the same tokens are created which allow the neural network to produce probability or likelihood scores based thereon, further decoder 120, 130 obtain tokens based on the encoder data, that is a layer of the decoder model operates to encode tokens); (Song: Abstract; ¶ 36; Fig 4: a transformer neural model receives tokenized text as an input generates target summary text as output). The claim is considered obvious over Lee as modified by Song as addressed in the base claim as it would have been obvious to apply the further teaching of Lee and/or Song to the modified device of Lee and Song; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 5
Lee in view of Song teaches or suggests:
The computing device of claim 1, wherein the producing the text response includes, in each of the multiple iterations of output token generation:
increasing the probability of the EOS token (Lee: Abstract; ¶ 57-60; Fig 1, 6: the system iteratively predicts a next word(s) or token(s) until an EOS token comprises the next output token, such as based on determined probability); (Song: ¶ 3, 4, Claim 1: a length threshold and/or diminishing reward value of words iteratively increases the probability of a shorter sentence and increases the respective probability of the Lee EOS token);
producing one or more output tokens, wherein each of the one or more output tokens is the EOS token or a text token representing one or more words (Lee: Abstract; ¶ 57-60; Fig 1, 6);
if the EOS token was produced, completing the producing the text response (Lee: ¶ 59); and otherwise, the EOS token not having been produced, continuing in a next iteration among the multiple iterations of output token generation (Lee: Abstract; ¶ 57-60; Fig 1, 6: the system iteratively predicts a next word(s) or token(s) for each time step based on a probability thereof, thereby predictively generating output of next words, tokens, etc. until an EOS token is predicted, occurs, etc.). The claim is considered obvious over Lee as modified by Song as addressed in the base claim as it would have been obvious to apply the further teaching of Lee and/or Song to the modified device of Lee and Song; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 6
Lee in view of Song teaches or suggests:
The computing device of claim 1, wherein the probability of the EOS token increases in the successive iterations according to an exponential growth factor. Examiner takes official notice that the utility of an exponential length penalty parameter was well known in the art before the effective filing date of the instant invention and would have comprised an obvious inclusion for at least the purpose of operating with a max length parameter and thereby limiting overall length of an output sentence or sequence; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 10
Lee in view of Song teaches or suggests:
The computing device of claim 1, wherein the probability of the EOS token is a prior probability (Song: ¶ 3, etc.: such as adjusting a preset reward value based on predefined length of an summarization output). The claim is considered obvious over Lee as modified by Song as addressed in the base claim as it would have been obvious to apply the further teaching of Lee and/or Song to the modified device of Lee and Song; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claims 13, 17—the claims are considered to recite substantially similar subject matter to that of claim 1 and are similarly rejected.
Regarding claims 14, 18—the claims are considered to recite substantially similar subject matter to that of claim 3 and are similarly rejected.
Regarding claims 15, 19—the claims are considered to recite substantially similar subject matter to that of claim 5 and are similarly rejected.
Regarding claims 16, 20—the claims are considered to recite substantially similar subject matter to that of claim 6 and are similarly rejected.
Claims 7, 8 rejected under 35 U.S.C. 103 as being unpatentable over Lee: 20210110259 further in view of Song: 20210110259 as applied to claims 1-6 supra and further in view of Hugging Face transformer model 2.42 hereinafter Hug (available at least 2022; signification portions available online at https://huggingface.co/docs/transformers/v4.24.0/en/internal/generation_utils#transformers.ExponentialDecayLengthPenalty—significant portions provided by Examiner as Transformers 4_24 copyright 2022; additional supportive references for the presence of max_length and length_penalty parameters provided as Constrained Beam Search, copyright 2022; and Clarity in Beam Search, copyright 2/2024).
Regarding claim 7
Lee in view of Song teaches or suggests:
The computing device of claim 6, wherein the probability of the EOS token increases in the successive iterations according to: PEoSs1,…,sk-t=1-1-PEoSs1,…,sk-t-11+∝, wherein k represents a target limit on count of output tokens, t represents a counter that decreases in the successive iterations, PEoSs1,…,sk-t represents the probability of the EOS token in a current iteration of the multiple iterations, PEoSs1,…,sk-t-1represents the probability of the EOS token in a previous iteration of the multiple iterations, and ∝ represents a hyper parameter that controls a rate of the exponential growth factor. This is considered obvious, Lee in view of Song teach an iterative method for constructing an output term by term (Lee: Abstract; ¶ 57-60; Fig 1, 6: in this way the system iteratively predicts a next word(s) or token(s) including an end of speech token for each time step based on a probability thereof); (Song: 6, 35, 36: output sequence iterated in a per word, token, etc. fashion and based on probabilities thereof); comprising a target limit on output tokens (Song: ¶ 35; Claim 1: a predicted length threshold) and iterates term by term such as by count (Lee: Abstract; ¶ 57-60; Fig 1, 6); (Song: 6, 35, 36).
Lee in view of Song strongly suggests but does not explicitly teach adjusting the probability increase as claimed particular with respect to a count of output tokens wherein the count iterates upon a target limit of a maximum count of output tokens, threshold thereof, and wherein the calculation comprises a probability of the EOS token in a previous iteration of the multiple iterations, and includes a hyper parameter that controls a rate of the exponential growth factor of the of the ending likelihood.
In a related field of endeavor Hug teaches a system and method for conducting beam search such as for output of a sequence of words, tokens, etc. and comprising a count variable with respect to a maximum number of tokens, words, etc. in a sequence to be generated (Hug: mSearchScorer, BeamSearchScorer, etc.: max_length thresholds the length of a sequence) this latently represents an strict increase in the likelihood of an final token of the sequence, particularly at small values; and wherein the likelihood of the end of the sequence is iterated stepwise by applying an exponential growth factor stepwise to a previous value of an likelihood of a final token. It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the parameterization of the claimed features as taught or suggested by Hug to realized the claimed subject matter, and arrive at the Lee in view of Song EOS token exponentially increased likelihood for at least the purpose of managing the communication style of output sequences by limiting overall sentence, sequence, etc. length to a small number; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 8
Lee in view of Song in view of Hug teaches or suggests:
The computing device of claim 7, wherein the hyper parameter that controls the rate of the exponential growth factor is in a range of (0, …, 1). While Lee in view of Song in view of Hug discuss the value of exponential growth as ranging from negative to positive normalization in the manner claimed would have been obvious as a matter of design choice on the part of a creator or optimizer of a code base such as Hug; one of ordinary skill in the art would have expected only predictable results therefrom.
Claims 9 rejected under 35 U.S.C. 103 as being unpatentable over Lee: 20210110259 further in view of Song: 20210110259 as applied to claims 1-6 supra and further in view of Hugging Face transformer model 2.42 hereinafter Hug as applied to claims 7, 8 supra and further in view of Seybold: 20250209794 hereinafter Sey.
Regarding claim 9
Lee in view of Song in view of Hug teaches or suggests:
The computing device of claim 7, wherein the hyper parameter that controls the rate of the exponential growth factor (Hug: mSearchScorer, BeamSearchScorer, etc.) has been set in a training process (Hug: mSearchScorer, BeamSearchScorer, etc.) comprising: receiving a training set comprising of text data (Lee: ¶ 3, 4, 59, 67, 68, 92, 104: system for training a neural network to generate particular sequences of tokens) (Song: ¶ 3, 4, 6, 14, 35, 36: system for training a neural network to generate particular sequences of tokens); and adjusting the generative AI model using the training set, including adjusting the hyper parameter that controls the rate of the exponential growth factor (Song: 3, 35: the algorithm adjusts a reward value to increase and or decrease an overall length in concert with a defined length threshold); (Hug: mSearchScorer, BeamSearchScorer, etc.: such as for variance of training routines to tune a model toward/away from a particular parameter, value thereof, such as length, penalty, etc. parameters).
Lee in view of Song in view of Hug does not explicitly teach the system operative as an vision language model operative to receiving a training set comprising images and text captions, each of the text captions being associated with an image among the images; and adjusting the generative AI model using the training set, including adjusting the hyper parameter that controls the rate of the exponential growth factor.
In a related field of endeavor Sey teaches a large scale model trained upon a training set comprising images and text captions, each of the text captions being associated with an image among the video data comprising a stream of images wherein the system is trained to associated a caption with a frame of the video (Sey: Abstract; ¶ 35). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Lee in view of Song in view of Hug system and method to train a captioning system such as that taught or suggested by Sey such as by varying the length and penalty parameters of Lee in view of Song in view of Hug for at least the purpose of generating a sufficiently clear but parsimonious caption corresponding to frames of the Sey videos; one of ordinary skill in the art would have expected only predictable results therefrom.
Claims 11, 12 rejected under 35 U.S.C. 103 as being unpatentable over Lee: 20210110259 further in view of Song: 20210110259 as applied to claims 1-6 supra and further in view of Seybold: 20250209794 hereinafter Sey.
Regarding claim 11
Lee in view of Song does not explicitly teach:
The computing device of claim 1, wherein the operations further comprise, during inference using the generative AI model: identifying one or more representative units of video; ranking the one or more representative units; and based on results of the ranking the one or more representative units, selecting a particular representative of the one or more representative units, wherein the particular representative unit is provided to the generative AI model as the input.
In a related field of endeavor Sey teaches
Sey teaches a large scale model trained upon a training set comprising images and text captions, each of the text captions being associated with an image among the video data comprising a stream of images wherein the system is trained to associated a caption with a frame of the video (Sey: Abstract; ¶ 35); wherein the operations further comprise, during inference using the generative AI model: identifying one or more representative units of video (Sey: Abstract; ¶ 35 such as associating a particular frame of a video with a caption in order to create additional training data or to operate the model); ranking the one or more representative units (Sey: ¶ 90, 92; Fig 3A: such as determining a frame most similar to a seed image); and based on results of the ranking the one or more representative units, selecting a particular representative of the one or more representative units (Sey: ¶ 90, 92; Fig 3A, 3B: such as by providing a frame selected to an annotation system), wherein the particular representative unit is provided to the generative AI model as the input (Sey: ¶ 90, 92, 106: such as by determining score matches of particular frames and transferring a caption thereto).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Lee in view of Song system and method to train a captioning and operate a system such as that taught or suggested by Sey for at least the purpose of captioning frames of the Sey videos and/or generating similar captions for similar frames; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 12
Lee in view of Song teaches or suggests:
The computing device of claim 1, wherein the generative AI model has been trained in a training process comprising: receiving an initial training set comprising text (Song: Abstract; ¶ 38); updating the initial training set, including distilling the initial text into final text (Song: Abstract; ¶ 38: model trained on input and generated output), wherein a given final text among the final texts is: generated using a corresponding initial text among the initial texts; more concise than the corresponding initial text caption (Song: ¶ 3, 4, 33-36: such as based on adjusting length values of the word reward algorithm of the length reward); and adjusting the ML model using the updated training set (Song: ¶ 3, 4, 33-36, 38: system iteratively improves a word output by iterative adjustment of training data based thereon).
Lee in view of Song does not explicitly teach the computing device of claim 1, wherein the generative AI model has been trained in a training process comprising: receiving an initial training set comprising images and initial text captions, each of the initial text captions being associated with an image among the images; updating the initial training set, including distilling the initial text captions into final text captions, wherein a given final text caption among the final text captions is: generated using a corresponding initial text caption among the initial text captions; more concise than the corresponding initial text caption; and associated with the image that is associated with the corresponding initial text caption; and adjusting the ML model using the updated training set.
In a related field of endeavor Sey teaches:
The computing device of claim 1, wherein the generative AI model has been trained in a training process comprising: receiving an initial training set comprising images and initial text captions (Sey: Abstract; ¶ 35), each of the initial text captions being associated with an image among the images (Sey: Abstract; ¶ 35; such as training data received or extended by the system);
updating the initial training set, including distilling the initial text captions into final text captions (Sey: ¶ 90, 92, 106; Fig 3A, 3B: such as by determining text captions to correspond to additional frame) , wherein a given final text caption among the final text captions is: generated using a corresponding initial text caption among the initial text captions (Sey: ¶ 90, 92, 106; Fig 3A, 3B: such as by transferring the initial text to an additional frame). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to combine the text captioning system for Sey with the generative language system of Lee in view of Song to realize a system for replacing captions or generating additional captions for frames of video of Sey which when generated by varying the Lee in view of Song taught reward algorithm would operate to generate texts more concise or more verbose than the corresponding initial text caption for at least the purpose of generating captions targeted to particular audiences or in particular styles; one of ordinary skill in the art would have expected only predictable results therefrom.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701. The examiner can normally be reached 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CAROLYN EDWARDS can be reached at (571) 270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PAUL C MCCORD/Primary Examiner, Art Unit 2692