DETAILED ACTION
Receipt of Applicant’s Amendment, filed February 5, 2026 is acknowledged.
Claims 1, 2, 5, 6, 8, 9, 11, 12, 16, and 18-20 were amended.
Claims 1-20 are pending in this office action.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 2-3, 5, 12-13, and 15 are objected to because of the following informalities. Appropriate correction is required.
With regard to claims 2 and 12, claim 2 recites “a respective LLM service comprises at least one or a combination of following metrics of a respective response generated by the respective LLM service: …” This claim language appears to contain grammatical issues which raise confusion regarding the meaning of the claim language.
For examination purposes this claim limitation has been construed to mean -- a respective LLM service comprises at least one or a combination of the following metrics of a respective response generated by the respective LLM service: …--.
With regard to claims 5 and 15, claim 5 recites “wherein obtaining machine-learning the contextual information further comprises:…” This claim limitation lacks antecedent basis and appears to contain grammatical issues.
The text “obtaining machine learning the contextual information” contains grammatical issues which render the meaning of the claim unclear. This is believed to be a result of a typo. Furthermore, parent claim 1 recites “obtaining contextual information”, and “applying a machine learning contextual bandit model”. It is unclear if the claim is attempting to refer to the contextual information previously recited, the machine-learning contextual bandit model previously recited, or attempting to define a new claim element.
For examination purposes this claim limitation has been construed to mean -- wherein obtaining the contextual information further comprises:.. -- as the claim amendment appears to contain a typo.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 8 and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
With regard to claims 8 and 18, claim 8 recites “The computer-implemented method of claim 1, further comprising, training the machine-learning contextual bandit model is trained by a process of:
obtaining a training dataset including one or more training examples, a training example obtained from logs for a previously submitted query processed by the LLM service or another LLM service, the training example including a training set of features and a known outcome for the previously submitted query; and
training one or more parameters of the machine-learning contextual bandit model on the one or more training examples of the training dataset.”
This claim appears to contain grammatical issues (e.g. training the machine-learning contextual bandit model is trained by a process of:…). The claim appears to recite the training twice in a way that renders the meaning of the claim unclear.
This claim limitation lacks antecedent basis. The parent claim recites training the machine-learning contextual bandit, a training dataset including one or more training examples, a training example for a previously submitted query, a known outcome for the previously submitted query, and the training comprising updating parameters of the machine-learning contextual bandit model based on the one or more training examples. All of these terms (underlined and italicized in the claim language above) lacks antecedent basis as it is unclear if applicant is defining a new element or referring to the previously recited element.
This claim limitation lacks antecedent basis as it is unclear how many training steps applicant is claiming. Based on the details of the training, one may reasonable read that the applicant is attempting to further define the training step reciting in the parent claim. Yet the claim language is written in a manner to suggest that the method of claim 1 is being modified to further include the training step of claim 8, suggesting that applicant is reciting a new training step.
The body of the claim recites yet a third training step (e.g. training one or more parameters). One of ordinary skill in the art would recognize that updating parameters is part of the training process. Yet within the claims this is written as a separate step.
It is unclear how many training steps are being recited, or the relationship between the operations of the training as recited.
For examination purposes this claim limitation has been construed to mean –wherein the training of the machine-learning contextual bandit model further comprises:
Wherein the training examples are obtained from logs,
Wherein the previously submitted query was processed by another LLM service,
Wherein the training example includes a training set of features--.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 8-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Slivkins [2011/0264639] in view of Padgett [2024/0160902].
With regard to claim 1 Slivkins teaches A computer-implemented (Slivkins, ¶62 “server computers”) method comprising:
[a] receiving, from a user as the user (Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”) of a client device as the client 110 (Slivkins, ¶16), a query as the received queries (Slivkins, ¶16) for (Please note that what the query is for is an intended use of query and does not impose a functional limitation on the claimed device) an application as the search engine 140 (Slivkins, ¶16) [[
[b] obtaining contextual information (Slivkins, ¶23 “A context may be a set of documents that have been ranked for higher slots in a results page. In”) as a set of features (Slivkins, ¶23 “Contexts may be selected based on a variety of document features and attributes including topics, keywords, and click-through rates, for example”), the set of features related to the query (Slivkins, ¶21 “Any number of methods or techniques for determining the similarity of documents may be used, such as using text features, queries that result in clicks on the documents, and keywords associated with the documents, for example.”) or the user (Slivkins, ¶23 “In addition, a context may further include information such as information about a user's interests or other information.”) of the client device as client 110 (Slivkins, ¶15);
[c] applying a machine-learning (Please note that while Slivkins does not explicitly use the term ‘machine-learning’, one of ordinary skill in the art would recognize the multi-armed bandit recited in ¶25 as a ML algorithm) contextual as context (Slivkins, ¶23) bandit model (Slivkins, ¶25 “In some implementations, the document selection algorithm may be a "multi-armed bandit" algorithm.”) to the set of features (Slivkins, ¶23 “Contexts may be selected based on a variety of document features and attributes including topics, keywords, and click-through rates, for example”) to generate a set of predicted scores as index values (Slivkins, ¶27 “each strategy may have an associated index value.”), wherein each predicted score in the set of predicted scores indicates how effective as the upper confidence bound of a click-through rate (Slivkins, ¶27 “The index value associated with a strategy may represent an upper confidence bound of a clickthrough rate of a document randomly selected from the strategy for a particular context S consistent with the strategy. The index value may be a measure of the number of times that the strategy has been selected by a document selection algorithm, as well as how many times the strategy has been selected by a user.”) a response as the document selected (Id) from a respective [[ as the strategy that selected the document (Id) in the set of [[ as the set of strategies (Slivkins, ¶49) is with respect to a desired outcome as positive feedback (Slivkins, ¶33 “The document selector 130 may use the user interaction data to provide positive and negative feedback to the strategies used by the instances of the document selection algorithm. For the case where the user interaction data indicates that a document indicated by one of the slots of the results page 200 was selected, positive feedback may be provided to the strategy used by a document selection algorithm to select the indicated document”), wherein the machine-learning contextual bandit model (Slivkins, ¶25 “In some implementations, the document selection algorithm may be a "multi-armed bandit" algorithm.”; ¶23) is trained as removing strategies (Slivkins, ¶58 “The strategy is removed from the plurality of strategies at 511 . The strategy associated with the selected slot may be removed from the plurality of strategies by the document selector 130”), generating second strategies (Slivkins, ¶59 “The second plurality of strategies may be generated by the document selector 130 using the strategy associated with the selected slot. In some implementations, the document selector 130 may generate the second plurality… of strategies by generating a strategy for one or more combinations of subtrees of u and subtrees of u' from the removed strategy.”), or adjusting values associated with a strategy (¶33 “the document selector 130 may provide the positive feedback by adjusting or increasing the index value associated with the strategy”) using a training dataset including one or more training examples as context tree Tc, which may be from a previous instance (Slivkins, ¶59 “As described previously, a strategy may include a subtree u of the context tree Tc and a subtree u' of the document tree TD·”; ¶28 “By executing the document selecting algorithms sequentially, each instance of the document selection algorithm may consider the documents selected by previous instances of the document selection algorithm when selecting a document.”), a training example representing (Please note that the claim does not require that the training example include the previous query, merely that it represents the previous query) a previously submitted query as the query to which the previous instance of documents are responsive to (Slivkins, ¶28; ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110. The search engine 140 may search for document and other media that is responsive to the query by searching a search corpus 163 using the received query.”; ¶20 “The document selector 130 may select one or more documents for placement on the results page 200 from a set of documents that is responsive to a received query. The set of documents may have been determined by the”) processed by an [[ as the determination of certainty (Slivkins, ¶57) which may be based on user selection of the document in response to the query in the previous round (Slivkins, ¶25 “For each round, an instance of the document selection algorithm may select a document from the set of responsive documents for a slot. If a user selects an indication of a document provided for a slot on the results page 200, then the document selection for the slot may receive a payout or some other positive feedback”) for the previously submitted query as the query to which the document response is being evaluated, e.g. the query to which the document is identified as a positive or negative response for (Slivkins, ¶28 “By executing the document selecting algorithms sequentially, each instance of the document selection algorithm may consider the documents selected by previous instances of the document selection algorithm when selecting a document. As described above, it may be assumed that a user reviews indicated documents on the results page 200 beginning with the top most slot (i.e., slot 201). When a user views a document indicated by a slot, it may also be assumed that the user rejected the documents indicated by the preceding slots.”; ¶16 “receive queries from users using clients”), the training as removing strategies (Slivkins, ¶58 “The strategy is removed from the plurality of strategies at 511 . The strategy associated with the selected slot may be removed from the plurality of strategies by the document selector 130”), generating second strategies (Slivkins, ¶59 “The second plurality of strategies may be generated by the document selector 130 using the strategy associated with the selected slot. In some implementations, the document selector 130 may generate the second plurality… of strategies by generating a strategy for one or more combinations of subtrees of u and subtrees of u' from the removed strategy.”), or adjusting values associated with a strategy (¶33 “the document selector 130 may provide the positive feedback by adjusting or increasing the index value associated with the strategy”) comprising updating parameters as strategies are generated based on the combination of the subtrees of u and u` (Slivkins, ¶59) of the machine-learning contextual bandit model as the multi-armed bandit (Slivkins, ¶23; ¶25; ¶55) based on the one or more training examples as context tree Tc, which may be from a previous instance (Slivkins, ¶59; ¶28);
[d] selecting an [[as selecting a strategy (Slivkins, ¶49 “A strategy is selected at 405. The strategy may be selected from the plurality of strategies by the document selector 130”) in the set of [[ as the plurality of strategies (Slivkins, ¶49) based at least on the set of predicted scores as index values (Slivkins, ¶49 “the selected strategy may have a maximal index value among the strategies that include the received context”) from the machine-learning contextual (Slivkins, ¶23 “context”) bandit model (Slivkins, ¶25 “multi-armed bandit”);
[e] [[
[f] receiving, from the selected [[as a document (Slivkins, ¶41 “A document is selected for each slot of the results page at 307.”) generated by executing the selected [[ as the selected strategy 407 as part of document selector 130 (Slivkins, ¶41 “The documents may be selected by the document selector 130 from the plurality of responsive documents. In some implementations, each document may be selected by an instance of a document selection algorithm such as multiarmed bandits algorithm.”; ¶50 “A document is selected using the selected strategy at 407. The document may be selected by the document selector 130.”) on the prompt as the responsive documents are associated with the query (Slivkins, ¶39 “the responsive documents may contain one or more keywords that match or are otherwise associated with one or more of the terms of the query.”; Please note that one of ordinary skill in the art would recognize a prompt as a query).
Slivkins does not explicitly teach [a] an application with access to one or more of a set of LLM services; [c]… applying a machine-learning… a respective LLM service in the set of LLM services… an LLM service… [d] selecting an LLM service in the set of LLM services; [e] generating a prompt for input to the selected LLM service, wherein the prompt for the selected LLM service is generated based on a prompt template in a prompt library database associated with the selected LLM service; [f] the selected LLM service… the selected LLM service.
[a] receiving, from a user of a client device (Padgett, ¶75 “In some cases, the input option for the prompt may be a free form text box enabling the user to provide any text input”), a query (Padgett, ¶75 “the input option may permit the input of non-text prompts, including image data, voice data, video data, or other multimedia”) for an application (Padgett, ¶75 “A web page or mobile app may provide the interface.”) with access to one or more of a set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”);
[c] applying at least one machine-learning (Padgett, ¶47 “Some concepts in ML-based language models are now discussed.”) contextual … model (Padgett, ¶124 “the output or result generated by a generative AI model may be obtained through interpreting the intent and context of the prompt.”) to the set of features (Padgett, ¶54 “The encoder 52 serves to encode the embeddings 60 into feature vectors 62 that represent the latent features of the embeddings 60”) to generate a set of predicted scores as the similarity, e.g. closeness (Padgett, ¶53 “The embedding 60 represents the text segment corresponding to the token 56 in a way such that embeddings corresponding to semantically-related text are closer to each other in a vector space than embeddings corresponding to semantically-unrelated text.”), wherein each predicted score in the set of predicted scores indicates how effective as how similar (Id) a response as the target embedding that the token is being compared to (Id) from a respective LLM service in the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) is with respect to a desired outcome as the text segment representing the prompt, which is being compared to the target segment (Padgett, ¶53 “The embedding 60 represents the text segment corresponding to the token 56 in a way such that embeddings corresponding to semantically-related text are closer to each other in a vector space than embeddings corresponding to semantically-unrelated text.”), wherein the machine-learning (Padgett, ¶47 “Some concepts in ML-based language models are now discussed.”) contextual … model (Padgett, ¶124 “the output or result generated by a generative AI model may be obtained through interpreting the intent and context of the prompt.”) is trained using a training dataset including one or more training examples (Padgett, ¶30 “The generative AI models are typically trained using a large data set of example training data.”), a training example representing a previously submitted query as a preexisting input prompt (Padgett, ¶62 “The generative AI model 102 is configured to take an input prompt, typically in text form but may also possibly include images or other media inputs. The model 102 creates an output related to the input prompt.”; ¶82 “In one example, the repository of pre-existing items is the set of training data or a subset of the training data”; ¶109 “In one example, the training data set is split into two portions, a first one of which is used as training data and a second one of which is used as input prompts.”) processed by an LLM service and a known outcome as the ground truth labeling (Padgett, ¶42 “Training data may be annotated with ground truth labels (e.g. each data entry in the training dataset may be paired with a label)”) for the previously submitted query as a preexisting input prompt (Padgett, ¶62; ¶82; ¶109), the training comprising updating parameters as updating the parameters of the ML (Padgett, ¶43 “The parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity ( or one or more quantities) to be optimized ( e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function.”; ¶44 “a new set of hyperparameters may be determined based on the measured performance of one or more of the trained ML models, and the first step of training”; ¶45 “Backpropagation is used to adjust (also referred to as update) the value of the parameters in the ML model, with the goal of optimizing the objective function”) of the machine-learning contextual bandit model based on the one or more training examples as the known output values and the desired target values (Id);
[d] … an LLM service in the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”)…;
[e] generating a prompt for input to the selected LLM service (Padgett, ¶59 “A computing system may generate a prompt that is provided as input to the LLM via its API”), wherein the prompt for the selected LLM service is generated based on a prompt template (Padgett, ¶124 “In some cases, this may include use of a prompt template. A prompt template may specify that prompts have a certain structure or constrained intents, or that acceptable prompts exclude certain classes of subject matter or intent, such as the production of results or outputs that are violent, pornographic, etc.”) in a prompt library database (Padgett, ¶32 “This might be more common in instances where that section of code is a commonly-accepted template or precedent, such as a standard incorporation of certain libraries or setting of initial variables or other parameters”) associated with the selected LLM service (Padgett, ¶59); and
[f] receiving, from the selected LLM service, a response generated by executing the selected LLM service on the prompt (Padgett, ¶60 “The prompt generated by the computing system is provided to the language model or LLM and the output (e.g., token sequence) generated by the language model or LLM is communicated back to the computing system.”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the search strategies taught by Slivkins as LLM search systems as taught by Padgett as it may generate better output (Padgett, ¶59 “A prompt can include one or more examples of the desired output provides the LLM with additional information to enable the LLM to better generate output according to the desired output.”) Such AI systems have significant advantages in generating responses and other times of outputs (Padgett, ¶4; ¶30).
With regard to claims 2 and 12 the proposed combination further teaches obtaining values for the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) for one or more performance metrics as positive/negative feedback (Slivkins, ¶55 discuss Positive feedback; ¶56 discuss Negative feedback), wherein the one or more performance metrics for a respective LLM service (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) comprises at least one or a combination (Please note only one of the following is required by the claim language) of following metrics as positive/negative feedback (Slivkins, ¶55 discuss Positive feedback; ¶56 discuss Negative feedback) of a respective response generated by the respective LLM service:
an expected quality as click-through rate (Slivkins, ¶21 “click-through rate is a metric that describes a percentage of times a document is selected or clicked on when it is displayed”) of the respective response as the document (Slivkins, ¶21;Please note this claim limitation has been construed to mean --a respective response--) generated by the respective LLM service as the strategy which selected the document, e.g. the LLM (Slivkins, ¶50; Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”), a latency of the respective LLM service for generating the respective response, and a cost of the respective LLM service for generating the respective response, and
wherein the LLM service (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) is selected further based on the values for the one or more performance metrics (¶57 “After a strategy has been proven successful based on its index value, the document selector 130 may “zoom in" to the strategy by replacing the strategy associated with the selected slot with a plurality of smaller strategies based on the strategy associated with the selected slot. If the strategy has been used or selected a threshold number of times, then the method 500 may continue at 511. Otherwise, the method 500 may end at 517”).
With regard to claims 3 and 13 the proposed combination further teaches generating, a set of prompts for input to the set of LLM services;
receiving, from the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”), a set of responses by executing the set of prompts (Padgett, ¶59 “A computing system may generate a prompt that is provided as input to the LLM via its API”) by the set of LLM services (Slivkins, ¶50 “A document is selected using the selected strategy at 407. The document may be selected by the document selector 130. In some implementations, the document may be randomly selected from the selected strategy”; Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”); and
obtaining the values for the one or more performance metrics from the set of responses (Slivkins, ¶54 “An indication of a selection is received at 503. The indication of a selection may be received by the document selector 130 and may be an indication of a selection made by a user to a slot containing an indicator of a document ( e.g., a link). For example, a user may have clicked on a URL in one of the slots of the results page 200”).
With regard to claims 4 and 14 the proposed combination further teaches wherein the set of features includes at least one or a combination of information obtained from the query as text features, e.g. keywords (¶21 “Any number of methods or techniques for determining the similarity of documents may be used, such as using text features, queries that result in clicks on the documents, and keywords associated with the documents, for example.”; ¶23), one or more characteristics of the user as user’s interested (Slivkins, ¶23 “A context may be a set of documents that have been ranked for higher slots in a results page. In addition, a context may further include information such as information about a user's interests or other information.”), an order status of the user with an online system, and an issue category as topic (Slivkins, ¶23 “Contexts may be selected based on a variety of document features and attributes including topics, keywords, and click-through rates, for example”) assigned to the query (Slivkins, ¶21).
With regard to claims 5 and 15 the proposed combination further teaches wherein obtaining machine-learning (Please see the objection above regarding claim interpretation) the contextual information further comprises:
applying a machine-learning embedding model to a text (Slivkins, ¶21 “using text features”) or image as image data (Slivkins, ¶16 “The search corpus 163 may comprise an index of documents such as webpages, advertisements, product descriptions, image data, video data, map data, etc.”) obtained from the query (Slivkins, ¶21 “Any number of methods or techniques for determining the similarity of documents may be used, such as using text features, queries that result in clicks on the documents, and keywords associated with the documents, for example. The similarity data 135 may be generated by the document selector 130 or may be provided to the document selector 130 by the search engine 140, for example”) to generate a query embedding mapping the query in a latent space (Padgett, ¶54 “The encoder 52 serves to encode the embeddings 60 into feature vectors 62 that represent the latent features of the embeddings 60. The encoder 52 may encode positional information (i.e., information about the sequence of the input) in the feature vectors 62”), wherein the set of features includes the query embedding (Padgett, ¶52 “Tokenization, in the context of language models and NLP, refers to the process of parsing textual input ( e.g., a character, a word, a phrase, a sentence, a paragraph, etc.) into a sequence of shorter segments that are converted to numerical representations referred to as tokens ( or "compute tokens").”).
With regard to claims 6 and 16 the proposed combination further teaches wherein when the query is a user support query(Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”; Please note that the content of the query, e.g. that it is a ‘support’ query has been identified as non-functional descriptive material describing an intended use for the claimed query. The content of the query does not impose a functional impact on how the search operation is performed, how the feedback is obtained, or the rest of the claimed device), each predicted score as index values (Slivkins, ¶27 “each strategy may have an associated index value.”) from the machine-learning (Please note that while Slivkins does not explicitly use the term ‘machine-learning’, one of ordinary skill in the art would recognize the multi-armed bandit recited in ¶25 as a ML algorithm) contextual as context (Slivkins, ¶23) bandit model (Slivkins, ¶25 “In some implementations, the document selection algorithm may be a "multi-armed bandit" algorithm.”) indicates how effective as the upper confidence bound of a click-through rate (Slivkins, ¶27 “The index value associated with a strategy may represent an upper confidence bound of a clickthrough rate of a document randomly selected from the strategy for a particular context S consistent with the strategy. The index value may be a measure of the number of times that the strategy has been selected by a document selection algorithm, as well as how many times the strategy has been selected by a user.”) the response as the document selected (Id) from the respective LLM service as the strategy that selected the document (Slivkins, ¶27; Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) is for achieving the desired outcome of user satisfaction as positive feedback (Slivkins, ¶33 “The document selector 130 may use the user interaction data to provide positive and negative feedback to the strategies used by the instances of the document selection algorithm. For the case where the user interaction data indicates that a document indicated by one of the slots of the results page 200 was selected, positive feedback may be provided to the strategy used by a document selection algorithm to select the indicated document”) after the user support query has been addressed (Slivkins, ¶54 “An indication of a selection is received at 503. The indication of a selection may be received by the document selector 130 and may be an indication of a selection made by a user to a slot containing an indicator of a document ( e.g., a link). For example, a user may have clicked on a URL in one of the slots of the results page 200.”).
With regard to claims 8 and 18 the proposed combination further teaches training the machine-learning contextual bandit model is trained by a process (Please note this claim limitation has been construed as further defining the training of the parent claim, see the 112b above) of:
obtaining a training dataset including one or more training examples as context tree Tc, which may be from a previous instance (Slivkins, ¶59; ¶28; Please note this claim limitation has been construed as referring to the same limitation in the parent claim, please see the 112b above), a training example (Id; Please note this claim limitation has been construed as referring to the same limitation in the parent claim, please see the 112b above. The prior art mapping is as put forth in the parent claim for the same limitation) obtained from logs as the tracked similarity measure between the item and the search for the item (Padgett, ¶78 “The comparison may be carried out between the result and each item in the repository in tum. The system may track the highest similarity measure and the corresponding item, in a search for the item from the repository that is most similar to the result. … a search-based operation in which the result is used as a search query and the repository returns one or more items that are determined to be closest to the result”) for a previously submitted query as the query to which the document response is being evaluated, e.g. the query to which the document is identified as a positive or negative response for (¶28; ¶16) processed by the LLM service as a first strategy (Slivkins, ¶57) which may be a LLM (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”; Please note this claim limitation has been construed as referring to the same limitation in the parent claim, please see the 112b above. The prior art mapping is as put forth in the parent claim for the same limitation) or another as a second strategy (Slivkins, ¶57) LLM service as the second strategy may be a LLM (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”), the training example including a training set of features as context tree Tc (Slivkins, ¶59 “As described previously, a strategy may include a subtree u of the context tree Tc and a subtree u' of the document tree TD·”) and a known outcome as the determination of certainty (Slivkins, ¶57) which may be based on user selection of the document in response to the query in the previous round (Slivkins, ¶25; Please note this claim limitation has been construed as referring to the same limitation in the parent claim, please see the 112b above.) for the previously submitted query as the query to which the document response is being evaluated, e.g. the query to which the document is identified as a positive or negative response for (Slivkins, ¶28; ¶16; Please note this claim limitation has been construed as referring to the same limitation in the parent claim, please see the 112b above. The prior art mapping is as put forth in the parent claim for the same limitation); and
training one or more parameters as strategies are generated based on the combination of the subtrees of u and u` (Slivkins, ¶59) of the machine-learning contextual bandit model as the multi-armed bandit (Slivkins, ¶23; ¶25; ¶55) on the one or more training examples of the training dataset as context tree Tc, which may be from a previous instance (Slivkins, ¶59; ¶28; Please note this claim limitation has been construed as referring to the same limitation in the parent claim, please see the 112b above.).
With regard to claims 9 and 19 the proposed combination further teaches obtaining an indication that the user performed the desired outcome after receiving the response;
generating a training as the data is used to generate the second set of strategies (Slivkins, ¶59) example including the set of features as context tree Tc (Slivkins, ¶59 “As described previously, a strategy may include a subtree u of the context tree Tc and a subtree u' of the document tree TD·”) and a label as positive feedback (Slivkins, ¶55) indicating the desired outcome was performed as the document being selected (Slivkins, ¶57 “In some implementations, the determination may be based on whether the strategy has been used or selected more than a threshold number of times.”); and
training as removing strategies (¶58) or generating second strategies (¶59) one or more parameters as strategies are generated based on the combination of the subtrees of u and u` (¶59) of the machine-learning (Slivkins, ¶25) contextual as context (Slivkins, ¶23) bandit model as the multi-armed bandit (Slivkins, ¶25; ¶55) on the training example as the evaluated results (Slivkins, ¶57; ¶59).
With regard to claim 10 the proposed combination further teaches [g] prompting the selected LLM (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) service to process subsequent queries as receiving queries, e.g. a plurality of queries (Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”) from the user as the user (Slivkins, ¶16) of the client device as the client 110 (Slivkins, ¶16).
With regard to claim 11 Slivkins teaches A non-transitory computer-readable storage medium storing computer instructions (Slivkins, ¶63 “Computer-executable instructions, such as program modules, being executed by a computer may be used.”), wherein the computer instructions, when executed by one or more processors (¶62 “multiprocessor systems, microprocessor-based systems”), cause the one or more processors to perform operations comprising:
[a] receiving, from a user as the user (Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”) of a client device as the client 110 (Slivkins, ¶16), a query as the received queries (Slivkins, ¶16) for (Please note that what the query is for is an intended use of query and does not impose a functional limitation on the claimed device) an application as the search engine 140 (Slivkins, ¶16) [[
[b] obtaining contextual information (Slivkins, ¶23 “A context may be a set of documents that have been ranked for higher slots in a results page. In”) as a set of features (Slivkins, ¶23 “Contexts may be selected based on a variety of document features and attributes including topics, keywords, and click-through rates, for example”), the set of features related to the query (Slivkins, ¶21 “Any number of methods or techniques for determining the similarity of documents may be used, such as using text features, queries that result in clicks on the documents, and keywords associated with the documents, for example.”) or the user (Slivkins, ¶23 “In addition, a context may further include information such as information about a user's interests or other information.”) of the client device as client 110 (Slivkins, ¶15);
[c] applying a machine-learning (Please note that while Slivkins does not explicitly use the term ‘machine-learning’, one of ordinary skill in the art would recognize the multi-armed bandit recited in ¶25 as a ML algorithm) contextual as context (Slivkins, ¶23) bandit model (Slivkins, ¶25 “In some implementations, the document selection algorithm may be a "multi-armed bandit" algorithm.”) to the set of features (Slivkins, ¶23 “Contexts may be selected based on a variety of document features and attributes including topics, keywords, and click-through rates, for example”) to generate a set of predicted scores as index values (Slivkins, ¶27 “each strategy may have an associated index value.”), wherein each predicted score in the set of predicted scores indicates how effective as the upper confidence bound of a click-through rate (¶27 “The index value associated with a strategy may represent an upper confidence bound of a clickthrough rate of a document randomly selected from the strategy for a particular context S consistent with the strategy. The index value may be a measure of the number of times that the strategy has been selected by a document selection algorithm, as well as how many times the strategy has been selected by a user.”) a response as the document selected (Id) from a respective [[ as the strategy that selected the document (Id) in the set of [[ as the set of strategies (Slivkins, ¶49) is with respect to a desired outcome as positive feedback (Slivkins, ¶33 “The document selector 130 may use the user interaction data to provide positive and negative feedback to the strategies used by the instances of the document selection algorithm. For the case where the user interaction data indicates that a document indicated by one of the slots of the results page 200 was selected, positive feedback may be provided to the strategy used by a document selection algorithm to select the indicated document”), wherein the machine-learning contextual bandit model (Slivkins, ¶25 “In some implementations, the document selection algorithm may be a "multi-armed bandit" algorithm.”; ¶23) is trained as removing strategies (¶58 “The strategy is removed from the plurality of strategies at 511 . The strategy associated with the selected slot may be removed from the plurality of strategies by the document selector 130”), generating second strategies (¶59 “The second plurality of strategies may be generated by the document selector 130 using the strategy associated with the selected slot. In some implementations, the document selector 130 may generate the second plurality… of strategies by generating a strategy for one or more combinations of subtrees of u and subtrees of u' from the removed strategy.”), or adjusting values associated with a strategy (¶33 “the document selector 130 may provide the positive feedback by adjusting or increasing the index value associated with the strategy”) using a training dataset including one or more training examples as context tree Tc, which may be from a previous instance (Slivkins, ¶59 “As described previously, a strategy may include a subtree u of the context tree Tc and a subtree u' of the document tree TD·”; ¶28 “By executing the document selecting algorithms sequentially, each instance of the document selection algorithm may consider the documents selected by previous instances of the document selection algorithm when selecting a document.”), a training example representing (Please note that the claim does not require that the training example include the previous query, merely that it represents the previous query) a previously submitted query as the query to which the previous instance of documents are responsive to (XXX, ¶28; ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110. The search engine 140 may search for document and other media that is responsive to the query by searching a search corpus 163 using the received query.”; ¶20 “The document selector 130 may select one or more documents for placement on the results page 200 from a set of documents that is responsive to a received query. The set of documents may have been determined by the”) processed by an [[ as the determination of certainty (XXX, ¶57) which may be based on user selection of the document in response to the query in the previous round (XXX, ¶25 “For each round, an instance of the document selection algorithm may select a document from the set of responsive documents for a slot. If a user selects an indication of a document provided for a slot on the results page 200, then the document selection for the slot may receive a payout or some other positive feedback”) for the previously submitted query as the query to which the document response is being evaluated, e.g. the query to which the document is identified as a positive or negative response for (¶28 “By executing the document selecting algorithms sequentially, each instance of the document selection algorithm may consider the documents selected by previous instances of the document selection algorithm when selecting a document. As described above, it may be assumed that a user reviews indicated documents on the results page 200 beginning with the top most slot (i.e., slot 201). When a user views a document indicated by a slot, it may also be assumed that the user rejected the documents indicated by the preceding slots.”; ¶16 “receive queries from users using clients”), the training as removing strategies (¶58 “The strategy is removed from the plurality of strategies at 511 . The strategy associated with the selected slot may be removed from the plurality of strategies by the document selector 130”), generating second strategies (¶59 “The second plurality of strategies may be generated by the document selector 130 using the strategy associated with the selected slot. In some implementations, the document selector 130 may generate the second plurality… of strategies by generating a strategy for one or more combinations of subtrees of u and subtrees of u' from the removed strategy.”), or adjusting values associated with a strategy (¶33 “the document selector 130 may provide the positive feedback by adjusting or increasing the index value associated with the strategy”) comprising updating parameters as strategies are generated based on the combination of the subtrees of u and u` (XXX, ¶59) of the machine-learning contextual bandit model as the multi-armed bandit (Slivkins, ¶23; ¶25; ¶55) based on the one or more training examples as context tree Tc, which may be from a previous instance (Slivkins, ¶59; ¶28);
[d] selecting an [[as selecting a strategy (Slivkins, ¶49 “A strategy is selected at 405. The strategy may be selected from the plurality of strategies by the document selector 130”) in the set of [[ as the plurality of strategies (Slivkins, ¶49) based at least on the set of predicted scores as index values (Slivkins, ¶49 “the selected strategy may have a maximal index value among the strategies that include the received context”) from the machine-learning contextual (Slivkins, ¶23 “context”) bandit model (Slivkins, ¶25 “multi-armed bandit”);
[e] [[
[f] receiving, from the selected [[as a document (Slivkins, ¶41 “A document is selected for each slot of the results page at 307.”) generated by executing the selected [[ as the selected strategy 407 as part of document selector 130 (¶41 “The documents may be selected by the document selector 130 from the plurality of responsive documents. In some implementations, each document may be selected by an instance of a document selection algorithm such as multiarmed bandits algorithm.”; ¶50 “A document is selected using the selected strategy at 407. The document may be selected by the document selector 130.”) on the prompt as the responsive documents are associated with the query (Slivkins, ¶39 “the responsive documents may contain one or more keywords that match or are otherwise associated with one or more of the terms of the query.”; Please note that one of ordinary skill in the art would recognize a prompt as a query); and
[g] prompting the selected [[as receiving queries, e.g. a plurality of queries (Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”) from the user as the user (Slivkins, ¶16) of the client device as the client 110 (Slivkins, ¶16).
Slivkins does not explicitly teach [a] an application with access to one or more of a set of LLM services; [c]… applying a machine-learning… a respective LLM service in the set of LLM services… an LLM service… [d] selecting an LLM service in the set of LLM services; [e] generating a prompt for input to the selected LLM service, wherein the prompt for the selected LLM service is generated based on a prompt template in a prompt library database associated with the selected LLM service; [f] the selected LLM service… the selected LLM service [g] the selected LLM service.
[a] receiving, from a user of a client device (Padgett, ¶75 “In some cases, the input option for the prompt may be a free form text box enabling the user to provide any text input”), a query (Padgett, ¶75 “the input option may permit the input of non-text prompts, including image data, voice data, video data, or other multimedia”) for an application (Padgett, ¶75 “A web page or mobile app may provide the interface.”) with access to one or more of a set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”);
[c] applying at least one machine-learning (Padgett, ¶47 “Some concepts in ML-based language models are now discussed.”) contextual … model (Padgett, ¶124 “the output or result generated by a generative AI model may be obtained through interpreting the intent and context of the prompt.”) to the set of features (Padgett, ¶54 “The encoder 52 serves to encode the embeddings 60 into feature vectors 62 that represent the latent features of the embeddings 60”) to generate a set of predicted scores as the similarity, e.g. closeness (Padgett, ¶53 “The embedding 60 represents the text segment corresponding to the token 56 in a way such that embeddings corresponding to semantically-related text are closer to each other in a vector space than embeddings corresponding to semantically-unrelated text.”), wherein each predicted score in the set of predicted scores indicates how effective as how similar (Id) a response as the target embedding that the token is being compared to (Id) from a respective LLM service in the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) is with respect to a desired outcome as the text segment representing the prompt, which is being compared to the target segment (Padgett, ¶53 “The embedding 60 represents the text segment corresponding to the token 56 in a way such that embeddings corresponding to semantically-related text are closer to each other in a vector space than embeddings corresponding to semantically-unrelated text.”), wherein the machine-learning (Padgett, ¶47 “Some concepts in ML-based language models are now discussed.”) contextual … model (Padgett, ¶124 “the output or result generated by a generative AI model may be obtained through interpreting the intent and context of the prompt.”) is trained using a training dataset including one or more training examples (YYY, ¶30 “The generative AI models are typically trained using a large data set of example training data.”), a training example representing a previously submitted query as a preexisting input prompt (YYY, ¶62 “The generative AI model 102 is configured to take an input prompt, typically in text form but may also possibly include images or other media inputs. The model 102 creates an output related to the input prompt.”; ¶82 “In one example, the repository of pre-existing items is the set of training data or a subset of the training data”; ¶109 “In one example, the training data set is split into two portions, a first one of which is used as training data and a second one of which is used as input prompts.”) processed by an LLM service and a known outcome as the ground truth labeling (YYY, ¶42 “Training data may be annotated with ground truth labels (e.g. each data entry in the training dataset may be paired with a label)”) for the previously submitted query as a preexisting input prompt (YYY, ¶62; ¶82; ¶109), the training comprising updating parameters as updating the parameters of the ML (¶43 “The parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity ( or one or more quantities) to be optimized ( e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function.”; ¶44 “a new set of hyperparameters may be determined based on the measured performance of one or more of the trained ML models, and the first step of training”; ¶45 “Backpropagation is used to adjust (also referred to as update) the value of the parameters in the ML model, with the goal of optimizing the objective function”) of the machine-learning contextual bandit model based on the one or more training examples as the known output values and the desired target values (Id);
[d] … an LLM service in the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”)…;
[e] generating a prompt for input to the selected LLM service (Padgett, ¶59 “A computing system may generate a prompt that is provided as input to the LLM via its API”), wherein the prompt for the selected LLM service is generated based on a prompt template (Padgett, ¶124 “In some cases, this may include use of a prompt template. A prompt template may specify that prompts have a certain structure or constrained intents, or that acceptable prompts exclude certain classes of subject matter or intent, such as the production of results or outputs that are violent, pornographic, etc.”) in a prompt library database (¶32 “This might be more common in instances where that section of code is a commonly-accepted template or precedent, such as a standard incorporation of certain libraries or setting of initial variables or other parameters”) associated with the selected LLM service (Padgett, ¶59); and
[f] receiving, from the selected LLM service, a response generated by executing the selected LLM service on the prompt (Padgett, ¶60 “The prompt generated by the computing system is provided to the language model or LLM and the output (e.g., token sequence) generated by the language model or LLM is communicated back to the computing system.”); and
[g] the selected LLM service (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the search strategies taught by Slivkins as LLM search systems as taught by Padgett as it may generate better output (Padgett, ¶59 “A prompt can include one or more examples of the desired output provides the LLM with additional information to enable the LLM to better generate output according to the desired output.”) Such AI systems have significant advantages in generating responses and other times of outputs (Padgett, ¶4; ¶30).
With regard to claim 20 Slivkins teaches A computer system, comprising one or more processors(¶62 “multiprocessor systems, microprocessor-based systems”); and
a non-transitory computer-readable storage medium storing computer instructions (Slivkins, ¶63 “Computer-executable instructions, such as program modules, being executed by a computer may be used.”), wherein the computer instructions, when executed by the one or more processors, cause the one or more processors to perform operations as perform particular tasks (Slivkins, ¶63) comprising:
[a] receiving, from a user as the user (Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”) of a client device as the client 110 (Slivkins, ¶16), a query as the received queries (Slivkins, ¶16) for (Please note that what the query is for is an intended use of query and does not impose a functional limitation on the claimed device) an application as the search engine 140 (Slivkins, ¶16) [[
[b] obtaining contextual information (Slivkins, ¶23 “A context may be a set of documents that have been ranked for higher slots in a results page. In”) as a set of features (Slivkins, ¶23 “Contexts may be selected based on a variety of document features and attributes including topics, keywords, and click-through rates, for example”), the set of features related to the query (Slivkins, ¶21 “Any number of methods or techniques for determining the similarity of documents may be used, such as using text features, queries that result in clicks on the documents, and keywords associated with the documents, for example.”) or the user (Slivkins, ¶23 “In addition, a context may further include information such as information about a user's interests or other information.”) of the client device as client 110 (Slivkins, ¶15);
[c] applying a machine-learning (Please note that while Slivkins does not explicitly use the term ‘machine-learning’, one of ordinary skill in the art would recognize the multi-armed bandit recited in ¶25 as a ML algorithm) contextual as context (Slivkins, ¶23) bandit model (Slivkins, ¶25 “In some implementations, the document selection algorithm may be a "multi-armed bandit" algorithm.”) to the set of features (Slivkins, ¶23 “Contexts may be selected based on a variety of document features and attributes including topics, keywords, and click-through rates, for example”) to generate a set of predicted scores as index values (Slivkins, ¶27 “each strategy may have an associated index value.”), wherein each predicted score in the set of predicted scores indicates how effective as the upper confidence bound of a click-through rate (¶27 “The index value associated with a strategy may represent an upper confidence bound of a clickthrough rate of a document randomly selected from the strategy for a particular context S consistent with the strategy. The index value may be a measure of the number of times that the strategy has been selected by a document selection algorithm, as well as how many times the strategy has been selected by a user.”) a response as the document selected (Id) from a respective [[ as the strategy that selected the document (Id) in the set of [[ as the set of strategies (Slivkins, ¶49) is with respect to a desired outcome as positive feedback (Slivkins, ¶33 “The document selector 130 may use the user interaction data to provide positive and negative feedback to the strategies used by the instances of the document selection algorithm. For the case where the user interaction data indicates that a document indicated by one of the slots of the results page 200 was selected, positive feedback may be provided to the strategy used by a document selection algorithm to select the indicated document”), wherein the machine-learning contextual bandit model (Slivkins, ¶25 “In some implementations, the document selection algorithm may be a "multi-armed bandit" algorithm.”; ¶23) is trained as removing strategies (¶58 “The strategy is removed from the plurality of strategies at 511 . The strategy associated with the selected slot may be removed from the plurality of strategies by the document selector 130”), generating second strategies (¶59 “The second plurality of strategies may be generated by the document selector 130 using the strategy associated with the selected slot. In some implementations, the document selector 130 may generate the second plurality… of strategies by generating a strategy for one or more combinations of subtrees of u and subtrees of u' from the removed strategy.”), or adjusting values associated with a strategy (¶33 “the document selector 130 may provide the positive feedback by adjusting or increasing the index value associated with the strategy”) using a training dataset including one or more training examples as context tree Tc, which may be from a previous instance (Slivkins, ¶59 “As described previously, a strategy may include a subtree u of the context tree Tc and a subtree u' of the document tree TD·”; ¶28 “By executing the document selecting algorithms sequentially, each instance of the document selection algorithm may consider the documents selected by previous instances of the document selection algorithm when selecting a document.”), a training example representing (Please note that the claim does not require that the training example include the previous query, merely that it represents the previous query) a previously submitted query as the query to which the previous instance of documents are responsive to (XXX, ¶28; ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110. The search engine 140 may search for document and other media that is responsive to the query by searching a search corpus 163 using the received query.”; ¶20 “The document selector 130 may select one or more documents for placement on the results page 200 from a set of documents that is responsive to a received query. The set of documents may have been determined by the”) processed by an [[ as the determination of certainty (XXX, ¶57) which may be based on user selection of the document in response to the query in the previous round (XXX, ¶25 “For each round, an instance of the document selection algorithm may select a document from the set of responsive documents for a slot. If a user selects an indication of a document provided for a slot on the results page 200, then the document selection for the slot may receive a payout or some other positive feedback”) for the previously submitted query as the query to which the document response is being evaluated, e.g. the query to which the document is identified as a positive or negative response for (¶28 “By executing the document selecting algorithms sequentially, each instance of the document selection algorithm may consider the documents selected by previous instances of the document selection algorithm when selecting a document. As described above, it may be assumed that a user reviews indicated documents on the results page 200 beginning with the top most slot (i.e., slot 201). When a user views a document indicated by a slot, it may also be assumed that the user rejected the documents indicated by the preceding slots.”; ¶16 “receive queries from users using clients”), the training as removing strategies (¶58 “The strategy is removed from the plurality of strategies at 511 . The strategy associated with the selected slot may be removed from the plurality of strategies by the document selector 130”), generating second strategies (¶59 “The second plurality of strategies may be generated by the document selector 130 using the strategy associated with the selected slot. In some implementations, the document selector 130 may generate the second plurality… of strategies by generating a strategy for one or more combinations of subtrees of u and subtrees of u' from the removed strategy.”), or adjusting values associated with a strategy (¶33 “the document selector 130 may provide the positive feedback by adjusting or increasing the index value associated with the strategy”) comprising updating parameters as strategies are generated based on the combination of the subtrees of u and u` (XXX, ¶59) of the machine-learning contextual bandit model as the multi-armed bandit (Slivkins, ¶23; ¶25; ¶55) based on the one or more training examples as context tree Tc, which may be from a previous instance (Slivkins, ¶59; ¶28);
[d] selecting an [[as selecting a strategy (Slivkins, ¶49 “A strategy is selected at 405. The strategy may be selected from the plurality of strategies by the document selector 130”) in the set of [[ as the plurality of strategies (Slivkins, ¶49) based at least on the set of predicted scores as index values (Slivkins, ¶49 “the selected strategy may have a maximal index value among the strategies that include the received context”) from the machine-learning contextual (Slivkins, ¶23 “context”) bandit model (Slivkins, ¶25 “multi-armed bandit”);
[e] [[
[f] receiving, from the selected [[as a document (Slivkins, ¶41 “A document is selected for each slot of the results page at 307.”) generated by executing the selected [[ as the selected strategy 407 as part of document selector 130 (¶41 “The documents may be selected by the document selector 130 from the plurality of responsive documents. In some implementations, each document may be selected by an instance of a document selection algorithm such as multiarmed bandits algorithm.”; ¶50 “A document is selected using the selected strategy at 407. The document may be selected by the document selector 130.”) on the prompt as the responsive documents are associated with the query (Slivkins, ¶39 “the responsive documents may contain one or more keywords that match or are otherwise associated with one or more of the terms of the query.”; Please note that one of ordinary skill in the art would recognize a prompt as a query); and
[g] prompting the selected [[as receiving queries, e.g. a plurality of queries (Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”) from the user as the user (Slivkins, ¶16) of the client device as the client 110 (Slivkins, ¶16).
Slivkins does not explicitly teach [a] an application with access to one or more of a set of LLM services; [c]… applying a machine-learning… a respective LLM service in the set of LLM services… an LLM service… [d] selecting an LLM service in the set of LLM services; [e] generating a prompt for input to the selected LLM service, wherein the prompt for the selected LLM service is generated based on a prompt template in a prompt library database associated with the selected LLM service; [f] the selected LLM service… the selected LLM service [g] the selected LLM service.
[a] receiving, from a user of a client device (Padgett, ¶75 “In some cases, the input option for the prompt may be a free form text box enabling the user to provide any text input”), a query (Padgett, ¶75 “the input option may permit the input of non-text prompts, including image data, voice data, video data, or other multimedia”) for an application (Padgett, ¶75 “A web page or mobile app may provide the interface.”) with access to one or more of a set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”);
[c] applying at least one machine-learning (Padgett, ¶47 “Some concepts in ML-based language models are now discussed.”) contextual … model (Padgett, ¶124 “the output or result generated by a generative AI model may be obtained through interpreting the intent and context of the prompt.”) to the set of features (Padgett, ¶54 “The encoder 52 serves to encode the embeddings 60 into feature vectors 62 that represent the latent features of the embeddings 60”) to generate a set of predicted scores as the similarity, e.g. closeness (Padgett, ¶53 “The embedding 60 represents the text segment corresponding to the token 56 in a way such that embeddings corresponding to semantically-related text are closer to each other in a vector space than embeddings corresponding to semantically-unrelated text.”), wherein each predicted score in the set of predicted scores indicates how effective as how similar (Id) a response as the target embedding that the token is being compared to (Id) from a respective LLM service in the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) is with respect to a desired outcome as the text segment representing the prompt, which is being compared to the target segment (Padgett, ¶53 “The embedding 60 represents the text segment corresponding to the token 56 in a way such that embeddings corresponding to semantically-related text are closer to each other in a vector space than embeddings corresponding to semantically-unrelated text.”), wherein the machine-learning (Padgett, ¶47 “Some concepts in ML-based language models are now discussed.”) contextual … model (Padgett, ¶124 “the output or result generated by a generative AI model may be obtained through interpreting the intent and context of the prompt.”) is trained using a training dataset including one or more training examples (YYY, ¶30 “The generative AI models are typically trained using a large data set of example training data.”), a training example representing a previously submitted query as a preexisting input prompt (YYY, ¶62 “The generative AI model 102 is configured to take an input prompt, typically in text form but may also possibly include images or other media inputs. The model 102 creates an output related to the input prompt.”; ¶82 “In one example, the repository of pre-existing items is the set of training data or a subset of the training data”; ¶109 “In one example, the training data set is split into two portions, a first one of which is used as training data and a second one of which is used as input prompts.”) processed by an LLM service and a known outcome as the ground truth labeling (YYY, ¶42 “Training data may be annotated with ground truth labels (e.g. each data entry in the training dataset may be paired with a label)”) for the previously submitted query as a preexisting input prompt (YYY, ¶62; ¶82; ¶109), the training comprising updating parameters as updating the parameters of the ML (¶43 “The parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity ( or one or more quantities) to be optimized ( e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function.”; ¶44 “a new set of hyperparameters may be determined based on the measured performance of one or more of the trained ML models, and the first step of training”; ¶45 “Backpropagation is used to adjust (also referred to as update) the value of the parameters in the ML model, with the goal of optimizing the objective function”) of the machine-learning contextual bandit model based on the one or more training examples as the known output values and the desired target values (Id);
[d] … an LLM service in the set of LLM services (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”)…;
[e] generating a prompt for input to the selected LLM service (Padgett, ¶59 “A computing system may generate a prompt that is provided as input to the LLM via its API”), wherein the prompt for the selected LLM service is generated based on a prompt template (Padgett, ¶124 “In some cases, this may include use of a prompt template. A prompt template may specify that prompts have a certain structure or constrained intents, or that acceptable prompts exclude certain classes of subject matter or intent, such as the production of results or outputs that are violent, pornographic, etc.”) in a prompt library database (¶32 “This might be more common in instances where that section of code is a commonly-accepted template or precedent, such as a standard incorporation of certain libraries or setting of initial variables or other parameters”) associated with the selected LLM service (Padgett, ¶59); and
[f] receiving, from the selected LLM service, a response generated by executing the selected LLM service on the prompt (Padgett, ¶60 “The prompt generated by the computing system is provided to the language model or LLM and the output (e.g., token sequence) generated by the language model or LLM is communicated back to the computing system.”); and
[g] the selected LLM service (Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the search strategies taught by Slivkins as LLM search systems as taught by Padgett as it may generate better output (Padgett, ¶59 “A prompt can include one or more examples of the desired output provides the LLM with additional information to enable the LLM to better generate output according to the desired output.”) Such AI systems have significant advantages in generating responses and other times of outputs (Padgett, ¶4; ¶30).
Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Slivkins in view of Padgett and Gupta [2022/0222728].
With regard to claims 7 and 17 the proposed combination further teaches wherein when the query (Slivkins, ¶16 “The search engine 140 may be configured to receive queries from users using clients such as the client 110”) includes an opportunity to respond with a message for a [[(Slivkins, ¶41 “A document is selected for each slot of the results page at 307.”), each predicted score as index values (Slivkins, ¶27 “each strategy may have an associated index value.”) indicates how effective as the upper confidence bound of a click-through rate (Slivkins, ¶27 “The index value associated with a strategy may represent an upper confidence bound of a clickthrough rate of a document randomly selected from the strategy for a particular context S consistent with the strategy. The index value may be a measure of the number of times that the strategy has been selected by a document selection algorithm, as well as how many times the strategy has been selected by a user.”) the response as the document selected (Id) from the respective LLM service as the strategy that selected the document (Slivkins, ¶27; Padgett, ¶28; ¶47; “unless stated otherwise, "language model" encompasses LLMs.”; ¶48 “LLM”; ¶49 “GPT”) is for the desired outcome as positive feedback (Slivkins, ¶33 “The document selector 130 may use the user interaction data to provide positive and negative feedback to the strategies used by the instances of the document selection algorithm. For the case where the user interaction data indicates that a document indicated by one of the slots of the results page 200 was selected, positive feedback may be provided to the strategy used by a document selection algorithm to select the indicated document”) of the [[ as the monitoring the user interaction, e.g. if the user clicks the result (Slivkins, ¶19 “After the generated results page 200 is provided to a user by the search engine 140, interactions between the user and the results page 200 may be monitored and stored as interaction data 165.”; ¶32 “After receiving the generated results page 200, the document selector 130 may receive user interaction data based on the user's interaction with the results page 200. For example, the user may select an indicated document from one of the slots of the results page 200,”).
Slivkins does not explicitly teach that the result is a sponsored item, or that the monitored interaction is the user adding the sponsored item to a user’s order.
Gupta teaches wherein when the query includes an opportunity to respond with a message for a sponsored item as sponsored items (Gupta, ¶29 “execute the models to determine item recommendations for items (e.g., sponsored items and non-sponsored items) to display to the customer”; Please note this claim limitation has been construed in light of Paragraph [19] of the original specification which recites “An "item", as used herein, means a good or product that can be provided to the customer through the online system 140.”), …each predictive score as the similarity score generated based on feature vectors (Gupta, ¶31 “the tensors are generated based on semantic similarities between the items and sponsored items.”; ¶28; ¶33) indicates how effective the response …is for the desired outcome of the user adding the sponsored item to a user’s order as clickthrough rates, such as the item being added to a shopping cart (Gupta, ¶30 “For example, the user session data may identify item impressions, item clicks, items added to an online shopping cart, conversions, clickthrough rates, item recommendations viewed, and/or item recommendations clicked during an ongoing browsing session (e.g., the user data identifies real-time events).”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the search result system taught by the proposed combination to include sponsored items as taught by Gupta as it yields the predictable results of providing items that the customer finds interesting thereby increasing the relevance and revenue of the website hosting the item (Gupta, ¶2 “in some examples, the item recommendation systems may provide recommendations for items that the customer finds irrelevant or is not interested in at positions that the user is most likely to interact with, losing out on revenue that may have been collected with other more relevant items recommended in those positions.”). One of ordinary skill in the art would recognize that the system taught by Slivkins ensures that the produced results are relevant to the user, and what the user is looking for, and would be a useful system for retailers to use to promote sails of items to customers as it ensures the items are relevant (Gupta, ¶3) which may maximize revenue (Gupta, ¶3). Within the proposed combination, the results returned to the user may include the sponsored items taught by Gupta as the means of determining relevance within the devices (Gupta and Silvkins) is substantially similar (e.g. feature vector similarity in an embedded space, as determined by a ML model).
Response to Arguments
Applicant's arguments filed February 5, 2026 have been fully considered but they are not persuasive.
With regard to claim 1, applicant argues that Slivkins does not teach or suggest a machine-learning contextual bandit model trained on previously submitted queries processed by LLM services. Specifically, applicant argues that Slivkins teaches a multi-armed bandit for selecting document selection strategies, not for selecting among LLM services.
In response, it is noted that one of ordinary skill in the art would recognize the multi-armed bandit as a specific form of Machine learning, one in which each ‘arm’ (within the prior art, Slivkins refers to the arms as ‘strategies’) is a ML algorithm. The multi-armed bandit selects the best strategy (e.g. best ML algorithm ‘arm’) to perform the desired operation (e.g. the search operation within Slivkins). Within the proposed combination, the multi-armed bandit is modified to include LLM models (as taught by Padgett), as these are known Machine learning models that one of ordinary skill in the art would reasonable know how to implement within the multi-armed bandit structure.
Applicant's arguments appear to analyze the references individually. One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Applicant’s arguments do not address the proposed combination on record.
Regarding the teaching of “training a model on logs”, this claim limitation is newly added in the claim amendments and addressed in the rejection above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMANDA WILLIS whose telephone number is (571)270-7691. The examiner can normally be reached Monday-Friday 8am-2pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ajay Bhatia can be reached at 571-272-3906. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AMANDA L WILLIS/ Primary Examiner, Art Unit 2156