Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This action is in response to amendments filed January 16th, 2026, in which Claims 1, 7, and 14 are amended. No claims are cancelled nor added. The amendments have been entered, and Claims 1-20 are currently pending.
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 120 as follows:
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA 35 U.S.C. 112, except for the best mode requirement. See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994).
The disclosure of the prior-filed applications, Application Nos. 18/633,293, 18/661,519, 18/661,532, and 18/812,913, fail to provide adequate support or enablement in the manner provided by 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112, first paragraph for one or more claims of this application. Specifically, none of these applications disclose a hierarchical set of constraints for routing requests to candidate LLM models, with the details as required by the independent claims.
The claims are thus given an effective filing date of August 15th, 2025.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 7-20 are rejected under 35 U.S.C. 103 as being unpatentable over Odland, US PG Pub 2024/0427994, in view of Upadhyay, US PG Pub 2025/0265504, and further in view of Kuperman, US Patent 12,236,193.
Regarding Claim 7, Odland teaches a method comprising: receiving, in response to a request to generate an output using a set of AI models, a plurality of session-specific data elements (Odland, Fig. 8, elements 805, 810, 815, 820, “Receive a question and an identity of a user of the question … a user profile … privacy rules”) including prior interaction data (Odland, [0061], “the routing rules may route a request as a function of … historical performance of available LLMs”), system environment parameters (Odland, [0074], “a current location and eating habits of the user from the user profile database”), and computational context values (Odland, [0108], “permission to send the question to an unsecure domain … the routing engine may generate the routing of a question based on the selected LLM, the permission of the user, and the content of the question”); determining a first subset of constraints and a second subset of constraints for routing the request to an AI model of the set of AI models, the first subset of constraints comprising privacy and data handling protocols (Odland, Fig. 8, elements 825 versus 840 shows a hierarchy of the first privacy, see [0142-0143] for data handling protocols, versus second-priority routing classification rules, privacy rules are always executed first) … selecting … at least one candidate AI for the request … wherein each of the plurality of candidate AI models satisfies the first subset of constraints (Odland, Fig. 8, element 840, “Select one or more LLM based on a routing classification model” where the privacy determination was previously made); and routing the request to the at least one candidate AI model to cause the at least one candidate AI model to generate the output for the request (Odland, Fig. 8, element 835, “Generate a routing based on the selection” & [0051], “to receive answers … from the PHI approved LLM and the public LLM”).
While Odland teaches a hierarchy of constraints/rules for determining a routing of queries to various LLMs, Odland does not teach the specific constraints in the second subset of constraints (i.e. latency thresholds, model response requirements, and resource allocation limitations); nor does Odland teach weights for a multi-variable optimization in order to determine the routing; nor does Odland teach system performance feedback that indicates that a performance metric is beyond a respective threshold for automatically selecting a different AI model.
However, Upadhyay, also in the art of routing requests to LLMs, teaches a subset of constraints that comprises processing latency thresholds, model response requirements (Upadhyay, [0026], “multi-variate routing … the router can additionally optimize for other parameters like latency, tone” where “tone” is a models response requirement), and resource allocation limitations (Upadhyay, [0078], “model selection module which functions to select a candidate model based on … latency preferences … accuracy … computing resource cost”). Upadhyay further teaches updating weights, using the plurality of session specific data elements, for a multi-variable optimization (Upadhyay, [0078], “The selection parameters can be received as user preferences (e.g. alongside the runtime prompt) … [and] can include a set of weights representing varying priorities of conflicting goals”); executing the multi-variable optimization, using the updated weights, across a plurality of candidate AI models of the set of AI models, wherein each of the plurality of candidate AI models … optimizes the second subset of constraints (Upadhyay, [0063], “a multivariate optimization to select a candidate model” & [0078], “selection parameters are a set of weights each corresponding to a different optimization target in a multivariate optimization”) and in response to receiving system performance feedback relating to at least one constraint the second subset, automatically select a different LLM from among the plurality of candidate LLMs to improve the at least one constraint (Upadhyay, [0078], “the selection parameters can be … learned based on user feedback” i.e. the weights are updated, causing a different LLM to optimize the updated target objective).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use multi-objective optimization with the optimization targets and weights of Upadhyay as the second-level routing model in Odland. The motivation to do so is to choose a model which best reflects user preferences (Upadhyay, [0078]).
While the Odland/Upadhyay combination teaches a system for routing requests to appropriate LLMs utilizing feedback, the combination does not specifically teach to do use the feedback in such a way that a different model is selected when they system performance feedback indicates that a performance metric for the at least one constraint has deviated beyond a respective threshold. However, Kuperman, also in the art of routing LLM requests, teaches this limitation (Kuperman, Claim 1, “for a difference between the first user feedback score and the second user feedback score being less than a threshold, selecting the first LLM as the LLM, and for the difference being equal to or greater than the threshold, selecting the second LLM”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to switch LLMs when the feedback indicates that one LLM is significantly (i.e. greater than a threshold, rather than only slightly) better than another. The motivation to do so is to maintain the other preference criteria until the respective criteria because significantly bad enough (Kuperman, column 8, lines 18-21, “If a difference between satisfaction scores for an open-source LLM and a non-open source LLM is smaller than a threshold, the open source LLM is selected” that is, the open-source constraint is obey as long as the feedback is not worse that a threshold on another constraint).
Regarding Claim 8, the Odland/Upadhyay/Kuperman combination of Claim 7 teaches the method of Claim 7 (and thus the rejection of Claim 7 is incorporated). The combination has not yet been shown to teach, but Upadhyay does teach, in response to receiving system performance feedback relating to at least one constraint in the second subset, automatically selecting a different AI model from among the plurality of candidate AI models to improve the at least one constraint (Upadhyay, [0078], “the selection parameters can be … learned based on user feedback” i.e. the weights are updated, causing a different LLM to optimize the updated target objective, as well as [0053], “metrics can be received from … a user device (e.g. for latency)” and used to score the models, thus to improve the optimization). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to receive feedback regarding the model’s performance, and then to use that feedback to improve the optimization of the constraints (Upadhyay, [0150], “to identify the best candidate model” based on user preferences).
Regarding Claim 9, the Odland/Upadhyay/Kuperman combination of Claim 8 teaches the method of Claim 8 (and thus the rejection of Claim 8 is incorporated). The combination can be used more than one time, and learns from each use (Upadhyay, [0030], “increases the probability that a high-scoring model can be selected from a subsequent prompt … router improvements can be tuned towards prompts coming from a particular user … ” & [0078], “The selection parameters can be … learned based on user feedback (e.g. user metrics)”). Using the router on a subsequent query, after the metrics and weights have been learned in order to improve performance, thus teaches retrieving an updated plurality of session-specific data elements; dynamically updating the weights, using the updated plurality of session-specific data elements, for the multi-variable optimization, increasing a weight associated with an objective within the multi-variable optimization corresponding to the at least one constraint in the second subset (i.e. learning the selection parameters based on feedback), re-executing the multi-variable optimization using the increased weight and the updated plurality of session-specific data elements to generate an updated candidate set of AI models; and selecting, as the different AI model, a candidate AI model from the plurality of candidate AI models that provides improved performance for the at least one constraint of the second subset and satisfies the first set of constraints (Upadhyay, Fig. 12, increased weight chooses a different operating point with increased performance for the now higher-weighted attribute) by using the router on a new query with user preferences that have been updated based on feedback.
Regarding Claim 10, the Odland/Upadhyay/Kuperman combination of Claim 7 teaches the method of Claim 7 (and thus the rejection of Claim 7 has been incorporated). The combination has not yet been shown to teach, but the combination as described does teach, monitoring the prior interaction data, (Odland, [0061], “the routing rules may route a request as a function of … historical performance of available LLMs”) the system environment parameters (Upadhyay, Fig. 14, “User metrics” are monitored, also see [0050], “Metrics can include … metadata or operation characteristics” & [0061], “physical location”), and the computational context values for the request (Upadhyay, [0050], “Metrics can include: performance scores (e.g. measurements of qualities of prompt-response pairs”); determining revised weights for the multi-variable optimization based on changes in the prior interaction data, the system environment parameters, or the computational context values for the request (Upadhyay, [0078], “The selection parameters can be … learned based on user feedback (e.g. user metrics)”) and applying the revised weights in performing the multi-variable optimization to adjust a relative importance of each objective associated with the second subset of constraints (Upadhyay, [0153], “user specific selection parameters” that is parameters are learned and learned parameters are used for additional queries from that user).
Regarding Claim 11, the Odland/Upadhyay/Kuperman combination of Claim 7 teaches the method of Claim 7 (and thus the rejection of Claim 7 has been incorporated). The combination has not yet been shown to teach, but the combination as described does teach comparing, for each of the plurality of candidate AI models that satisfy the first subset of constraints, results of the multi-variable optimization with the updated weights; and selecting the at least one candidate model that most closely satisfies the second subset of constraints in accordance with the updated weights (Upadhyay, [0153], “a set of metrics for each candidate model are aggregated (e.g. using a weighted average weighted by selection parameters) to determine an aggregate metric for the respective candidate model, then a candidate model is selected based on the aggregate performance metric meeting a constraint” with [0152], “a candidate model is selected based on being associated with a highest performance score”).
Regarding Claim 12, the Odland/Upadhyay/Kuperman combination of Claim 7 teaches the method of Claim 7 (and thus the rejection of Claim 7 has been incorporated). Odland further teaches prior to executing the multi-variable optimization, filtering the plurality of session specific data elements to exclude data elements that do not satisfy the privacy and data handling protocols of the first subset of constraints, such that only compliant session-specific data elements are used (Odland, Fig. 7, elements 705, 710, “Receive a content and a user profile for removing PHI” where PHI is personal health information, e.g. under privacy constraints of the first subset of constraints with “Remove PHI from the content based on sanitation rules”) for dynamically updating the weights for the multi-variable optimization (in the Odland/Upadhyay combination, the “user profile” provides the context information for determining the user preference weights, and the user profile is sanitizes, thus only compliant session-specific data elements are used for dynamically updating the weights).
Regarding Claim 13, the Odland/Upadhyay/Kuperman combination of Claim 7 teaches the method of Claim 7 (and thus the rejection of Claim 7 has been incorporated). The combination has not yet been shown to teach, but Upadhyay teaches, determining a context complexity score derived from the plurality of session specific data elements (Upadhyay, [0054], “metrics are predicted based on prompt information …[which] can include prompt length” where length is a complexity score) and applying a complexity threshold for selecting the at least one candidate AI model based on the context complexity score (Upadhyay, [0054], “a latency can predicted for a given candidate model based on the prompt length” with [0078], “selection parameters are constraints on metrics (e.g. maximum allowable cost, latency, etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the prompt length/context complexity score of Upadhyay in the combination as a further constraint upon which to choose an LLM model. The motivation to do so is to adhere to the maximum allowable latency that a user may desire (Upadhyay, [0078]).
Claims 14-20 recite a system comprising: a storage device; and one or more processors communicatively coupled to the storage device storing instructions thereon, that cause the one or more processors to perform precisely the methods of Claims 7-13, respectively. As Odland teaches a system upon which to perform their method (Odland, Fig. 2, elements 205, “Processor” & 230, 225, 255, “Memory”), Claims 14-20 are rejected for reasons set forth in the rejections of Claims 7-13, respectively.
Claims 1-6 are rejected under 35 U.S.C. 103 as being unpatentable over Odland, in view of Upadhyay and Kuperman, and further in view of Liu, “OptLLM: Optimal Assignment of Queries to Large Language Models.”
Regarding Claim 1, Odland teaches one or more non-transitory computer-readable storage medium comprising instructions recorded thereon, wherein the instructions, when executed by at least one data processor of a system (Odland, [0182], “a program of instructions tangibly encoded on a non-transitory computer readable medium … the instructions may be execute on a processor”), cause the system to: receive, in response to a request to generate an output using large language models (LLMs), a plurality of session-specific data elements (Odland, Fig. 8, elements 805, 810, 815, 820, “Receive a question and an identity of a user of the question … a user profile … privacy rules”) including prior interaction data (Odland, [0061], “the routing rules may route a request as a function of … historical performance of available LLMs”), system environment parameters (Odland, [0074], “a current location and eating habits of the user from the user profile database”), and computational context values (Odland, [0108], “permission to send the question to an unsecure domain … the routing engine may generate the routing of a question based on the selected LLM, the permission of the user, and the content of the question”), wherein the plurality of session-specific data elements is updated based upon each request (Odland, Fig. 8, element 805, a new user profile is received for each question, i.e. based upon each request, also see [0019], “dynamically source the patient data”) and each response (Odland, [0019], “some embodiments may advantageously select a LLM to deliver the QAP based on historical answers/results”); determine a hierarchy of operational constraints for routing the request to an LLM, the hierarchy of operational constraints comprising a first subset of constraints that comprises privacy and data handling protocols and a second subset of constraints (Odland, Fig. 8, elements 825 versus 840 shows a hierarchy of the first privacy, see [0142-0143] for data handling protocols, versus second-priority routing classification rules, privacy rules are always executed first) … select … at least one candidate LLM for the request … wherein each of the plurality of candidate LLMs satisfies the first subset of constraints (Odland, Fig. 8, element 840, “Select one or more LLM based on a routing classification model” where the privacy determination was previously made); route the request to the at least one candidate LLM to cause the at least one candidate LLM to generate the output for the request (Odland, Fig. 8, element 835, “Generate a routing based on the selection” & [0051], “to receive answers … from the PHI approved LLM and the public LLM”).
While Odland teaches a hierarchy of constraints/rules for determining a routing of queries to various LLMs, Odland does not teach the specific constraints in the second subset of constraints (i.e. latency thresholds, model response requirements, and resource allocation limitations); nor does Odland teach weights for a multi-variable optimization in order to determine the routing; nor does Odland teach to automatically select a different LLM in response to receiving system performance feedback relating to at least one constraint in the second subset of constraints.
However, Upadhyay, also in the art of routing requests to LLMs, teaches a subset of constraints that comprises processing latency thresholds, model response requirements (Upadhyay, [0026], “multi-variate routing … the router can additionally optimize for other parameters like latency, tone” where “tone” is a models response requirement), and resource allocation limitations (Upadhyay, [0078], “model selection module which functions to select a candidate model based on … latency preferences … accuracy … computing resource cost”). Upadhyay further teaches to dynamically update weights, using the plurality of session specific data elements, for a multi-variable optimization (Upadhyay, [0078], “The selection parameters can be received as user preferences (e.g. alongside the runtime prompt) … [and] can include a set of weights representing varying priorities of conflicting goals”); execute the multi-variable optimization, using the dynamically updated weights, across a plurality of candidate LLMs (Upadhyay, [0063], “a multivariate optimization to select a candidate model” & [0078], “selection parameters are a set of weights each corresponding to a different optimization target in a multivariate optimization”) and in response to receiving system performance feedback relating to at least one constraint the second subset, automatically select a different LLM from among the plurality of candidate LLMs to improve the at least one constraint (Upadhyay, [0078], “the selection parameters can be … learned based on user feedback” i.e. the weights are updated, causing a different LLM to optimize the updated target objective).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use multi-objective optimization with the optimization targets and weights of Upadhyay as the second-level routing model in Odland. The motivation to do so is to choose a model which best reflects user preferences (Upadhyay, [0078]).
While the Odland/Upadhyay combination teaches a system for routing requests to appropriate LLMs utilizing feedback, the combination does not specifically teach to do use the feedback in such a way that a different model is selected when they system performance feedback indicates that a performance metric for the at least one constraint has deviated beyond a respective threshold. However, Kuperman, also in the art of routing LLM requests, teaches this limitation (Kuperman, Claim 1, “for a difference between the first user feedback score and the second user feedback score being less than a threshold, selecting the first LLM as the LLM, and for the difference being equal to or greater than the threshold, selecting the second LLM”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to switch LLMs when the feedback indicates that one LLM is significantly (i.e. greater than a threshold, rather than only slightly) better than another. The motivation to do so is to maintain the other preference criteria until the respective criteria because significantly bad enough (Kuperman, column 8, lines 18-21, “If a difference between satisfaction scores for an open-source LLM and a non-open source LLM is smaller than a threshold, the open source LLM is selected” that is, the open-source constraint is obey as long as the feedback is not worse that a threshold on another constraint).
The Odland/Upadhyay/Kuperman combination thus teaches multi-objective optimization for selecting a routing to an LLM, but is silent regarding whether the multi-objective optimization optimizes such that any further improvement of one constraint in the second subset causes degradation of at least one other constraint in the second subset, i.e. that selecting a different LLM is resulting in the degradation of at least one other constraint in the second subset. However, Liu, also in the art of routing requests to LLMs via multi-objective optimization, teaches multi-objective optimization with these properties (Liu, pg. 3, 1st column, 2nd paragraph, “This study employs the concept of Pareto dominance … a solution is said to dominate another solution if all objective values are at least as good and strictly better in at least one objective .. the Pareto front … [is] the optimal trade-off between the conflicting objectives” see Figs. 1 & 4 & [0072] of the instant disclosure). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to optimize the multi-objective problem of the Odland/Upadhyay combination using Pareto-optimal methods, such as those of Liu. The motivation to do so is “Pareto dominance … is the most common evaluation criterion in multi-objective optimization problems” (Liu, pg. 3, 1st column, 2nd paragraph).
Regarding Claim 2, the Odland/Upadhyay/Kuperman/Liu combination of Claim 1 teaches the one or more non-transitory computer readable storage medium of Claim 1 (and thus the rejection of Claim 1 is incorporated). The combination can be used more than one time, and learns from each use (Upadhyay, [0030], “increases the probability that a high-scoring model can be selected from a subsequent prompt … router improvements can be tuned towards prompts coming from a particular user … ” & [0078], “The selection parameters can be … learned based on user feedback (e.g. user metrics)”). Using the router on a subsequent query, after the metrics and weights have been learned in order to improve performance, thus teaches retrieving an updated plurality of session-specific data elements; dynamically updating the weights, using the updated plurality of session-specific data elements, for the multi-variable optimization, increasing a weight associated with an objective within the multi-variable optimization corresponding to the at least one constraint in the second subset (i.e. learning the selection parameters based on feedback), re-executing the multi-variable optimization using the increased weight and the updated plurality of session-specific data elements to generate an updated candidate set of LLMs; and selecting, as the different LLM, a candidate LLM from the plurality of candidate LMMs that provides improved performance for the at least one constraint of the second subset and satisfies the first set of constraints (Upadhyay, Fig. 12, increased weight chooses a different operating point with increased performance for the now higher-weighted attribute) by using the router on a new query with user preferences that have been updated based on feedback.
Regarding Claim 3, the Odland/Upadhyay/Kuperman/Liu combination of Claim 1 teaches the one or more non-transitory computer readable storage medium of Claim 1 (and thus the rejection of Claim 1 is incorporated). The combination has not yet been shown to teach, but the combination as described does teach, monitoring the prior interaction data, (Odland, [0061], “the routing rules may route a request as a function of … historical performance of available LLMs”) the system environment parameters (Upadhyay, Fig. 14, “User metrics” are monitored, also see [0050], “Metrics can include … metadata or operation characteristics” & [0061], “physical location”), and the computational context values for the request (Upadhyay, [0050], “Metrics can include: performance scores (e.g. measurements of qualities of prompt-response pairs”); determining revised weights for the multi-variable optimization based on changes in the prior interaction data, the system environment parameters, or the computational context values for the request (Upadhyay, [0078], “The selection parameters can be … learned based on user feedback (e.g. user metrics)”) and applying the revised weights in performing the multi-variable optimization to adjust a relative importance of each objective associated with the second subset of constraints (Upadhyay, [0153], “user specific selection parameters” that is parameters are learned and learned parameters are used for additional queries from that user).
Regarding Claim 4, the Odland/Upadhyay/Kuperman/Liu combination of Claim 1 teaches the one or more non-transitory computer readable storage medium of Claim 1 (and thus the rejection of Claim 1 is incorporated). The combination has not yet been shown to teach, but the combination as described does teach comparing, for each of the plurality of candidate LLMs that satisfy the first subset of constraints, results of the multi-variable optimization with the updated weights; and selecting the at least one candidate LLM that most closely satisfies the second subset of constraints in accordance with the updated weights (Upadhyay, [0153], “a set of metrics for each candidate model are aggregated (e.g. using a weighted average weighted by selection parameters) to determine an aggregate metric for the respective candidate model, then a candidate model is selected based on the aggregate performance metric meeting a constraint” with [0152], “a candidate model is selected based on being associated with a highest performance score”).
Regarding Claim 5, the Odland/Upadhyay/Kuperman/Liu combination of Claim 1 teaches the one or more non-transitory computer readable storage medium of Claim 1 (and thus the rejection of Claim 1 is incorporated). Odland further teaches prior to executing the multi-variable optimization, filtering the plurality of session specific data elements to exclude data elements that do not satisfy the privacy and data handling protocols of the first subset of constraints, such that only compliant session-specific data elements are used (Odland, Fig. 7, elements 705, 710, “Receive a content and a user profile for removing PHI” where PHI is personal health information, e.g. under privacy constraints of the first subset of constraints with “Remove PHI from the content based on sanitation rules”) for dynamically updating the weights for the multi-variable optimization (in the Odland/Upadhyay combination, the “user profile” provides the context information for determining the user preference weights, and the user profile is sanitizes, thus only compliant session-specific data elements are used for dynamically updating the weights).
Regarding Claim 6, the Odland/Upadhyay/Kuperman/Liu combination of Claim 1 teaches the one or more non-transitory computer readable storage medium of Claim 1 (and thus the rejection of Claim 1 is incorporated).. The combination has not yet been shown to teach, but Upadhyay teaches, determining a context complexity score derived from the plurality of session specific data elements (Upadhyay, [0054], “metrics are predicted based on prompt information …[which] can include prompt length” where length is a complexity score) and applying a complexity threshold for selecting the at least one candidate LLM based on the context complexity score (Upadhyay, [0054], “a latency can predicted for a given candidate model based on the prompt length” with [0078], “selection parameters are constraints on metrics (e.g. maximum allowable cost, latency, etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the prompt length/context complexity score of Upadhyay in the combination as a further constraint upon which to choose an LLM model. The motivation to do so is to adhere to the maximum allowable latency that a user may desire (Upadhyay, [0078]).
Response to Arguments
Applicant’s arguments filed January 16th, 2026 have been fully considered, but are not fully persuasive.
Applicant’s arguments with respect to the prior art rejections of the independent claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. New reference Kuperman teaches selecting a different LLM when performance feedback below a threshold is received.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Sun et al., “Interval Multiobjective Optimization with Memetic Algorithms” also teaches only updating the solution to a multi-objective optimization problem (i.e. choosing another LLM to which to route the request based on multiple constraints) when the performance for a particular constraint falls below a threshold level of performance, in order to save computation cycles.
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN M SMITH whose telephone number is (469)295-9104. The examiner can normally be reached Monday - Friday, 8:00am - 4pm Pacific.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRIAN M SMITH/Primary Examiner, Art Unit 2122