Prosecution Insights
Last updated: April 19, 2026
Application No. 18/436,261

INTELLIGENT STEWARD PLATFORM FOR VALIDATION OF LARGE LANGUAGE MODEL (LLM) OUTPUTS

Final Rejection §103
Filed
Feb 08, 2024
Examiner
MCLEAN, IAN SCOTT
Art Unit
2654
Tech Center
2600 — Communications
Assignee
BANK OF AMERICA CORPORATION
OA Round
2 (Final)
43%
Grant Probability
Moderate
3-4
OA Rounds
3y 2m
To Grant
74%
With Interview

Examiner Intelligence

Grants 43% of resolved cases
43%
Career Allow Rate
19 granted / 44 resolved
-18.8% vs TC avg
Strong +31% interview lift
Without
With
+31.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
40 currently pending
Career history
84
Total Applications
across all art units

Statute-Specific Performance

§101
9.9%
-30.1% vs TC avg
§103
60.0%
+20.0% vs TC avg
§102
27.2%
-12.8% vs TC avg
§112
2.1%
-37.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 44 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status 1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments 2. Applicant's arguments filed 1/2/2025 have been fully considered but they are not persuasive. a) Applicant argues that Gupta allegedly discloses only a binary classification and therefore fails to teach a tolerable classification defined as an intersection of acceptable and non-acceptable regimes. This argument is not persuasive. Gupta explicitly discloses evaluating generated responses using graded evaluation outputs, including similarity scores and score ranges (e.g., less than 20%, 20-50%, 50-80% and greater than 80%). See Gupta ¶[0075]-[0076]. Such graded ranges inherently disclose intermediate or borderline classifications, which correspond to a tolerable regime positioned between acceptable and non-acceptable outputs. This corresponds to at least (with emphasis) three classification outputs. Gupta does not say acceptable, tolerable and acceptable, however, the classification ranges clearly disclose those exact ranges and classifications b) Applicant further argues that Gupta compares model outputs, not a delta between “historical information” and “updated information,” and therefore fails to disclose this limitation. This argument is not persuasive. Gupta discloses receiving updated information to a stored data table and identifying a delta between the updated information and historical information via transaction log change data that records changes with respect to the previous version of data table, see ¶[0033]-[0035]. Gupta also discloses identifying a delta value between textual information inputs by comparing an expected output to a generated response and calculating a similarity score via vector similarity techniques, see ¶[0047]-[0049] and ¶[0059]. c) Applicant argues that Pedersen retrains the classification model itself and therefore fails to each an additional model layered on top of a closed loop steward model. This argument is not persuasive. The examiner agrees that Gupta does not expressly disclose updated a separate, dynamically updated machine learning layer without updating the LLM steward model. However, Pedersen expressly discloses this functionality and it would have been obvious to one of ordinary skill in the art to incorporate Pedersen’s teaching into Gupta’s system. Pedersen discloses a system in which a classification model is refined through user feedback without altering the underlaying base model, by retraining or updating a secondary model component that operates on top of existing model outputs (Pedersen ¶[0042], ¶[0068]). Specifically, Pedersen teaches collecting feedback regarding classification correctness and using that feedback to retrain machine learning models responsible for classification thereby dynamically adjusting classification behavior which could be applied to a separate layer in Gupta, without affecting the LLM steward model. Gupta discloses an evaluation pipeline in which LLM outputs are analyzed scored and classified according to defined regimes and ranges. Gupta further discloses continuous intake of updated information used to refine evaluation behavior. It would have been obvious to one of ordinary skill in the art to recognize that updating a separate evaluation or classification layer as taught by Pedersen. The motivation for doing so is that it “may provide an improved user experience for users of any network -accessible platform that allows its users to exchange user-generated content.“ Accordingly, it would have been obvious to modify Gupta’s evaluation system to include Pedersen’s dynamically updated machine-learning layer positioned on top of the LLM steward model, such that regime classifications are adjusted based on feedback driven updates to the additional model while leaving the LLM steward model unchanged, as recited in the claim. d) Applicant argues a closed loop model as recited distinguishes from the prior art. This argument is not persuasive. A closed loop model as recited is not limited to continuously retraining the same underlying model weights in real time. Rather the claim requires that outputs of the system are evaluated and that information derived from that evaluation is fed back into the system to influence subsequent operations. Gupta satisfies a closed loop model because the system generates LLM outputs, evaluates those outputs using scoring functions and user provided ground truth feedback and stores the evaluation results and feedback for use in subsequent evaluations and comparisons. This feedback driven evaluation cycle is a closed loop even if the underlying base LLM is not itself retrained. The claims only require the steward system operates in a feedback-informed loop, which Gupta expressly discloses and Pedersen supports. e) Applicant lastly argues that Gupta actually teaches away with regards to the obviousness for claim 2. The Examiner respectfully disagrees. While Gupta does not state it modifies the generated text by removing it, Gupta discloses evaluating LLM outputs against predetermined acceptability criteria and identifying specific portions requiring correction. Modifying identified unacceptable portions to conform with the applicable regime represents a predictable variation of what Gupta already does. Pedersen further discloses updating classified text that is to be presented to a user based on such classifications. Specifically, Pedersen receives unclassified text, classifies the text using trained machine learning models, determines whether the text should be moderated for a user and in response causes the text presented on the client machine to be modified Newly added amendments to claims 8 and 18 are taught by newly cited Ignatyev. In view of the arguments above the rejection of claims 1-20 is maintained. Claim Rejections - 35 USC § 103 3. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. 4. Claims 1-7 and 9-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta (US 2025/0124236) in view of Pedersen (US 2020/0142999). Regarding Claim 1: Gupta discloses a computing platform comprising: at least one processor (Gupta: Fig. 7 discloses a processor); a communication interface communicatively coupled to the at least one processor (Gupta: p[0082]-[0083 discloses network interface device 720 coupled to the processor through data bus 708); and memory storing computer-readable instructions that, when executed by the at least one processor (Gupta: p[0082]-[0083]), cause the computing platform to: train, using historical information indicating a plurality of regimes for large language model (LLM) outputs, an LLM steward model, wherein training the LLM steward model configures the LLM steward model to generate LLM validation information indicating three different classifications of LLM outputs, wherein the classifications comprise acceptable, tolerable, and non-acceptable, wherein the tolerable classification comprises an intersection of the acceptable and non-acceptable classifications, and wherein the LLM steward model is a closed loop mode (Gupta: ¶[0040], ¶[0060]-[0061] discloses training an evaluation LLM to classify LLM outputs using predefined evaluation functions, including toxicity, hallucination and quality metrics, producing validation results that categorize outputs across multiple graded outcome ranges rather than a single binary classification. Gupta further discloses continuous evaluation and feedback of LLM outputs through the evaluation pipeline in Fig. 6) receive updated information associated with the plurality of regimes (Gupta: ¶[0070]-[0075] discloses continuously receiving updated evaluation data and user-provided information associated with the evaluation of LLM outputs, including updated datasets and feedback used by the system during operation); identify a delta value between the historical information and the updated information (Gupta: ¶[0033]-[0035], ¶[0047]-[0049] and ¶[0059] discloses receiving updated information to a stored data table and identifying a delta between the updated information and historical information via transaction log change data that records changes with respect to the previous version of data table; Gupta also discloses identifying a delta value between textual information inputs by comparing an expected output to a generated response and calculating a similarity score via vector similarity techniques); update, based on the delta value, the plurality of regimes to adjust corresponding classifications of acceptable, tolerable, and non-acceptable (Gupta: ¶[0047]-[0049], ¶[0059] and ¶[0074]-[0075] discloses maintaining data logs of prior evaluated responses and receiving provided expected responses or labels for newly generated outputs and computing similarity scores and differences between stored evaluation records and newly labeled outputs. These comparisons are performed between previously stored evaluation data and newly received input information, thereby identify a delta value between historical input information and updated input information (the generated response and the expected response) and the delta value is the similarity score quantifying difference between generated output and expected response. By comparatively evaluating generated responses against user provided expected responses, assigning a similarity score to each response and then using that score to identify, highlight and distinguish deficient and acceptable portions of the responses, Gupta treats higher scores as more acceptable therefore adjusting response acceptability categories based on the evaluation score), input, into an LLM, an LLM prompt, wherein inputting the LLM prompt causes the LLM to generate an LLM output (Gupta: ¶[0045] explicitly discloses inputting a prompt into an LLM to generate a response); input the LLM output into the LLM steward model and the additional model, wherein inputting the LLM output into the LLM steward model and the additional model causes the LLM steward model to output the LLM validation information (Gupta: ¶[0046], ¶[0060] and ¶[0061] discloses sending the generated responses to an evaluation LLM to generate validation results); and based on outputting LLM validation information indicating that the LLM output is acceptable or tolerable, send the LLM output to a user device for presentation (Gupta: ¶[0070]-[0073] discloses once validation results are produced, the evaluated LLM output is sent to the client device and displayed to the user). Gupta does not explicitly disclose wherein updating the plurality of regimes comprises updating an additional model that is dynamically updated, and wherein the additional model is a layer added on top of the LLM steward model. However, Pedersen discloses this limitation: (Pedersen: p[0042], p[0068], discloses implementing classification adjustments through a dynamically updated secondary machine learning layer that refines classification behavior). It would have been obvious to one of ordinary skill in the art before the effective filing date to try combining the evaluation system of Gupta, which evaluates large language model outputs, with Pedersen’s feedback-based retraining method because both references address improving classification accuracy. Pedersen provides a known, predictable technique (user feedback driven retraining for updating model parameters. It would have been obvious to one of ordinary skill in the art to recognize that updating a separate evaluation or classification layer as taught by Pedersen. The motivation for doing so is that it “may provide an improved user experience for users of any network -accessible platform that allows its users to exchange user-generated content.“ Regarding Claim 2: The proposed combination of Gupta in view of Pedersen further disclose computing platform of claim 1, wherein the memory stores additional computer readable instructions that, when executed by the at least one processor, cause the computing platform to: based on outputting LLM validation information indicating that the LLM output is non-acceptable, update the LLM output to conform with a corresponding subset of the plurality of regimes (Gupta: Figs. 3A-C and Fig 5, discloses replacing otherwise visible comment text with moderated content when the text fails to satisfy applicable user preference or moderation criteria). It would have been obvious to one of ordinary skill in the art to disclose update the LLM output to conform with a corresponding subset of the plurality of regimes. Gupta discloses generating outputs from one or more LLMs, evaluating those outputs and presenting the results to a user via a user interface. However, Gupta differs from the claimed invention in that it does not explicitly modify the actual output text itself to bring it into compliance with predetermined regimes, it only marks which portions are acceptable or not. Pedersen discloses moderation techniques to update the output so that it conforms to a specific subset. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gupta’s system such that, upon outputting validation information indicating that the LLM output is non-acceptable, applying a moderation technique to make the text conform to a particular regime. The suggestion/motivation for doing so is: “ Human moderators cannot moderate user-generated content effectively when the user community is large” as disclosed in Pedersen ¶[0003]. Regarding Claim 3: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 1, wherein the memory stores additional computer readable instructions that, when executed by the at least one processor, cause the computing platform to: update, via a dynamic feedback loop and based on feedback received from the user device, the LLM steward model (Gupta: p[0076] UI includes a feedback mechanism which allows users to indicate the quality of generated response; Pedersen p[0042] user feedback is actually used in order to update the evaluation model). It would have been obvious to one of ordinary skill to combine Gupta’s evaluation and feedback system with Pedersen’s dynamic retraining approach so that user feedback received through Gupta’s interface would be used to continuously update the LLM steward model. This simple addition is straightforward and predictable, improving model performance over time by automatically adapting based on real-world user interactions. Regarding Claim 4: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 1, wherein the historical information includes one or more of: text information, images, speech information, structured information, three dimensional signals, literature information, cultural information, social information, geographical information, legal information, or linguistic information (Gupta: p[0024]-[0025] discloses the LLM is trained on large amounts of data from various sources including websites, articles, posts on the web, images, audio etc.). Regarding Claim 5: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 1, wherein each of the regimes define content that, when included in an output from the LLM, is one or more of: acceptable, tolerable, or non-acceptable (Gupta: p[0060]-[0061] classifies the LLM output as toxic/non-toxic). Regarding Claim 6: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 1, wherein outputting the LLM validation information comprises: identifying one or more regimes, of the plurality of regimes, associated with the LLM prompt (Gupta: p[0046]-[0048] teaches applying different evaluation functions, such as toxicity detection or keyword similarity to classify and score outputs, these evaluation functions correspond to the claimed regimes where each regime represents a set of rules or thresholds used to assess a response), identifying a location of the LLM output, within the one or more regimes associated with the LLM prompt (Gupta: p[0070]-[0071]Gupta discloses identifying the exact location of issues within the LLM output by highlighting specific words or portions), based on identifying that the LLM output is within an acceptable regime or a tolerable regime, outputting an indication that the LLM output is acceptable (Gupta: p[0060] discloses generating toxicity score and comparing it to a threshold, if the score is below the threshold the response is considered acceptable or tolerable and an indication (e.g., a green highlight or score)), and based on identifying that the LLM output is within an non-acceptable regime, outputting an indication that the LLM output is non-acceptable (Gupta: p[0060] p[0070]-[0071] teaches classifying an LLM output as non-acceptable when it exceeds a threshold and displaying a visual indicator, such as a red highlight). Regarding Claim 7: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 6, wherein the LLM steward model comprises a foundational model, and wherein identifying the one or more regimes associated with the LLM prompt comprises: identifying a plurality of overlapping clusters, within the foundational model, that characterize the LLM prompt, and identifying regimes corresponding to the plurality of overlapping clusters (Gupta: p[0042] discloses that the LLM grading model operates as a distributed system of clusters in the data layer, where different clusters process portions of prompts to execute evaluation tasks, specifically Gupta explains that the query processing module provides prompts to appropriate clusters and receives results from those clusters, these clusters naturally overlap as multiple tasks such as toxicity detection hallucination detection and keyword similarity share these resources). Regarding Claim 9: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 1, wherein outputting the LLM validation information comprises: generating a confidence score indicating a confidence that the LLM output is acceptable or non-acceptable, comparing the confidence score to a confidence threshold, based on identifying that the confidence score meets or exceeds the confidence threshold, outputting the LLM validation information, and based on identifying that the confidence score fails to meet or exceed the confidence threshold, sending a request to the user device for additional information for use in updating the confidence score (Gupta: p[0059]-[0061] discloses calculating similarity scores, similarity scores and toxicity scores, further discloses that these scores are compared to a threshold to classify output; p[0062] discloses generating visual indicators to represent that an output meets or exceeds the evaluation threshold and is displayed to the user; p[0075]-[0077] discloses a feedback mechanism allowing users to give positive/negative input on borderline or uncertain outputs in order to update and improve the evaluation process). Regarding Claim 10: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 1, wherein the LLM corresponds to a chatbot (Gupta: p[0043] discloses that the evaluation functions and system can be applied to chatbot applications as part of NLP tasks). Regarding Claim 11: Claim 11 has been analyzed with regard to claims 1 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 12: Claim 12 has been analyzed with regard to claims 2 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 13: Claim 13 has been analyzed with regard to claims 3 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 14: Claim 14 has been analyzed with regard to claims 4 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 15: Claim 15 has been analyzed with regard to claims 5 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 16: Claim 16 has been analyzed with regard to claims 6 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 17: Claim 17 has been analyzed with regard to claims 7 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 19: Claim 19 has been analyzed with regard to claims 9 (see rejection above) and is rejected for the same reasons of obviousness as used above. Regarding Claim 20: Claim 20 has been analyzed with regard to claims 1 (see rejection above) and is rejected for the same reasons of obviousness as used above. It is noted that Gupta discloses a non-transitory computer readable medium at least at p[0015]. 5. Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta in view of Pedersen, and further in view of Ignatyev (US 2017/0236182). Regarding Claim 8: The proposed combination of Gupta in view of Pedersen further discloses the computing platform of claim 7, wherein the plurality of overlapping clusters are identified based on an internet protocol (IP) address of a user submitting the LLM prompt (Gupta: p[0016]-[0017] discloses a network and data processing service that connect to a user client device p[0039] discloses a web-based interface where the data may be submitted and where the results may be obtained which, all web protocols require an IP address). The combination of and Gupta and Pedersen does not explicitly disclose and wherein : a first cluster of the plurality of overlapping clusters represents a geographic region, a second cluster of the plurality of overlapping clusters represents a cultural group, and an intersection of the first cluster and the second cluster represents members of the cultural group within the geographic region. However, Ignatyev discloses and wherein: a first cluster of the plurality of overlapping clusters represents a geographic region (Ignatyev: Fig. 4a step 408, ¶[0079] discloses identifying clusters based on IP addresses where the IP address represents geographic location), a second cluster of the plurality of overlapping clusters represents a cultural group (Ignatyev: ¶[0094] teaches that ethnicity and nationality are demographic (cultural) groupings), and an intersection of the first cluster and the second cluster represents members of the cultural group within the geographic region (Ignatyev: Fig. 4a steps 408-410, ¶[0096] and ¶[0142] discloses a geographic cluster a cultural cluster and their intersection, i.e., ethnicity/nationality of people living in a region). It would have been obvious to one of ordinary skill in the art to disclose update the LLM output to conform with a corresponding subset of the plurality of regimes. Gupta discloses generating outputs from one or more LLMs, evaluating those outputs and presenting the results to a user via a user interface. However, Gupta differs from the claimed invention in that it does not cluster users representing geographic regions and cultural groups. Ignatyev discloses doing this based on an IP address. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to disclose this functionality. The suggestion/motivation for doing so is explained in Ignatyev, explaining that conventional approaches relying solely on transaction data or course IP-based aggregation are “imprecise and “too uncertain to rely upon” and therefore teaches combining geographic location information with demographic characteristics in ¶[0004]-[0008]. Regarding Claim 18: Claim 18 has been analyzed with regard to claims 8 (see rejection above) and is rejected for the same reasons of obviousness as used above. Conclusion THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN SCOTT MCLEAN whose telephone number is (703)756-4599. The examiner can normally be reached "Monday - Friday 8:00-5:00 EST, off Every 2nd Friday". Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at (571) 272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /IAN SCOTT MCLEAN/Examiner, Art Unit 2654 /HAI PHAN/Supervisory Patent Examiner, Art Unit 2654
Read full office action

Prosecution Timeline

Feb 08, 2024
Application Filed
Sep 26, 2025
Non-Final Rejection — §103
Jan 02, 2026
Response Filed
Feb 06, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602553
SPEECH TRANSLATION METHOD, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 14, 2026
Patent 12494199
VOICE INTERACTION METHOD AND ELECTRONIC DEVICE
2y 5m to grant Granted Dec 09, 2025
Patent 12443805
Systems and Methods for Multilingual Data Processing and Arrangement on a Multilingual User Interface
2y 5m to grant Granted Oct 14, 2025
Patent 12437144
Content Recommendation Method and User Terminal
2y 5m to grant Granted Oct 07, 2025
Patent 12400644
DYNAMIC LANGUAGE MODEL UPDATES WITH BOOSTING
2y 5m to grant Granted Aug 26, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
43%
Grant Probability
74%
With Interview (+31.0%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 44 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month