Last updated: May 29, 2026

Application No. 18/755,440

SYSTEM, METHOD, AND COMPUTER PROGRAM FOR EVOLVING MULTI-TURN CHATBOT DIALOGS

Non-Final OA §102

Filed

Jun 26, 2024

Examiner

SINGH, SATWANT K

Art Unit

2653

Tech Center

2600 — Communications

Assignee

Amdocs Development Limited

OA Round

1 (Non-Final)

Interview Optional

— +9.6% interview lift. Interview lift (+9.6%) is below the 15.0% threshold. A written response is recommended.

Based on 793 resolved cases, 2023–2026

Examiner Intelligence

SINGH, SATWANT K View full profile →

Grants 90% — above average

Career Allowance Rate

712 granted / 793 resolved

+27.8% vs TC avg

Moderate +10% lift

Without

With

+9.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

12 currently pending

Career history

803

Total Applications

across all art units

Statute-Specific Performance

§101

6.2%

-33.8% vs TC avg

§103

44.9%

+4.9% vs TC avg

§102

34.3%

-5.7% vs TC avg

§112

0.6%

-39.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 793 resolved cases

Office Action

§102

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDSs) submitted on 07/09/2024, 01/15/2025, and 10/23/2025 were filed in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ruochen Zhao et al: (Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-Battles and Committee Discussions”, ARXIV.ORG, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY 14853, 30 May 2024 (2024-05-30), XP091772881, IDS supplied).
Regarding Claim 1, Ruochen Zhao et al discloses a non-transitory computer-readable media storing computer instructions which when executed by one or more processors of a device cause the device to evolve a large language model (LLM)-based chatbot (Peer Battle) (page 2, Figure 2, Section 3.2) over at least one iteration (Overall, the peer battle consists of 3 rounds, where the candidates take turns to speak) (page 2, Figure 2, Section 3.2) that includes: presenting, by a large language model (LLM)-based evaluator, a question to a LLM-based chatbot during a dialog with the LLM-based chatbot (For debate questions, as using a static dataset could incur data contamination concerns and result in unfair evaluations, we ask an LLM examiner agent to dynamically generate questions. The examiner agent could be any capable LLM) (page 4, Section 3.1) comprised of a sequence of question and answer pairs (The process is illustrated in Figure 2. In the first round, A gives an initial response to the examiner's question; B criticizes the weaknesses in A's response and raises a targeted follow-up question; and A responds to B's question) (page 2, Figure 2, Section 3.2); receiving, by the LLM-based evaluator, an answer to the question from the LLM-based chatbot (In the first round, A gives an initial response to the examiner's question) (page 2, Figure 2, Section 3.2: Peer Debate); evaluating, by the LLM-based evaluator, the answer according to one or more evaluation metrics and a ground truth (Given the questions, the LLM-produced answers are compared to ground-truth answers using metrics such as accuracy) (page 1, Section 1); determining, by the LLM-based evaluator, that a result of the evaluation is unsatisfactory (Candidate A (powered by Yi-34B-Chat) gives a wrong answer as it miscounts occurrences for repeated letters and miscalculates factorials) (page 9, Section 5.1); and presenting, by the LLM-based evaluator, a follow-up question to the LLM-based chatbot designed to encourage a new answer of the LLM-based chatbot (The opponent B (powered by Claude-3-Haiku) quickly and precisely points out these two issues and skillfully raised a follow-up that targets A's weaknesses: "how about the word 'BANANA?" ) (page 9, Section 5.1) to be satisfactory with respect to the ground truth and to cause an optimization of the LLM-based chatbot (Given the questions, the LLM-produced answers are compared to ground-truth answers using metrics such as accuracy) (page 1, Section 1).
Regarding Claim 2, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the LLM-based chatbot is evolved over a plurality of iterations each corresponding to different question and answer pair in the sequence of question and answer pairs (Overall, the peer battle consists of 3 rounds, where the candidates take turns to speak. The entire dialogue history is visible to both candidates. The process is illustrated in Figure 2. In the first round, A gives an initial response to the examiner's question; B criticizes the weaknesses in A's response and raises a targeted follow-up question; and A responds to B's question. In the second round, A and B are reversed: B gives an initial response to the examiner's question (without seeing A's response); A criticizes and raises questions; and B responds to A's question. In the third round, A and B cross-examine each other. A starts by criticizing B's previous loopholes and raises follow-up questions. After responding, B also criticizes A's loopholes and raises questions. A concludes the battle by responding again. In this process, both A and B get an equal number of each action to ensure fairness. To further reduce position bias, A and B's order is randomly shuffled at the beginning of each debate (pages 4 and 5, Section 3.2).
Regarding Claim 3, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein when the LLM-based evaluator determines that a result of the evaluation for a given question and answer pair is satisfactory with respect to the ground truth (Given the questions, the LLM-produced answers are compared to ground-truth answers using metrics such as accuracy) (page 1, Section 1), then the LLM-based evaluator begins a next iteration of the plurality of iterations (Overall, the peer battle consists of 3 rounds, where the candidates take turns to speak. The entire dialogue history is visible to both candidates. The process is illustrated in Figure 2. In the first round, A gives an initial response to the examiner's question; B criticizes the weaknesses in A's response and raises a targeted follow-up question; and A responds to B's question. In the second round, A and B are reversed: B gives an initial response to the examiner's question (without seeing A's response); A criticizes and raises questions; and B responds to A's question. In the third round, A and B cross-examine each other. A starts by criticizing B's previous loopholes and raises follow-up questions. After responding, B also criticizes A's loopholes and raises questions. A concludes the battle by responding again. In this process, both A and B get an equal number of each action to ensure fairness. To further reduce position bias, A and B's order is randomly shuffled at the beginning of each debate (pages 4 and 5, Section 3.2).
Regarding Claim 4, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the evaluating of the answer is further performed according to prior question and answer pairs occurring in the dialog (In the third round, A and B cross-examine each other. A starts by criticizing B's previous loopholes and raises follow-up questions. After responding, B also criticizes A's loopholes and raises questions. A concludes the battle by responding again. In this process, both A and B get an equal number of each action to ensure fairness. To further reduce position bias, A and B's order is randomly shuffled at the beginning of each debate (pages 4 and 5, Section 3.2).
Regarding Claim 5, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the one or more evaluation metrics include one or more automatically calculable natural language processing (NLP) measures (This is a competitive chatbot arena. You are competing against another chatbot assistant in a debate and being judged by a committee on factors such as helpfulness, relevance, accuracy, depth, and creativity) (Page 15, Section A.1.2, Prompts).
Regarding Claim 6, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein evaluating, by the LLM-based evaluator, the answer according to the one or more evaluation metrics and the ground truth includes: calculating a score for the answer based on the one or more evaluation metrics and the ground truth (For logical-reasoning questions that have ground-truth answers (reasoning, code, math), LLM-as- a-judge is known to show weak performances in judging the quality of responses. We adopt prior approaches to establish the reference-based judge [32]. Specifically, we utilize the strongest model (according to the current ranking) to generate a reference answer and provide it to the judge when evaluating the peer battle) (Page 5, Section 3.3)
Regarding Claim 7, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the result of the evaluation is unsatisfactory when the score is below a predefined threshold (In the first round, the committee is initialized with MMLU [15] scores to approximate LLM performances. They will first be asked to read through the battle history, elaborate judgment reasons, and give a verdict on whether A is better, or B is better, or if there is a tie) (Page 5, Section 3.3)
Regarding Claim 8, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the LLM-based evaluator presents up to a threshold number of follow-up questions until the new answer of the LLM-based chatbot is evaluated to be satisfactory with respect to the ground truth (Each pair of candidates engage in 40 peer battles, with 5 questions from each of the 8 categories. The questions are generated by GPT-4. As each battle consists of 3 rounds (each candidate speaks for 4 times), we expect the competition scale to be approximately the same as MT-Bench (80 questions, each candidate speaks twice)) (Page 6, Section 4.1).
Regarding Claim 9, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein when the LLM-based evaluator presents the threshold number of follow-up questions without the new answer of the LLM-based chatbot being evaluated as satisfactory with respect to the ground truth, then an error analysis is caused to be performed on the LLM-based chatbot (Each pair of candidates engage in 40 peer battles, with 5 questions from each of the 8 categories. The questions are generated by GPT-4. As each battle consists of 3 rounds (each candidate speaks for 4 times), we expect the competition scale to be approximately the same as MT-Bench (80 questions, each candidate speaks twice)) (Page 6, Section 4.1).
Regarding Claim 10, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the LLM-based chatbot is initially trained on a dataset comprised of individual question and answer pairs (One line of research conducts automatic evaluation with static datasets. Among these, static datasets with predefined metrics, such as GSM8k [9] and MMLU [15], are constructed with aspect-specific input-output pairs, such as questions and their corresponding answers) (page 1, Section 1).
Regarding Claim 11, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the LLM-based chatbot evolved to include a multi-turn question and answer dataset (Secondly, two candidate LLMs interact with each other and engage in a multi-round peer battle by answering the seed question individually, criticizing the opponent's weaknesses, and raising targeted follow-up queries to challenge the opponent further) (page 2, Section 1).
Regarding Claim 12, Ruochen Zhao et al discloses the non-transitory computer-readable media, wherein the device is further caused to: output the evolved LLM-based chatbot for use (A noticeable example is Chatbot Arena [32], which is a crowdsourced voting platform that gathers anonymous votes on LLM performances and calculates ELO scores to rank these models) (page 2, Section 1).
Claims 13 and 20 are rejected for the same reason as claim 1.
Claims 14 is rejected for the same reason as claim 2.
Claims 15 is rejected for the same reason as claim 3.
Claims 16 is rejected for the same reason as claim 4.
Claims 17 is rejected for the same reason as claim 5.
Claims 18 is rejected for the same reason as claim 6.
Claims 19 is rejected for the same reason as claim 7.
Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Barron et al. (US 2024/0311407) discloses an artificial intelligence agricultural advisor chatbot system powered by large language models (LLMs) and customized for the agricultural domain using a blend of agricultural datasets can include tools providing custom context relevant to user queries.
Gado et al. (US 2025/0384280) discloses training data generation for large language model (LLM) training and/or benchmarking.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SATWANT K SINGH whose telephone number is (571)272-7468. The examiner can normally be reached Monday thru Friday 9:00 AM to 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D Shah can be reached at (571}270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SATWANT K SINGH/Primary Examiner, Art Unit 2653

Read full office action

Prosecution Timeline

Jun 26, 2024

Application Filed

Feb 11, 2026

Non-Final Rejection mailed — §102

May 08, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

18/301,657

Patent 12639517

TOOLS FOR CONFORMANCE OF NETWORK-BASED MEDIA PROCESSING (NBMP) DOCUMENTS AND ENTITIES

3y 1m to grant Granted May 26, 2026

18/444,212

Patent 12640956

SINK DEVICE

2y 3m to grant Granted May 26, 2026

18/642,905

Patent 12626064

META-REFLECTION TECHNIQUES FOR LEARNING INSTRUCTIONS FOR LANGUAGE AGENTS USING PAST SELF-REFLECTIONS

2y 0m to grant Granted May 12, 2026

18/641,428

Patent 12619829

AUTOMATIC GENERATION OF SCIENTIFIC ARTICLE METADATA

2y 0m to grant Granted May 05, 2026

18/478,613

Patent 12614035

RETRIEVAL AUGMENTED GENERATION

2y 7m to grant Granted Apr 28, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

90%

Grant Probability

99%

With Interview (+9.6%)

2y 5m (~6m remaining)

Median Time to Grant

Low

PTA Risk

Based on 793 resolved cases by this examiner. Grant probability derived from career allowance rate.