Office Action Analysis: 18742627 — DYNAMIC EVALUATION SYSTEM FOR RESPONSIBLE AI IN LARGE LANGUAGE MODELS

Office Action

§101 §102 §103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 18 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 18 recites the limitation language model in line one of the claims when referring to the independent system claim 17.  There is insufficient antecedent basis for this limitation in the claim as claim 17 refers to a language model-based application which is a distinct entity compared to a language model itself. An application is a software wrapper whilst the model itself is the underlying engine.  
The examiner suggests amending the claim by adding, “configuration settings for a language model of the LM-based application” in order to alleviate this rejection.

Claim Rejections - 35 USC § 101
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is a mental process without significantly more.
Independent claims 1, 13, and 17 regard a process that, as drafted under its broadest reasonable interpretation (BRI), covers a feedback-driven testing loop where conversation parameters are adjusted to evaluate compliance with a guideline. For example, under the BRI the method claim relates to:
receiving initial conversation parameters including persona settings and an RAI guideline (a person can receive and review testing rules and a persona profile mentally or via a pen and paper); generating a conversational-input prompt requesting a conversational input (a person can draft a prompt or question based on rules mentally or via pen and paper); providing the prompt to a language model and receiving a response (a person can provide a prompt to an AI, such as ChatGPT, and read the response); storing the input and response as a simulated conversation (a person can record a conversation transcript via pen and paper); generating feedback based on the stored conversation (a person can analyze a conversation to determine its effectiveness mentally or via pen and paper); adjusting conversation parameters based on the feedback (a person can decide to change a persona's traits or the testing approach for the next round of testing mentally or via pen and paper); and evaluating an RAI compliance based on whether the conversation violated the guideline (a person can judge whether a recorded conversation violates a set of moral or safety rules mentally or via pen and paper).
As described above, these limitations can be carried out as a series of mental steps. The judicial exception is not integrated into a practical application because the only additional elements recited are a system comprising of a computer processor and memory, which is general purpose hardware being used as a tool to implement the mental process, and instructions that are conventional components that utilize the basic functions of a computer.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as described above, the only additional elements recited are a system comprising of a computer processor and memory, which is general purpose hardware being used as a tool to implement the mental process.
The remaining dependent claims fail to add patent eligible subject matter to the independent claims:
Claims 2, 17 simply adds a system description to the prompt which a human can perform mentally and/or with a pen and paper.
Claims 3, 4 simply adds an adversarial seed or logged prior conversation, which is a routine "referencing of past examples" that a human can perform mentally and/or with a pen and paper.
Claims 5, 9 simply adds the organization of conversations into sets which a human can perform mentally and/or with a pen and paper.
Claims 6, 7, 8 simply adds the generation of metrics like relevance or diversity, which is a standard "statistical or qualitative assessment" that a human can perform mentally and/or with a pen and paper.
Claims 10, 15, 19 simply adds the use of an evaluation prompt to determine violations, which is a routine mental process of "asking a third party for an opinion" or "checking against a checklist" that a human can perform.
Claim 11 simply limits the environment to a "chatbot" which is a generic medium that does not shift the mental process into a practical application (a human can interact with a chatbot (e.g., ChatGPT) to perform the recited evaluation steps).
Claims 12, 14 simply adds specific personality traits (e.g., openness, extraversion), which is an activity used in human role-playing that a human can do mentally.
Claim 16 simply adds the generation of a compliance score, which is a basic "mathematical tallying" task that a human can perform mentally and/or with a pen and paper.
Claim 18 simply adds the adjustment of model configuration settings (top-p, temperature), which is a conventional "fine-tuning" step that a human can perform via a standard user interface.
Claim 20 simply adds that parameters are received through a configuration interface, which is a generic computer component for data entry that a human can perform by entering or receiving instructions through a standard digital form or a physical checklist.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1-6, 9-17, 19-20 are rejected under 35 U.S.C. 102(a)(1) and 35 U.S.C. 102(a)(2) as being anticipated by Akkiraju et al. (hereinafter Akkiraju) (US 11190464 B2) (Paragraph numbers from attached copy).
	
Regarding claim 1, Akkiraju teaches:
A system for testing responsible artificial intelligence (RAI) compliance of a language model (LM) based application, the system comprising (Akkiraju, P[3]: “The method may additionally include one or more processors determining an assessment of the performance of the customer service agent based on the interaction between the customer service agent and the chatbot.” (determining an assessment of the performance reads on testing compliance of the LM based application), P[22]: “Customer service agents may be trained to identify the fourth customer chatbot and to apply company anti-fraud policies.”, P[7]: “Many organizations spend tremendous resources to provide extensive training to their agents pertaining to products, services, and guidelines for dealing with concerns and requests from customers.” (anti-fraud policies and guidelines for dealing with concerns of customers reads on RAI compliance): 
at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the system to perform operations comprising: (Akkiraju, P[47]: "computer processor(s) 404, memory 406, persistent storage 408" and P[50]: "Integrated learning environment program 106 is stored in persistent storage 408 for execution by one or more of the respective computer processors 404," reads on a system with a processor and memory configured to perform program operations.)
receiving initial conversation parameters including at least one or more initial persona settings for a simulated user and an RAI guideline (Akkiraju, P[37]: "integrated learning environment program 106 receives a customer persona" and P[7]: "guidelines for dealing with concerns and requests from customers," reads on receiving initial simulation parameters including behavioral personas and organizational guidelines.)
generating a first conversational-input prompt requesting a first conversational input, wherein the first conversational input prompt includes at least a portion of the initial conversation parameters (Akkiraju, P[39]: "determines a chatbot behavior and responses using a customer service interaction data and the persona" and P[40]: "initiates the training by sending the message 'My flight is Delayed!'," reads on generating a prompt or initial message for the simulation based on the received persona parameters.)
providing the generated first conversational-input prompt as input to a language model (Akkiraju, P[42]: "encoder 604 is a neural network that maps a variable-length input sequence 602 to fixed-length vector 606," reads on providing an input sequence to a neural network/language model component of the simulation.)
receiving, from the language model in response to the first conversational-input prompt, the requested first conversational input (Akkiraju, P[24]: "customer chatbots module 108 may generate multiple responses and select at least one response according to the target style, tone, or persona," reads on receiving the generated conversational output from the model based on the prompt.)
transmitting the first conversational input to the LM-based application (Akkiraju, P[42]: "a simulated customer sends variable-length input sequence 602" and "to the customer service agent," reads on sending the generated input to the target application being tested/trained.)
receiving a first conversational response from the LM-based application in response to the first conversational input (Akkiraju, P[40]: "Ben sends the message 'I am sorry to hear that. I am here to help. What's your flight number?' in reply," reads on receiving the response from the target application/agent.)
storing the first conversational input and the first conversational response as a first simulated conversation (Akkiraju, P[14]: "Database 112 is a repository for data" and "Customer service interaction data may include text and/or speech conversations between customers and customer service agents," reads on recording the input/response exchange in a repository.)
generating feedback based on the stored first simulated conversation (Akkiraju, P[25]: "Feedback generation module 110 provides feedback" and "by continuous assessment of the style and performance of the interaction," reads on generating feedback based on the analysis of the recorded conversation.)
adjusting one or more of the initial conversational parameters based on the generated feedback (Akkiraju, P[34]: "customer chatbots module 108 adjusts simulated customer styles and tasks based on the evaluation results" and "by feedback generation module 110," reads on modifying the conversation parameters/styles based on the feedback results.)
generating a second conversational-input prompt requesting a second conversational input, wherein the second conversational input prompt includes at least a portion of the adjusted conversation parameters (Akkiraju, P[41]: "may continue processing at operation 260 to determine a chatbot behavior and responses" and "based on the interaction between the chatbot and the customer service agent," (reads on continuing the simulation for subsequent turns, which inherently includes the steps of generating a second prompt based on the adjusted behavior, providing it to the model, receiving/transmitting the subsequent turn's inputs and responses, and storing the resulting conversation similar to the first iteration), Akkiraju, P[39]: "determines a chatbot behavior and responses using a customer service interaction data and the persona" and P[40]: "initiates the training by sending the message 'My flight is Delayed!'," reads on generating a prompt or initial message for the simulation based on the received persona parameters.) 
providing the generated second conversational-input prompt as input to the language model (Akkiraju, P[42]: "encoder 604 is a neural network that maps a variable-length input sequence 602 to fixed-length vector 606," reads on providing an input sequence to a neural network/language model component of the simulation.)
receiving, from the language model in response to the second conversational-input prompt, the requested second conversational input (Akkiraju, P[24]: "customer chatbots module 108 may generate multiple responses and select at least one response according to the target style, tone, or persona," reads on receiving the generated conversational output from the model based on the prompt.)
transmitting the second conversational input to the LM-based application (Akkiraju, P[42]: "a simulated customer sends variable-length input sequence 602" and "to the customer service agent," reads on sending the generated input to the target application being tested/trained.)
receiving a second conversational response from the LM-based application in response to the second conversational input (Akkiraju, P[40]: "Ben sends the message 'I am sorry to hear that. I am here to help. What's your flight number?' in reply," reads on receiving the response from the target application/agent.)
storing the second conversational input and the second conversational response as a second simulated conversation (Akkiraju, P[14]: "Database 112 is a repository for data" and "Customer service interaction data may include text and/or speech conversations between customers and customer service agents," reads on recording the input/response exchange in a repository.)
and evaluating an RAI compliance of the LM-based application based on at least whether the second simulated conversation violated the RAI guideline (Akkiraju, P[22]: "identify the fourth customer chatbot" and "to apply company anti-fraud policies,", Akkiraju, P[42]: “generates a score for Ben’s performance…generates a score of 0.8 for the reply provided by Ben” (reads on evaluating the performance/compliance of the application by determining if the interaction violated established policies or guidelines, anti-fraud reads on RAI.)

Regarding claim 2, Akkiraju teaches the system described in claim 1
Akkiraju further teaches:
The initial conversation parameters further include a system description of the LM-based application; and 
the first conversational-input prompt further includes the system description (Akkiraju, P[14]: "Customer service interaction data may further include contextual information," and P[39]: "include models to directly consider the persona, task, and context as additional constraints," and P[40]: "the task scenario includes an airline customer in an international flight schedule where the first flight has been delayed," reads on providing a description of the system environment (the airline flight schedule and delay parameters) as part of the initial simulation parameters.)

Regarding claim 3, Akkiraju teaches the system described in claim 1
Akkiraju further teaches:
The initial conversation parameters further include an adversarial seed including at least one of an example topic, example query, or example conversation; and 
the first conversational-input prompt further includes the adversarial seed (Akkiraju, P[14]: "Customer service interaction data may include text and/or speech conversations," and P[37]: "derived from the customer service interaction data by performing unsupervised machine learning methods to extract tone and persona styles," reads on using specific examples from prior conversations to seed the current simulation. Text and/or speech conversations reads on example conversation and tone and persona styles reads on example topic)

Regarding claim 4, Akkiraju teaches the system described in claim 3
Akkiraju further teaches:
The adversarial seed includes at least a portion of a logged prior conversation with the LM-based application (Akkiraju, P[14]: "Customer service interaction data may include text and/or speech conversations between customers and customer service agents," and "collected from customer service call centers, social media, and/or any other source," reads on using logged historical interactions as the basis for the simulation seed.)

Regarding claim 5, Akkiraju teaches the system described in claim 1
Akkiraju further teaches:
The first simulated conversation is stored in a first set of simulated conversations, and the second simulated conversation is stored in a second set of simulated conversations (Akkiraju, P[14]: "Database 112 is a repository for data," and P[35]: "store the style and performance of a customer service agent with respect to different types of customers and task scenarios during the training process," reads on categorizing and storing interactions into distinct data sets for analysis.)

Regarding claim 6, Akkiraju teaches the methods described in claim 5:
Akkiraju further teaches:
Generating the feedback includes generating one or more metrics for the first set of simulated conversations (Akkiraju, P[41]: "determining the similarity... by automatic metrics," and P[42]: "generates a score for Ben's performance," reads on calculating quantitative metrics based on the stored conversation sets.)

Regarding claim 9, Akkiraju teaches the system described in claim 5
Akkiraju further teaches:
Evaluating an RAI compliance includes evaluating whether each simulated conversation in the second set of conversations violated the RAI guideline (Akkiraju, P[31]: "evaluates the performance of an agent ranging from utterance level to task level," and P[34]: "adjusts simulated customer styles and tasks based on the evaluation results," reads on assessing a full set of interactions against the required guidelines.)

Regarding claim 10, Akkiraju teaches the system described in claim 1
Akkiraju further teaches:
Generating an evaluation prompt including the RAI guideline, the second simulated conversation, and an instruction for the language model to determine if the second simulated conversation violated the RAI guidelines (Akkiraju, P[33]: "provide multiple responses" "shown as a multiple-choice question" and "ask the customer service agent to select the best response," reads on generating an evaluative prompt with instructions to judge the content against the defined scenario/guideline.)
providing the evaluation prompt to the language model (Akkiraju, P[42]: "encoder 604 is a neural network that maps a variable-length input sequence 602" and "simulated customer sends variable-length input sequence 602," reads on providing the generated evaluative sequence to a language model/neural network component.)
receiving, from the language model in response to the evaluation prompt, a response indicating whether the second simulated conversation violated the RAI guideline (Akkiraju, P[44]: "identifies the high score for the provided reply" and P[22]: "apply company anti-fraud policies," reads on receiving an output from the model indicating whether the interaction complied with or violated the established policies/guidelines, anti-fraud reads on RAI.)

Regarding claim 11, Akkiraju teaches the system described in claim 1
Akkiraju further teaches:
The LM-based application is a chatbot (Akkiraju, P[1]: "customer care training using chatbots," and P[8]: "training may be performed by using a chatbot," reads on the application under test being a chatbot.)

Regarding claim 12, Akkiraju teaches the system described in claim 1
Akkiraju further teaches:
The persona settings include a setting for at least one of a conscientiousness trait, an openness trait, an extraversion trait, a neuroticism trait, or an agreeableness trait (Akkiraju, P[18]: "unsupervised machine learning methods to extract tone and persona styles," P: "enthusiastic when communicating," and P[28]: "attempt to avoid confrontation," reads on assigning multiple specific behavioral traits associated with an extraversion trait (being enthusiastic) and an agreeableness trait (avoiding confrontation) to the simulated persona.)

Regarding claim 13, claim 13 recites the computer-implemented method corresponding to the method presented in claim 1
Akkiraju further teaches:	A computer-implemented method for testing responsible artificial intelligence (RAI) compliance of a language model (LM) based application, the method comprising: (Akkiraju, Abstract)
generating feedback for a first set of simulated conversations with the LM-based application (Akkiraju, P[0025]: "Feedback generation module 110 provides feedback" and "by continuous assessment of the style and performance of the interaction," reads on generating feedback based on a recorded interaction set.)
based on the feedback, adjusting one or more conversation parameters, wherein the conversation parameters include one or more persona settings for a simulated user (Akkiraju, P[0034]: "customer chatbots module 108 adjusts simulated customer styles and tasks based on the evaluation results," reads on modifying the persona behavioral settings based on previous turns.)
generating a second set of simulated conversations, wherein generating the second set of simulated conversations comprises (Akkiraju, P[0041]: "integrated learning environment program 106 may continue processing at operation 260 to determine a chatbot behavior and responses based on the interaction between the chatbot and the customer service agent," reads on generating a subsequent turn or set of interactions by returning to the generation process after an initial assessment.):
 generating a conversational-input prompt requesting a conversational input, wherein the conversational input prompt includes an RAI guideline and the persona settings (Akkiraju, P[0039]: "determines a chatbot behavior and responses using a customer service interaction data and the persona" and P[0040]: "initiates the training by sending the message," reads on generating a subsequent turn prompt based on the persona and task guidelines.)
providing the generated conversational-input prompt as input to a language model; receiving, from the language model in response to the conversational-input prompt, the requested conversational input (Akkiraju, P[0042]: "encoder 604 is a neural network that maps a variable-length input sequence 602 to fixed-length vector 606," and P[0024]: "generate multiple responses and select at least one response," reads on utilizing an internal model to generate the simulated user's input.)
transmitting the conversational input to the LM-based application (Akkiraju, P[0042]: "a simulated customer sends variable-length input sequence 602 (e.g., 'My flight is Delayed!') to the customer service agent," reads on the act of sending the model-generated input sequence to the target agent/application being tested.); 
receiving a conversational response from the LM-based application in response to the conversational input (Akkiraju, P[0042]: "a simulated customer sends variable-length input sequence 602" and "to the customer service agent," reads on sending the input to the target application and receiving the reply.)
storing the conversational response as part of a simulated conversation of the second set of simulated conversations (Akkiraju, P[0014]: "Database 112 is a repository for data" and "include text and/or speech conversations," reads on recording the subsequent interaction in a repository.)
and evaluating an RAI compliance of the second set of simulated conversations (Akkiraju, P[0022]: "identify the fourth customer chatbot" and "to apply company anti-fraud policies," reads on evaluating the performance of the interaction set against established guidelines.)

Regarding claim 14, Akkiraju teaches the methods described in claim 13
Akkiraju further teaches:
	The persona settings include settings for at least two of a conscientiousness trait, an openness trait, an extraversion trait, a neuroticism trait, or an agreeableness trait (Akkiraju, P[18]: "unsupervised machine learning methods to extract tone and persona styles," P[28]: "enthusiastic when communicating," and P[19]: "attempt to avoid confrontation," reads on assigning multiple specific behavioral traits associated with an extraversion trait (being enthusiastic when communicating) and an agreeableness trait (avoiding confrontation) to the simulated persona.)

	Regarding claim 15, Akkiraju teaches the computer-implemented method corresponding to the system presented in claim 10 and is rejected under the same grounds stated above.

Regarding claim 16, Akkiraju teaches the methods described in claim 15
Akkiraju further teaches:
Generating an RAI compliance score based on whether the conversational response violated the RAI guideline (Akkiraju, P[41]: "determining the similarity" "by automatic metrics," and P[42]: "generates a score for Ben's performance," reads on producing a numerical assessment of adherence to guidelines.)

Regarding claim 17, Akkiraju teaches:	a system for testing responsible artificial intelligence (RAI) compliance of a language model (LM) based application, the system comprising: 
at least one processor; and 
memory storing instructions that, when executed by the at least one processor, cause the system to perform operations comprising: (Akkiraju, P[0047]: "computer processor(s) 404, memory 406, persistent storage 408" and P[0050]: "stored in persistent storage 408 for execution by one or more of the respective computer processors 404," reads on a system with a processor and memory configured to perform program operations.);
receiving initial conversation parameters, wherein the initial conversation parameters comprise: 
an RAI guideline; 
one or more persona settings for a simulated user; and 
a system description of the LM-based application (Akkiraju, P[0037]: "integrated learning environment program 106 receives a customer persona" and P[0007]: "guidelines for dealing with concerns" and P[0014]: "Customer service interaction data may further include contextual information," reads on receiving initial parameters including personas, guidelines, and context or descriptions of the system environment.);
generating a first set of simulated conversations, with the LM-based application, based on the initial conversation parameters (Akkiraju, P[0039]: "determines a chatbot behavior and responses using a customer service interaction data and the persona" and P[0040]: "initiates the training by sending the message," reads on generating an initial set of simulated exchanges based on the input persona and task.);
generating feedback based on the first set of simulated conversations (Akkiraju, P[0025]: "Feedback generation module 110 provides feedback" and "by continuous assessment of the style and performance of the interaction," reads on generating feedback derived from the analysis of a recorded conversation set.);
adjusting one or more of the conversation parameters based on the generated feedback (Akkiraju, P[0034]: "customer chatbots module 108 adjusts simulated customer styles and tasks based on the evaluation results," reads on modifying behavioral or task parameters based on the assessed results of previous interactions.);
generating a second set of simulated conversations, with the LM-based application, based on the adjusted conversation parameters (Akkiraju, P[0041]: "may continue processing at operation 260 to determine a chatbot behavior and responses based on the interaction between the chatbot and the customer service agent," reads on generating a subsequent set of interactions after the parameters have been modified.); and
evaluating an RAI compliance of the LM-based application based on whether the second set of simulated conversations violated the RAI guideline (Akkiraju, P[0022]: "identify the fourth customer chatbot" and "to apply company anti-fraud policies," reads on evaluating the compliance of the interaction set by determining if a specific policy or guideline violation occurred.).

	Regarding claim 19, Akkiraju teaches the system corresponding to the system presented in claim 10 and is rejected under the same grounds stated above.

Regarding claim 20, Akkiraju teaches the system described in claim 17
Akkiraju further teaches:
The initial conversation parameters are received through a configuration interface (Akkiraju, P[16]: "User interface 116" "display text, documents, web browser windows, user options," and "control sequences the user employs to control the program," reads on receiving testing parameters through a GUI/interface.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 7, 8 are rejected under 35 U.S.C. 103 as being unpatentable over Akkiraju et al. (hereinafter Akkiraju) (US 11190464 B2) in view of Kang et al. (hereinafter Kang) (US 10303978 B1) (Paragraph numbers from attached copy).
Regarding claim 7, Akkiraju teaches the methods described in claim 6.
Akkiraju teaches:	The one or more metrics include at least one of a relevance metric, an adversarial metric (Akkiraju, P[0032]: "similarity between the responses generated by the model and the responses provided by the agent by automatic metrics (e.g., BLEU, Tf-idf, word2vec)," reads on the relevance metric; Akkiraju, P[0022]: "identify the fourth customer chatbot" and "to apply company anti-fraud policies," reads on an adversarial metric used to detect deceptive or non-compliant behavior) 
However, Akkiraju does not teach
a diversity and coverage metric 
Kang teaches:
a diversity and coverage metric (Kang, Abstract: "calculating... a coverage metric value and a diversity metric value," reads on the diversity and coverage metric.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Akkiraju in view of Kang. Doing so would have provided the specific clustering methods and diversity and coverage metrics of Kang (Kang, Abstract, [P0038]) with the high-level architecture for deriving persona styles and behavior taxonomies from conversational data of Akkiraju (Akkiraju, [P0018], [P0037]). This would generate diversity and coverage metrics to ensure that a training or testing dataset is sufficiently broad and representative of the varied user interactions required for robust system evaluation.

Regarding claim 8, the combination of Akkiraju and Kang teaches the methods described in claim 7.
Akkiraju further teaches:The one or metrics include the diversity and coverage metric (Akkiraju, [P0018]: "unsupervised machine learning methods to extract tone and persona styles," reads on using unsupervised methods to categorize and ensure a variety of styles in a taxonomy.)
However, Akkiraju does not teach: 	The diversity and coverage metric is generated based on a cluster analysis of embeddings for the simulated conversations 
Kang teaches:
The diversity and coverage metric is generated based on a cluster analysis of embeddings for the simulated conversations (Kang, Abstract: "a coverage metric value and a diversity metric value" and [P0038]: "unsupervised learning (e.g., using K-means clustering)," reads on generating the specific metrics via a cluster analysis algorithm performed on the conversational data.)

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Akkiraju et al. (hereinafter Akkiraju) (US 11190464 B2) in view of Gurgu et al. (hereinafter Gurgu) (US 20230297887 A1).

Regarding claim 18, Akkiraju teaches the system corresponding to the system presented in claim 17.
Gurgu teaches:	the conversation parameters further include configuration settings for the language model, including at least one of a top-p value, a top-k value, or a temperature (Gurgu, [P0111]: "the hyperparameters that may be controlled include, for example, temperature, top-k and top-p," (reads on temperature, top-k and top-p values respectively) and [P0112]: "the temperature hyperparameter is used to control randomness of the generated output," reads on specifically naming and utilizing these probabilistic configuration settings for the language model.);
Gurgu does not teach:
And adjusting the conversation parameters includes adjusting at least one of the configuration settings for the language model.
Akkiraju further teaches:
And adjusting the conversation parameters includes adjusting at least one of the configuration settings for the language model (Akkiraju, P[34]: "customer chatbots module 108 adjusts simulated customer styles and tasks" and "increases the uncertainty in customer responses simulated by chatbots," reads on the functional act of modifying model generation behavior based on performance.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Akkiraju in view of Gurgu. Doing so would have provided the specific hyperparameter selection and optimization techniques of Gurgu (Gurgu, [P0014]-[P0015], [P0111]) with the high-level architecture for iterative chatbot simulation and agent training of Akkiraju (Akkiraju, Abstract), thus improving the precision of the automated feedback loop and providing more granular control over the probabilistic generation of simulated user inputs through the adjustment of temperature, top-p, or top-k values in a real-time conversation analysis system.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHASHIDHAR S MANOHARAN whose telephone number is (571)272-6772. The examiner can normally be reached M-F 8:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHASHIDHAR SHANKAR MANOHARAN/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
DYNAMIC EVALUATION SYSTEM FOR RESPONSIBLE AI IN LARGE LANGUAGE MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

DYNAMIC EVALUATION SYSTEM FOR RESPONSIBLE AI IN LARGE LANGUAGE MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email