Last updated: May 29, 2026
Application No. 19/349,743
SYSTEM AND METHOD FOR EVALUATING ARTIFICIAL INTELLIGENCE AGENTS DEPLOYED IN AN ENTERPRISE COMPUTING ENVIRONMENT

Final Rejection §103§112
Filed
Oct 03, 2025
Priority
Jul 16, 2024 — provisional 63/672,148 +1 more
Examiner
LEY, SALLY THI
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Kpmg LLP
OA Round
2 (Final)
This examiner grants 19% of cases after interview

— +33.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 36 resolved cases, 2023–2026
Examiner Intelligence

LEY, SALLY THI View full profile →
Grants only 19% of cases
Career Allowance Rate
7 granted / 36 resolved
-35.6% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
17 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
10.3%
-29.7% vs TC avg
§103
83.2%
+43.2% vs TC avg
§102
3.8%
-36.2% vs TC avg
§112
2.7%
-37.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
	This Office Action is in response to the communication filed on 17 February 2026.
	Claims 1-20 are being considered on the merits.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
Agent aggregation unit – in claim 1 
Agent evaluation unit – In claims 1, 2, and 12-13 
Agent scoring unit – in claims 1, 6-10
intervention unit – in claims 1, 11, 12 and 20
Agent selection unit – in claims 2 and 13 
Each of these units are referred multiple times throughout the specification. None of the claimed units include any corresponding structure. Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover any corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 2, and 6-13  are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
In claims 1, claim limitation “agent aggregation unit” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. No clear association between the structure and the function can be found in the specification. As a result, the specification is unclear as to whether the structure of the trainer unit is a single processor or multiple processors, or some other structure altogether. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
In claims 1, 2 and 12-13, Claim limitation “agent evaluation unit” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. No clear association between a structure and the function can be found in the specification. As a result, the specification is unclear as to whether the structure of the tester unit is a single processor or multiple processors, or some other structure altogether. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
In claims 1 and 6-10, claim limitation “agent scoring unit” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. No clear association between a structure and the function can be found in the specification. As a result, the specification is unclear as to whether the structure of the query unit is a single processor or multiple processors, or some other structure altogether. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
In claims 1, 11, 12 and 20, claim limitation “intervention unit” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. No clear association between a structure and the function can be found in the specification. As a result, the specification is unclear as to whether the structure of the annotation unit is a single processor or multiple processors, or some other structure altogether. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
In claims 2 and 13, claim limitation “Agent selection unit” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. No clear association between a structure and the function can be found in the specification. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. As noted above, throughout claims 1-9, the “Neural Network Unit” is sometimes more specifically referred to as a “Trained Neural Network Unit” or an “Active Learning Neural Network Unit” and for examination purposes, such additional specificity will be treated as descriptive labels.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 10-13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Siebel, et. al. (US2024/0202225A1; hereinafter “Siebel”), in view of Xia, et. al. (“An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping”, arXiv:2404.05388v3 [cs.SE] 15 May 2024; hereinafter, Xia)

Regarding claim 1, Siebel teaches: 
A computer-implemented artificial intelligence (Al) agent evaluation system for evaluating an AI agent associated with an enterprise computing environment, comprising (Siebel, para. 0021: “An architecture for enterprise generative AI is disclosed herein to transform interactions with enterprise information that fundamentally change the human-computer interaction (HCl) model for enterprise software. Enterprises running sensitive workloads in both cloud-native, on premise, or air-gapped environments can implement enterprise generative AI architecture to generate enterprise-wide insights using tool to rapidly locate and retrieve with agents that develop and coordinate complex operations in response to simple intuitive input.”)  
a memory for storing computer executing instructions, and (Siebel, para. 0205: “The storage 1508 includes any storage configured to retrieve and store data. Some examples of the storage 1508 include flash drives, hard drives, optical drives, cloud storage, and/or magnetic tape. Each of the memory system 1506 and the storage system 1508 comprises a computer-readable medium, which stores instructions or programs executable by processor 1504.”)
a processor configured to execute the computer executable instructions stored in the memory to implement: (Siebel, para. 0205: “The storage 1508 includes any storage configured to retrieve and store data. Some examples of the storage 1508 include flash drives, hard drives, optical drives, cloud storage, and/or magnetic tape. Each of the memory system 1506 and the storage system 1508 comprises a computer-readable medium, which stores instructions or programs executable by processor 1504.”)
an agent aggregation module configured to aggregate (Siebel, paras. 0055, 0070 and fig. 5: “FIG. 5 depicts a diagram of an example enterprise generative artificial intelligence system 402 according to some embodiments. In the example of FIG. 5, the enterprise generative artificial intelligence system 402 includes a management module 502, an orchestrator module 504, a retrieval agent module 506-1, an unstructured data retriever agent module, 506-2, a structured data retriever agent module 506-3, a type system retriever agent module 506-4, a machine learning insight module 506-5, a timeseries processing agent 506-6, an API agent module 506-7, a math agent module 506-8, a visualization agent module 506-9, a code generation agent module 506-10, an unstructured data retrieval tool 508-1, an structured data retrieval tool 508-2, a text processing tool module 508-3, an image processing tool module 508-4, a timeseries processing tool module 508-5, an API tool module 508-6, a visualization tool module 508-7, an optimizer tool module 508-8, a filter tool module 508-9, a projections tool module 508-10, a group tool module 508-11, an order tool module 508-12, a limit tool module 508-13, code generation tool module 508-14, a comprehension module 510, a chunking module 512, an enterprise access control module 514, an artificial intelligence traceability module 516, a parallelization module 520, model generation module 522, a model deployment module 524, a model optimization module 526, an interface module 528, a communication module 530, vector datastore(s) 540, model registry datastore(s) 550, feature datastore(s) 560, and enterprise generative artificial intelligence system datastore(s) 570.” “the agent modules 506 include a variety of different example agent modules 506-1 to 506-N. It will be appreciated that these are shown by way of example, and various embodiments may include different agents instead of, or in addition to, the agents 506-1 to 506-N. In some embodiments, each of the agents 506 comprises hardware and/or software, and include one or more large language models, one or more other machine learning models, and/or functions, to provide reasoning functionality to accomplish a prescribed set of tasks. It will be appreciated that reference to an agent module may refer to the agent itself and/or the component that generates and/or executes the agent. In some embodiments, the orchestrator is a type of agent and may be referred to as an orchestrator agent. Accordingly, reference to orchestrator may refer to the orchestrator itself and/or the component that generates and/or executes the orchestrator.” Examiner notes teaches a computer with several agents, including a orchestrator module to route inputs to any of an aggregation of agents) together a plurality of AI agents in real-time within the enterprise computing environment, and (Siebel, para. 0066 and 0132: “In some embodiments, the orchestrator 504 may combine (e.g., stitch) outputs/results from various agents to create a unified output” “In some embodiments, the model optimization module 526 can retrain models (e.g., transformer-based natural language machine learning models) periodically, on-demand, and/or in real-time.”)
an agent evaluation module configured to aggregate (Siebel, paras. 0055, 0070 and fig. 5: “FIG. 5 depicts a diagram of an example enterprise generative artificial intelligence system 402 according to some embodiments. In the example of FIG. 5, the enterprise generative artificial intelligence system 402 includes a management module 502, an orchestrator module 504, a retrieval agent module 506-1, an unstructured data retriever agent module, 506-2, a structured data retriever agent module 506-3, a type system retriever agent module 506-4, a machine learning insight module 506-5, a timeseries processing agent 506-6, an API agent module 506-7, a math agent module 506-8, a visualization agent module 506-9, a code generation agent module 506-10, an unstructured data retrieval tool 508-1, an structured data retrieval tool 508-2, a text processing tool module 508-3, an image processing tool module 508-4, a timeseries processing tool module 508-5, an API tool module 508-6, a visualization tool module 508-7, an optimizer tool module 508-8, a filter tool module 508-9, a projections tool module 508-10, a group tool module 508-11, an order tool module 508-12, a limit tool module 508-13, code generation tool module 508-14, a comprehension module 510, a chunking module 512, an enterprise access control module 514, an artificial intelligence traceability module 516, a parallelization module 520, model generation module 522, a model deployment module 524, a model optimization module 526, an interface module 528, a communication module 530, vector datastore(s) 540, model registry datastore(s) 550, feature datastore(s) 560, and enterprise generative artificial intelligence system datastore(s) 570.” “the agent modules 506 include a variety of different example agent modules 506-1 to 506-N. It will be appreciated that these are shown by way of example, and various embodiments may include different agents instead of, or in addition to, the agents 506-1 to 506-N. In some embodiments, each of the agents 506 comprises hardware and/or software, and include one or more large language models, one or more other machine learning models, and/or functions, to provide reasoning functionality to accomplish a prescribed set of tasks. It will be appreciated that reference to an agent module may refer to the agent itself and/or the component that generates and/or executes the agent. In some embodiments, the orchestrator is a type of agent and may be referred to as an orchestrator agent. Accordingly, reference to orchestrator may refer to the orchestrator itself and/or the component that generates and/or executes the orchestrator.” Examiner notes teaches a computer with several agents, including a orchestrator module to route inputs to any of an aggregation of agents) a selected AI agent of the plurality of AI agents using multi-dimensional evaluation data (Siebel, para. 0028 and 0141: “An orchestrator agent (or, simply, orchestrator) can pre-process the input in step 104. Pre-processing can include, for example, acronym handling, translation handling, punctuation handling, input identification (e.g., identifying different portions of the input 102 for processing by different agents). The orchestrator can use a multimodal model (e.g., large language model) to further process the input 102 to create a plan for determining a result (step 112) for the input.” “In some embodiments, the configuration, coordination, and cooperation of the orchestrator module 504, agents 506, tools 508, and/or other modules of the enterprise generative artificial intelligence system 402 (e.g., comprehension module 510) enables the enterprise generative artificial intelligence system 402 to provide a multi-hop architecture that enables complex reasoning over multiple agents 506, tools 508, and data sources (e.g., vector datastores, feature datastores, data models, enterprise datastores, unstructured data sources, structured data sources, and the like).”), wherein the evaluation data includes operational behavior data (Siebel, para. 0131: “In some embodiments, reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones.”) and trustworthiness data (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system.”), wherein the agent evaluation unit implements a structured evaluation methodology specifically designed for AI agent performance assessment in enterprise environments (Siebel, para. 0083: “In some embodiments, a type system is designed to be used by different computing systems, application developers, data scientists, operations personnel, and/or other users, to build applications, develop and execute machine learning algorithms, and manage and monitor the state of jobs running on a type system (e.g., an enterprise generative artificial intelligence system in some embodiments).”), and wherein the agent evaluation module includes 
an agent scoring module (Siebel, para. 0070: “the agent modules 506 include a variety of different example agent modules 506-1 to 506-N. It will be appreciated that these are shown by way of example, and various embodiments may include different agents instead of, or in addition to, the agents 506-1 to 506-N. In some embodiments, each of the agents 506 comprises hardware and/or software, and include one or more large language models, one or more other machine learning models, and/or functions, to provide reasoning functionality to accomplish a prescribed set of tasks. It will be appreciated that reference to an agent module may refer to the agent itself and/or the component that generates and/or executes the agent. In some embodiments, the orchestrator is a type of agent and may be referred to as an orchestrator agent. Accordingly, reference to orchestrator may refer to the orchestrator itself and/or the component that generates and/or executes the orchestrator.”)  for determining a total agent evaluation score for the selected AI agent using a category-based framework (Siebel, para. 0151: “Based on the selected projections, the structured data agent can select different tools to generate a structured data retrieval specification query. In the example of FIG. 7, the structed data agent selects a filter tool (e.g., filter 508-9) filter the selected types based on the selected projections (step 726), a grouping tool (e.g., group tool module 508-11) to group the types (e.g., the filtered types) and/or projections,” Examiner note Siebel teaches a tool, which can an agent, using category-based framework in the form of groupings), wherein the total agent evaluation score is determined from a plurality of category evaluation scores by aggregating a plurality of agent specific categories (Siebel, para. 0077: “For example, the agent 506-2 can calculate and assign relevance scores (e.g., using a machine learning relevance model) for each of the retrieved data records. The relevance score can be relative to the other retrieved data records. For example, the least relevant data record may be assigned a minimum value (e.g., 0) and the most relevant data record may be assigned a maximum value (e.g., 100). The unstructured data retriever agent module 506-2 may filter out documents that are relevant (or the documents that are not relevant). For example, the unstructured data retriever agent module 506-2 may filter out data records that have a relevance score below a configurable threshold value (e.g., 50). In some embodiments, the number of data records that the unstructured data retriever agent module 506-2 can retrieve for a particular input or query can be user or system defined, and also may be configurable.” Examiner notes Siebel teaches calculating relevance scores as well as grouping types or projections at para. 0151 cited above), wherein each of the plurality of agent specific categories has associated therewith a category evaluation score is calculated according to predetermined metrics for the respective category, (Xia, pg. 3 left column: “Model accuracy evaluation complements non-functional Quality/Risk evaluation. This process hinges on testing which systematically measures a model’s functional accuracy against predefined expectations. Evaluating model accuracy necessitates testing across selected test cases (i.e., a test suite), incorporating datasets that reflect real-world complexities and metrics for quantifying accuracy. Metrics range from general (precision, recall, F1 score) to domain-specific (e.g., BLEU for natural language processing [13]), tailored to the model’s purpose.” Examiner notes Xia teaches evaluation of AI models that are domain-specific against predefined expectations), wherein the plurality of agent specific categories includes categories associated with the operational behavior (Siebel, para. 131, supra) and the trustworthiness (Siebel, para. 171, supra) of the selected AI agent in enterprise deployment contexts, (Siebel, para. 0141: “In some embodiments, the configuration, coordination, and cooperation of the orchestrator module 504, agents 506, tools 508, and/or other modules of the enterprise generative artificial intelligence system 402 (e.g., comprehension module 510) enables the enterprise generative artificial intelligence system 402 to provide a multi-hop architecture that enables complex reasoning over multiple agents 506, tools 508, and data sources (e.g., vector datastores, feature datastores, data models, enterprise datastores, unstructured data sources, structured data sources, and the like).”), and
an intervention module (Siebel, para. 0070: “the agent modules 506 include a variety of different example agent modules 506-1 to 506-N. It will be appreciated that these are shown by way of example, and various embodiments may include different agents instead of, or in addition to, the agents 506-1 to 506-N. In some embodiments, each of the agents 506 comprises hardware and/or software, and include one or more large language models, one or more other machine learning models, and/or functions, to provide reasoning functionality to accomplish a prescribed set of tasks. It will be appreciated that reference to an agent module may refer to the agent itself and/or the component that generates and/or executes the agent. In some embodiments, the orchestrator is a type of agent and may be referred to as an orchestrator agent. Accordingly, reference to orchestrator may refer to the orchestrator itself and/or the component that generates and/or executes the orchestrator.”)  configured to automatically initiate an AI-based intervention (Siebel, para. 0056: “The management module 502 can perform operations manually (e.g., by a user interacting with a GUI) and/or automatically (e.g., triggered by one or more of the modules 504-530)”) in the enterprise computing environment in response to the total agent evaluation score falling below a threshold value, (Siebel, para. 0105: “In some implementations, features of one or more large language models of the comprehension module 510 define conditions or functions that determine if more information is needed to satisfy the initial input or if there is enough information to satisfy the initial input. The large language models of the comprehension module 510 may also define stopping conditions that indicate a stopping threshold condition indicating a maximum number of iterations that may be performed before the iterative process is terminated.” Examiner notes Siebel teaches features of an LLM (i.e. an AI model) that prevents the model from providing a result i.e. intervening when the threshold of initial input is not met).  
wherein the agent evaluation module (Siebel, para. 0070, supra) is configured to enhance computational efficiency and decision accuracy within the enterprise computing environment (Siebel, para. 0231: “Such operation may serve to improve the computational efficiency of providing an output in response to an input”) by dynamically adapting agent tasking and deployment (Siebel, para. 0065: “The orchestrator 504 may also select and swap models as needed. For example, the orchestrator 504 may change out models (e.g., data models, large language models, machine learning models) of the enterprise generative artificial intelligence system 402 at or during run-time in addition to before or after run-time. For example, the orchestrator 504, agents 506, and comprehension module 510 may use particular sets of machine learning models for one domain and other models for different domains. The orchestrator 504 may select and use the appropriate models for a given domain and/or input.”)  based on the total agent evaluation score wherein the system provides real-time agent performance monitoring and automated corrective action capabilities for maintaining optimal AI agent deployment in the enterprise environments. (Siebel, para. 0077, supra and para. 0132: “In some embodiments, the model optimization module 526 can retrain models (e.g., transformer-based natural language machine learning models) periodically, on-demand, and/or in real-time. In some example implementations, corresponding candidate model (e.g., candidate transformer-based natural language machine learning models) can be trained based on the user selections and the model optimization module 526 can replace some or all of the models with one or more candidate models that have been trained on the received user selections.”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel. Siebel teaches systems and methods managing, by an orchestrator, a plurality of agents to generate a response to an input; Xia teaches a framework for AI system evaluation. One of ordinary skill would have been motivated to combine the teachings of Xia into Siebel in order to obtain meaningful and actionable insights, ensuring the relevance and applicability of benchmarking and evaluation of AI to real-world deployments (Xia, sec. 2). 

Regarding claim 2, Siebel teaches claim 1 above. Siebel further teaches: 
The computer-implemented system of claim 1, wherein the agent evaluation module further Comprises (Siebel, para. 0070, supra)
an agent selection module (Siebel, para. 0070, supra) configured to select, for assignment to a task, one or more of the AI agents of the plurality of AI agents (Siebel, para. 0065: “The orchestrator 504 may also select and swap models as needed. For example, the orchestrator 504 may change out models (e.g., data models, large language models, machine learning models) of the enterprise generative artificial intelligence system 402 at or during run-time in addition to before or after run-time. For example, the orchestrator 504, agents 506, and comprehension module 510 may use particular sets of machine learning models for one domain and other models for different domains. The orchestrator 504 may select and use the appropriate models for a given domain and/or input.”) having the total agent evaluation score that satisfies one or more predefined selection criteria, (Siebel, para. 0077, supra). 
wherein the agent evaluation module (Siebel, para. 0070, supra) enables dynamic and automated control of agent deployment (Siebel, para. 0065, supra) based on the total agent evaluation score (Siebel, para. 0077, supra), thereby improving operational efficiency of the enterprise computing environment (Siebel, para. 0168: “More specifically, if the comprehension module 906 determines that it needs additional information to satisfy the initial input, it can generate context-specific data (or, simply, “context”) that will inform future iterations of the process and help the system more efficiently and accurately satisfy the initial input.”)  and decision accuracy of the selected AI agent (Siebel, para. 0042: “The orchestrator 342 may utilize the various agents, large language models, and other features to generate an accurate and reliable (e.g., without hallucination) answer to the user query 362.”)

Regarding claim 3, Siebel teaches claim 2 above. Xia further teaches: 
The computer-implemented system of claim 2, wherein the plurality of agent specific categories includes a planning accuracy category (Xia, sec. 3.1.3.1: “Accuracy - Model Evaluation: Testing. Model accuracy evaluation complements non-functional Quality/Risk evaluation. This process hinges on testing which systematically measures a model’s functional accuracy against predefined expectations”), a tool precision category (Xia, sec. 1: “For example, evaluating object recognition model for parking does not alone ensure the safety of autonomous vehicles, which also needs precise manoeuvring and obstacle avoidance.”), a knowledge reliability category (Xia, sec. 3.2.2: “It employs a systematic approach to compare these systems against recognised benchmarks (e.g., EU AI Act, ISO/IEC 25010:2023), focusing on quality attributes significant for General AI’s broad application range—such as reliability and security—as well as adherence to ethical principles like privacy, and transparency.”), a safety and compliance category (Xia, sec. 3.1.4: “external mechanisms set to define operational boundaries and ensure safety compliance, over (in-)model guardrails integrated during training.”), and a human collaboration category (Xia, sec. 3.3: “ This analysis considers the roles and responsibilities of organisation-level stakeholders across the AI supply chain (Xia et al., 2024), as detailed in Fig. 2, illustrating the need for evaluations that span the entirety of the development lifecycle and engage all relevant stakeholders”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel. Siebel teaches systems and methods managing, by an orchestrator, a plurality of agents to generate a response to an input; Xia teaches a framework for AI system evaluation. One of ordinary skill would have been motivated to combine the teachings of Xia into Siebel in order to obtain meaningful and actionable insights, ensuring the relevance and applicability of benchmarking and evaluation of AI to real-world deployments (Xia, sec. 2). 

Regarding claim 4, Siebel as modified teaches claim 2 above. Xia further teaches: 
The computer-implemented system of claim 2, wherein the trustworthiness categories of the plurality of agent specific categories includes a safety and compliance category (Xia, sec. 3.1.4: “Clear and understandable guardrails enhance user trust and facilitate better interaction with the AI system; and Standards alignment, ensuring that guardrails adhere to established safety, ethical, and regulatory frameworks, making them not just technically sound but also ethically responsible and compliant with legal requirements.”) and a knowledge reliability category (Xia, sec. 3.1.3.2: “Subsequent testing, utilising a tailored selection of datasets and metrics, probes the model’s adaptability and output quality in both typical and edge-case scenarios, providing a comprehensive view of its accuracy”). 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel, as set forth above with respect to claim 3. 

Regarding claim 5, Siebel as modified teaches claim 4 above. Xia further teaches: 
The computer-implemented system of claim 4, wherein the operational behavior categories of the plurality of agent specific categories includes a planning accuracy category (Xia, sec. 3.1.3.1: “Accuracy - Model Evaluation: Testing. Model accuracy evaluation complements non-functional Quality/Risk evaluation. This process hinges on testing which systematically measures a model’s functional accuracy against predefined expectations”), a tool precision category (Xia, sec. 1: “For example, evaluating object recognition model for parking does not alone ensure the safety of autonomous vehicles, which also needs precise manoeuvring and obstacle avoidance.”), and a human collaboration category (Xia, sec. 3.3: “ This analysis considers the roles and responsibilities of organisation-level stakeholders across the AI supply chain (Xia et al., 2024), as detailed in Fig. 2, illustrating the need for evaluations that span the entirety of the development lifecycle and engage all relevant stakeholders”). 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel, as modified, as set forth above with respect to claim 1. 

Regarding claim 6, Siebel as modified teaches claim 5 above. Siebel further teaches: 
The computer-implemented system of claim 5, wherein the agent scoring module (Siebel, para. 0070, supra) is configured to employ a scoring logic framework that assigns a selected weight to each of the agent specific categories (Siebel, sec. 0106: “ In some implementations, the context comprises a concatenation and/or annotation of one or more segments of data records, and/or embeddings associated therewith, along with a mapping of the concatenations and/or annotations. For example, the mapping may indicate relationships between different segments, a weighted or relative value associated with the different segments, and/or the like.”)

Regarding claim 7, Siebel as modified teaches claim 6 above. Siebel further teaches:
The computer-implemented system of claim 6, wherein the agent scoring module (Siebel, para. 0070, supra) is configured to automatically and dynamically adjust the weights assigned to each agent specific category based on one or more weighting factors and an operational context of the AI agent. (Siebel, sec. 0106, 0116 and 117: “In some implementations, the context comprises a concatenation and/or annotation of one or more segments of data records, and/or embeddings associated therewith, along with a mapping of the concatenations and/or annotations. For example, the mapping may indicate relationships between different segments, a weighted or relative value associated with the different segments, and/or the like.” “In some implementations, the chunking module 512 generates enriched embeddings. For example, the chunking module 512 may generate enriched embeddings based on the contextual information, data records, and/or data record segments. An enriched embedding may comprise a vector value based on an embedding vector and the contextual information. In some embodiments, an enriched embedding comprises the embedding vector value along with the contextual metadata including the contextual information” “In some embodiments, the chunking module 512 may perform some or all of the functionality described herein periodically (e.g., in batches), on-demand, and/or in real time. For example, the chunking module 512 may periodically trigger, on-demand trigger, manually trigger, and/or automatically trigger, the chunking described herein.”)

Regarding claim 10, Siebel as modified teaches claim 7 above. Siebel further teaches:
The computer-implemented system of claim 7, wherein the agent scoring module (Siebel, para. 0070, supra) applies the scoring logic framework (Siebel, sec. 0106, supra) such that:
the category evaluation score (Siebel, para. 0024, supra) associated with the planning category (Xia, sec. 3.1.3.1, supra) is determined by analyzing a correct number of steps executed by the AI agent and then dividing the correct number of steps by a total number of steps; (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system” examiner notes accuracy is always a proportion i.e. dividing correct by total). 
the category evaluation score (Siebel, para. 0024, supra) associated with the tool precision category (Xia, sec. 1, supra) is determined by comparing a number of correct tool invocations performed by the AI agent to a total number of tool invocations attempted by the AI agent; (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system” examiner notes accuracy is always a proportion i.e. dividing correct by total).
the category evaluation score (Siebel, para. 0024, supra) associated with the knowledge reliability category (Xia, sec. 3.1.3.2, supra) is determined based on one or more measurable indicators including a staleness check (Siebel, para. 0223: “The additional retrieval requests may comprise requests for additional data (which may be different to the retrieved data) which can be used to check, corroborate and/or validate the retrieved data and/or the one or more responses. For example, the additional retrieval requests may comprise requests configured to retrieve similar data to the retrieved data but from one or more different data sources. The additional data may comprise data from different data domains (and/or different portions of the same data domains) to the data domains (and/or respective portions of the data domains) from which the retrieved data was retrieved. The additional data may therefore provide alternative data to the retrieved data, based on which the retrieved data and/or the one or more responses can be checked, corroborated and/or validated.”) and a contradiction rate; (Siebel, para. 0123: “For example, the artificial intelligence traceability module 516 may identify data records that contradict each other (e.g., one of the data records indicate that John Doe is an employee at Acme corporation and another data record indicates that John Doe works at a different company) and provide a notification that the output was generated based on contradictory on conflicting information.”)
the category evaluation score (Siebel, para. 0024, supra) associated with the safety and compliance category (Xia, sec. 3.1.4) is determined by determining a proportion of interactions that occur without triggering a safety incident; (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system” examiner notes accuracy is always a proportion i.e. dividing correct by total). and 
the category evaluation score (Siebel, para. 0024, supra) associated with the human collaboration category (Xia, sec. 3.3) is determined based on one or more of an escalation accuracy, a feedback reinforcement effectiveness, an oversight compliance, and a transparency quality (Xia, sec. 3.1.3.2: “Benchmarking distinguishes General AI’s evaluation by employing standardised criteria and metrics tailored to its versatile nature, contrasting with Narrow AI’s focused scope. This approach ensures General AI models excel in adaptability, transparency, and interoperability—attributes critical for operation across varied contexts—while also meticulously assessing complex risks like copyright infringement due to their extensive training data”).
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel, as set forth above with respect to claim 3.

Regarding claim 11, Siebel as modified teaches claim 7 above. Siebel further teaches: 
The computer-implemented system of claim 7, wherein the intervention module (Siebel, para. 0070, supra) automatically performs an AI-based intervention when the total evaluation score is below a selected threshold, (Siebel, para. 0175: “For example, the first agent and/or tool may use an artificial intelligence-based similarity search (e.g., ANN algorithm) to identify and retrieve passages that have similar embedding values (e.g., closest in the vector space based on one or more threshold values).”) and wherein the AI-based intervention includes an agent related corrective action. (Siebel, para. 0131: “In general, a reinforcement learning agent is able to perceive and interpret its environment, take actions and learn through trial and error. Reinforcement Learning uses algorithms and models to determine optimal behavior in an environment to obtain maximum reward. This optimal behavior is learned through interactions with the environment and observations of how it responds. In the absence of a supervisor, the learner must independently discover the sequence of actions that maximize the reward. This discovery process is akin to a trial-and-error search.”)

Regarding claim 12, Siebel teaches: 
A computer-implemented method for evaluating an AI agent associated with an enterprise computing environment, the method comprising (Siebel, para. 0021: “An architecture for enterprise generative AI is disclosed herein to transform interactions with enterprise information that fundamentally change the human-computer interaction (HCl) model for enterprise software. Enterprises running sensitive workloads in both cloud-native, on premise, or air-gapped environments can implement enterprise generative AI architecture to generate enterprise-wide insights using tool to rapidly locate and retrieve with agents that develop and coordinate complex operations in response to simple intuitive input.”)  
aggregating, using computer-executable instructions stored in a non-transient memory and executed by a processor (Siebel, para. 0205: “The storage 1508 includes any storage configured to retrieve and store data. Some examples of the storage 1508 include flash drives, hard drives, optical drives, cloud storage, and/or magnetic tape. Each of the memory system 1506 and the storage system 1508 comprises a computer-readable medium, which stores instructions or programs executable by processor 1504.”) implementing an agent aggregation module (Siebel, para. 0070, supra), a plurality of AI agents in real-time within the enterprise computing environment, and (Siebel, para. 0066 and 0132: “In some embodiments, the orchestrator 504 may combine (e.g., stitch) outputs/results from various agents to create a unified output” “In some embodiments, the model optimization module 526 can retrain models (e.g., transformer-based natural language machine learning models) periodically, on-demand, and/or in real-time.”)
evaluating, using computer-executable instructions stored in a non-transient memory and executed by a processor (Siebel, para. 0205: “The storage 1508 includes any storage configured to retrieve and store data. Some examples of the storage 1508 include flash drives, hard drives, optical drives, cloud storage, and/or magnetic tape. Each of the memory system 1506 and the storage system 1508 comprises a computer-readable medium, which stores instructions or programs executable by processor 1504.”) implementing with an agent evaluation module (Siebel, para. 0070, supra), a selected AI agent of the plurality of AI agents using multi-dimensional evaluation data (Siebel, para. 0028 and 0141: “An orchestrator agent (or, simply, orchestrator) can pre-process the input in step 104. Pre-processing can include, for example, acronym handling, translation handling, punctuation handling, input identification (e.g., identifying different portions of the input 102 for processing by different agents). The orchestrator can use a multimodal model (e.g., large language model) to further process the input 102 to create a plan for determining a result (step 112) for the input.” “In some embodiments, the configuration, coordination, and cooperation of the orchestrator module 504, agents 506, tools 508, and/or other modules of the enterprise generative artificial intelligence system 402 (e.g., comprehension module 510) enables the enterprise generative artificial intelligence system 402 to provide a multi-hop architecture that enables complex reasoning over multiple agents 506, tools 508, and data sources (e.g., vector datastores, feature datastores, data models, enterprise datastores, unstructured data sources, structured data sources, and the like).”), wherein the evaluation data includes operational behavior data and trustworthiness data  (Siebel, para. 0131: “In some embodiments, reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones.”) and trustworthiness data (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system.”), wherein the agent evaluation unit implements a structured evaluation methodology specifically designed for AI agent performance assessment in enterprise environments (Siebel, para. 0083: “In some embodiments, a type system is designed to be used by different computing systems, application developers, data scientists, operations personnel, and/or other users, to build applications, develop and execute machine learning algorithms, and manage and monitor the state of jobs running on a type system (e.g., an enterprise generative artificial intelligence system in some embodiments).”),, and wherein the agent evaluation module is configured to:
Determining, using computer-executable instructions stored in the non-transient memory and executed by the processor (Siebel, para. 0205: “The storage 1508 includes any storage configured to retrieve and store data. Some examples of the storage 1508 include flash drives, hard drives, optical drives, cloud storage, and/or magnetic tape. Each of the memory system 1506 and the storage system 1508 comprises a computer-readable medium, which stores instructions or programs executable by processor 1504.”) implementing with an agent scoring module (Siebel, para. 0070, supra), a total agent evaluation score for the selected AI agent using a category-based framework (Siebel, para. 0151: “Based on the selected projections, the structured data agent can select different tools to generate a structured data retrieval specification query. In the example of FIG. 7, the structed data agent selects a filter tool (e.g., filter 508-9) filter the selected types based on the selected projections (step 726), a grouping tool (e.g., group tool module 508-11) to group the types (e.g., the filtered types) and/or projections,” Examiner note Siebel teaches a tool, which can an agent, using category-based framework in the form of groupings), wherein the total agent evaluation score is determined by aggregating a plurality of category evaluation scores from a plurality of agent specific categories (Siebel, para. 0077: “For example, the agent 506-2 can calculate and assign relevance scores (e.g., using a machine learning relevance model) for each of the retrieved data records. The relevance score can be relative to the other retrieved data records. For example, the least relevant data record may be assigned a minimum value (e.g., 0) and the most relevant data record may be assigned a maximum value (e.g., 100). The unstructured data retriever agent module 506-2 may filter out documents that are relevant (or the documents that are not relevant). For example, the unstructured data retriever agent module 506-2 may filter out data records that have a relevance score below a configurable threshold value (e.g., 50). In some embodiments, the number of data records that the unstructured data retriever agent module 506-2 can retrieve for a particular input or query can be user or system defined, and also may be configurable.” Examiner notes Siebel teaches calculating relevance scores as well as grouping types or projections at para. 0151 cited above), wherein each category evaluation score is calculated according to predetermined metrics for the respective category (Xia, pg. 3 left column: “Model accuracy evaluation complements non-functional Quality/Risk evaluation. This process hinges on testing which systematically measures a model’s functional accuracy against predefined expectations. Evaluating model accuracy necessitates testing across selected test cases (i.e., a test suite), incorporating datasets that reflect real-world complexities and metrics for quantifying accuracy. Metrics range from general (precision, recall, F1 score) to domain-specific (e.g., BLEU for natural language processing [13]), tailored to the model’s purpose.” Examiner notes Xia teaches evaluation of AI models that are domain-specific against predefined expectations),
wherein the plurality of agent specific categories includes categories associated with the operational behavior (Siebel, para. 131, supra) and the trustworthiness (Siebel, para. 0171) of the selected AI agent in enterprise deployment contexts (Siebel, para. 0141: “In some embodiments, the configuration, coordination, and cooperation of the orchestrator module 504, agents 506, tools 508, and/or other modules of the enterprise generative artificial intelligence system 402 (e.g., comprehension module 510) enables the enterprise generative artificial intelligence system 402 to provide a multi-hop architecture that enables complex reasoning over multiple agents 506, tools 508, and data sources (e.g., vector datastores, feature datastores, data models, enterprise datastores, unstructured data sources, structured data sources, and the like) and
automatically initiating, using computer-executable instructions stored in the non-transient memory and executed by the processor (Siebel, para. 0205: “The storage 1508 includes any storage configured to retrieve and store data. Some examples of the storage 1508 include flash drives, hard drives, optical drives, cloud storage, and/or magnetic tape. Each of the memory system 1506 and the storage system 1508 comprises a computer-readable medium, which stores instructions or programs executable by processor 1504.”) implementing with an intervention module (Siebel, para. 0070, supra), an AI-based intervention (Siebel, para. 0056: “The management module 502 can perform operations manually (e.g., by a user interacting with a GUI) and/or automatically (e.g., triggered by one or more of the modules 504-530)”)  in the enterprise computing environment in response to the total agent evaluation score (Siebel, para. 0077, supra) falling below a threshold value (Siebel, para. 0105: “In some implementations, features of one or more large language models of the comprehension module 510 define conditions or functions that determine if more information is needed to satisfy the initial input or if there is enough information to satisfy the initial input. The large language models of the comprehension module 510 may also define stopping conditions that indicate a stopping threshold condition indicating a maximum number of iterations that may be performed before the iterative process is terminated.” Examiner notes Siebel teaches features of an LLM (i.e. an AI model) that prevents the model from providing a result i.e. intervening when the threshold of initial input is not met).  
wherein the agent evaluation module (Siebel, para. 0070, supra) is configured to enhance computational efficiency and decision accuracy within the enterprise computing environment (Siebel, para. 0231: “Such operation may serve to improve the computational efficiency of providing an output in response to an input”) by dynamically adapting agent tasking and deployment (Siebel, para. 0065: “The orchestrator 504 may also select and swap models as needed. For example, the orchestrator 504 may change out models (e.g., data models, large language models, machine learning models) of the enterprise generative artificial intelligence system 402 at or during run-time in addition to before or after run-time. For example, the orchestrator 504, agents 506, and comprehension module 510 may use particular sets of machine learning models for one domain and other models for different domains. The orchestrator 504 may select and use the appropriate models for a given domain and/or input.”) based on the total agent evaluation score and providing real-time agent performance monitoring and automated corrective action capabilities for maintaining optimal AI agent deployment in the enterprise environments (Siebel, para. 0077, supra and para. 0132: “In some embodiments, the model optimization module 526 can retrain models (e.g., transformer-based natural language machine learning models) periodically, on-demand, and/or in real-time. In some example implementations, corresponding candidate model (e.g., candidate transformer-based natural language machine learning models) can be trained based on the user selections and the model optimization module 526 can replace some or all of the models with one or more candidate models that have been trained on the received user selections.”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel. Siebel teaches systems and methods managing, by an orchestrator, a plurality of agents to generate a response to an input; Xia teaches a framework for AI system evaluation. One of ordinary skill would have been motivated to combine the teachings of Xia into Siebel in order to obtain meaningful and actionable insights, ensuring the relevance and applicability of benchmarking and evaluation of AI to real-world deployments (Xia, sec. 2). 

Regarding claim 13, Siebel teaches claim 12 above. Siebel further teaches:
The computer-implemented method of claim 12, further comprising selecting, using computer-executable instructions stored in the non-transient memory and executed by the processor (Siebel, para. 0205: “The storage 1508 includes any storage configured to retrieve and store data. Some examples of the storage 1508 include flash drives, hard drives, optical drives, cloud storage, and/or magnetic tape. Each of the memory system 1506 and the storage system 1508 comprises a computer-readable medium, which stores instructions or programs executable by processor 1504.”) implementing an agent selection module of the agent evaluation module (Siebel, para. 0070, supra, for assignment to a task, one or more of the AI agents of the plurality of AI agents (Siebel, para. 0065: “The orchestrator 504 may also select and swap models as needed. For example, the orchestrator 504 may change out models (e.g., data models, large language models, machine learning models) of the enterprise generative artificial intelligence system 402 at or during run-time in addition to before or after run-time. For example, the orchestrator 504, agents 506, and comprehension module 510 may use particular sets of machine learning models for one domain and other models for different domains. The orchestrator 504 may select and use the appropriate models for a given domain and/or input.”) having the total agent evaluation score that satisfies one or more predefined selection criteria, (Siebel, para. 0077, supra).
wherein the agent evaluation module (Siebel, para. 0070, supra) enables dynamic and automated control of agent deployment (Siebel, para. 0065, supra) based on the total agent evaluation score (Siebel, para. 0077, supra), thereby improving operational efficiency of the enterprise computing environment (Siebel, para. 0168: “More specifically, if the comprehension module 906 determines that it needs additional information to satisfy the initial input, it can generate context-specific data (or, simply, “context”) that will inform future iterations of the process and help the system more efficiently and accurately satisfy the initial input.”)  and decision accuracy of the selected AI agent. (Siebel, para. 0042: “The orchestrator 342 may utilize the various agents, large language models, and other features to generate an accurate and reliable (e.g., without hallucination) answer to the user query 362.”)


Regarding claim 20, Siebel as modified teaches claim 7 above. Siebel further teaches: 
The computer-implemented method of claim 7, further comprising automatically performing, with the intervention module (Siebel, para. 0070, supra), an AI-based intervention when the total evaluation score is below a selected threshold (Siebel, para. 0175: “For example, the first agent and/or tool may use an artificial intelligence-based similarity search (e.g., ANN algorithm) to identify and retrieve passages that have similar embedding values (e.g., closest in the vector space based on one or more threshold values).”), and wherein the AI-based intervention includes an agent related corrective action. (Siebel, para. 0131: “In general, a reinforcement learning agent is able to perceive and interpret its environment, take actions and learn through trial and error. Reinforcement Learning uses algorithms and models to determine optimal behavior in an environment to obtain maximum reward. This optimal behavior is learned through interactions with the environment and observations of how it responds. In the absence of a supervisor, the learner must independently discover the sequence of actions that maximize the reward. This discovery process is akin to a trial-and-error search.”)

Claims 8-9 and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Siebel, in view of Xia, and further in view of Aull-Hyde (“An experiment on the consistency of aggregated comparison matrices in AHP,” European Journal of Operational Research, Volume 171, Issue 1, 2006, Pages 290-295, ISSN 0377-2217, https://doi.org/10.1016/j.ejor.2004.06.037; hereinafter, “Aull-Hyde”)

Regarding claim 8, Siebel as modified teaches claim 7 above. Aull-Hyde further teaches:
The computer-implemented system of claim 7, wherein the agent scoring module (Siebel, para. 0070, supra) employs a weighted aggregation technique to determine the total agent evaluation score based on the category evaluation scores of the plurality of agent specific categories. (Aull-Hyde, sec. 5: “Computation of Saaty’s consistency measure includes normalization of the matrix A, determination of the weight vector w by averaging row elements of the normalized matrix A and then multiplication of the matrix A and vector w. For a 3 × 3 comparison matrix, these computations do not significantly affect the variability of the elements of the aggregate comparison matrix. In order to achieve consistency in an aggregate comparison matrix, the higher variability of the elements in a 3 × 3 aggregated comparison matrix must be offset by increasing the group size.”) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Aull-Hyde into Siebel, as modified. Aull-Hyde teaches threshold values for various dimensions of the aggregated comparison matrix. One of ordinary skill would have been motivated to combine Aull-Hyde into Siebel in order to calculate acceptable inconsistency level for aggregate matrices given a sufficiently large group size (Aull-Hyde, sec. 5). 

Regarding claim 9, Siebel as modified teaches claim 7 above. Aull-Hyde further teaches:
The computer-implemented system of claim 7, wherein the agent scoring module (Siebel, para. 0070, supra) employs an unweighted technique that determines an arithmetic mean of the category evaluation scores (Aull-Hyde, sec. 1: “ When aggregating individual priorities (AIP), either the geometric mean method (GMM) or the weighted arithmetic mean method (WAMM) is suitable.”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Aull-Hyde into Siebel, as modified, as set forth above with respect to claim 8.

Regarding claim 14, Siebel as modified teaches claim 13 above. Siebel further teaches: 
The computer-implemented method of claim 13, wherein the plurality of agent specific categories includes a planning accuracy category (Xia, sec. 3.1.3.1: “Accuracy - Model Evaluation: Testing. Model accuracy evaluation complements non-functional Quality/Risk evaluation. This process hinges on testing which systematically measures a model’s functional accuracy against predefined expectations”), a tool precision category (Xia, sec. 1: “For example, evaluating object recognition model for parking does not alone ensure the safety of autonomous vehicles, which also needs precise manoeuvring and obstacle avoidance.”), a knowledge reliability category (Xia, sec. 3.2.2: “It employs a systematic approach to compare these systems against recognised benchmarks (e.g., EU AI Act, ISO/IEC 25010:2023), focusing on quality attributes significant for General AI’s broad application range—such as reliability and security—as well as adherence to ethical principles like privacy, and transparency.”), a safety and compliance category (Xia, sec. 3.1.4: “external mechanisms set to define operational boundaries and ensure safety compliance, over (in-)model guardrails integrated during training.”), and a human collaboration category (Xia, sec. 3.3: “ This analysis considers the roles and responsibilities of organisation-level stakeholders across the AI supply chain (Xia et al., 2024), as detailed in Fig. 2, illustrating the need for evaluations that span the entirety of the development lifecycle and engage all relevant stakeholders”).
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel, as modified, as set forth above with respect to claim 3. 

Regarding Claim 15, Siebel as modified teaches claim 13 above. Xia further teaches:
The computer-implemented method of claim 13, wherein the trustworthiness categories of the plurality of agent specific categories includes a safety and compliance category (Xia, sec. 3.1.4: “Clear and understandable guardrails enhance user trust and facilitate better interaction with the AI system; and Standards alignment, ensuring that guardrails adhere to established safety, ethical, and regulatory frameworks, making them not just technically sound but also ethically responsible and compliant with legal requirements.”) and a knowledge reliability category (Xia, sec. 3.1.3.2: “Subsequent testing, utilising a tailored selection of datasets and metrics, probes the model’s adaptability and output quality in both typical and edge-case scenarios, providing a comprehensive view of its accuracy”), and the operational behavior categories (Siebel, para. 0131: “In some embodiments, reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones.”) of the plurality of agent specific categories includes a planning accuracy category (Xia, sec. 3.1.3.1: “Accuracy - Model Evaluation: Testing. Model accuracy evaluation complements non-functional Quality/Risk evaluation. This process hinges on testing which systematically measures a model’s functional accuracy against predefined expectations”), a tool precision category, (Xia, sec. 1: “For example, evaluating object recognition model for parking does not alone ensure the safety of autonomous vehicles, which also needs precise manoeuvring and obstacle avoidance.”), and a human collaboration category (Xia, sec. 3.3: “ This analysis considers the roles and responsibilities of organisation-level stakeholders across the AI supply chain (Xia et al., 2024), as detailed in Fig. 2, illustrating the need for evaluations that span the entirety of the development lifecycle and engage all relevant stakeholders”).
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel, as set forth above with respect to claim 3. 

Regarding claim 16, Siebel as modified teaches claim 15 above. Siebel further teaches:
The computer-implemented method of claim 15, further comprising employing a scoring logic framework for assigning a selected weight to each of the agent specific categories. (Siebel, sec. 0106: “ In some implementations, the context comprises a concatenation and/or annotation of one or more segments of data records, and/or embeddings associated therewith, along with a mapping of the concatenations and/or annotations. For example, the mapping may indicate relationships between different segments, a weighted or relative value associated with the different segments, and/or the like.”)

Regarding claim 17, Siebel as modified teaches claim 16 above. Siebel further teaches:
The computer-implemented method of claim 16, further comprising automatically and dynamically adjusting the weights assigned to each agent specific category based on one or more weighting factors and an operational context of the AI agent. (Siebel, sec. 0106, 0116 and 117: “In some implementations, the context comprises a concatenation and/or annotation of one or more segments of data records, and/or embeddings associated therewith, along with a mapping of the concatenations and/or annotations. For example, the mapping may indicate relationships between different segments, a weighted or relative value associated with the different segments, and/or the like.” “In some implementations, the chunking module 512 generates enriched embeddings. For example, the chunking module 512 may generate enriched embeddings based on the contextual information, data records, and/or data record segments. An enriched embedding may comprise a vector value based on an embedding vector and the contextual information. In some embodiments, an enriched embedding comprises the embedding vector value along with the contextual metadata including the contextual information” “In some embodiments, the chunking module 512 may perform some or all of the functionality described herein periodically (e.g., in batches), on-demand, and/or in real time. For example, the chunking module 512 may periodically trigger, on-demand trigger, manually trigger, and/or automatically trigger, the chunking described herein.”)

Regarding claim 18, Siebel as modified teaches claim 17 above. Aull-Hyde further teaches: 
The computer-implemented method of claim 17, further comprising applying a weighted aggregation technique to determine the total agent evaluation score based on the category evaluation scores of the plurality of agent specific categories. (Aull-Hyde, sec. 5: “Computation of Saaty’s consistency measure includes normalization of the matrix A, determination of the weight vector w by averaging row elements of the normalized matrix A and then multiplication of the matrix A and vector w. For a 3 × 3 comparison matrix, these computations do not significantly affect the variability of the elements of the aggregate comparison matrix. In order to achieve consistency in an aggregate comparison matrix, the higher variability of the elements in a 3 × 3 aggregated comparison matrix must be offset by increasing the group size.”)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Aull-Hyde into Siebel, as modified, as set forth above with respect to claim 8. 

Regarding claim 19, Siebel as modified teaches claim 18 above. Siebel further teaches:
The computer-implemented method of claim 18, further comprising applying the scoring logic framework (Siebel, sec. 0106, supra) such that:
the category evaluation score (Siebel, para. 0024, supra) associated with the planning category (Xia, sec. 3.1.3.1, supra) is determined by analyzing a correct number of steps executed by the AI agent and then dividing the correct number of steps by a total number of steps; (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system” examiner notes accuracy is always a proportion i.e. dividing correct by total).
the category evaluation score (Siebel, para. 0024, supra) associated with the tool precision category (Xia, sec. 1, supra) is determined by comparing a number of correct tool invocations performed by the AI agent to a total number of tool invocations attempted by the AI agent; (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system” examiner notes accuracy is always a proportion i.e. dividing correct by total).
the category evaluation score(Siebel, para. 0024, supra) associated with the knowledge reliability category (Xia, sec. 3.1.3.2, supra) is determined based on one or more measurable indicators including a staleness check (Siebel, para. 0223: “The additional retrieval requests may comprise requests for additional data (which may be different to the retrieved data) which can be used to check, corroborate and/or validate the retrieved data and/or the one or more responses. For example, the additional retrieval requests may comprise requests configured to retrieve similar data to the retrieved data but from one or more different data sources. The additional data may comprise data from different data domains (and/or different portions of the same data domains) to the data domains (and/or respective portions of the data domains) from which the retrieved data was retrieved. The additional data may therefore provide alternative data to the retrieved data, based on which the retrieved data and/or the one or more responses can be checked, corroborated and/or validated.”) and a contradiction rate; (Siebel, para. 0123: “For example, the artificial intelligence traceability module 516 may identify data records that contradict each other (e.g., one of the data records indicate that John Doe is an employee at Acme corporation and another data record indicates that John Doe works at a different company) and provide a notification that the output was generated based on contradictory on conflicting information.”).
the category evaluation score (Siebel, para. 0024, supra) associated with the safety and compliance category (Xia, sec. 3.1.4) is determined by determining a proportion of interactions that occur without triggering a safety incident; (Siebel, para. 0171: “The enterprise generative artificial intelligence system can, in some embodiments, use the feedback to improve the accuracy and/or reliability of the system” examiner notes accuracy is always a proportion i.e. dividing correct by total). and
the category evaluation score (Siebel, para. 0024, supra) associated with the human collaboration category (Xia, sec. 3.3) is determined based on one or more of an escalation accuracy, a feedback reinforcement effectiveness, an oversight compliance, and a transparency quality (Xia, sec. 3.1.3.2: “Benchmarking distinguishes General AI’s evaluation by employing standardised criteria and metrics tailored to its versatile nature, contrasting with Narrow AI’s focused scope. This approach ensures General AI models excel in adaptability, transparency, and interoperability—attributes critical for operation across varied contexts—while also meticulously assessing complex risks like copyright infringement due to their extensive training data”).
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Xia into Siebel, as set forth above with respect to claim 3. 
Response to Applicant Remarks
35 USC § 112(b)
	Towards the bottom of page 9 of applicant’s remarks, applicant states that the claims have been amended to recite specific structural language conveying claimed elements comprise computer-executable instructions stored in memory that when executed by a processor implemented the recited modules.
	However, a processor executing instructions stored in memory to implement a module does not provide structure or make the boundaries of the module itself clear. 

35 USC § 102/103
	At the bottom of page 11 of applicant’s remarks, applicant states that independent claims 1 and 12 have been amended to include features not disclosed by Siebel. In light of applicant’s amendments, the previously asserted § 102 rejections for claims 1 and 12 have been withdrawn. They now stand rejected pursuant to § 103. 
	On page 12 of applicant’s remarks, applicant states that dependent claims have been amended to include features not taught by Siebel, Xia, or Aull-Hyde. Applicant further argues that one of ordinary skill in the art would have not combined the teachings of Siebel with the teachings of Xia or Aull-Hyde. 
	First, in light of applicant’s amendments, the rejections have been updated as set forth in the rejection above. Second, with respect to the combination of Sibel with Xia:  Siebel teaches systems and methods managing, by an orchestrator, a plurality of agents to generate a response to an input; Xia teaches a framework for AI system evaluation. There is no reason one of ordinary skill in the art would not combine the two in order to obtain meaningful and actionable insights, ensuring the relevance and applicability of benchmarking and evaluation of AI to real-world deployments. With respect to Aull-Hyde: Aull-Hyde teaches threshold values for various dimensions of the aggregated comparison matrix. There is no reason to one of ordinary skill would not have been motivated to combine Aull-Hyde into Siebel in order to calculate acceptable inconsistency level for aggregate matrices given a sufficiently large group size. 
	Applicant argues that adding evaluation features would conflict with the efficiency goals of the references. However, Siebel teaches orchestrating a plurality of agents which can be adapted to perform in different contexts—the integration of evaluating systems or of threshold values is not contrary to Siebel. 
 
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sally T. Ley whose telephone number is (571)272-3406. The examiner can normally be reached Monday - Thursday, 10:00am - 6:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/STL/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
Prosecution Timeline

Oct 03, 2025
Application Filed
Nov 17, 2025
Non-Final Rejection mailed — §103, §112
Feb 17, 2026
Response Filed
Mar 27, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/981,796
Patent 12632746
A METHOD AND APPARATUS FOR DISPLAYING CATEGORIZED CARBON EMISSIONS
3y 6m to grant Granted May 19, 2026
16/733,393
Patent 12443830
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS
5y 9m to grant Granted Oct 14, 2025
16/835,892
Patent 12135927
EXPERT-IN-THE-LOOP AI FOR MATERIALS DISCOVERY
4y 7m to grant Granted Nov 05, 2024
17/992,958
Patent 11880776
GRAPH NEURAL NETWORK (GNN)-BASED PREDICTION SYSTEM FOR TOTAL ORGANIC CARBON (TOC) IN SHALE
1y 2m to grant Granted Jan 23, 2024
Study what changed to get past this examiner. Based on 4 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
19%
Grant Probability
53%
With Interview (+33.3%)
4y 8m (~4y 0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allowance rate.