Last updated: May 29, 2026
Application No. 18/620,771
TRAINING A MACHINE LEARNING MODEL BASED ON AGGREGATING ANNOTATED COMMUNICATION CONTENT

Final Rejection §101§103
Filed
Mar 28, 2024
Examiner
SMITH, SEAN THOMAS
Art Unit
2659
Tech Center
2600 — Communications
Assignee
The Toronto-Dominion Bank
OA Round
2 (Final)
Interview Optional

— +25.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 86% grant rate with +25.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 7 resolved cases, 2023–2026
Examiner Intelligence

SMITH, SEAN THOMAS View full profile →
Grants 86% — above average
Career Allowance Rate
6 granted / 7 resolved
+23.7% vs TC avg
Strong +25% interview lift
Without
With
+25.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.1%
-36.9% vs TC avg
§103
92.8%
+52.8% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
3.1%
-36.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 7 resolved cases
Office Action

§101 §103
DETAILED ACTION
This Office Action is responsive to amendments and arguments filed on March 12th, 2026. Claims 1-3 and 6-17 are amended, claims 4-5 and 18-20 are cancelled and claims 21-25 are added. Claims 1-3, 6-17 and 21-25 are pending and have been examined, hence, this action is made FINAL.
Any previous objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 7 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Response to Arguments
With regard to rejections made under 35 U.S.C. 101, Applicant argues, "Claim 1 recites a specific machine-learning architecture that transforms interaction content into vector embeddings stored in a vector database, retrieves a subset of vectors based on the labels, executes a drift detection model using the subset and policy content, and provides a response-generating model based on the detected drift. These steps are not capable of being performed in the human mind and cannot be carried out with pen and paper… Furthermore, the claim does not merely analyze policy compliance. Instead, the claim recites a specific computational mechanism for correcting an LLM / chatbot to generate responses / outputs that adhere to a correct policy implementation based on the drift. The drift determination is performed by a machine learning model operating on vectorized representations and policy content, not by a human reviewing text. The focus of the claim is therefore a technical improvement in machine-learning-based system alignment rather than an abstract idea," (page 9 of Remarks).
Applicant’s argument with respect to mental process have been considered, but are not persuasive.
When read as a whole, the claims recite a judicial exception in the form of a mental process or organization of human activity; that being the steps of “receiving interaction content…” “annotate the interaction content…” “aggregate the interaction content…” “convert the aggregated content…” “identify a drift…” and “output responses…” These are steps that may be carried out by a human actor, and the claims include only broad mathematical operations (vectorization, label comparison) and generic computer hardware as an automation tool for those steps. The limitations of the claim do not present a practical application of the mental process, nor do they illustrate a technical improvement, as the “drift detection model” and “response-generating model” are recited broadly and called only to perform their expected functions and provide expected results, with no technical improvements to their function described. As such, the claims are directed to an abstract idea – data labeling and compliance analysis – implemented on a generic computer without an inventive concept. Accordingly, the rejections under 35 U.S.C. 101 are maintained.
Additionally, claim 17 as amended may still be interpreted to include transitory signals, and thus, the rejection under 35 U.S.C. 101 as a signal per se is maintained.
With regard to rejections made under 35 U.S.C. 102, Applicant argues, "Kelkar fails to anticipate or render obvious the features of Claim 1, because Kelkar fails to describe at least, 'convert the aggregated content into vectors that are labelled with policies discussed in the aggregated content and store the vectors in a vector database, retrieve a subset of vectors from the vector database based on a comparison of the policy identifier to the labelled policies of the vectors, identify a drift between a current policy implementation and the policy content based on execution of a drift detection model on the subset of vectors and the policy content, and provide a response-generating model to output responses associated with the policy content based on the identified drift and the policy content,'… Accordingly, Kelkar has multiple significant deficiencies with respect to Claim 1," (page 13 of Remarks).
Applicant’s argument is moot, as new grounds of rejections - commensurate with the amendments to the claims - are raised in view of references Voyles and Ajmera. Further details are provided below.
Claim Objections
Claims 7 and 13 are objected to because of the following informalities: The claims recite "at least one ML model" without first defining the initialization "ML". Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-3, 6-17 and 21-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite a mental process that can be performed in the human mind or with the aid of pen and paper. This judicial exception is not integrated into a practical application because a computer is invoked merely as a tool to execute an abstract idea. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because an abstract idea is merely applied on a generic computer without any element that would otherwise preclude performance of the abstract idea as a mental process.
Regarding claim 1, the claim recites “An apparatus comprising:a memory; anda processor coupled to the memory, the processor configured to:receive interaction content and a policy identifier of policy content from an interaction session between devices,annotate the interaction content with the policy identifier,aggregate the interaction content with previously received and annotated interaction content to generate aggregated content,convert the aggregated content into vectors that are labelled with policies discussed in the aggregated content and store the vectors in a vector database,retrieve a subset of vectors from the vector database based on a comparison of the policy identifier to the labelled policies of the vectors,identify a drift between a current policy implementation and the policy content based on execution of a drift detection model on the subset of vectors and the policy content, andprovide a response-generating model to output responses associated with the policy content based on the identified drift and the policy content.”
The limitations of “receive interaction content…” “annotate the interaction content…” “aggregate the interaction content…” “convert the aggregated content…” “retrieve a subset of vectors…” “identify a drift…” and “…output responses associated with the policy content…” as drafted cover mental activities which can be performed in the mind or with the aid of pen and paper. Taken individually, or as a whole, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the steps of the claimed invention can be performed mentally, and the technological elements describe only generic computer hardware as a tool for carrying out the steps. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 2, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the processor is further configured to identify a topic of the interaction session and a posted interaction within the interaction session that comprises upvotes,assign a weight to the posted interaction based on the upvotes, andtrain the response-generating model based on the weight assigned to the posted interaction.”
Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 3, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the processor is further configured to identify a topic of the interaction session and a posted interaction within the interaction session that comprises downvotes,assign a weight to the posted interaction based on the downvotes, andtrain the response-generating model based on execution of the response-generating model on the weight assigned to the posted interaction.”
Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 6, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the processor is configured to determine a geographic location of a source of the interaction content, and retrieve the subset of vectors based on the geographic location of the source.”
Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 7, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the processor is further configured to determine a topic of the interaction session based on execution of at least one ML model on the interaction session, andtrain the response-generating model based on the topic of the interaction session.”
Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 8, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the processor is configured to add the policy identifier to a portion of the interaction session which discusses a corresponding policy, prior to training the response-generating model.”
Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claims 9-11 and 14-16, method claims 9-11 and 14-16 and apparatus claims 1-3 and 6-8 are related as a method and apparatus of using the same, with each apparatus element’s function corresponding to the method step. Accordingly, claims 9-11 and 14-16 are similarly rejected under the same rationale as applied to claims 1-3 and 6-8.
Regarding claim 12, the claim depends from claim 9, and thus recites the limitations of claim 9, “wherein the interaction content comprises a plurality of posted interactions to the interaction session, and the method further comprises identifying upvotes and downvotes assigned to the plurality of posted interactions,generating a ranking of the plurality of posted interactions based on the upvotes and downvotes,assigning weights to the plurality of posted interactions based on the ranking, andtraining the response-generating model based on the weights assigned to the plurality of posted interactions.”
Taken individually, or as a whole with claim 9, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 13, the claim depends from claim 9, and thus recites the limitations of claim 9, “further comprising determining an accuracy of a posted interaction based on execution of at least one ML model on the policy and the posted interaction,assigning a weight to the posted interaction based on the accuracy, andtraining the response-generating model based on execution of the response-generating model on the weight assigned to the posted interaction.”
Taken individually, or as a whole with claim 9, these limitations describe acts which are equivalent to human mental work of data organization and compliance analysis. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 17, computer-readable medium claim 17 and method claim 1 are related as method and computer-readable medium for performing the same, with each computer-readable medium element’s function corresponding to the method step. Accordingly, claim 17 is similarly rejected under the same rationale applied to claim 1.
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims do not fall within at least one of the four categories of patent eligible subject matter because they are drawn to a signal per se, as recited in the "computer-readable storage medium" of the preambles. Paragraphs [0004], [0007], [0010], [0012], [0176], [0177], [0193] and [0196] of the Specification disclose a computer-readable medium, but do not limit that medium to non-transitory embodiments. Hence, one of ordinary skill in the art can interpret such a medium to include transitory signals. Accordingly, the claims are not patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1, 7, 9, 15, 17, 21-22 and 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 12,282,743 to Kelkar et al. (hereinafter, Kelkar) in view of U.S. Patent Application Publication 2024/0012841 to Voyles et al. (hereinafter, “Voyles”) and further in view of U.S. Patent Application Publication 2025/0181899 to Ajmera et al. (hereinafter, “Ajmera”).
Regarding claims 1, 9 and 17, Kelkar teaches a method, computer-readable medium and an apparatus comprising: a memory (Claim 12, "A system for automatically generating a configuration for an autonomous conversational system, the system comprising: one or more processors; one or more memory storage devices storing executable instructions thereon,"); and
a processor coupled to the memory (Claim 12, "A system for automatically generating a configuration for an autonomous conversational system, the system comprising: one or more processors;"), the processor configured to:
receive interaction content and a policy identifier of policy content from an interaction session between devices (column 3, line 35, "One key idea in the invention is that most entities have logs or records of past conversations that have already occurred between users and human agents, and an AI based system that uses a discovery based approach to inspect these logs and bootstrap itself into an ACAI system by learning to mimic what real human agents have communicated and done in the past when interacting with users," and column 5, line 14, "FIG. 6 lists a sample PII-removed, annotated conversation between a User and the Agent where the User begins the conversation by wanting to cancel their subscription to an entity's services and rejects the Agent's counter-offer for lowering the price of the subscription, prompting the Agent to finally cancel their subscription. The figure shows the turn-level auto-intent, turn-level auto-response and sentence-level auto-intent annotations predicted by the Generative DNN model at each turn of the conversation."),
annotate the interaction content with the policy identifier (column 5, line 38, "The Annotation module for Conversation Level Auto-Topics and Auto-Subtopics (206) (FIG. 2) uses a Generative DNN model (206-a) to annotate each conversation with an auto-topic and auto-subtopic. The topic is the general theme of the conversation, and the subtopic is the primary task that was accomplished or the primary outcome of the conversation."),
aggregate the interaction content with previously received and annotated interaction content to generate aggregated content (column 8, line 48, "To improve over time, the ACAI system periodically repeats the discovery process using the conversations between the user where the user requested an agent handoff and the agent has handled the user request, because these very conversations have now become historical user agent conversations."), and
provide a response-generating model to output responses associated with the policy content based on the identified drift and the policy content (column 2, line 18, "Dialog Manager is the component that keeps track of the overall progress of the conversation, since it can involve multiple back and forth turns, history of what the user has said and the [Natural Language Understanding] has understood, and what the [Natural Language Generation] has responded with, and uses the history and state of the conversation as context for enabling the Conversational AI system to taking certain actions if needed, such as retrieving information from a database or an API.").
Kelkar does not explicitly teach “convert the aggregated content into vectors that are labelled with policies discussed in the aggregated content and store the vectors in a vector database,” or “identify a drift between a current policy implementation and the policy content based on execution of a drift detection model on the subset of vectors and the policy content,” and thus, Voyles is introduced.
Voyles teaches convert the aggregated content into vectors that are labelled with policies discussed in the aggregated content and store the vectors in a vector database (paragraph [0053], "In some examples, method 100 can begin at step 102, wherein step 102 comprises receiving a dataset representing model operations executed by a trained model. The dataset may include various data received by the trained model from a user of the model, such as user input data associated with one or more user queries or user conversations with the model. The dataset may also include various data generated by the trained model during model operations. For instance, the dataset may include model output data associated with one or more outputs from the trained model to the user of the model. Additionally or alternatively, the dataset may include or one or more pieces of data generated by the model responsive to user input data, such as vector data representative of the user input data, an intent classification of the user input data, or other data associated with the user input data. In some examples, the dataset further includes one or more labels applied by the trained model or by a user of the trained model, such as one or more labels associated with a particular intent classification of the data," and paragraph [0059], "As data is received representing model operations executed by the trained model, the data may be stored in a shared memory accessible by the trained model and the drift detection system… Data stored in the shared memory may include the user input data itself (e.g., raw natural language user queries input into a chatbot during a conversation with a user), vector data representative of the user input data, a predicted intent classification associated with the user input data, model output data (e.g., a natural language output from a trained model responsive to a user query), and/or other information about the model operations of the trained model."), and
identify a drift between a current policy implementation and the policy content based on execution of a drift detection model on the subset of vectors and the policy content (paragraph [0058], "In some examples, the drift detection system receives datasets in an iterative process. The various datasets may be analyzed individually and comparatively by the drift detection system to identify differences in the data included within each dataset," and paragraph [0072], "The drift detection system may compare the result data of a first dataset with the result data of a second dataset to identify differences between the result data of the first dataset and the result data of the second dataset that may be indicative of concept drift or concept evolution.").
Kelkar and Voyles are considered analogous because they are each concerned with adaptation of machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Kelkar with the teachings of Voyles for the purpose of improving adaptation performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
The combination of Kelkar and Voyles does not teach “retrieve a subset of vectors from the vector database based on a comparison of the policy identifier to the labelled policies of the vectors,” and thus, Ajmera is introduced.
Ajmera teaches retrieve a subset of vectors from the vector database based on a comparison of the policy identifier to the labelled policies of the vectors (paragraph [0071], "The search module 534 may be used to execute search requests on the embedding database 536 by making an API call to retrieve relevant embeddings using, for example, a similarity search or semantic similarity search that compares one or more embeddings of a prompt to embeddings in the embedding database to determine the relevant embeddings. The search module may also use tags, labels, filters, or other techniques to specify a selection of embeddings from which a relevant embedding may be retrieved. In this way, an input query may be received by the system, processed by the query processing module 540 using the embedding generation module 538 to generate an embedding for the query. The search module may identify and/or retrieve one or more embeddings from the embedding database 536.").
Kelkar, Voyles and Ajmera are considered analogous because they are each concerned with historical or aggregate training for machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar and Voyles with the vector retrieval of Ajmera for the purpose of improving retrieval efficiency. A person of ordinary skill in the art has good reason to pursue the known options within the field of the invention, that of retrieving related records rather than a complete set or random set. If this leads to the anticipated success, it is likely that product is not of innovation but of ordinary skill and common sense.
Regarding claims 7 and 15, Kelkar further teaches and apparatus, method and storage medium wherein the processor is further configured to determine a topic of the interaction session based on execution of at least one ML model on the interaction session (column 5, line 38, "The Annotation module for Conversation Level Auto-Topics and Auto-Subtopics (206) (FIG. 2) uses a Generative DNN model (206-a) to annotate each conversation with an auto-topic and auto-subtopic."), and
train the response-generating model based on the topic of the interaction session (column 7, line 49, "In one embodiment of the present invention, the ACAI system configuration has now been created using the Configuration Generator (211) and it contains stories annotated with intents, responses, fulfillment actions and slots. The ACAI system then self-trains itself using this configuration and yields an NLU model, a DM model, an NLG model and a configured Action Server.").
Regarding claim 21, Voyles teaches an apparatus configured to identify an incorrectly performed action associated with the policy based on the drift detection model, and train the response-generating model to learn a correct policy implementation of the incorrectly performed action (paragraph [0050], "The systems and methods may be implemented in a drift detection system that receives data about the operation of a trained model, and then uses the data to determine the drift and evolution of the intent classifications associated with user inputs into the trained model over time. When concept drift or concept evolution is detected in the intent classification of user inputs by the trained model, a drift summary report can be transmitted, prompting an administrator or developer of the model with instructions to update or retrain the model to correct the detected concept drift and/or concept evolution.").
Kelkar and Voyles are considered analogous because they are each concerned with adaptation of machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Kelkar with the teachings of Voyles for the purpose of improving adaptation performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 22, Voyles further teaches an apparatus configured to generate training content that describes a correct policy implementation based on the identified drift (paragraph [0100], "If concept drift is present, the instructions may include information about the character and extent of the concept drift. For instance, the instructions may include information about the difference in maximum mean discrepancy of the dataset(s), a least-squares density difference of the dataset(s), a vector quantization of the dataset(s), a partitioning of the dataset(s) (e.g., partitioning of the dataset(s) into one or more clusters), a divergence of the dataset(s), or an uncertainty of the dataset(s)."), and
execute the response-generating model on the training content to train the response-generating model (paragraph [0113], "Additionally or alternatively, the trained model may be configured to update automatically after receiving the transmitted instructions from the drift detection system (e.g., by accessing the instructions in a shared memory after the instructions have been transmitted to the shared memory by the drift detection system). For instance, transmitting the instruction to update training of the trained model may include transmitting executable program code configured to cause the trained model to be retrained when the code is executed by one or more processors.").
Kelkar and Voyles are considered analogous because they are each concerned with adaptation of machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Kelkar with the teachings of Voyles for the purpose of improving adaptation performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 24, Voyles further teaches an apparatus configured to arrange the vectors within the vector database into clusters according to shared policy identifiers to enable retrieval of vectors having a same policy identifier (paragraph [0064], "In some examples, the data processing operations include performing a spatial clustering analysis to visualize the relationship between the user input data and intent classifications in a training dataset used to train the model. In the spatial clustering analysis, the drift detection system may analyze various information about the dataset, such as the predicted intent classifications of data in the dataset, the likely intent classification of the data in the dataset determined by the trained model, and whether the data in the dataset is likely to fit within the predefined user intent classifications of the trained model. The spatial clustering analysis generated by the drift detection system may be used to evaluate datasets from the trained model and/or to predict whether a particular dataset or group of datasets is likely to cause concept drift or concept evolution in the trained model and to determine when model retraining is needed.").
Kelkar and Voyles are considered analogous because they are each concerned with adaptation of machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Kelkar with the teachings of Voyles for the purpose of improving adaptation performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 25, Kelkar teaches an apparatus configured to automatically train the response-generating model in response to identification of the drift (column 6, line 4, "Furthermore, the Generative DNN (206-a) discovers most auto-topics and auto-subtopics from the logs and it is good enough to be deployed, however, it is possible that some topics and subtopics are missed and will result in the user requiring a hand-off to a human agent. This can be addressed in the automatic retraining stage. As new flows are discovered with new historical conversation logs, the Generative DNN (206-a) discovers more auto-topics and auto-subtopics thereby increasing the coverage with time and eventually leading to convergence.").
Claims 2-3, 10-13 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Kelkar, Voyles and Ajmera as applied to claims 1 and 9 above, further in view of WIPO Publication WO 2021/111267 to Chakraborti et al. (hereinafter, "Chakraborti").
Regarding claim 2, the combination of Kelkar, Voyles and Ajmera does not explicitly teach an apparatus to include “identify a topic of the interaction session and a posted interaction within the interaction session that comprises upvotes,” “assign a weight…” or “train the response-generating model based on the weight assigned to the posted interaction,” and thus, Chakraborti is introduced.
Chakraborti teaches an apparatus wherein the processor is further configured to identify a topic of the interaction session and a posted interaction within the interaction session that comprises upvotes (paragraph [0061], "Features may include author name, author status, upvotes, downvotes, length of response, comments to the response by other users indicating that the solution works, etc., or any combination thereof. In general, a feature contains information that may be used to assess the quality (e.g., efficiency, effectiveness, etc.) of the answer to a question and may act as a reward signal to a RL system."),
assign a weight to the posted interaction based on the upvotes (paragraph [0042], "Once the query is matched to content in the knowledge store (e.g., question answer pairs), NLP engine 110 and RL model 115 are configured to generate a policy based on this content and to rank the (state, action) pairs of the policy (e.g., based on upvotes, author name, author status, replies, etc.) present within the forum," and paragraph [0057], "During training, a query may be formulated by the RL software agent 105. The RL software system identifies a policy generated by a RL model, wherein the state corresponds or is similar to the state of the RL software agent. Once a match between the current state and policy state is performed, a correcting action may be selected, and the action performed in the sandbox. If the action resolves the query, a reward is received (e.g., successfully resolving a user computing issue such as opening a file). On the other hand, if the action fails, the RL software agent will learn that the selected action did not resolve the issue, and will not select this action for the corresponding state in the future."), and
train the response-generating model based on the weight assigned to the posted interaction (paragraph [0062], "The RL model may be trained based on these features to efficiently determine a series of actions leading to a desired goal or solution.").
Kelkar, Voyles, Ajmera and Chakraborti are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Chakraborti for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 3, Chakraborti further teaches an apparatus wherein the processor is further configured to identify a topic of the interaction session and a posted interaction within the interaction session that comprises downvotes (paragraph [0061], "Features may include author name, author status, upvotes, downvotes, length of response, comments to the response by other users indicating that the solution works, etc., or any combination thereof. In general, a feature contains information that may be used to assess the quality (e.g., efficiency, effectiveness, etc.) of the answer to a question and may act as a reward signal to a RL system."),
assign a weight to the posted interaction based on the downvotes (paragraph [0042], "Once the query is matched to content in the knowledge store (e.g., question answer pairs), NLP engine 110 and RL model 115 are configured to generate a policy based on this content and to rank the (state, action) pairs of the policy (e.g., based on upvotes, author name, author status, replies, etc.) present within the forum," and paragraph [0057], "During training, a query may be formulated by the RL software agent 105. The RL software system identifies a policy generated by a RL model, wherein the state corresponds or is similar to the state of the RL software agent. Once a match between the current state and policy state is performed, a correcting action may be selected, and the action performed in the sandbox. If the action resolves the query, a reward is received (e.g., successfully resolving a user computing issue such as opening a file). On the other hand, if the action fails, the RL software agent will learn that the selected action did not resolve the issue, and will not select this action for the corresponding state in the future."), and
train the response-generating model based on execution of the response-generating model on the weight assigned to the posted interaction (paragraph [0022], "The RL software agent may be trained using reinforcement learning, wherein data from the knowledge database provides information pertaining to actions. The RL software agent may be tested in a sandbox environment, which is typically a non-production environment using the simulator, prior to deployment.").
Kelkar, Voyles, Ajmera and Chakraborti are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Chakraborti for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 10, Chakraborti teaches a method comprising identifying a topic of the interaction session and a posted interaction within the interaction session that comprises upvotes (paragraph [0061], "Features may include author name, author status, upvotes, downvotes, length of response, comments to the response by other users indicating that the solution works, etc., or any combination thereof. In general, a feature contains information that may be used to assess the quality (e.g., efficiency, effectiveness, etc.) of the answer to a question and may act as a reward signal to a RL system."),assigning a weight to the posted interaction based on the upvotes (paragraph [0042], "Once the query is matched to content in the knowledge store (e.g., question answer pairs), NLP engine 110 and RL model 115 are configured to generate a policy based on this content and to rank the (state, action) pairs of the policy (e.g., based on upvotes, author name, author status, replies, etc.) present within the forum," and paragraph [0057], "During training, a query may be formulated by the RL software agent 105. The RL software system identifies a policy generated by a RL model, wherein the state corresponds or is similar to the state of the RL software agent. Once a match between the current state and policy state is performed, a correcting action may be selected, and the action performed in the sandbox. If the action resolves the query, a reward is received (e.g., successfully resolving a user computing issue such as opening a file). On the other hand, if the action fails, the RL software agent will learn that the selected action did not resolve the issue, and will not select this action for the corresponding state in the future."), andtraining the response-generating model based on execution of the response-generating model on the weight assigned to the posted interaction (paragraph [0022], "The RL software agent may be trained using reinforcement learning, wherein data from the knowledge database provides information pertaining to actions. The RL software agent may be tested in a sandbox environment, which is typically a non-production environment using the simulator, prior to deployment.").
Kelkar, Voyles, Ajmera and Chakraborti are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Chakraborti for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 11, Chakraborti teaches a method comprising identifying a topic of the interaction session and a posted interaction within the interaction session that comprises downvotes (paragraph [0061], "Features may include author name, author status, upvotes, downvotes, length of response, comments to the response by other users indicating that the solution works, etc., or any combination thereof. In general, a feature contains information that may be used to assess the quality (e.g., efficiency, effectiveness, etc.) of the answer to a question and may act as a reward signal to a RL system."),
assigning a weight to the posted interaction based on the downvotes (paragraph [0042], "Once the query is matched to content in the knowledge store (e.g., question answer pairs), NLP engine 110 and RL model 115 are configured to generate a policy based on this content and to rank the (state, action) pairs of the policy (e.g., based on upvotes, author name, author status, replies, etc.) present within the forum," and paragraph [0057], "During training, a query may be formulated by the RL software agent 105. The RL software system identifies a policy generated by a RL model, wherein the state corresponds or is similar to the state of the RL software agent. Once a match between the current state and policy state is performed, a correcting action may be selected, and the action performed in the sandbox. If the action resolves the query, a reward is received (e.g., successfully resolving a user computing issue such as opening a file). On the other hand, if the action fails, the RL software agent will learn that the selected action did not resolve the issue, and will not select this action for the corresponding state in the future."), and
training the response-generating model based on the weight assigned to the posted interaction (paragraph [0062], "The RL model may be trained based on these features to efficiently determine a series of actions leading to a desired goal or solution.").
Kelkar, Voyles, Ajmera and Chakraborti are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Chakraborti for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 12, Chakraborti further teaches a method wherein the interaction content comprises a plurality of posted interactions to the interaction session, and the method further comprises identifying upvotes and downvotes assigned to the plurality of posted interactions (paragraph [0023], "NLP engine 110 is able to identify the portion of the forum corresponding to a question, one or more answers to that question, along with features associated with each answer that may be provided to the RL model 115 in order for the RL model to rank the answers for a particular question. Features may include author name, author status, upvotes, downvotes, length of response, comments to the response by other users indicating that the solution works, etc., or any combination thereof."),
generating a ranking of the plurality of posted interactions with respect to each other based on the upvotes and downvotes assigned to the plurality of posted interactions (paragraph [0022], "NLP engine 110 is able to identify the portion of the forum corresponding to a question, one or more answers to that question, along with features associated with each answer that may be provided to the RL model 115 in order for the RL model to rank the answers for a particular question. Features may include author name, author status, upvotes, downvotes, length of response, comments to the response by other users indicating that the solution works, etc., or any combination thereof."),
assigning weights to the plurality of posted interactions based on the ranking (paragraph [0042], "Once the query is matched to content in the knowledge store (e.g., question answer pairs), NLP engine 110 and RL model 115 are configured to generate a policy based on this content and to rank the (state, action) pairs of the policy (e.g., based on upvotes, author name, author status, replies, etc.) present within the forum."), and
training the response-generating model based on the weights assigned to the plurality of posted interactions (paragraph [0062], "The RL model may be trained based on these features to efficiently determine a series of actions leading to a desired goal or solution.").
Kelkar, Voyles, Ajmera and Chakraborti are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Chakraborti for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 13, Chakraborti further teaches a method comprising determining an accuracy of a posted interaction based on execution of at least one ML model on the policy and the posted interaction (paragraph [0054], "RL agent 105, which learns through reinforcement learning, may learn policies (state, action) pairs ranked by a RL system that map a (state, action) pair to a value that signifies the usefulness of performing the action in the current state," and paragraph [0055], "Agents 620 may be provided as part of a package along with a domain simulator 120, which may comprise test environment 625 and emulator 630. The emulator 630 may be deployed in the test environment 625 (e.g., a sandbox) to allow testing without being deployed in a real system. The sandbox environment allows both learning and benchmarking/performance."),
assigning a weight to the posted interaction based on the accuracy (paragraph [0054], "RL agent 105, which learns through reinforcement learning, may learn policies (state, action) pairs ranked by a RL system that map a (state, action) pair to a value that signifies the usefulness of performing the action in the current state."), and
training the ML model based on execution of the ML model on the posted interaction and the weight assigned to the posted interaction (paragraph [0055], "Agents 620 may be provided as part of a package along with a domain simulator 120, which may comprise test environment 625 and emulator 630. The emulator 630 may be deployed in the test environment 625 (e.g., a sandbox) to allow testing without being deployed in a real system. The sandbox environment allows both learning and benchmarking/performance.").
Kelkar, Voyles, Ajmera and Chakraborti are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Chakraborti for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 23, Chakraborti further teaches an apparatus configured to assign a weight to at least one vector based on upvotes or downvotes posted to a message window in association with interaction content represented by the at least one vector (paragraph [0022], "NLP engine 110 is able to identify the portion of the forum corresponding to a question, one or more answers to that question, along with features associated with each answer that may be provided to the RL model 115 in order for the RL model to rank the answers for a particular question. Features may include author name, author status, upvotes, downvotes, length of response, comments to the response by other users indicating that the solution works, etc., or any combination thereof," and paragraph [0042], "Once the query is matched to content in the knowledge store (e.g., question answer pairs), NLP engine 110 and RL model 115 are configured to generate a policy based on this content and to rank the (state, action) pairs of the policy (e.g., based on upvotes, author name, author status, replies, etc.) present within the forum."), and
train the response- generating model based on the weight (paragraph [0062], "The RL model may be trained based on these features to efficiently determine a series of actions leading to a desired goal or solution.").
Kelkar, Voyles, Ajmera and Chakraborti are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Chakraborti for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kelkar, Voyles and Ajmera as applied to claims 1 and 9 above, and further in view of U.S. Patent Application Publication 2023/0252980 to Kumar (hereinafter, "Kumar").
Regarding claims 6 and 14, the combination of Kelkar, Voyles and Ajmera does not teach an apparatus or method to include “the processor is configured to determine a geographic location of a source of the interaction content,” or “retrieve the subset of vectors based on the geographic location of the source,” and thus Kumar is introduced.
Kumar teaches an apparatus and method wherein the processor is configured to determine a geographic location of a source of the interaction content (paragraph [0027], "One or more embodiments determine that a set of conversations, across different types of communication channels, correspond to a same transaction. The system extracts data from each of the set of conversations and stores the extracted data in association with the same transaction. The transaction may include a transaction name. For example, a transaction name may include a geographic location and type of the transaction,"), and
retrieve the subset of vectors based on the geographic location of the source (paragraph [0072], "The conversation segments may be mapped to memory locations, such that selection of a particular interface element of a graphical user interface (GUI) representing a particular candidate transaction retrieves content associated with the conversation segments mapped to a particular candidate transaction.").
Kelkar, Voyles, Ajmera and Kumar are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Kumar for the purpose of improving model performance for certain users. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Kelkar, Voyles and Ajmera in view of China Invention Application 116894498 to Ruan et al. (hereinafter, "Ruan").
Regarding claims 8 and 16, the combination of Kelkar, Voyles and Ajmera does not explicitly teach an apparatus or method “wherein the processor is configured to add the policy identifier to a portion of the interaction session which discusses a corresponding policy, prior to training the response-generating model,” and thus, Ruan is introduced.
Ruan teaches an apparatus and method wherein the processor is configured to add the policy identifier to a portion of the interaction session which discusses a corresponding policy, prior to training the response-generating model (page 2, "According to one aspect of the embodiment of the present application, a policy identification method based on query dialog is provided, and the method comprises: inputting the inquiry conversation of the target object into the network model to obtain the strategy type of the inquiry conversation and the current conversation turn of the inquiry conversation; if the policy type output by the network model is the pocket bottom policy, fine tuning the pocket bottom policy output by the network model to obtain the target policy type to which the inquiry conversation belongs, wherein the network model is the network model obtained by training the network model training method.").
Kelkar, Voyles, Ajmera and Ruan are considered analogous because they are each concerned with training machine models for dialog. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the combination of Kelkar, Voyles and Ajmera with the teachings of Ruan for the purpose of improving model performance. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
U.S. Patent 10,949,454 to Conley et al.
U.S. Patent 11,151,328 to Prasad et al.
U.S. Patent 11,580,968 to Gupta et al.
U.S. Patent 11,741,956 to Chaudhary et al.
U.S. Patent 11,960,514 to Taylert et al.
U.S. Patent Application Publication 2018/0165582 to Cha.
U.S. Patent Application Publication 2019/0180196 to Terry et al.
U.S. Patent Application Publication 2021/0150385 to Mallette et al.
U.S. Patent Application Publication 2022/0327120 to Yanez Lucero et al.
U.S. Patent Application Publication 2023/0188480 to Menon et al.
U.S. Patent Application Publication 2023/0289167 to Gangireddy.
U.S. Patent Application Publication 2023/0316138 to Nguyen et al.
U.S. Patent Application Publication 2025/0028879 to Barr et al.
U.S. Patent Application Publication 2025/0104700 to Henault-Ethier et al.
U.S. Patent Application Publication 2025/0210034 to Karla et al.
U.S. Patent Application Publication 2025/0219968 to Gong et al.
Canadian Patent Application CA 3085315 to Mazza et al.
European Patent Specification EP 3596727 to Thomson et al.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T SMITH whose telephone number is (571)272-6643. The examiner can normally be reached Monday - Friday 8:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SEAN THOMAS SMITH/Examiner, Art Unit 2659    

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Mar 28, 2024
Application Filed
Dec 12, 2025
Non-Final Rejection mailed — §101, §103
Jan 21, 2026
Examiner Interview Summary
Jan 21, 2026
Applicant Interview (Telephonic)
Mar 12, 2026
Response Filed
Apr 22, 2026
Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/338,033
Patent 12626056
GENERATING NATURAL LANGUAGE MODEL INSIGHTS FOR DATA CHARTS USING LIGHT LANGUAGE MODELS DISTILLED FROM LARGE LANGUAGE MODELS
2y 10m to grant Granted May 12, 2026
18/393,807
Patent 12602540
LEVERAGING A LARGE LANGUAGE MODEL ENCODER TO EVALUATE PREDICTIVE MODELS
2y 3m to grant Granted Apr 14, 2026
18/092,987
Patent 12530534
SYSTEM AND METHOD FOR GENERATING STRUCTURED SEMANTIC ANNOTATIONS FROM UNSTRUCTURED DOCUMENT
3y 0m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 3 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+25.0%)
2y 9m (~6m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 7 resolved cases by this examiner. Grant probability derived from career allowance rate.