Detailed Action
This Office Action is in response to the remarks entered on 03/03/2026. Claims 1, 3-7, 12, and 14-18 are currently pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Amended claims were received on 03/03/2026. Claim Objections have been withdrawn.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Gao et al. (Gao et al, “S3: Social-network Simulation System with Large Language Model-Empowered Agents”, 19 Oct 2023, hereinafter ‘Gao’) in view of Zheng et al. (Zheng et al, “STEVE-EYE: EQUIPPING LLM-BASED EMBODIED AGENTS WITH VISUAL PERCEPTION IN OPEN WORLDS”, 7 Dec 2023, hereinafter ‘Zheng’) in view of Agarwal et al. (Agarwal et al, “AI-powered decision making for the bank of the future”, 2021, hereinafter ‘Agarwal’) and further in view of Farooq et al. (US 20210182739 A1 hereinafter ‘Farooq’).
Regarding claim 1,
Gao teaches:
A method for training and operating an artificial intelligence digital twin system with large language model (LLM) agents, comprising: ([Gao, page 1, Abstract, line 3-10] discloses generating human-like LLM models that emulates a genuine human within the social network)
collecting a plurality of [Gao, page 5, 3.2 Social Network Environment, line 9-22] discloses collecting real data with users, social connections (i.e., behavioral data), and textual posts (i.e., text communication. Also can be interpreted as the computational device interactions as a device is required to post texts in social media) in social media and capturing user demographic features from textual information using LLM, with a particular emphasis on predicting Age, Gender, and Occupation to generate more authentic representation of user’s actions and interactions)
, wherein the digital representation emulates the user’s communication patterns and decision-making processes based on the collected; ([Gao, page 1, Abstract, line 3-10] discloses generating human-like LLM models that emulates a genuine human (a digital twin of the user) within the social network. [Gao, page 5, 3.3 Individual-level Simulation, line 3-5] discloses that individual simulations are performed to simulate emotion, attitude, and interaction behavior. Following paragraphs [Gao, page 5, 3.3.1 Emotion Simulation] discloses simulating emotion using an LLM, [Gao, page 6, 3.3.2 Attitude Simulation, line 13-15] discloses inputting the user profiles and user history and using a Markov process to simulate the attitude, [Gao, page 7, line 1-5] discloses inputting the user’s profile to the models and generating content using the models, and [Gao, page 7, 3.3.4 Interactive Behavior Simulation] discloses having LLMs that corresponds to emotion simulation, attitude simulation, content-generation behavior simulation, and interactive behavior simulation. The LLMs and Markov process are interpreted as the ensemble model. [page 2, 3rd para, line 6-14] discloses that the textual outputs can influence the environment and interact (communicate) with other agents and users)
Gao does not specifically disclose:
collecting a plurality of multimodal data streams … comprising financial transactions
processing the plurality of multimodal data streams through specialized data processing pipelines;
training a plurality of specialized artificial intelligence models specialized to perform different specific tasks using the processed multimodal data streams;
combining the plurality of specialized artificial intelligence models into an ensemble model architecture to create a digital representation of the user, … based on the collected multimodal data streams and autonomously executes actions through authenticated interfaces with external systems including communication platforms, banking systems, and property management tools on behalf of the user
operating the ensemble model in a tethered mode; and
measuring one or more performance metrics of the ensemble model operating in the tethered mode;
transitioning operating the ensemble model to an autonomous untethered mode responsive to achieving predetermined performance thresholds in the tethered mode.
Zheng teaches:
collecting a plurality of multimodal data streams ([Zheng, page 5, Fig. 4] and [Zheng, page 5, 3.2 Modal Architecture, line 14-20] discloses receiving multimodal data streams including visual data and text data)
processing the plurality of multimodal data streams through specialized data processing pipelines; ([Zheng, page 5, Fig. 4], [Zheng, page 5, 3.2 Modal Architecture, line 1-8] and [Zheng, 4.1 Experimental Setup, page 6, line 1-6] discloses utilizing a visual tokenizer for visual data and a text tokenizer for text data)
training a plurality of specialized artificial intelligence models specialized to perform different specific tasks using the processed multimodal data streams; ([Zheng, page 5, Fig. 4], [Zheng, page 5, 3.2 Modal Architecture, line 1-20] and [Zheng, 4.1 Experimental Setup, page 6, line 1-6] discloses utilizing a visual tokenizer (specialized AI models) for visual data and a text tokenizer for text data. The visual tokenizer is VQ-GAN which is a trainable artificial intelligence model. The LLM backbone is built upon a decoder-only architecture with causal transformers and performs specialized tasks including generating responses based on input tokens. The text tokenizer-LLM backbone is the specialized AI model specialized to perform the specific task. [Zheng, page 6, line 1-12] discloses that the training phase involves fine-tuning token embeddings)
combining the plurality of specialized artificial intelligence models into an ensemble model architecture … based on the collected multimodal data streams and autonomously executes actions through authenticated interfaces with external systems including communication platforms[Zheng, page 5, Fig. 4], [Zheng, page 5, 3.2 Modal Architecture, line 1-8] and [Zheng, 4.1 Experimental Setup, page 6, line 1-6] discloses utilizing a visual tokenizer for visual data and a text tokenizer for text data (specialized AI models). The visual tokenizer is VQ-GAN which is a trainable artificial intelligence model)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would
have been obvious, having the teachings of Gao and Zheng to use a method of receiving multimodal input data and utilizing a plurality of specialized AI models to process the multimodal input data of Zheng to implement the AI system of Gao. The suggestion and/or motivation is intended to improve efficiency of the LLM system [Zheng, page 3, 2.2 Large Multimodal Models (LMMs), line 6-11].
Gao in view of Zheng does not specifically disclose:
collecting a plurality of multimodal data streams … comprising financial transactions
autonomously executes actions through authenticated interfaces with external systems including communication platforms, banking systems, and property management tools on behalf of the user
operating the ensemble model in a tethered mode; and
measuring one or more performance metrics of the ensemble model operating in the tethered mode;
transitioning operating the ensemble model to an autonomous untethered mode responsive to achieving predetermined performance thresholds in the tethered mode.
Agarwal teaches:
collecting a plurality of data streams … comprising financial transactions ([Agarwal, page 4, left col, line 4 – right col, line 46], [Agarwal, page 9, left col, Augmented AA/ML models with edge capabilities, line 1 – right col, line 25], [Agarwal, page 10, left col, line 1-2] and [Agarwal, page 11, Exhibit 6] discloses autonomously execute decisioning layer of the AI system to assist and/or make decision in real time. The paragraph and the figure disclose autonomously make decisions (execute actions) on behalf of human (e.g., bank agent or employees) based on the data collected from the raw data lake, which includes internal structured data such as payment behavior (financial transactions), product holding, and clickstream data)
autonomously executes actions through authenticated interfaces with external systems including ([Agarwal, page 4, left col, line 4 – right col, line 46], [Agarwal, page 9, left col, Augmented AA/ML models with edge capabilities, line 1 – right col, line 25], [Agarwal, page 10, left col, line 1-2] and [Agarwal, page 11, Exhibit 6] discloses autonomously execute decisioning layer of the AI system to assist and/or make decision in real time. The paragraph and the figure disclose autonomously make decisions (execute actions) on behalf of human (e.g., bank agent or employees) based on the data collected from the raw data lake, which includes internal structured data such as payment behavior (financial transactions), product holding, and clickstream data. The banking system itself can be interpreted as the property management tools, as the broadest reasonable interpretation of ‘property’ includes money, loan, bond… etc. [Agarwal, page 2, right col, line 16-26] The engagement layer is the authenticated interface with external systems such as mobile app, contact center, website … and Martech stack, Data mgmt. platform (DMP) is also can be interpreted as the interface)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would
have been obvious, having the teachings of Gao, Zheng and Agarwal to use a method of receiving multimodal input data comprising financial transaction and autonomously execute actions through interfaces with external systems including banking systems to process the multimodal input data of Agarwal to implement the AI system of Gao. The suggestion and/or motivation is to extend the usability of the LLM system to financial management.
However, Gao, Zheng, and Agarwal does not specifically disclose:
operating the ensemble model in a tethered mode; and
measuring one or more performance metrics of the ensemble model operating in the tethered mode;
transitioning operating the ensemble model to an autonomous untethered mode responsive to achieving predetermined performance thresholds in the tethered mode.
Farooq teaches:
operating the ensemble model in a tethered mode; and ([Farooq, 0024 and 0027] discloses running the generated Random Forest Classifier 104 on the observation data 122. The testing phase is interpreted as the tethered mode, as the testing is done before the propagation (untethered mode) and performed on the collected observation data)
measuring one or more performance metrics of the ensemble model operating in the tethered mode; ([Farooq, 0024 and 0027] The training phase operate on all observations from various time periods to determine the Out-of-Bag scores and validation score (e.g., a percentage of correct answers) during the testing phrase based on whether the Random Forest Classifier (i.e., ensemble model) 104 correctly identifies conditions of the electronic devices based on the observations from the second subset of the time periods. The testing phase is interpreted as the tethered mode, as the testing is done before the propagation (untethered mode) and performed on the collected observation data)
transitioning operating the ensemble model to an autonomous untethered mode responsive to achieving predetermined performance thresholds in the tethered mode. ([Farooq, 0028] If the validation score and the OOB score determined based on the method disclosed in [Farooq, 0027] and [Farooq, 0024] meets thresholds respectively, the Random Forest Classifier is propagated (i.e., untethered mode). [Farooq, 0029] discloses details about how the server propagates the evaluated Random Forest Classifier 104 to vehicle 116)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would
have been obvious, having the teachings of Gao, Zheng, Agarwal and Farooq to use a method of deploying trained machine learning model that achieved predetermined performance thresholds of Farooq to implement the AI system of Gao. The suggestion and/or motivation is intended to prevent the waste of computing power on small external computer (the computers where the agents are deployed) and improving the efficiency of the machine learning system by training the agent only on a large computing systems specialized for machine learning model training.
Regarding claim 12, Gao in view of Zheng in view of Agarwal and further in view of Farooq teaches:
A system for training and operating an artificial intelligence digital twin system with LLM agents, comprising: a processor; a network communication device positioned in communication with the processor and operable to communicate across a computerized network; and a non-transitory computer-readable storage medium having store thereon software that, when executed by the processor, is operable to: ([Gao, page 1, 1 Introduction, line 1-3] discloses that the simulation is performed using a computer which contains a CPU and memories. [Gao, page 5, 3.2 Social Network Environment, line 9-22] discloses collecting real data with users, social connections (i.e., behavioral data), and textual posts (i.e., text communication. Also can be interpreted as the computational device interactions as a device is required to post texts in social media) in social media and capturing user demographic features from textual information using LLM, with a particular emphasis on predicting Age, Gender, and Occupation to generate more authentic representation of user’s actions and interactions. [Gao, page 11, Prediction Result Evaluation, line 1-7] discloses collecting English blogs during the training and testing process)
Claim 12 is a system claim having similar limitation to the claim 1 above. Therefore, claim 12 is rejected under the same rationale as claim 1.
Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Gao in view of Zheng in view of Agarwal in view of Farooq in view of Malik et al. (US 20190043500 A1, hereinafter ‘Malik’) and further in view of Chien et al. (US 20020066021 A1 hereinafter ‘Chien’).
Regarding claim 3, Gao in view of Zheng teaches:
wherein processing the multimodal data streams comprises: converting the data into a standardized data format; ([Zheng, page 5, Fig. 4], [Zheng, page 5, 3.2 Modal Architecture, line 1-8] and [Zheng, 4.1 Experimental Setup, page 6, line 1-6] discloses utilizing a visual tokenizer for visual data and a text tokenizer for text data (specialized AI models). The token is the standardized data format)
However, Gao in view of Zheng in view of Agarwal and further in view of Farooq does not specifically disclose:
generating processed data by cleaning and normalizing the standardized data;
extracting one or more features for model training from the processed data through one or more of: text extraction, audio transcription, video processing, and image processing;
encrypting the processed data; and
storing the encrypted data in a distributed storage system.
MALIK teaches:
generating processed data by cleaning and normalizing the standardized data; ([Malik, 0028] discloses preprocessing the audio data a) Data cleaning, b) Data integration, c) Data transformation (normalization), and d) Data reduction)
extracting one or more features for model training from the processed data through one or more of: text extraction, audio transcription, video processing, and image processing; ([Malik, 0028] discloses preprocessing the audio data a) Data cleaning, b) Data integration, c) Data transformation (normalization), and d) Data reduction. The data reduction is interpreted as the text extraction process)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would
have been obvious, having the teachings of Gao, Zheng, Agarwal, Farooq, and Malik, to use the data pre-processing technique Malik to implement the AI system of Gao. The suggestion and/or motivation for doing so is to improve the efficiency of the AI system by transforming the input data into a data type that is easier for the ML model to process.
Gao in view of Zheng in view of Agarwal in view of Farooq and further in view of Malik does not specifically disclose:
encrypting the processed data; and
storing the encrypted data in a distributed storage system.
Chien teaches:
encrypting the processed data; and ([Chien, 0243] The preprocessing module preprocess the file by modifying the files and names of directories and encrypts the file)
storing the encrypted data in a distributed storage system. ([Chien, 0243] The preprocessing module preprocess the file by modifying the files and names of directories and encrypts the file. The encrypted files are stored in an import table)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would
have been obvious, having the teachings of Gao, Zheng, Agarwal, Farooq, Malik, and Chien to use the encryption method of Chien to implement the AI system of Gao. The suggestion and/or motivation for doing so is to improve the security of the AI system.
Claim 14 is a system claim having similar limitation to the claim 3 above. Therefore, claim 14 is rejected under the same rationale as claim 3.
Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Gao in view of Zheng in view of Agarwal in view of Farooq and further in view of Khanzada et al. (US 20220037022 A1, hereinafter ‘Khanzada’).
Regarding claim 4, Gao teaches:
wherein the plurality of specialized models comprises at least one of: a language model trained on at least one of written communications and verbal interactions; ([Gao, page 5, 3.2 Social Network Environment, line 9-22] discloses collecting real data with users, social connections (i.e., behavioral data), and textual posts (i.e., text communication. Also can be interpreted as the computational device interactions as a device is required to post texts in social media) in social media and capturing user demographic features from textual information using LLM, with a particular emphasis on predicting Age, Gender, and Occupation to generate more authentic representation of user’s actions and interactions. [Gao, page 11, Prediction Result Evaluation, line 1-7] discloses collecting English blogs during the training and testing process)
Gao in view of Zheng in view of Agarwal in view of Farooq does not specifically disclose an audio model trained on vocal characteristics; a video model trained on at least one of facial mannerisms and facial expressions; and an image model trained on at least one of visual recognition and scene understanding.
Khanzada teaches:
an audio model trained on vocal characteristics; ([Khanzada, 0019] discloses utilizing a plurality of modality classifiers that performs cough classification, deep breathing analysis, temporal data analysis, facial video, fingertip video, and biometric image classification. [Khanzada, 0038] discloses that the classifiers may be trained using machine learning technique. [Khanzada, 0042 and 0044] collectively discloses training the model using audio data)
a video model trained on at least one of facial mannerisms and facial expressions; and ([Khanzada, 0019] discloses utilizing a plurality of modality classifiers that performs cough classification, deep breathing analysis, temporal data analysis, facial video, fingertip video, and biometric image classification. [Khanzada, 0038] discloses that the classifiers may be trained using machine learning technique. [Khanzada, 0026 and 0072] collectively discloses using facial video and/or image data from a user’s smartphone)
an image model trained on at least one of visual recognition and scene understanding. ([Khanzada, 0019] discloses utilizing a plurality of modality classifiers that performs cough classification, deep breathing analysis, temporal data analysis, facial video, fingertip video, and biometric image classification. [Khanzada, 0038] discloses that the classifiers may be trained using machine learning technique. [Khanzada, 0026 and 0072] collectively discloses using facial video and/or image data from a user’s smartphone)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would
have been obvious, having the teachings of Gao, Zheng, Agarwal, Farooq, and Khanzada to use a method of using a plurality of models specialized to perform different specific tasks of Khanzada to implement the AI system of Gao. The suggestion and/or motivation for doing so is to improve the performance of AI agent generation systems by utilizing specialized machine learning model that are specialized to perform a specific task.
Claim 15 is a system claim having similar limitation to the claim 4 above. Therefore, claim 15 is rejected under the same rationale as claim 4.
Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Gao in view of Zheng in view of Agarwal in view of Farooq and further in view of Pfister et al. (US 20230154614 A1 hereinafter ‘Pfister’).
Regarding claim 5, Gao in view of Zheng in view of Agarwal and further in view of Farooq teaches:
wherein training the plurality of specialized artificial intelligence models comprises: ([Farooq, 0021] Each decision trees 104a-104n are trained on different portion of the observation data 122, which makes each decision tree specialized for the portion of the data. The plurality of decision tree classifier 104a-104n are trained and determine based on different inputs, which indicates that each classifier is different and specialized (i.e., custom training orchestration) for the input observation. Each model in 104a-104n before the training and testing phase are the base models)
[Farooq, 0021] Each decision trees 104a-104n are trained on different portion of the observation data 122, which makes each decision tree specialized for the portion of the data. Each model in 104a-104n before the training and testing phase are the base models)
implementing custom training orchestration for each specialized artificial intelligence model of the plurality of specialized artificial intelligence models; ([Farooq, 0021-0022] A plurality of decision tree classifier 104a-104n are trained and determine based on different inputs, which indicates that each classifier is different and specialized (i.e., custom training orchestration) for the input observation. [Farooq, 0026] Each of the decision tree classifiers are combined using the majority voting)
validating the performance of each specialized artificial intelligence model against one or more quality thresholds; and ([Farooq, 0024 and 0027] The training phase operate on all observations from various time periods to determine the Out-of-Bag scores and validation score (e.g., a percentage of correct answers) during the testing phrase based on whether the Random Forest Classifier (i.e., ensemble model) 104 correctly identifies conditions of the electronic devices based on the observations from the second subset of the time periods)
implementing continuous learning capabilities based on user interactions with tools and in a real-world environment. ([Farooq, 0041] The training system 106 determines whether to retrain the Random Forest Classifier 104 based on the state data and new unseen data from the vehicles 116. The retraining process based on new unseen data (i.e., real world environment data) and new state data (i.e., user interaction data) is interpreted as ‘continuous learning capabilities’)
Gao in view of Zheng in view of Agarwal of Farooq and further in view of Khanzada does not specifically disclose:
selecting one or more base models for each modality of data comprised by the plurality of multimodal data streams;
Pfister teaches:
selecting one or more base models for each modality of data comprised by the plurality of multimodal data streams; ([Pfister, 0065 and 0069] collectively discloses generating base learners (i.e., base models) and creating an ensemble using the base learners. As mentioned with reference to FIG. 2, preferably, at least one of the base learner models BLS 1 to BLS k may be selected based on the at least one property associated with the medical data of the test instance. The property includes a modality of the feature)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Gao, Zheng, Agarwal, Farooq, Khanzada, and Pfister to use the method of selecting base models for each modality of data of Pfister to implement the AI system of Gao. The suggestion and/or motivation for doing so is to improve the performance of each AI agent generated by the AI agent by creating custom models based on base models specialized for data of different modalities.
Claim 16 is a system claim having similar limitation to the claim 5 above. Therefore, claim 16 is rejected under the same rationale as claim 5.
Claims 6 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Gao in view of Zheng in view of Agarwal in view of Farooq and further in view of EKER (US 20130262523 A1 hereinafter ‘Eker’).
Regarding claim 6, Gao in view of Zheng teaches:
analyzing incoming requests through a multi-stage evaluation process; ([Zheng, page 5, Fig. 4], [Zheng, page 5, 3.2 Modal Architecture, line 1-8] and [Zheng, 4.1 Experimental Setup, page 6, line 1-6] discloses utilizing a visual tokenizer for visual data and a text tokenizer for text data, and further analyze them using the LLM backbone)
Gao in view of Zheng in view of Agarwal does not specifically disclose:
classifying actions as one of low-risk or high-risk based on predetermined criteria;
approving low-risk actions automatically;
routing high-risk actions for user approval;
wherein operating the ensemble model in the tethered mode comprises:
logging all model actions and model action outcomes; and
updating a behavior of the ensemble model based on the model action outcomes.
Farooq teaches:
wherein operating the ensemble model in the tethered mode comprises: ([Farooq, 0024 and 0027] discloses running the generated Random Forest Classifier 104 on the observation data 122. The testing phase is interpreted as the tethered mode, as the testing is done before the propagation (i.e., transitioning to untethered mode) and performed on the collected observation data)
logging all model actions and model action outcomes; and ([Farooq, 0037 and Fig. 1] The model prediction actions generated by the Random Forest Classifier 104 and state data including sensor data are stored and sent 118 to the server 102. [Farooq, 0036] further disclose adjusting the prediction model when prediction does not align with the state data, which is interpreted as recording model action outcome)
updating a behavior of the ensemble model based on the model action outcomes. ([Farooq, 0037] The model prediction actions generated by the Random Forest Classifier 104 and state data including sensor data are stored and sent 118 to the server 102. [Farooq, 0036] further disclose adjusting the prediction model when prediction does not align with the state data, which is interpreted as recording model action outcome)
Gao in view of Zheng in view of Agarwal in view of Farooq does not specifically disclose:
classifying actions as one of low-risk or high-risk based on predetermined criteria;
approving low-risk actions automatically;
routing high-risk actions for user approval;
EKER teaches:
classifying actions as one of low-risk or high-risk based on predetermined criteria; ([Eker, 0049 and Claim 40] The actions are classified based on the threshold level. Users can be queried as to whether to perform operations when the operation exceeds a threshold, e.g., is assigned a risk level of medium or high. For example, responsive to determining that the risk level of the change is above a threshold risk level, the data processing system can generate a prompt inquiring whether to apply the change)
approving low-risk actions automatically; ([Eker, 0049 and Claim 40] The actions are classified based on the threshold level. Users can be queried as to whether to perform operations when the operation exceeds a threshold, e.g., is assigned a risk level of medium or high, which indicates that the action with a risk level of low will not be queried)
routing high-risk actions for user approval; ([Eker, 0049 and Claim 40] The actions are classified based on the threshold level. Users can be queried as to whether to perform operations when the operation exceeds a threshold, e.g., is assigned a risk level of medium or high. For example, responsive to determining that the risk level of the change is above a threshold risk level, the data processing system can generate a prompt inquiring whether to apply the change)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Gao, Zheng, Agarwal, Farooq, and EKER to use the method of routing high-risk actions for user approval of EKER to implement the AI system of Gao. The suggestion and/or motivation for doing so is to enhance the security of each AI agent by showing high-risk actions that may contain sensitive information to the user before generating output. This allows the user to prevent leakage of sensitive information.
Claim 17 is a system claim having similar limitation to the claim 6 above. Therefore, claim 17 is rejected under the same rationale as claim 6.
Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gao in view of Zheng in view of Agarwal in view of Farooq in view of Chen et al. (US 20180285343 A1 hereinafter ‘Chen’) and further in view of Johnson et al. (US 20190220777 A1 hereinafter ‘Johnson’).
Regarding claim 7, Gao teaches:
generating a digital twin of the user; ([Gao, page 1, Abstract, line 3-10] discloses generating human-like LLM models that emulates a genuine human (a digital twin of the user) within the social network)
operating the digital twin to interact with the ensemble model operating [Gao, page 1, Abstract, line 3-10] discloses generating human-like LLM models that emulates a genuine human (a digital twin of the user) within the social network. [Gao, page 5, 3.3 Individual-level Simulation, line 3-5] discloses that individual simulations are performed to simulate emotion, attitude, and interaction behavior. Following paragraphs [page 5, 3.3.1 Emotion Simulation] discloses simulating emotion using an LLM, [page 6, 3.3.2 Attitude Simulation, line 13-15] discloses inputting the user profiles and user history and using a Markov process to simulate the attitude, [page 7, line 1-5] discloses inputting the user’s profile to the models and generating content using the models, and [page 7, 3.3.4 Interactive Behavior Simulation] discloses having LLMs that corresponds to emotion simulation, attitude simulation, content-generation behavior simulation, and interactive behavior simulation. The LLMs and Markov process are interpreted as the ensemble model)
Gao in view of Zheng in view of Agarwal does not specifically disclose:
wherein transitioning to the autonomous untethered mode comprises:
operating
verifying that the ensemble model operating in the tethered mode meets one or more performance thresholds responsive to at least two queries having different data modalities;
altering a classification of actions for low-risk actions and high-risk actions to result in a greater proportion of automatically approved low-risk actions over a transition period;
identify one or more user approval patterns for high-risk actions by analyzing a plurality of user decisions;
developing one or more risk assessment criteria based on the one or more user approval patterns;
confirming consistent alignment with at least one of one or more user preferences or one or more decision patterns;
implementing one or more additional safety guardrails for autonomous operation prior to enabling autonomous operation;
maintaining comprehensive action logging and monitoring capabilities; and
enabling autonomous operation upon meeting predetermined performance thresholds during the transition period.
Farooq teaches:
wherein transitioning to the autonomous untethered mode comprises: ([Farooq, 0028] If the validation score and the OOB score determined based on the method disclosed in [Farooq, 0027] and [Farooq, 0024] meets thresholds respectively, the Random Forest Classifier is propagated (i.e., untethered mode). [Farooq, 0029] discloses details about how the server propagates the evaluated Random Forest Classifier 104 to vehicle 116)
operating [Farooq, 0024 and 0027] The training phase operate on all observations from various time periods to determine the Out-of-Bag scores and validation score (e.g., a percentage of correct answers) during the testing phase (i.e., operation) )
verifying that the ensemble model operating in the tethered mode ([Farooq, 0024 and 0027] The training phase operate on all observations from various time periods to determine the Out-of-Bag scores and validation score (e.g., a percentage of correct answers) during the testing phase based on whether the Random Forest Classifier (i.e., ensemble model) 104 correctly identifies conditions of the electronic devices based on the observations from the second subset of the time periods. The testing phase is interpreted as the tethered mode, as the testing is done before the propagation (untethered mode) and performed on the collected observation data)
maintaining comprehensive action logging and monitoring capabilities; and ([Farooq, 0037 and Fig. 1] The model prediction actions generated by the Random Forest Classifier 104 and state data including sensor data are stored and sent 118 to the server 102. [Farooq, 0036] further disclose adjusting the prediction model when prediction does not align with the state data, which is interpreted as recording model action outcome)
enabling autonomous operation upon meeting predetermined performance thresholds during the transition period. ([Farooq, 0036] further disclose adjusting the prediction model when prediction does not align with the state data, which is interpreted as recording model action outcome and adjusting the model so that the model can meet the predetermined performance threshold after the transition to the untethered mode [Farooq, 0028] If the validation score and the OOB score determined based on the method disclosed in [Farooq, 0027] and [Farooq, 0024] meets thresholds respectively, the Random Forest Classifier is propagated (i.e., untethered mode) )
Gao in view of Zheng in view of Agarwal and further in view of Farooq does not specifically disclose:
verifying that the ensemble model operating in the tethered mode meets one or more performance thresholds responsive to at least two queries having different data modalities;
altering a classification of actions for low-risk actions and high-risk actions to result in a greater proportion of automatically approved low-risk actions over a transition period;
identify one or more user approval patterns for high-risk actions by analyzing a plurality of user decisions;
developing one or more risk assessment criteria based on the one or more user approval patterns;
confirming consistent alignment with at least one of one or more user preferences or one or more decision patterns;
implementing one or more additional safety guardrails for autonomous operation prior to enabling autonomous operation;
Chen teaches:
verifying that the ensemble model two queries having different data modalities; ([Chen, 0040 and 0058 and Fig. 4A] collectively disclose the ensemble classifier determines whether the provider is safe or not safe based on input data and a threshold. Based on the information from map data store 205, the ensemble classifier 450 determines the risk level of the action. According to Fig. 4A, Feature Vector A-D and Metadata, which are interpreted as ‘at least two queries having different data modalities’ [Chen, 0029] ‘Metadata describes context of feedback not necessarily provided by text, but also includes historical actions of a user or provider’ are input to the Ensemble Classifier 450 to determine the score)
altering a classification of actions for low-risk actions and high-risk actions to result in a greater proportion of automatically approved low-risk actions over a transition period; ([Chen, 0041 and 0048] collectively discloses training the ensemble classifier to adjust weighting or other relationship between each type of the safety sub-scores to generate a sufficient overall safety score, which will change the classification of actions)
confirming consistent alignment with at least one of one or more user preferences [Chen, 0058] discloses confirming by the ensemble classifier 450 whether the duration of the trip greater than the average of the reference duration, which is consistent with the sentiment “driving too slow” from the sample textual feedback)
implementing one or more additional safety guardrails for autonomous operation prior to enabling autonomous operation prior to enabling autonomous operation; ([Chen, 0029] The feedback engine 220 receives metadata associated with feedback and describes certain categories of safety risk associated with the incidents. According to paragraph [Chen, 0030], the metadata is described by the feedback engine 220 prior to [Chen, 0031] analyzing textual feedback received from users. ‘The feedback engine 220 analyzes textual feedback received from users’ is performed automatically and performed after the description process disclosed in [Chen, 0029-0030])
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Gao, Zheng, Agarwal, Farooq, and Chen to use the method of confirming consistent alignment with at least one of one or more user preferences of Chen to implement the AI system of Gao. The suggestion and/or motivation for doing so is to increase the accuracy of each AI agent by checking the output of the AI agents in real-time thereby adjust the model in real-time.
Chen does not specifically disclose:
identify one or more user approval patterns for high-risk actions by analyzing a plurality of user decisions;
developing one or more risk assessment criteria based on the one or more user approval patterns;
Johnson teaches:
altering a classification of actions for low-risk actions and high-risk actions to result in a greater proportion of automatically approved low-risk actions over a transition period; ([Johnson, 0020] and [Johnson, claim 4] collectively disclose the threshold value is determined by comparing the real-time customer communication to a set of communication over a predetermined period of time. The threshold may be based on various factors and considerations. This indicates that the classification is altered by adjusting the threshold in real time)
identify one or more user approval patterns for high-risk actions by analyzing a plurality of user decisions; ([Johnson, 0020] discloses identifying sentiment/tone using actual client emails, messages, texts, writing samples, excerpts, publications, articles, documents and/or other training data. The training process is interpreted as analyzation process. The paragraph further discloses developing a custom sentimental model (i.e., risk assessment criteria) using internal processional language based on neutral, happy/please and angry/disappointed. The sentimental is interpreted as user approval pattern (i.e., decision pattern))
developing one or more risk assessment criteria based on the one or more user approval patterns; ([Johnson, 0020] discloses developing a custom sentimental model (i.e., risk assessment criteria) using internal processional language based on neutral, happy/please and angry/disappointed. The sentimental is interpreted as user approval pattern (i.e., decision pattern))
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Gao, Zheng, Agarwal, Farooq, Chen and Johnson to use the method of developing assessment criteria based on user sentimental pattern of Johnson to implement the AI system of Gao. The suggestion and/or motivation for doing so is to increase the efficiency of each AI agent by increasing the percentage of tasks that are automatically approved, thereby reducing processing time.
Claim 18 is a system claim having similar limitation to the claim 7 above. Therefore, claim 18 is rejected under the same rationale as claim 7.
Response to Arguments
Claim Objections
Amended claims were received on 03/03/2026. Claim Objections have been withdrawn.
Response to Arguments under 35 U.S.C. 103 Rejection
Applicant’s arguments with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Li et al, “Large Language Model-Empowered Agents for Simulating Macroeconomic Activities”, Oct 2023 (This prior art discloses processing multi-modal input data and utilizing autonomous agents to simulate financial activities)
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached Monday – Friday 7:30AM – 4:30PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached at (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JUN KWON/Examiner, Art Unit 2127
/JEREMY L STANLEY/Examiner, Art Unit 2127