DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy (INDIA 202341076442) has been filed in parent Application No. 18,759,399, filed on 6/28/2024.
Receipt is also acknowledged of certified copies of papers required by 37 CFR 1.55 (INDIA 202341054463 and INDIA 202441092693).
Claim Objections
Claim 20 is objected to because of the following informalities:
Claim 20 recites, “generating, by the AI model, based on a result” which should be “generating, by the AI model, content based on a result”, as Claim 20 goes on to recite on the last line, “transmitting the content to the user device”, wherein “content” was not previously recited.
Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-2, 5-17 and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Vasylyev (US PGPUB 2024/0412720).
With regard to Claim 1, Vasylyev teaches a system comprising:
a non-transitory memory ([0011] “a non-volatile system memory unit, where the processor is configured to execute instructions to...”); and
one or more hardware processors coupled with the non-transitory memory and configured to execute instructions from the non-transitory memory to cause the system to ([0011] “One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions... an artificial intelligence (AI) assistant system, which may include... a processor.”):
receive a query from a user device ([0209] “As the user provides their next request through speech or text input, assistant system 2 processes the request using the same multi-modal input techniques and transformer-based language model used for prediction and pre-generation.” [0645] “assistant system 2 could interact with... a voice-controlled smart home device to adjust the thermostat or turn on the lights.”);
provide the query to a multi-tiered cache system associated with an artificial intelligence (AI) model, wherein the multi-tiered cache system comprises a plurality of cache modules that that stores pre-generated responses from the Al model (See Fig. 1 showing “Audio Memory Unit 114”, “Contextual Memory Unit 116”, “System Memory Unit 118” and “RAM 124,” i.e. the “cache modules”. [0209] “The pre-generated responses are then stored in a high-speed cache memory (e.g., RAM 124), along with their corresponding predicted request embeddings and conversation context embeddings.” [0209] “The system generates an embedding of the actual user request and compares it with the predicted request embeddings stored in the cache memory using a similarity metric.” [0564] “the memory structure of assistant system 2 may be composed of multiple layers each designed to store and process information at different time frames, contextual background or levels of abstraction. This multilayer memory structure may be configured as a hierarchical memory structure designed to efficiently manage a vast amount of data, segregating it based on relevance and complexity. The hierarchy may be based on a division between short-term and long-term memory, for example.”);
determine that the query does not result in a first cache hit associated with a first cache module of the plurality of cache modules ([0210] “In cases where the actual user request does not match any of the predicted requests with high similarity, assistant system 2 may default to its standard response generation process,” wherein “RAM 124” shown in Fig. 1 is the “first cache module”.);
generate embeddings based on the query ([0206] “As the conversation progresses, assistant system 2 employs its transformer-based language model to generate real-time embeddings of the conversation context. These embeddings may be high-dimensional vector representations that capture the semantic meaning and relationships between the words and phrases in the context.” [0209] “The system generates an embedding of the actual user request.”);
determine whether the query results in a second cache hit associated with a second cache module of the plurality of cache modules based on the embeddings ([0207] “The system feeds the predicted request embeddings, along with the conversation context embeddings, into a decoder network that generates a natural language response.” [0258] “embeddings can be considered as numerical representations of concepts that assistant system 2 can use to understand relationships between these concepts, as well as personalize, recommend, and search content in contextual memory unit 116,” wherein “Contextual Memory Unit 116” is the “second cache module”. [0563] “When the user command refers to a topic that was discussed 15 minutes ago, assistant system 2 may first search for the relevant context in the 5-minute shorter-term context memory and, if no such context is found there, it may further search for the relevant context in the 30-minute longer-term context memory.”);
obtain a response from one of the second cache module or the Al model ([0210] “The system generates a new response based on the actual request and the conversation context using the transformer-based language model and decoder network, without relying on the pre-generated responses in the cache memory.”); and
update the first cache module based on the response ([0304] “RAM 124 may be further used to store the active context (like the transcript of the current conversation) ... and temporary data for processing inputs and generating outputs.” [0328] “the context memory used by the LLM may be configured to maintain a context representation that captures the relevant information from previous user inputs and system responses, the context representation may be updated dynamically.” [0565] “it is preferred that it is continuously updated with new information as the conversation is ongoing or the environment is changing.”).
With regard to Claim 2, Vasylyev teaches the system of claim 1, wherein executing the instructions further causes the system to:
transmit the response to the user device ([0506] “the swarm of AI assistants may be distributed across multiple physical devices, creating a decentralized network of intelligent agents that collaborate to support the elderly user.” [0508] “The physical devices in the distributed swarm may further include smart home devices or a network of connected devices, such as smart speakers, cameras, thermostats, and appliances, distributed throughout the user's home... The physical devices in the swarm may further include tablets, smartphones, and other similar personal devices used by the user and their family members, e.g., hosting the social interaction agent and providing interfaces for communication.” [0633] “The generated response is then processed by the decoding and detokenization modules within the NLP unit 212 to convert the numerical representation back into human-readable text. The speech synthesis module receives this text and generates the corresponding speech output, which is played through speaker 142.”).
With regard to Claim 5, Vasylyev teaches the system of claim 1, wherein the second cache module comprises a plurality of records, wherein each record of the plurality of records is associated with an expiration time ([0342] “system memory unit 118... configured to store... profiles... contact lists, multimedia files... and any other data that enhances the personalization and functionality of the AI assistant system,” wherein these types of data are equivalent to “records”. [0375] “Information management system 30 includes a data indexing and retrieval module 402 that is designed to organize the stored information in system memory unit 118.” [0376] “Information management system 30 further incorporates a data retention policy engine 404. This engine allows system administrators or users to define rules and policies governing the retention and expiration of stored information. These policies can be based on various factors that include but are not limited to the age of the information... For instance, an administrator might define a policy that automatically deletes any user-provided information that hasn't been accessed or referenced by the system for a period of 90 days.”).
With regard to Claim 6, Vasylyev teaches the system of claim 5, wherein the response is obtained from the second cache module based on the query resulting in the second cache hit associated with the second cache module ([0282] “the generated conversational response may be stored in contextual memory unit 116.” [0563] “When the user command refers to a topic that was discussed 15 minutes ago, assistant system 2 may first search for the relevant context in the 5-minute shorter-term context memory and, if no such context is found there, it may further search for the relevant context in the 30-minute longer-term context memory.”), and
wherein executing the instructions further causes the system to:
update a first expiration time associated with a first record in the second cache module that stores the response based on the second cache hit ([0328] “the context memory used by the LLM may be configured to maintain a context representation that captures the relevant information from previous user inputs and system responses, the context representation may be updated dynamically.” [0376] “an administrator might define a policy that automatically deletes any user-provided information that hasn't been accessed or referenced by the system for a period of 90 days,” wherein the “first expiration time” is necessarily updated when the associated data is accessed, i.e. the 90-day expiration time in Vasylyev is effectively reset.).
With regard to Claim 7, Vasylyev teaches the system of claim 5, wherein executing the instructions further causes the system to:
determine that a second record in the second cache module has expired based on a second expiration time associated with the second record; and remove the second record from the second cache module ([0377] “Data retention policy engine 404 may continuously scan the information stored in system memory unit 118 and apply the defined retention policies. When a piece of information meets the criteria for deletion, the engine securely erases it from the memory unit and updates the associated indexes in the data indexing and retrieval module.”).
With regard to Claim 8, this claim is equivalent in scope to Claim 1 rejected above, merely having a different independent claim type, and as such Claim 8 is rejected under the same grounds and for the same reasons as discussed above with regard to Claim 1.
With further regard to Claim 8, the claim recites additional elements not specifically addressed in the rejection of Claim 1, i.e. wherein the received “query” recited in Claim 1 is instead a received “utterance” in Claim 8. The Vasylyev reference also anticipates these additional elements of Claim 8, for example, Vasylyev teaches:
receiving an utterance from a user device ([0041] “According to one embodiment, the AI assistant system may be configured to learn and adapt to user-specific audio commands, which may not necessarily be legible words but rather unique spoken utterances serving as audio-based shortcuts for conveying instructions. The voice recognition unit of the AI assistant system may be specifically trained on these user-specific utterances, allowing it to accurately interpret and respond to the user's personalized audio commands.” [0130] “Accordingly, when a user communicates with assistant system 2 or when users communicate with one another using vocal utterances, these sounds can be captured by microphone 102 or similar input device.” [0194] “When the user issues a query or command, assistant system 2 retrieves the relevant audio data and contextual information... The system then uses this information to generate an appropriate response.”).
With regard to Claim 9, Vasylyev teaches the method of claim 8, further comprising:
determining a second cache miss for the second cache module based on the embeddings ([0207] “The system feeds the predicted request embeddings, along with the conversation context embeddings, into a decoder network that generates a natural language response.” 0258] “embeddings can be considered as numerical representations of concepts that assistant system 2 can use to understand relationships between these concepts, as well as personalize, recommend, and search content in contextual memory unit 116,” wherein “Contextual Memory Unit 116” is the “second cache module”. [0563] “When the user command refers to a topic that was discussed 15 minutes ago, assistant system 2 may first search for the relevant context in the 5-minute shorter-term context memory and, if no such context is found there, it may further search for the relevant context in the 30-minute longer-term context memory.”);
in response to determining the second cache miss, generating a prompt for the AI model, the prompt including the utterance; and providing the prompt to the AI model, wherein the AI model is configured to generate the response based on the prompt ([0582] “If the requested information is still not found, assistant system 2 may generate a synthesized response based on the interpretative inferencing capabilities of its integrated LLM.” [0210] “The system generates a new response based on the actual request and the conversation context using the transformer-based language model and decoder network, without relying on the pre-generated responses in the cache memory.”).
With regard to Claim 10, Vasylyev teaches the method of claim 8, further comprising:
determining that the utterance comprises data of a particular type ([0620] “When a user initiates a conversation with the AI assistant, assistant system 2 retrieves their privacy settings from the configuration database. These settings are used to configure the data processing and storage components dynamically. For example, if the user has opted for anonymization, the NLP pipeline will strip out any PII from the user's utterances before passing them to the dialog management system,” wherein “type” of data is Personally Identifiable Information (PII) data.); and
modifying the utterance based on removing the data from the utterance, wherein the response is generated based on the modified utterance ([0619] “The privacy settings may further include anonymization and pseudonymization where users can choose to have their conversation data anonymized or pseudonymized before being stored or processed. Anonymization can involve irreversibly removing all personally identifiable information (PII) from the data, while pseudonymization replaces PII with a pseudonym that can be reversed only with additional information stored separately. The user's preference may be applied by the data pre-processing pipeline before the conversation data is ingested into the system's storage.” [0636] “Furthermore, the ASR, NLP, and speech synthesis modules are designed to operate with minimal latency, leveraging hardware acceleration and parallel processing capabilities of processor 122. This enables the AI assistant system to rapidly convert speech to text, analyze the input in context, generate a response, and synthesize the corresponding speech output.”).
With regard to Claim 11, Vasylyev teaches the method of claim 10, further comprising:
modifying the response based on incorporating the data into the response; and providing the modified response to the user device ([0619] “The privacy settings may further include anonymization and pseudonymization where users can choose to have their conversation data anonymized or pseudonymized before being stored or processed. Anonymization can involve irreversibly removing all personally identifiable information (PII) from the data, while pseudonymization replaces PII with a pseudonym that can be reversed only with additional information stored separately. The user's preference may be applied by the data pre-processing pipeline before the conversation data is ingested into the system's storage,” wherein each response is necessarily modified since each response is specifically generated based on the information in the query/utterance, i.e. “incorporating the data into the response”. [0636] “Furthermore, the ASR, NLP, and speech synthesis modules are designed to operate with minimal latency, leveraging hardware acceleration and parallel processing capabilities of processor 122. This enables the AI assistant system to rapidly convert speech to text, analyze the input in context, generate a response, and synthesize the corresponding speech output.”).
With regard to Claim 12, Vasylyev teaches the method of claim 8, wherein the updating the first cache module comprises:
storing the response in the first cache module ([0282] “The tokenized representation of the generated response is then stored in contextual memory unit 116, along with the tokenized representation of the voice input and any other relevant contextual information. The tokenized representations in contextual memory unit 116 can be used by the transformer-based language model for subsequent processing, such as generating follow-up responses or answering user queries that rely on the conversation history.”).
With regard to Claims 13-14, these claims are equivalent in scope to Claims 6-7 rejected above, merely having a different independent claim type, and as such Claims 13-14 are respectively rejected under the same grounds and for the same reasons as discussed above with regard to Claims 6-7.
With regard to Claims 15-16, these claims are equivalent in scope to Claims 1-2 rejected above, merely having a different independent claim type, and as such Claims 15-16 are respectively rejected under the same grounds and for the same reasons as discussed above with regard to Claims 1-2.
With further regard to Claim 15, the claim recites additional elements not specifically addressed in the rejection of Claim 1, i.e. wherein the received “query” recited in Claim 1 is instead a received “utterance” in Claim 15. The Vasylyev reference also anticipates these additional elements of Claim 15, for example, Vasylyev teaches:
receiving an utterance from a user device ([0041] “According to one embodiment, the AI assistant system may be configured to learn and adapt to user-specific audio commands, which may not necessarily be legible words but rather unique spoken utterances serving as audio-based shortcuts for conveying instructions. The voice recognition unit of the AI assistant system may be specifically trained on these user-specific utterances, allowing it to accurately interpret and respond to the user's personalized audio commands.” [0130] “Accordingly, when a user communicates with assistant system 2 or when users communicate with one another using vocal utterances, these sounds can be captured by microphone 102 or similar input device.” [0194] “When the user issues a query or command, assistant system 2 retrieves the relevant audio data and contextual information... The system then uses this information to generate an appropriate response.”).
With regard to Claim 17, Vasylyev teaches the non-transitory machine-readable medium of claim 16, wherein the updating the first cache module comprises:
storing the response in the first cache module ([0304] “RAM 124 may be further used to store the active context (like the transcript of the current conversation) ... and temporary data for processing inputs and generating outputs.” [0328] “the context memory used by the LLM may be configured to maintain a context representation that captures the relevant information from previous user inputs and system responses, the context representation may be updated dynamically,” wherein the “system responses” comprise “the response”. [0565] “it is preferred that it is continuously updated with new information as the conversation is ongoing or the environment is changing.”).
With regard to Claim 20, Vasylyev teaches the non-transitory machine-readable medium of claim 15, wherein the operations further comprise:
selecting, from a plurality of computer modules, a particular computer module for performing a transaction based on the utterance ([0051] “Assistant is designed to seamlessly integrate with a wide range of third-party services and APIs, enabling it to extend its capabilities and provide a more comprehensive and efficient user experience. This integration allows the Assistant to access and leverage external data sources, functionalities, and services to better understand and fulfill user requests,” wherein the “APIs” are the “computer modules”. [0407] “. Assistant system 2 continuously monitors the conversation and user activity to identify potential food ordering intent. This can also be triggered by explicit user commands, such as ‘Assistant, I want to order food’.”);
generating, by the AI model, instructions that cause the particular computer module to perform the transaction for a user of the user device ([0410] “In an exemplary scenario, let's consider that the user wants to order food from a popular restaurant chain called ‘FoodOrderingHub’ while driving.” [0412] “It then sends a request to FoodOrderingHub's API endpoint (e.g., https://api.foodorderinghub.com/restaurants) with the user's location to retrieve a list of nearby FoodOrderingHub restaurants... The API request is made using libraries like Python's requests or JavaScript's axios.”);
generating, by the AI model, based on a result from the particular computer module performing the transaction ([0419] “FoodOrderingHub's API processes the payment using the user's selected payment method (e.g., saved credit card or mobile wallet) and returns an order confirmation response, including an order ID and estimated delivery time.”); and
transmitting the content to the user device ([0420] “Assistant system 2 communicates the order confirmation to the user through voice output, saying something like, ‘Your FoodOrderingHub order has been successfully placed. The estimated delivery time is 30 minutes, and your order ID is #ABC123.’”).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3-4 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Vasylyev as applied to Claims 1 and 15 above, and further in view of Larchev et al. (US PGPUB 2020/0019632).
With regard to Claim 3, Vasylyev further teaches wherein executing the instructions further causes the system to:
retrieve a set of documents from a data storage based on the query ([0050] “Useful examples of such data include but are not limited to... various documents stored locally or on a cloud.” [0146] “when given a prompt or a question, step 868 may include an information retrieval system to search for relevant documents or passages from the external knowledge source. This retrieval process may be configured to find the most pertinent information that can help in generating a contextually appropriate response. In this step, the AI Assistant may first analyze the command and the stored conversation context to identify key information needs. Based on this analysis, the AI Assistant may formulate a search query to retrieve relevant information from an external knowledge base. Useful examples of the knowledge base include but are not limited to structured databases, web resources, and document collections.”).
With further regard to claim 3, Vasylyev does not teach the generating of a hash based on documents as described in claim 3. Larchev teaches wherein executing the instructions further causes the system to:
generate a hash value based on the set of documents, wherein determining that the query does not result in the first cache hit associated with the first cache module is further based on the hash value ([0003] “An example method of recommending similar searches in an electronic document search engine may include receiving a current search query from a user, the current search query intended for the search engine, converting the current search query into one or more word vectors, converting the one or more word vectors into a document vector with a machine learning model, the machine learning model trained on a set of pairs, each pair comprising (i) a respective prior search query, each comprising one or more word vectors and (ii) a composite vector describing a respective document that is searchable by the search engine and is responsive to the respective prior search query, applying a locality-sensitive hashing algorithm to the document vector to determine one or more of the composite vectors that are closest to the document vector.” [0010] “executing a search with the search engine on the user-selected prior search query, and returning a set of documents to the user that are responsive to the user-selected prior search query.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have modified the system as disclosed by Vasylyev with the generating of a hash based on documents as taught by Larchev in order “to improve the quality of the search query, and thereby improve the quality of the search results” (Larchev [0027]).
With regard to Claim 4, Vasylyev in view of Larchev teaches all the limitations of Claim 3 as described above. Larcevh further teaches wherein determining whether the query results in the second cache hit associated with the second cache module is further based on the hash value ([0011] “recommending to the user, responsive to the current search query, the prior search queries to which the closest composite vectors are responsive.” [0059] “The method 40 may further include a step 52 that includes executing the user's selected search query with a search engine. For example, the system or computing device providing the search interface (e.g., the server 18) may transmit the user's selected search query to a search engine (e.g., the search engine 12), may cause the search engine to perform the search according to the search query, and may receive results back from the search engine.”).
With regard to Claims 18-19, these claims are equivalent in scope to Claims 3-4 rejected above, merely having a different independent claim type, and as such Claims 18-19 are respectively rejected under the same grounds and for the same reasons as discussed above with regard to Claims 3-4.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure is as follows:
Taylert et al. (US Patent 11,960,514) discloses a method and system for providing an interactive conversation tool that enables a service provider and its customers to generate more accurate and relevant answers to outside inquiries during such conversations, including the use of semantic search and generative-Artificial Intelligence (AI).
Adenekan (“Optimizing LLM Latency and Throughput for Interactive Web Interfaces,” February 2023) discusses strategies and techniques for optimizing LLM performance, including distributed computing, model pruning, caching, query preprocessing, hybrid architectures, and pre-generated resposnes.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS J SIMONETTI whose telephone number is (571)270-7702. The examiner can normally be reached Monday-Thursday 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Arpan Savla can be reached at (571) 272-1077. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NICHOLAS J SIMONETTI/Primary Examiner, Art Unit 2137 January 9, 2026