Last updated: April 19, 2026
Application No. 18/181,103
GENERALIZABLE INSTRUCTION FOLLOWING WITH PRE-TRAINED LANGUAGE AND GROUNDING MODELS

Non-Final OA §101§103
Filed
Mar 09, 2023
Examiner
CHAVEZ, RODRIGO A
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Honda Motor Co. Ltd.
OA Round
1 (Non-Final)
This examiner grants 50% of cases after interview

— +37.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 228 resolved cases, 2023–2026
Examiner Intelligence

CHAVEZ, RODRIGO A View full profile →
Grants 50% of resolved cases
Career Allow Rate
115 granted / 228 resolved
-11.6% vs TC avg
Strong +37% interview lift
Without
With
+37.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
22 currently pending
Career history
250
Total Applications
across all art units
Statute-Specific Performance

§101
16.4%
-23.6% vs TC avg
§103
53.1%
+13.1% vs TC avg
§102
20.9%
-19.1% vs TC avg
§112
5.6%
-34.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 228 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-3, 5-10, 12-17, 19 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
The Supreme Court has long held that “[l]aws of nature, natural phenomena, and abstract ideas are not patentable.” Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354 (2014) (quoting Assoc. for Molecular Pathology v. Myriad Genetics, Inc., 133 S. Ct. 2107, 2116 (2013) (internal quotation marks omitted)). The “abstract ideas” category embodies the longstanding rule that an idea, by itself, is not patentable. Alice Corp., 134S. Ct. at 2355 (quoting Gottschalk v. Benson, 409 U.S. 63, 67 (1972).
In Alice, the Supreme Court sets forth an analytical “framework for distinguishing patents that claim laws of nature, natural phenomena, and abstract ideas [or mental processes ] from those that claim patent-eligible applications of those concepts.”  Id. at 2355 (citing Mayo Collaborative Servs. v. Prometheus Labs., Inc., 132 S. Ct. 1289, 1296–97 (2012)).  The first step in the analysis is to “determine whether the claims at issue are directed to one of those patent-ineligible concepts.”  Id.  If the claims are directed to a patent-ineligible concept, the second step in the analysis is to consider the elements of the claims “individually and ‘as an ordered combination’” to determine whether there are additional elements that “‘transform the nature of the claim’ into a patent-eligible application.”  Id. (quoting Mayo, 132 S. Ct. at 1298, 1297).  In other words, the second step is to “search for an ‘inventive concept’—i.e., an element or combination of elements that is ‘sufficient to ensure that the patent in practice amounts to significantly more than a patent upon the [ineligible concept] itself’”.  Id. (brackets in original) (quoting Mayo, 132 S. Ct. at 1294).  The prohibition against patenting an abstract idea “‘cannot be circumvented by attempting to limit the use of the formula to a particular technological environment’ or adding ‘insignificant post-solution activity.’”  Bilski v. Kappos, 561 U.S. 593, 610–11 (2010) (citation omitted).

Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. Independent Claim 1 recites a system, including a processor and a memory, of generating a plurality of abstract tuples based on a set of instructions, classifying the abstract tuples, form a set of executable interaction abstract tuples detect an object location for at least one object that is included in the abstract tuple and generate an executable plan for an agent based on the set of executable interactions and the object location. A system (apparatus) is a statutory category of invention. Independent Claim 8 recites a method with steps similar to Claim 1, and thus is a process (a series of steps or acts). A process is a statutory category of invention. Independent claim 15 recites a non-transitory computer readable storage medium, and thus is also statutory. Dependent claims 2-7, 9-14 and 16-20 are dependent on claims 1, 8 and 15, respectively, and therefore recite their respective statutory classes.
	Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. In applying the framework set out in Alice, examiner found Applicant’s claims 1, 8 and 15 are directed to a patent-ineligible abstract concept of generating an executable plan for responding to a user’s request.  The steps of Applicant’s claims 1, 8, and 15 are an abstract concept that would fall under the judicial exception of steps that can be performed in the human mind or by a human with pen and paper. Specifically, the claims recite the step of “generating a plurality of abstract tuples based on a set of instructions, wherein an abstract tuple of the plurality of abstract tuples includes at least one action and at least one object.” The claim recites a set of instructions, which may be from a user input, which may be received in any way such as by speech or text. The generated abstract tuples are no more than packages of data that are collected and organized in a specific way, based on relatedness to each other, with each tuple having their own unique grouping identifier, and wherein the abstractness comes from the items comprised in the tuple not being fixed, but being subject to change in value, type, size, etc. Thus, the generating of the abstract tuples is directed to no more than methods of organizing data. Additionally, the identification of an action and at least one object, that are comprised in the abstract tuple, is a process that can be performed in the human mind by analyzing the received set of instructions. Therefore, this step is directed to ways of organizing data which falls under a process performed in the human mind. Furthermore, the step of “classifying each abstract tuple of the plurality of abstract tuples as a navigation abstract tuple, an executable interaction abstract tuple, or a complex interaction abstract tuple based on the at least one action, wherein the executable interaction abstract tuple includes a single executable interaction and the complex interaction abstract tuple includes multiple executable interactions” recites steps that are directed to processes performed in the human mind. This step recites further method of organizing data by classifying each abstract tuple into three different classification categories. This method of organizing data is also a process that can be performed in the human mind. Further, the claim recites “discarding the abstract tuples classified as navigation abstract tuples”. Under broadest reasonable interpretation, “discarding” a piece of data may simply involve classifying the data as unusable, or simply just ignoring it all together. Therefore, the above step is also directed to a process that can be performed in the human mind. Further, the step of “forming a set of executable interactions including the executable interaction abstract tuple” is recited. This step simply involves analyzing the data and further grouping the data that is deemed to be “executable interaction abstract tuple.” Thus, this step can also be performed in the human mind. Further, the claim recites the step of “detecting an object location for the at least one object based on a knowledge base.” The claim places no limits on how the object location is obtained, therefore, the step may involve a human looking at an object and noting the location of the object in space. Thus, the step also falls under processes performed in the human mind. Further, the claim recites “decoding the complex interaction abstract tuples into executable interaction abstract tuples.” The recited step does not place any limits on how the decoding is performed, therefore, under broadest reasonable interpretation, the step may simply involve decoding any coded data in the human mind. Further, the step of “adding the decoded executable interaction abstract tuples to the set of executable interactions” is directed to further organizing the decoded data. And finally, the step of “generating an executable plan for an agent based on the set of executable interactions and the object location” recites a step that can be performed in the human mind because a human is able to take all the extracted and organized data and formulate a plan with that data. Therefore, the claimed steps are directed to processes that can be performed in the human mind.

Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d). 
As discussed above, the claims recite “generating an executable plan…”. The examiner has found, however, that the generating step provides no further detail and is recited at such a high-level of generality that this limitation is merely a post-solution step. Therefore, this step is an insignificant extra-solution activity and does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). 	Independent Claim 1 recites “a processor; and a memory storing instructions that when executed by the processor cause the processor to:” as additional elements beyond the judicial exception. However, these additional elements do not amount to significantly more than the abstract idea because the additional elements constitute a generic computer environment. Alice, 134 S. Ct. at 2357. The Claims need meaningful limitations that go beyond generally linking the use of an abstract idea to a particular technological environment. Therefore, the steps are all abstract and the Claim as a whole is abstract. “[S]imply appending generic computer functionality to lend speed or efficiency to the performance of an otherwise abstract concept does not meaningfully limit claim scope for purposes of patent eligibility.” CLS Bank, 2013 U.S. App. LEXIS 9493, at *29 (citing Bancorp, 687 F.3d at 1278, and Dealertrack, Inc. v. Huber, 674 F.3d 1315, 1333-34 (Fed. Cir. 2012) (finding that the claimed computer-aided clearinghouse process is a patent-ineligible abstract idea)); SiRF Tech., Inc. v. Int'l Trade Comm'n, 601 F.3d 1319, 1333 (Fed. Cir. 2010) (“In order for the addition of a machine to impose a meaningful limit on the scope of a claim, it must play a significant part in permitting the claimed method to be performed, rather than function solely as an obvious mechanism for permitting a solution to be achieved more quickly, i.e., through the utilization of a computer for performing calculations.”).	Additionally, dependent claims 2, 3, 5-7, 9, 10, 12-14, 16, 17, 19 and 20 do not provide any additional elements that integrate the judicial exception into a practical application. The claims simply further describe how the data is manipulated to detect object locations and to identify the set of instructions. The broadest reasonable interpretation provides that using a semantic map based on a depth prediction model to identify objects may simply be a human using a written map of objects and object names to aid in identifying the objects.

Step 2B:  This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05.
At step 2A, prong two, the additional elements of generating the executable plan and the “processor” and “memory” were found to be insignificant extra-solution activity and a generic computer environment. At Step 2B, the re-evaluation of the insignificant extra-solution activity consideration takes into account whether or not the extra-solution activity is well understood, routine, and conventional in the field. See MPEP 2106.05(g). Here, the step of generating an executable plan is mere data organization that is recited at a high level of generality. Therefore, this limitation remains insignificant extra-solution activity even upon reconsideration and does not amount to significantly more. Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, and therefore do not provide an inventive concept. 	Additionally, dependent claims 2, 3, 5-7, 9, 10, 12-14, 16, 17, 19 and 20 do not add an inventive concept.

In conclusion, Examiner notes that none of recited steps in Applicant's claims 1-3, 5-10, 12-17, 19 and 20 refer to a specific machine by reciting structural limitations of any apparatus or to any specific operations that would cause a machine to be the mechanism to perform these steps.  Although the claims may be processed by a computing system having a processor, the computing system is merely a general purpose computing system. Therefore, all of the claims 1-3, 5-10, 12-17, 19 and 20 are abstract.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen (US PG Pub 20220129556) in view of Ng-Thow-Hing (US PG Pub 20140365228; hereinafter “Ng”)
	As per claims 1, 8 and 15, Chen discloses:
A system, method and non-transitory computer readable storage medium storing instructions that when executed by a computer, which includes a processor (Chen; Fig. 44, item 4402; p. 0417 - a processor 4402); and 	a memory storing instructions that when executed by the processor (Chen; Fig 40, item 4404; p. 0418 - processor 4402 includes hardware for executing instructions, such as those making up a computer program. As an example and not by way of limitation, to execute instructions, processor 4402 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 4404, or storage 4406) cause the processor to:	generate a plurality of abstract tuples based on a set of instructions (Chen; Fig. 5; p. 0155 – NUANCED Dataset (abstract tuples)), wherein an abstract tuple of the plurality of abstract tuples includes at least one action and at least one object (Chen; Fig. 5; p. 0109 – intent classification and slot tagging; see also p. 0123 - the dialog state tracker 218 may communicate with the action selector 222 about the dialog intents and associated content objects. In particular embodiments, the action selector 222 may rank different dialog hypotheses for different dialog intents; p. 0129-0130 - …the dialog manager 216 may further support requesting missing slots in a nested intent and multi-intent user inputs (e.g., “take this photo and send it to Dad”)… (set of instructions); see also p. 0140 - the entity resolution module 212 may provide one or more of the intents, slots, entities, events, context, or user memory to the dialog state tracker 218. The dialog state tracker 218 may identify a set of state candidates for a task accordingly, conduct interaction with the user to collect necessary information to fill the state, and call the action selector 222 to fulfill the task); 	classify each abstract tuple of the plurality of abstract tuples as a navigation abstract tuple (Chen; p. 0128 - As an example and not by way of limitation, the user input may comprise “direct me to my next meeting.” The assistant system 140 may use a calendar agent to retrieve the location of the next meeting. The assistant system 140 may then use a navigation agent to direct the user to the next meeting), an executable interaction abstract tuple or a complex interaction abstract tuple based on the at least one action, wherein the executable interaction abstract tuple includes a single executable interaction and the complex interaction abstract tuple includes multiple executable interactions p. 0129-0130 - In particular embodiments, the dialog manager 216 may support multi-turn compositional resolution of slot mentions. For a compositional parse from the NLU module 210, the resolver may recursively resolve the nested slots. The dialog manager 216 may additionally support disambiguation for the nested slots. As an example and not by way of limitation, the user input may be “remind me to call Alex” (executable interaction abstract tuple). The resolver may need to know which Alex to call before creating an actionable reminder to-do entity. The resolver may halt the resolution and set the resolution state when further user clarification is necessary for a particular slot. The general policy 362 may examine the resolution state and create corresponding dialog action for user clarification. In dialog state tracker 218, based on the user input and the last dialog action, the dialog manager 216 may update the nested slot. This capability may allow the assistant system 140 to interact with the user not only to collect missing slot values but also to reduce ambiguity of more complex/ambiguous utterances to complete the task. In particular embodiments, the dialog manager 216 may further support requesting missing slots in a nested intent and multi-intent user inputs (e.g., “take this photo and send it to Dad”) (complex interaction abstract tuple). In particular embodiments, the dialog manager 216 may support machine-learning models for more robust dialog experience. As an example and not by way of limitation, the dialog state tracker 218 may use neural network based models (or any other suitable machine-learning models) to model belief over task hypotheses. As another example and not by way of limitation, for action selector 222, highest priority policy units may comprise white-list/black-list overrides, which may have to occur by design; middle priority units may comprise machine-learning models designed for action selection; and lower priority units may comprise rule-based fallbacks when the machine-learning models elect not to handle a situation. In particular embodiments, machine-learning model based general policy unit may help the assistant system 140 reduce redundant disambiguation or confirmation steps, thereby reducing the number of turns to execute the user input); 	form a set of executable interactions including the executable interaction abstract tuple (Chen; p. 0129 - …The dialog state tracker 218 may identify a set of state candidates for a task accordingly, conduct interaction with the user to collect necessary information to fill the state, and call the action selector 222 to fulfill the task); 	detect an object location for the at least one object based on a knowledge base (Chen; p. 0128 - As an example and not by way of limitation, the user input may comprise “direct me to my next meeting.” The assistant system 140 may use a calendar agent to retrieve the location (from knowledge base) of the next meeting (object). The assistant system 140 may then use a navigation agent to direct the user to the next meeting; see also p. 0138 - The location updates may be consumed by the dialog manager 216 to support various proactive/reactive scenarios. The visual events may be based on person or object appearing in the user's field of view. These events may be consumed by the dialog manager 216 and recorded in transient user state to support visual co-reference (e.g., resolving “that” in “how much is that shirt?” and resolving “him” in “send him my contact”)… Note that the cited paragraphs correspond to different aspects of the prior art with respect to “location” resolution, however, both of these aspects teach on the broad language of the claimed subject matter); 	decode the complex interaction abstract tuples into executable interaction abstract tuples (Chen; p. 0117 - When multiple high scoring candidates are present, the entity resolution module 212 may perform user-facilitated disambiguation (decoding complex interactions) (e.g., getting real-time user feedback from users on these candidates); see also p. 0129-0130 - In particular embodiments, the dialog manager 216 may support multi-turn compositional resolution of slot mentions. For a compositional parse from the NLU module 210, the resolver may recursively resolve the nested slots. The dialog manager 216 may additionally support disambiguation for the nested slots. As an example and not by way of limitation, the user input may be “remind me to call Alex”. The resolver may need to know which Alex to call before creating an actionable reminder to-do entity. The resolver may halt the resolution and set the resolution state when further user clarification is necessary for a particular slot. The general policy 362 may examine the resolution state and create corresponding dialog action for user clarification. In dialog state tracker 218, based on the user input and the last dialog action, the dialog manager 216 may update the nested slot. This capability may allow the assistant system 140 to interact with the user not only to collect missing slot values but also to reduce ambiguity of more complex/ambiguous utterances to complete the task. In particular embodiments, the dialog manager 216 may further support requesting missing slots in a nested intent and multi-intent user inputs (e.g., “take this photo and send it to Dad”). In particular embodiments, the dialog manager 216 may support machine-learning models for more robust dialog experience. As an example and not by way of limitation, the dialog state tracker 218 may use neural network based models (or any other suitable machine-learning models) to model belief over task hypotheses. As another example and not by way of limitation, for action selector 222, highest priority policy units may comprise white-list/black-list overrides, which may have to occur by design; middle priority units may comprise machine-learning models designed for action selection; and lower priority units may comprise rule-based fallbacks when the machine-learning models elect not to handle a situation. In particular embodiments, machine-learning model based general policy unit may help the assistant system 140 reduce redundant disambiguation or confirmation steps, thereby reducing the number of turns to execute the user input); 	add the decoded executable interaction abstract tuples to the set of executable interactions (Chen; p. 0130 - the determined actions by the action selector 222 may be sent to the delivery system 230. The delivery system 230 may comprise a CU composer 370, a response generation component 380, a dialog state writing component 382, and a text-to-speech (TTS) component 390. Specifically, the output of the action selector 222 may be received at the CU composer 370. In particular embodiments, the output from the action selector 222 may be formulated as a <k,c,u,d> tuple, in which k indicates a knowledge source, c indicates a communicative goal, u indicates a user model, and d indicates a discourse model); and 	generate an executable plan for an agent based on the set of executable interactions and the object location (Chen; 0095 - the action selector 222a/b may perform interaction management. The action selector 222a/b may determine and trigger a set of general executable actions. The actions may be executed either on the client system 130 or at the remote server. As an example and not by way of limitation, these actions may include providing information or suggestions to the user. In particular embodiments, the actions may interact with agents 228a/b, users, and/or the assistant system 140 itself. These actions may comprise actions including one or more of a slot request, a confirmation, a disambiguation, or an agent execution. The actions may be independent of the underlying implementation of the action selector 222a/b. For more complicated scenarios such as, for example, multi-turn tasks or tasks with complex business logic, the local action selector 222a may call one or more local agents 228a, and the remote action selector 222b may call one or more remote agents 228b to execute the actions. Agents 228a/b may be invoked via task ID, and any actions may be routed to the correct agent 228a/b using that task ID. In particular embodiments, an agent 228a/b may be configured to serve as a broker across a plurality of content providers for one domain. A content provider may be an entity responsible for carrying out an action associated with an intent or completing a task associated with the intent. In particular embodiments, agents 228a/b may provide several functionalities for the assistant system 140 including, for example, native template generation, task specific business logic, and querying external APIs. When executing actions for a task, agents 228a/b may use context from the dialog state tracker 218a/b, and may also update the dialog state tracker 218a/b. In particular embodiments, agents 228a/b may also generate partial payloads from a dialog act; see also p. 0129-0130).
	Chen, however, fails to disclose discard the abstract tuples classified as navigation abstract tuples.	Ng does teach discard the abstract tuples classified as navigation abstract tuples (Ng; p. 0100 - Various additional features for incorporation into the command interpreter will be apparent. For example, in some embodiments, the command interpreter may be able to differentiate between a driver and a passenger or between multiple passengers based on, for example, voice identification or locating the source of a voice input or gesture. Such information may be used, for example, to accept commands only from a driver or to only analyze input from a single user at a time (e.g., if passenger A says "turn over there," ignore any gestures or speech from passenger B). In some embodiments, the command interpreter may operate according to a mode where only a driver may issue commands and alternatively according to a mode where passengers may also issue commands. In some embodiments, the command interpreter may ignore specific speech patterns (such as relating to navigation). For example, the command interpreter may recognize some speech and gestures as originating from child passengers and consequently ignore any such commands. As another example, the command interpreter may associated different voices with permissions to operate the vehicle and ignore any commands that are issued by a passenger that is not permitted to issue such a command).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the system, method and non-transitory computer readable storage medium of Chen to include discarding the abstract tuples classified as navigation abstract tuples, as taught by Ng, in order to prevent theft, the command interpreter may ignore all commands or all commands to the autonomous control subsystem if the voice or image data does not correlate to any registered or otherwise known passengers (Ng; p. 0100).

	As per claims 2, 9 and 16, Chen in view of Ng disclose:	The system, method and non-transitory computer readable storage medium of claims 1, 8 and 15, wherein the instructions when executed by the processor further cause the processor to detect the object location includes generating candidate locations based on proposing semantically meaningful candidates for masked tokens (Chen; p. 0325-0329 - FIGS. 26A and 26B illustrate example processes for performing product identification with pixel-level segmentation masks. In particular, FIG. 26A illustrates an example object identification process which generates segmentation masks for objects depicted in images having corresponding text referencing an object within the image. Image 2610 may depict multiple objects including a motorcycle rider 2612 and a motorcycle 2614, and image 2610 may have a corresponding caption 2620 (“The person”) which is intended to be a reference to the rider 2612 shown in the image. An image encoder 2650 and language encoder 2630 may be respectively utilized to extract visual features 2655a and text features 2635 from the image 2610 and descriptive text 2620. The resulting visual features 2655a and text features 2635 may then be input to a decoder (not shown) in order to predict a segmentation mask 2660 for the referenced object 2612. This segmentation process is made possible due to the model having access to ground truth segmentation masks for at least a subset of the image dataset, which enables the model to determine a segmentation loss 2670a for the predicted segmentation mask 2660 against the ground truth segmentation mask (semantically meaningful candidates)). 
	As per claims 3, 10 and 17, Chen in view of Ng disclose:
	The system, method and non-transitory computer readable storage medium of claims 2, 9 and 16, wherein the masked tokens represent identified objects in a semantic map based on a depth prediction model generated from image data or depth data captured by the agent (Chen; p. 0325-0329 - In particular embodiments, the assistant system 140 may process an image using an unsupervised region proposal generator to identify proposed segmentation masks, apply an image encoder to the segmentation masks to determine region features associated with each segmentation mask, apply a language encoder to text associated with a referenced object within the image along with any relevant metadata to determine text features associated with the object, and identify the correct segmentation mask associated with the referenced object based at least in part on the determined text features and region features… the problems associated with bounding box techniques as described above may be addressed by utilizing models for generating segmentation masks (depth prediction model generated from image data or depth data) corresponding to objects depicted in the images; see also p. 0109 - An intent may be an element in a pre-defined taxonomy of semantic intentions (semantic map), which may indicate a purpose of a user interaction with the assistant system 140intent).  

	As per claims 4, 11, and 18, Chen in view of Ng discloses:	The system, method and non-transitory computer readable storage medium of claims 3, 10 and 17, wherein the semantic map is based on the depth prediction model and grounded language-image pre-training, wherein the grounded language-image pre-training converts object bounding boxes to natural language phrases (Chen; p. 0325-0329 - In particular embodiments, the assistant system 140 may process an image using an unsupervised region proposal generator to identify proposed segmentation masks, apply an image encoder to the segmentation masks to determine region features associated with each segmentation mask, apply a language encoder to text associated with a referenced object within the image along with any relevant metadata to determine text features associated with the object (semantic map), and identify the correct segmentation mask associated with the referenced object based at least in part on the determined text features and region features… the problems associated with bounding box techniques as described above may be addressed by utilizing models for generating segmentation masks (depth prediction model generated from image data or depth data) corresponding to objects depicted in the images). 

	As per claims 5, 12 and 19, Chen in view of Ng disclose:
	The system, method and non-transitory computer readable storage medium of claims 1, 8 and 15, wherein the instructions when executed by the processor further cause the processor to: receive a task from a biological entity (Chen; p. 0078 - if the user input comprises speech data (biological entity/human speech), the speech data may be received at a local automatic speech recognition (ASR) module 208a on the client system 130. The ASR module 208a may allow a user to dictate and have speech transcribed as written text, have a document synthesized as an audio stream, or issue commands that are recognized as such by the system); and identify the set of instructions based on the task (Chen; p. 0127 - the action selector 222 may take the dialog state update operators as part of the input to select the dialog action (identify the set of instructions based on the task). The execution of the dialog action may generate a set of expectations to instruct the dialog state tracker 218 to handle future turns. In particular embodiments, an expectation may be used to provide context to the dialog state tracker 218 when handling the user input from next turn…; see also p. 0125 – generating task policies; see also p. 0098 – action selector).  

	As per claims 6, 13 and 20, Chen in view of Ng disclose:
	The system, method and non-transitory computer readable storage medium of claims 1, 8 and 15, wherein the executable plan is executed by the agent based on a semantic map (Chen; p. 0109 - An intent may be an element in a pre-defined taxonomy of semantic intentions (semantic map), which may indicate a purpose of a user interaction with the assistant system 140).  

	As per claims 7 and 14, Chen in view of Ng disclose:	The system and method of claims 1 and 8, wherein the abstract tuples are based on semantic parsing (Chen; p. 0139 - the text transcription from the ASR module 208 may be sent to the NLU module 210. The NLU module 210 may process the text transcription and extract the user intention (i.e., intents) and parse the slots or parsing result based on the linguistic ontology. In particular embodiments, the intents and slots from the NLU module 210 and/or the events and contexts from the context engine 220 may be sent to the entity resolution module 212; see also p. 0377 - In particular embodiments, the assistant system 140 may implement methods for real-time automatic speech recognition parsing to identify partial intents as a user is speaking and render partial inputs responsive to the identified partial intents).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:
Caba Heilbron (US PG Pub 20230325685) discloses a model training system is described that obtains a training dataset including videos and text labels. The model training system generates a video-text classification model by causing a model having a dual image text encoder architecture to predict which of the text labels describes each video in the training dataset. Predictions output by the model are compared to the training dataset to determine distillation and contrastive losses, which are used to adjust internal weights of the model during training. The internal weights of the model are then combined with internal weights of a trained image-text classification model to generate the video-text classification model. The video text-classification model is configured to generate a video or text output that classifies a video or text input (Caba Heilbron; Abstract).	Vu (US PG Pub 20230161963) discloses a computing device may receive, at a data processing system, a set of utterances for training or inferencing with a named entity recognizer to assign a label to each token piece from the set of utterances. The computing device may determine a length of each utterance in the set and when the length of the utterance exceeds a pre-determined threshold of token pieces: dividing the utterance into a plurality of overlapping chunks of token pieces; assigning a label together with a confidence score for each token piece in a chunk; determining a final label and an associated confidence score for each chunk of token pieces by merging two confidence scores; determining a final annotated label for the utterance based at least on the merging the two confidence scores; and storing the final annotated label in a memory (Vu; Abstract).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Mar 09, 2023
Application Filed
Jan 02, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/175,355
Patent 12597430
MULTI-CHANNEL SIGNAL GENERATOR, AUDIO ENCODER AND RELATED METHODS RELYING ON A MIXING NOISE SIGNAL
2y 5m to grant Granted Apr 07, 2026
17/579,750
Patent 12579984
DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS
2y 5m to grant Granted Mar 17, 2026
17/513,419
Patent 12541653
ENTERPRISE COGNITIVE SOLUTIONS LOCK-IN AVOIDANCE
2y 5m to grant Granted Feb 03, 2026
17/532,315
Patent 12542136
DYNAMICALLY CONFIGURING A WARM WORD BUTTON WITH ASSISTANT COMMANDS
2y 5m to grant Granted Feb 03, 2026
17/450,015
Patent 12531077
METHOD AND APPARATUS IN AUDIO PROCESSING
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
50%
Grant Probability
88%
With Interview (+37.3%)
3y 5m
Median Time to Grant
Low
PTA Risk
Based on 228 resolved cases by this examiner. Grant probability derived from career allow rate.