Last updated: May 29, 2026
Application No. 18/612,174
ANALYZING WEB PAGES TO FACILITATE AUTOMATIC NAVIGATION

Final Rejection §103
Filed
Mar 21, 2024
Priority
Sep 27, 2018 — provisional 62/737,822 +3 more
Examiner
HICKS, SHIRLEY D.
Art Unit
2168
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
4 (Final)
Interview Optional

— +54.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 63% grant rate with +54.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 109 resolved cases, 2023–2026
Examiner Intelligence

HICKS, SHIRLEY D. View full profile →
Grants 63% of resolved cases
Career Allowance Rate
69 granted / 109 resolved
+8.3% vs TC avg
Strong +54% interview lift
Without
With
+54.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
25 currently pending
Career history
145
Total Applications
across all art units
Statute-Specific Performance

§103
74.3%
+34.3% vs TC avg
§102
25.5%
-14.5% vs TC avg
§112
0.2%
-39.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 109 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/19/2025 has been entered.
Accordingly, claims 1-7 and 9-20 are pending in this application. Claims 1, 9, and 17 are currently amended.

Response to Arguments
Applicant’s arguments with respect to amended pending claims filed on 11/19/2025 have been fully considered. In view of the claim amendment filed, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made. 
Further, regarding the new limitations recited in claims 1, 9, and 17, it is submitted that they are properly addressed by the new ground of rejection.
Furthermore, it is also submitted that all limitations in pending claims, including those not specifically argued, are properly addressed. The reason is set forth in the rejections. See claim analysis below for detail.
Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7 and 9-20 are rejected under 35 U.S.C. 103 as being unpatentable over Challa et al.  (US Patent Number 10,978,056 B1) in view of Gruber (US 20180350353 A1).

Regarding Claim 1, Challa discloses a method implemented using one or more processors ([Col. 4, lines 7-10]: Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, Fig. 10; [Col. 3, line 14]: computer system 1000 includes a processor 1002), comprising: 
receiving a natural language input from a user at an input component of a computing device ([Col. 39, lines 55-60]: Fig. 6; The method may begin at step 610, where the assistant system 140 may receive, from a client system 130 associated with a user, a user input), 
wherein the natural language input is directed by the user to an automated assistant that operates at least in part on the computing device (Fig. 1, assistant system 140; [Col. 2, lines 4-6]: The assistant system may enable the user to interact with it with multi-modal user input (such as voice, text, image, video, motion)), and identifies an intent of the user ([Col. 12, lines 1-4]: An intent may be an element in a pre-defined taxonomy of semantic intentions, which may indicate a purpose of a user interacting with the assistant system 140); 
retrieving a web page of an interactive website that includes one or more interactive elements, wherein the web page is operable to carry out the identified intent of the user (Fig. 1; [Col. 6, lines 27-28]: The client system 130 may render a web interface (e.g. a webpage); [Col. 11, lines]: An intent may be an element in a pre-defined taxonomy of semantic intentions, which may indicate a purpose of a user interacting with the assistant system 140), the one or more interactive elements are operable to input one or more parameters associated with carrying out the intent of the user, and the one or more interactive elements include one or more of a radio button, a check box, a drop down menu, or a text input field ([Col. 42, lines 19-37]: The third-party web interface or resource may include, among other elements, content, a selectable or other icon, or other inter-actable object (which may be implemented, for example, in JavaScript, AJAX, or PHP codes) representing an action or activity); 
generating, as a current state of the user, one or more embeddings that represent one or more aspects of the natural language input and at least some of the interactive elements of the retrieved web page ([Col. 44, lines 16- 67]:  FIG. 8 illustrates an example view of an embedding space… an object may be represented in the vector space 800 as a vector referred to as a feature vector or an object embedding; [Col. 37, lines 13-15]: A vanilla LSTM-based decoder model, where the decoder hidden state is initialized using embeddings of the goal and arguments); 
processing the one or more embeddings using one or more neural networks to generate output indicative of a plurality of candidate actions for operating one or more of the interactive elements of the web page (Fig. 8; [Col. 44, lines 41-44]: In particular embodiments, an n-gram may be mapped to a vector representation in the vector space 800 by using a machine leaning model (e.g., a neural network); Fig. 6; [Col. 40, lines 12-14]: At step 620, the assistant system 140 may generate, by a natural-language generation module, a plurality of candidate responses in response to the user input); 
selecting a given candidate action from the plurality of actions based on one or more criteria (Fig. 6; [Col. 40, lines 14-24]: At step 630, the assistant system 140 may determine, by a filtering module, for each candidate response of the plurality of candidate responses, a quality-indication… based on one or more acceptance-criteria. At step 640, the assistant system 140 may select one or more candidate responses from the plurality of candidate responses based on their respective quality-indications); 
However, Challa does not explicitly teach “performing the selected candidate action to operate with one or more of the interactive elements of the web page using one or more of the parameters; based on the performing, obtaining data configured to render a different web page of the interactive website; and providing, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page.”
On the other hand, in the same field of endeavor, Gruber teaches 
performing the selected candidate action to operate one or more of the interactive elements of the web page using one or more of the parameters ([Abstract]: The text string can be parsed into multiple candidate substrings based on domain keywords; Fig. 8; [0264]: At block 812… a virtual assistant can determine a user's intent from speech input by matching the user's speech to a particular domain with tasks, processes, and the like that the virtual assistant can perform or execute; [0281]: Referring again to process 800 of FIG. 8, at block 814, a first process associated with the first intent and a second process associated with the second intent can be executed);
based on the performing, obtaining data configured to render a different web page of the interactive website (Fig. 11; [0272]-[275]: For example, a user might utter “Reply to those emails with my out of office reply, remind me to call mom in twenty minutes, tell Joe sure, snooze those reminders an hour, and accept that meeting request…  a virtual assistant can begin to take action immediately upon interpreting user intent for one command… the content displayed can be used to determine that “those emails” correspond to emails 1142 shown on display 1030… the displayed content can similarly be used to disambiguate a user's request); and 
providing, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page (Figs. 11-12; [0272]: For example, a virtual assistant might begin sending out of office emails to Jane Doe and Jennifer Smith; [0281]-[0282]: For example, messages can be composed and sent, emails can be deleted, notifications can be dismissed, or the like… Referring again to process 800 of FIG. 8, at block 816, an acknowledgment associated with the first intent and the second intent can be provided to the user).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Challa to incorporate the teachings of Gruber to include performing the selected candidate action to operate with the interactive elements of the web page using one or more of the parameters; based on the performing, obtaining data configured to render a different web page of the interactive website; and providing, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page.
The motivation for doing so would be to determine and execute a user intent, and provide an acknowledgment to the user, as recognized by Gruber ([Abstract] of Gruber: the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user). 

Regarding Claim 2, the combined teachings of Challa and Gruber disclose the method of claim 1. 
Challa further teaches wherein the one or more neural networks comprise a policy for automatic navigation of webpages (Figs. 2-3; [Col. 21, lines 35-55]: An agent 340 may select among registered content providers to complete the action… In particular embodiments, the dialog engine 235 may execute a dialog policy 320 to determine the next action to carry out. The dialog policies 320 may comprise generic policy 321 and domain specific policies 322, both of which may guide how to select the next system action based on the dialog state).

Regarding Claim 3, the combined teachings of Challa and Gruber disclose the method of claim 1. 
Challa further teaches wherein the one or more criteria include rankings of the plurality of candidate actions ([Abstract]: ranking the selected candidate responses based on one or more ranking-criteria; [Col. 22, lines 59-63]: Therefore, the assistant system 140 may use a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response).

Regarding Claim 4, the combined teachings of Challa and Gruber disclose the method of claim 1. 
Challa further teaches wherein the output indicative of the plurality of candidate actions comprises data indicative of a sequence in which the plurality of candidate actions should be performed (Fig. 2; [Col. 15, lines 50-53]: The surface realization component may determine specific words to use, the sequence of the sentences, and the style of the communication content; [Col. 1, lines 55-60]: The dialog engine 235 may additionally store previous conversations between the user and the assistant xbot 215; [Note that sequences of interactive web pages may be represented in a database of previous user states as “scripts” of previous user states and corresponding actions]).

Regarding Claim 5, the combined teachings of Challa and Gruber disclose the method of claim 1. 
Challa further teaches further comprising determining whether performance of the selected candidate action was successful ([Col. 24, lines 29-39]: Using different types of classification models including grammaticality models… these classification models may automatically determine whether a response satisfies grammaticality, semantic correctness, and naturalness, based on which the assistant system 140 may further determine if it is acceptable… In particular embodiments, the assistant system 140 may include a filtering step that performs acceptability classification in the more widely used generate & rank framework).

Regarding Claim 6, the combined teachings of Challa and Gruber disclose the method of claim 5. 
Challa further teaches wherein the output indicating the outcome of performing the selected candidate action comprises an indication of whether the selected candidate action was successful (Fig. 2; [Col. 17, lines 8-40]: In particular embodiments, the dialog state and history may indicate if the user is engaged in an ongoing conversation with the assistant xbot 215; [Col. 39, lines 63-67]: As an example and not by way of limitation, after receiving the user's question 510 “What's the weather like this week in Oak Hill on Wednesday?” the assistant xbot 215 may respond with an answer 512 “Here's your weather forecast).

Regarding Claim 7, the combined teachings of Challa and Gruber disclose the method of claim 5. 
Challa further teaches further comprising: determining a reward or penalty based on whether performance of the selected candidate action was successful ([Col. 2, line 47-Col. 3, line 40]: candidate responses are first filtered to eliminate unacceptable responses… these classification models may automatically determine whether a response satisfies grammaticality, semantic correctness, and naturalness, based on which the assistant system may further determine if it is acceptable; [Col. 28, lines 17-22]: each original ungrammatical utterance in the dataset is a negative example, and the final corrected utterance… is a positive example. Additionally, sentences without any corrections are positive examples as well; See also [Col. 31, lines 53-63], Table 5]); and 
training one or more of the neural networks based on the determined reward or penalty ([Col. 28, lines 23-26]: These positive and negative samples can then be directly used to train the grammaticality filter described in previous sections; [Col. 47, lines 26-30]: a training method may be used (e.g., the conjugate gradient method, the gradient descent method, the stochastic gradient descent) to backpropagate the sum-of-squares error measured as a distances between each vector representing a training object (e.g., using a cost function that minimizes the sum-of-squares error)).

Regarding Claim 9, Challa discloses a system comprising one or more processors and memory storing instructions that, in response to execution by the one or more processors, cause the one or more processors to ([Col. 4, lines 7-10]: Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, Fig. 10; [Col. 3, line 14]: computer system 1000 includes a processor 1002): 
receive a natural language input from a user at an input component ([Col. 39, lines 55-60]: Fig. 6; The method may begin at step 610, where the assistant system 140 may receive, from a client system 130 associated with a user, a user input), 
wherein the natural language input is directed by the user to an automated assistant that operates at least in part on the system, and identifies an intent of the user ([Col. 39, lines 55-60]: Fig. 6; The method may begin at step 610, where the assistant system 140 may receive, from a client system 130 associated with a user, a user input); 
retrieve a web page of an interactive website that includes one or more interactive elements, wherein the web page is operable to carry out the identified intent of the user (Fig. 1; [Col. 6, lines 27-28]: The client system 130 may render a web interface (e.g. a webpage); [Col. 11, lines]: An intent may be an element in a pre-defined taxonomy of semantic intentions, which may indicate a purpose of a user interacting with the assistant system 140), the one or more interactive elements are operable to input one or more parameters associated with carrying out the intent of the user, and the one or more interactive elements include one or more of a radio button, a check box, a drop down menu, or a text input field ([Col. 42, lines 19-37]: The third-party web interface or resource may include, among other elements, content, a selectable or other icon, or other inter-actable object (which may be implemented, for example, in JavaScript, AJAX, or PHP codes) representing an action or activity); 
generate, as a current state of the user, one or more embeddings that represent one or more aspects of the natural language input and at least some of the interactive elements of the retrieved web page ([Col. 44, lines 16- 67]:  FIG. 8 illustrates an example view of an embedding space… an object may be represented in the vector space 800 as a vector referred to as a feature vector or an object embedding; [Col. 37, lines 13-15]: A vanilla LSTM-based decoder model, where the decoder hidden state is initialized using embeddings of the goal and arguments); 
process the one or more embeddings using one or more neural networks to generate output indicative of a plurality of candidate actions for operating one or more of the interactive elements of the web page (Fig. 8; [Col. 44, lines 41-44]: In particular embodiments, an n-gram may be mapped to a vector representation in the vector space 800 by using a machine leaning model (e.g., a neural network); Fig. 6; [Col. 40, lines 12-14]: At step 620, the assistant system 140 may generate, by a natural-language generation module, a plurality of candidate responses in response to the user input); 
select a given candidate action from the plurality of actions based on one or more criteria (Fig. 6; [Col. 40, lines 14-24]: At step 630, the assistant system 140 may determine, by a filtering module, for each candidate response of the plurality of candidate responses, a quality-indication… based on one or more acceptance-criteria. At step 640, the assistant system 140 may select one or more candidate responses from the plurality of candidate responses based on their respective quality-indications); 
However, Challa does not explicitly teach “perform the selected candidate action to operate with one or more of the interactive elements of the web page using one or more of the parameters; based on the performing, obtain data configured to render a different web page of the interactive website; and provide, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page.”
On the other hand, in the same field of endeavor, Gruber teaches 
perform the selected candidate action to operate one or more of the interactive elements of the web page using one or more of the parameters ([Abstract]: The text string can be parsed into multiple candidate substrings based on domain keywords; Fig. 8; [0264]: At block 812… a virtual assistant can determine a user's intent from speech input by matching the user's speech to a particular domain with tasks, processes, and the like that the virtual assistant can perform or execute; [0281]: Referring again to process 800 of FIG. 8, at block 814, a first process associated with the first intent and a second process associated with the second intent can be executed);
based on the performing, obtain data configured to render a different web page of the interactive website (Fig. 11; [0272]-[275]: For example, a user might utter “Reply to those emails with my out of office reply, remind me to call mom in twenty minutes, tell Joe sure, snooze those reminders an hour, and accept that meeting request…  a virtual assistant can begin to take action immediately upon interpreting user intent for one command… the content displayed can be used to determine that “those emails” correspond to emails 1142 shown on display 1030… the displayed content can similarly be used to disambiguate a user's request); and 
provide, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page (Figs. 11-12; [0272]: For example, a virtual assistant might begin sending out of office emails to Jane Doe and Jennifer Smith; [0281]-[0282]: For example, messages can be composed and sent, emails can be deleted, notifications can be dismissed, or the like… Referring again to process 800 of FIG. 8, at block 816, an acknowledgment associated with the first intent and the second intent can be provided to the user).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Challa to incorporate the teachings of Gruber to include performing the selected candidate action to operate with the interactive elements of the web page using one or more of the parameters; based on the performing, obtaining data configured to render a different web page of the interactive website; and providing, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page.
The motivation for doing so would be to determine and execute a user intent, and provide an acknowledgment to the user, as recognized by Gruber ([Abstract] of Gruber: the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user).

Regarding Claim 10, the combined teachings of Challa and Gruber disclose the system of claim 9. 
Challa further teaches wherein the one or more neural networks comprise a policy for automatic navigation of webpages (Figs. 2-3; [Col. 21, lines 35-55]: An agent 340 may select among registered content providers to complete the action… In particular embodiments, the dialog engine 235 may execute a dialog policy 320 to determine the next action to carry out. The dialog policies 320 may comprise generic policy 321 and domain specific policies 322, both of which may guide how to select the next system action based on the dialog state).

Regarding Claim 11, the combined teachings of Challa and Gruber disclose the system of claim 9. 
Challa further teaches wherein the one or more criteria include rankings of the plurality of candidate actions ([Abstract]: ranking the selected candidate responses based on one or more ranking-criteria; [Col. 22, lines 59-63]: Therefore, the assistant system 140 may use a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response).

Regarding Claim 12, the combined teachings of Challa and Gruber disclose the system of claim 9. 
Challa further teaches wherein the output indicative of the plurality of candidate actions comprises data indicative of a sequence in which the plurality of candidate actions should be performed (Fig. 2; [Col. 15, lines 50-53]: The surface realization component may determine specific words to use, the sequence of the sentences, and the style of the communication content; [Col. 1, lines 55-60]: The dialog engine 235 may additionally store previous conversations between the user and the assistant xbot 215; [Note that sequences of interactive web pages may be represented in a database of previous user states as “scripts” of previous user states and corresponding actions]).

Regarding Claim 13, the combined teachings of Challa and Gruber disclose the system of claim 9. 
Challa further teaches further comprising instructions to determine whether performance of the selected candidate action was successful ([Col. 24, lines 29-39]: Using different types of classification models including grammaticality models… may automatically determine whether a response satisfies grammaticality, semantic correctness, and naturalness, based on which the assistant system 140 may further determine if it is acceptable… In particular embodiments, the assistant system 140 may include a filtering step that performs acceptability classification in the more widely used generate & rank framework; [Cols. 26-27]: TABLE-US-00001 TABLE 1 Mistakes involving grammatical errors and other cases of unacceptability).

Regarding Claim 14, the combined teachings of Challa and Gruber disclose the system of claim 13. 
Challa further teaches wherein the output indicating the outcome of performing the selected candidate action comprises an indication of whether the selected candidate action was successful ([Col. 17, lines 8-40]: In particular embodiments, the dialog state and history may indicate if the user is engaged in an ongoing conversation with the assistant xbot 215; [Col. 39, lines 63-67]: As an example and not by way of limitation, after receiving the user's question 510 “What's the weather like this week in Oak Hill on Wednesday?” the assistant xbot 215 may respond with an answer 512 “Here's your weather forecast).

Regarding Claim 15, the combined teachings of Challa and Gruber disclose the system of claim 13. 
Challa further teaches further comprising instructions to: 
determine a reward or penalty based on whether performance of the selected candidate action was successful ([Col. 2, line 47-Col. 3, line 40]: candidate responses are first filtered to eliminate unacceptable responses… these classification models may automatically determine whether a response satisfies grammaticality, semantic correctness, and naturalness, based on which the assistant system may further determine if it is acceptable; [Col. 28, lines 17-22]: each original ungrammatical utterance in the dataset is a negative example, and the final corrected utterance… is a positive example. Additionally, sentences without any corrections are positive examples as well; See also [Col. 31, lines 53-63], Table 5]); and 
train one or more of the neural networks based on the determined reward or penalty ([Col. 28, lines 23-26]: These positive and negative samples can then be directly used to train the grammaticality filter described in previous sections; [Col. 47, lines 26-30]: a training method may be used (e.g., the conjugate gradient method, the gradient descent method, the stochastic gradient descent) to backpropagate the sum-of-squares error measured as a distances between each vector representing a training object (e.g., using a cost function that minimizes the sum-of-squares error)).

Regarding Claim 16, the combined teachings of Challa and Gruber disclose the system of claim 9. 
Challa further teaches wherein the output indicating the outcome of performing the selected candidate action comprises navigation to a new webpage from the webpage (Fig. 3; In particular embodiments, a dialog policy 320 may comprise a data structure that describes an execution plan of an action by an agent 340. An agent 340 may select among registered content providers to complete the action; [Col. 39, line 37-Col. 40, line 7]: FIGS. 5A-5C… this disclosure contemplates any suitable user face associated with any suitable response in any suitable manner; Fig. 6; [Col. 40, lines 26-30]: At step 660, the assistant system 140 may send, to the client system 130… instructions for presenting a top-ranked candidate response to the user; [Col. 41, lines 35-37]: a concept may correspond to… a website).

Regarding Claim 17, Challa discloses at least one non-transitory computer-readable medium comprising instructions that, in response to execution by one or more processors, cause the one or more processors to ([Col. 55, lines 59-60]: Herein, a computer-readable non-transitory storage medium or media): 
receive a natural language input from a user at an input component ([Col. 39, lines 55-60]: Fig. 6; The method may begin at step 610, where the assistant system 140 may receive, from a client system 130 associated with a user, a user input), 
wherein the natural language input is directed by the user to an automated assistant that operates at least in part on the system, and identifies an intent of the user Fig. 1, assistant system 140; [Col. 2, lines 4-6]: The assistant system may enable the user to interact with it with multi-modal user input (such as voice, text, image, video, motion)), and identifies an intent of the user ([Col. 12, lines 1-4]: An intent may be an element in a pre-defined taxonomy of semantic intentions, which may indicate a purpose of a user interacting with the assistant system 140); 
retrieve a web page of an interactive website that includes one or more interactive elements, wherein the web page is operable to carry out the identified intent of the user (Fig. 1; [Col. 6, lines 27-28]: The client system 130 may render a web interface (e.g. a webpage); [Col. 11, lines]: An intent may be an element in a pre-defined taxonomy of semantic intentions, which may indicate a purpose of a user interacting with the assistant system 140), the one or more interactive elements are operable to input one or more parameters associated with carrying out the intent of the user, and the one or more interactive elements include one or more of a radio button, a check box, a drop down menu, or a text input field ([Col. 42, lines 19-37]: The third-party web interface or resource may include, among other elements, content, a selectable or other icon, or other inter-actable object (which may be implemented, for example, in JavaScript, AJAX, or PHP codes) representing an action or activity); 
generate, as a current state of the user, one or more embeddings that represent one or more aspects of the natural language input and at least some of the interactive elements of the retrieved web page ([Col. 44, lines 16- 67]:  FIG. 8 illustrates an example view of an embedding space… an object may be represented in the vector space 800 as a vector referred to as a feature vector or an object embedding; [Col. 37, lines 13-15]: A vanilla LSTM-based decoder model, where the decoder hidden state is initialized using embeddings of the goal and arguments); 
process the one or more embeddings using one or more neural networks to generate output indicative of a plurality of candidate actions for operating one or more of the interactive elements of the web page (Fig. 8; [Col. 44, lines 41-44]: In particular embodiments, an n-gram may be mapped to a vector representation in the vector space 800 by using a machine leaning model (e.g., a neural network); Fig. 6; [Col. 40, lines 12-14]: At step 620, the assistant system 140 may generate, by a natural-language generation module, a plurality of candidate responses in response to the user input); 
select a given candidate action from the plurality of actions based on one or more criteria (Fig. 6; [Col. 40, lines 14-24]: At step 630, the assistant system 140 may determine, by a filtering module, for each candidate response of the plurality of candidate responses, a quality-indication… based on one or more acceptance-criteria. At step 640, the assistant system 140 may select one or more candidate responses from the plurality of candidate responses based on their respective quality-indications); 
However, Challa does not explicitly teach “perform the selected candidate action to operate with one or more of the interactive elements of the web page using one or more of the parameters; based on the performing, obtain data configured to render a different web page of the interactive website; and provide, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page.”
On the other hand, in the same field of endeavor, Gruber teaches 
perform the selected candidate action to operate one or more of the interactive elements of the web page using one or more of the parameters ([Abstract]: The text string can be parsed into multiple candidate substrings based on domain keywords; Fig. 8; [0264]: At block 812… a virtual assistant can determine a user's intent from speech input by matching the user's speech to a particular domain with tasks, processes, and the like that the virtual assistant can perform or execute; [0281]: Referring again to process 800 of FIG. 8, at block 814, a first process associated with the first intent and a second process associated with the second intent can be executed);
based on the performing, obtain data configured to render a different web page of the interactive website (Fig. 11; [0272]-[275]: For example, a user might utter “Reply to those emails with my out of office reply, remind me to call mom in twenty minutes, tell Joe sure, snooze those reminders an hour, and accept that meeting request…  a virtual assistant can begin to take action immediately upon interpreting user intent for one command… the content displayed can be used to determine that “those emails” correspond to emails 1142 shown on display 1030… the displayed content can similarly be used to disambiguate a user's request); and 
provide, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page (Figs. 11-12; [0272]: For example, a virtual assistant might begin sending out of office emails to Jane Doe and Jennifer Smith; [0281]-[0282]: For example, messages can be composed and sent, emails can be deleted, notifications can be dismissed, or the like… Referring again to process 800 of FIG. 8, at block 816, an acknowledgment associated with the first intent and the second intent can be provided to the user).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Challa to incorporate the teachings of Gruber to include performing the selected candidate action to operate with the interactive elements of the web page using one or more of the parameters; based on the performing, obtaining data configured to render a different web page of the interactive website; and providing, by the automated assistant as audio or visual output to the user, output indicating an outcome of performing the selected candidate action, wherein the output indicating the outcome comprises content from the different web page.
The motivation for doing so would be to determine and execute a user intent, and provide an acknowledgment to the user, as recognized by Gruber ([Abstract] of Gruber: the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user).

Regarding Claim 18, the combined teachings of Challa and Gruber disclose the at least one non-transitory computer-readable medium of claim 17. 
Challa further teaches wherein the one or more neural networks comprise a policy for automatic navigation of webpages (Figs. 2-3; [Col. 21, lines 35-55]: An agent 340 may select among registered content providers to complete the action… In particular embodiments, the dialog engine 235 may execute a dialog policy 320 to determine the next action to carry out. The dialog policies 320 may comprise generic policy 321 and domain specific policies 322, both of which may guide how to select the next system action based on the dialog state).

Regarding Claim 19, the combined teachings of Challa and Gruber disclose the at least one non-transitory computer-readable medium of claim 17. 
Challa further teaches wherein the one or more criteria include rankings of the plurality of candidate actions ([Abstract]: ranking the selected candidate responses based on one or more ranking-criteria; [Col. 22, lines 59-63]: Therefore, the assistant system 140 may use a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response).

Regarding Claim 20, the combined teachings of Challa and Gruber disclose the at least one non-transitory computer-readable medium of claim 17. 
Challa further teaches wherein the output indicative of the plurality of candidate actions comprises data indicative of a sequence in which the plurality of candidate actions should be performed (Fig. 2; [Col. 15, lines 50-53]: The surface realization component may determine specific words to use, the sequence of the sentences, and the style of the communication content; [Col. 1, lines 55-60]: The dialog engine 235 may additionally store previous conversations between the user and the assistant xbot 215; [Note that sequences of interactive web pages may be represented in a database of previous user states as “scripts” of previous user states and corresponding actions]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIRLEY D. HICKS whose telephone number is (571)272-3304. The examiner can normally be reached Mon - Fri 7:30 - 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571) 272-4085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.D.H./Examiner, Art Unit 2168  

/CHARLES RONES/Supervisory Patent Examiner, Art Unit 2168
Read full office action
Prosecution Timeline

Show 9 earlier events
Nov 19, 2025
Request for Continued Examination
Nov 28, 2025
Response after Non-Final Action
Dec 09, 2025
Non-Final Rejection mailed — §103
Feb 17, 2026
Interview Requested
Feb 24, 2026
Applicant Interview (Telephonic)
Feb 24, 2026
Examiner Interview Summary
Feb 24, 2026
Response Filed
May 26, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/506,722
Patent 12639380
WORK INCOME VISUALIZATION AND OPTIMIZATION PLATFORM
4y 7m to grant Granted May 26, 2026
18/351,876
Patent 12596682
SYSTEM AND METHOD FOR OBJECT STORE FEDERATION
2y 8m to grant Granted Apr 07, 2026
18/218,986
Patent 12499102
HIERARCHICAL DELIMITER IDENTIFICATION FOR PARSING OF RAW DATA
2y 5m to grant Granted Dec 16, 2025
18/340,771
Patent 12499146
MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING (NLP)-BASED SYSTEM FOR SYSTEM-ON-CHIP (SoC) TROUBLESHOOTING
2y 5m to grant Granted Dec 16, 2025
18/396,455
Patent 12405818
BATCHING WAVEFORM DATA
1y 8m to grant Granted Sep 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
63%
Grant Probability
99%
With Interview (+54.2%)
2y 10m (~8m remaining)
Median Time to Grant
High
PTA Risk
Based on 109 resolved cases by this examiner. Grant probability derived from career allowance rate.
ANALYZING WEB PAGES TO FACILITATE AUTOMATIC NAVIGATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email