Last updated: April 19, 2026

Application No. 18/242,300

INTERFACE FOR A VIRTUAL DIGITAL ASSISTANT

Non-Final OA §103

Filed

Sep 05, 2023

Examiner

LUU, DAVID V

Art Unit

2171

Tech Center

2100 — Computer Architecture & Software

Assignee

Apple Inc.

OA Round

1 (Non-Final)

This examiner grants 49% of cases after interview

— +40.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 178 resolved cases, 2023–2026

Examiner Intelligence

LUU, DAVID V View full profile →

Grants 49% of resolved cases

Career Allow Rate

87 granted / 178 resolved

-6.1% vs TC avg

Strong +40% interview lift

Without

With

+40.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

15 currently pending

Career history

193

Total Applications

across all art units

Statute-Specific Performance

§101

8.3%

-31.7% vs TC avg

§103

57.1%

+17.1% vs TC avg

§102

12.3%

-27.7% vs TC avg

§112

15.9%

-24.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 178 resolved cases

Office Action

§103

Detailed Action
This action is in responsive to the claim set filed on 02/27/2024.
The present application is being examined under the pre-AIA  first to invent provisions. 
Claims 2-21 are pending.

Priority
Application 18242300 filed 09/05/2023 is a Continuation of 16362441, filed 03/22/2019, now abandoned and having 2 RCE-type filing therein.
Application 16362441 is a Continuation of 14046871, filed 10/04/2013, now U.S. Patent # 10241752 and having 2 RCE-type filing therein.
Application 14046871 Claims Priority from Provisional Application 61709766, filed 10/04/2012, 14046871 is a Continuation in Part of 13250854, filed 09/30/2011, now U.S. Patent # 9858925 and having 1 RCE-type filing therein.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 09/14/2023, 10/13/2023, 11/08/2013, 05/21/2024, 09/18/2024, 09/18/2024, 10/10/2024, 01/09/2025, 01/31/2025, 07/29/2025, 10/17/2025 were filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-6, 9-14, 16-17, 19-21 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Gruenstein et al. US 20150279354 A1, (hereinafter Gruenstein) in view of Blair et al. US 20060293890 A1, (hereinafter Blair).

As to independent claim 2, Gruenstein teaches: 
An electronic device, comprising: 
one or more processors (See Fig. 6 processor 604 with [0051-0060]); 
a memory (See Fig. 6 memory 608 with [0051-0060]); and 
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors (See Fig. 6 with [0060] program stored in memory), the one or more programs including instructions for:
receiving spoken user input via an input device (See Fig. 5A step 510 with [0050] first audio stream received from user. See also [0020] microphone receives an audio stream); 
generating a first plurality of candidate interpretations of the received spoken user input (See Fig. 5A step 530 with [0050] - “At stage 530, a list is generated based on the translation of the first audio stream”. See also [0040] – “Embedded speech recognizer 210 generates the list of recognition candidates, e.g., “pizzamerica,” “piece of my heart,” “pizza my heart” and these candidates are provided to tag comparator 260.”, thus it is a list of candidate interpretations); 

deriving a representation of user intent based on the first plurality of candidate interpretations (See Fig. 5A step 540 with [0050] – “and at stage 540, a selection from the list is received from the user”. It is interpreted that an explicit user selection on the list of candidates is how the device derives a representation of user intent), wherein an association between the user intent and a respective candidate interpretation is added to context information (See Fig. 5A step 550 and step 570 with [0050] – “At stage 550, a first speech tag based on the first audio stream and the selection is generated, and at stage 570 on FIG. 5B, the first speech tag is stored”. Thus, the first speech tag being generated and stored is interpreted to be the claimed context information that contains the association between user intent and a respective candidate); 
identifying, based on the user intent, at least one task (See [0020] – “As used herein by some embodiments, a voice command can be, for example and without limitation, an indication by a user for an application operating on client device 110 to perform a particular function, e.g., “open email,” “increase volume” or other type of command”, thus this paragraph defines voice commands. Then see Fig. 5A with [0050] – “In this example, a method for performing a personalized voice command on a client device is shown”, in other words the method illustrated by Fig. 5A is for identifying a task based on determined user intent. See also Figs. 3A-3B where a UI for identifying a task associated with a voice command. See also Figs. 4A-4B for the same thing); 
executing the at least one task (See Figs. 4A-4B where a search query task is executed in response to a voice command.); and 
providing an output based on the at least one executed task (See Figs. 4A-4B where a search output page is provided).  
Gruenstein does not teach: generating a second plurality of candidate interpretations, wherein the second plurality of candidate interpretations is a subset of the first plurality of candidate interpretations; 
Blair teaches: generating a second plurality of candidate interpretations, wherein the second plurality of candidate interpretations is a subset of the first plurality of candidate interpretations (See Fig. 3 with [0021-0022], a user input is received which then creates and displays a list of candidates, and then “The user may then choose to narrow the candidate list by providing speech input”. In other words, a second plurality of candidates are generated, and these are narrowed i.e. subset, of the first list of candidates.).
It would have been obvious to one of ordinary skill before the invention was made to modify a voice recognizer that displays a first list of candidates in response to a voice input as taught by Gruenstein to include a voice recognizer that further narrows the first list of candidates with a voice input as taught by Blair. Motivation to do so would be for “However, because the resulting list can be extremely long, it can be difficult for a user to quickly locate the desired word or character.” (See Blair [0003]).

As to dependent claim 3, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein further teaches: prompting the user via a conversational interface (See Fig. 3A with [0039] – “The displayed prompt “speak now” is a prompt to the user to speak into the device”); and 
receiving the spoken user input via the conversational interface (See Fig. 3A with [0039] – “Upon the user speaking, microphone 230 captures an audio stream and relays the stream to embedded speech recognizer 210.”); and 
converting the spoken user input to a text representation (Figs. 3A-3B does not explicitly show converted spoken user input. However, the embodiment shown in Fig. 4A shows it. See Fig. 4A which illustrates the text “pizza my heart” in the search box, which is understood to be the spoken user input which is different from the second iteration of “pizza my heart” which is a candidate within the list 420).  

As to dependent claim 4, Gruenstein as modified teaches all the limitations of claim 3 as cited above.
Gruenstein further teaches: wherein converting the spoken user input to the text representation comprises:
generating a plurality of candidate text interpretations of the spoken user input (See Fig. 4A which illustrates the list of generated candidates of the spoken user input); and 
ranking at least a subset of the generated candidate text interpretations (See [0021] – “…each recognition candidate corresponding to the text of a potential voice command, and having a confidence value associated therewith, such confidence value measuring the estimated likelihood that a particular recognition candidate corresponds to the work that the user intended. For example and without limitation, if the audio stream sound corresponds to “dark-nite” recognition candidates could include “dark knight” and “dark night.” The user could have intended either candidate at the time of the steam, and each candidate can, in an embodiment, have an associated confidence value”, in other words each candidate having a confidence value means the candidates have a ranking. See also [0031] which explicitly mentions highest ranked confidence score – “dark night”.); and 
wherein at least one of the generating and ranking steps is performed using the context information (As explained in claim 2 above, claimed context information is interpreted to be the stored speech tags taught by Gruenstein. See [0045] – “In an embodiment, the selection of any one of the above-described approaches could be determined by a confidence level associated with the speech tag match. For example, if the user said an audio stream corresponding to “pizza my heart” and a high-confidence match was determined with the stored “Pizza My Heart” speech tag, then approach shown on FIG. 4D could be selected and no confirmation would be requested”, in other words the ranking (i.e. confidence level) of a candidate is performed using speech tags.).  

As to dependent claim 5, Gruenstein as modified teaches all the limitations of claim 4 as cited above.
Gruenstein further teaches: wherein the context information used in at least one of the generating and ranking comprises at least one selected from the group consisting of:
acoustic environment data describing an acoustic environment in which the spoken user input is received; 
data received from at least one sensor; 
vocabulary obtained from a database associated with the user (See [0024] – “Based on the selected recognition candidate, in an embodiment, client query manager 220 queries client database 240 to generate a query result. In an embodiment, client database 240 contains information that is locally stored in client device 110 such as, for example and without limitation, telephone numbers, address information, and results from previous voice commands, and “speech tags””, in other words the speech tags are seen as claimed vocabulary, and the speech tags are stored and obtained from a user database 240); 
vocabulary associated with application preferences; 
vocabulary obtained from usage history; and 
current dialog state.  

As to dependent claim 6, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein further teaches: prompting the user by generating at least one prompt based at least in part on the context information (See Fig. 4B with [0042] – “In FIG. 4B an example is depicted wherein one of the recognition candidates matches a stored speech tag for “Pizza My Heart.” In this embodiment, this match is termed a “quick match” and the result is labeled 430 as such for the user. A quick match is signaled to the user, and the user is invited to confirm the accuracy of this determination.”, in other words the user is prompted with a quick match i.e. “hey, your voice query most likely matches this candidate”. Also, the quick match prompt is based on the stored speech tag i.e. claimed context information).  

As to dependent claim 9, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
identifying at least one task and at least one parameter for the task by identifying at least one task and at least one parameter for the task based at least in part on the context information (First see Fig. 4A with [0041], the search query is interpreted to be the claimed task and the claimed parameter is seen as the detected audio stream “pizza my heart”. And the claimed identifying the task and parameter is based at least in part on the stored speech tags mentioned in [0040]).  

As to dependent claim 10, Gruenstein as modified teaches all the limitations of claim 9 as cited above.
wherein the context information used in identifying at least one task and at least one parameter for the task comprises at least one selected from the group consisting of:
data describing an event; 
data from a database associated with the user; 
data received from at least one sensor; 
application context; 
input previously provided by the user; 
known information about the user (See [0024] client database and see [0040] speech tags in client database); 
location; 
date; 
environmental conditions; and 
history.  

As to dependent claim 11, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein further teaches: generating a dialog response based at least in part on the context information (See Fig. 4B with [0042] a quick match is signaled to the user).  

As to dependent claim 12, Gruenstein as modified teaches all the limitations of claim 11 as cited above.
Gruenstein further teaches: wherein the context information used in generating a dialog response comprises at least one selected from the group consisting of: 
data from a database associated with the user; 
application context; 
input previously provided by the user; 
known information about the user (See [0024] client database); 
location; 
date; 
environmental conditions; and 
history.  

As to dependent claim 13, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
wherein the context information comprises at least one selected from the group consisting of:
context information stored at a server; and 
context information stored at a client (See [0024] client database).  

As to dependent claim 14, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein further teaches: receiving the context information from a context source by: 
requesting the context information from a context source (See [0024] – “Based on the selected recognition candidate, in an embodiment, client query manager 220 queries client database 240 to generate a query result.”); and 
receiving the context information in response to the request (See [0024] – “…to generate a query result”.).  

As to dependent claim 16, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein further teaches: receiving at least a portion of the context information after receiving the spoken user input (See [0024] in conjunction with previous paragraphs, the query occurs after the received audio stream).  

As to dependent claim 17, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
receiving static context information as part of an initialization step (See [0050] – “Initially, as shown in stage 510 on FIG. 5A, a first audio stream is received from a user. At stage 520, a speech recognizer is used to create a first translation of the first audio stream. At stage 530, a list is generated based on the translation of the first audio stream, and at stage 540, a selection from the list is received from the user. At stage 550, a first speech tag based on the first audio stream and the selection is generated, and at stage 570 on FIG. 5B, the first speech tag is stored.”, in other words an initial non-changing (i.e., static) first speech tag is generated based on the first audio stream and user selection.); and 
receiving additional context information after receiving the spoken user input (See [0050] – “…At stage 580, a second audio stream is received from the user, and at stage 585, a determination is made as to whether the second audio stream matches the first speech tag. If, at stage 590, the second audio stream does match the first speech tag, then at stage 595, a second translation of a second audio stream is created using the speech recognizer, based on the speech tag”, in other words after the first audio stream, the invention can receive a second audio stream, and an additional second translation (i.e., additional context information) of the second audio stream is generated).  

As to dependent claim 19, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein further teaches: wherein the electronic device corresponds to at least one of:
a telephone; a smartphone; a tablet computer; a laptop computer; a personal digital assistant (See [0018] PDA); a desktop computer; a kiosk; a consumer electronic device; a consumer entertainment device; a music player; a camera; a television; an electronic gaming unit; and a set-top box.  

As to independent claim 20, it is rejected under similar rationale as claim 2 as cited above.  

As to independent claim 21, it is rejected under similar rationale as claim 2 as cited above.  

Claims 7-8, 15 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Gruenstein et al. US 20150279354 A1, (hereinafter Gruenstein) in view of Blair et al. US 20060293890 A1, (hereinafter Blair) in view of Kosaka et al. US 20020128826 A1, (hereinafter Kosaka).

As to dependent claim 7, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein as modified does not teach: disambiguating the received spoken user input based on acoustic environment data of received context information to derive a representation of user intent by performing natural language processing on the received spoken user input based at least in part on the context information.  
Kosaka teaches: disambiguating the received spoken user input based on acoustic environment data of received context information to derive a representation of user intent by performing natural language processing on the received spoken user input based at least in part on the context information (See [0041-0042]. Essentially the paragraphs describe disambiguation (i.e. speech recognition) is based on a particular acoustic environment (e.g. silent environment)).
It would have been obvious to one of ordinary skill in the art before the invention was made to modify a speech recognition system taught by Gruenstein to include speech recognition based on an acoustic state as taught by Kosaka. Motivation to do so would be for “the recognition rate can be improved.” (See Kosaka [0040]).

As to dependent claim 8, Gruenstein as modified teaches all the limitations of claim 7 as cited above.
Gruenstein as modified does not teach: wherein the context information comprises at least one selected from the group consisting of: 
data describing an event; 
application context; 
input previously provided by the user; 
known information about the user; 
location; 
date; 
environmental conditions; and 
history.  
Kosaka further teaches: wherein the context information comprises at least one selected from the group consisting of: 
environmental conditions (See [0041] – “This acoustic data also reflects the influence of the characteristics of a microphone used. If background noise or noise generated inside the device is present, the acoustic data is also influenced by such noise”).
It would have been obvious to one of ordinary skill in the art before the invention was made to modify a speech recognition system taught by Gruenstein to include speech recognition based on an acoustic state as taught by Kosaka. Motivation to do so would be for “the recognition rate can be improved.” (See Kosaka [0040]).

As to dependent claim 15, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein as modified does not teach: receiving context information from a context source by: 
receiving at least a portion of the context information prior to receiving the spoken user input.  
Kosaka teaches: receiving at least a portion of the context information prior to receiving the spoken user input (See [0040] – “Before the beginning of speech recognition, an initial setup shown in the flow chart of FIG. 2 is executed.”).  
It would have been obvious to one of ordinary skill in the art before the invention was made to modify a speech recognition system taught by Gruenstein to include speech recognition based on an acoustic state as taught by Kosaka. Motivation to do so would be for “the recognition rate can be improved.” (See Kosaka [0040]).

Claim 18 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Gruenstein et al. US 20150279354 A1, (hereinafter Gruenstein) in view of Blair et al. US 20060293890 A1, (hereinafter Blair) in view of Santamaria et al. US 20130244614 A1, (hereinafter Santamaria).

As to dependent claim 18, Gruenstein as modified teaches all the limitations of claim 2 as cited above.
Gruenstein as modified does not teach: receiving push notification of a change in context information; and 
responsive to the push notification, updating locally stored context information.
Santamaria teaches: receiving push notification of a change in context information (See [0226] push notification in response to a change such as “include a change to a friend's online status,”); and 
responsive to the push notification, updating locally stored context information (See [0226] push notifications are used to update data stored in a local cache).
It would have been obvious to one of ordinary skill in the art before the invention was made to modify a speech recognition system taught by Gruenstein to include push notifications for notifying a change in status information as taught by Santamaria. Motivation to do so would be for “in this manner may decrease network and service load because, with push updates, periodic polling between the mobile device and the service is not required” (See Santamaria [0226]).

Conclusion

Any inquiry concerning this communication should be directed to DAVID V LUU at telephone number (571)270-0703.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID V LUU whose telephone number is (571)270-0703. The examiner can normally be reached on Monday-Tuesday from 11am-7pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kieu Vu, can be reached at telephone number (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center and Private PAIR for authorized users only. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/DAVID V LUU/Examiner, Art Unit 2171                                                                                                                                                                                                        
/KIEU D VU/Supervisory Patent Examiner, Art Unit 2171

Read full office action

Prosecution Timeline

Sep 05, 2023

Application Filed

Feb 27, 2024

Response after Non-Final Action

Feb 07, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/941,961

Patent 12596520

MEDIA CONTROLS USER INTERFACE

2y 5m to grant Granted Apr 07, 2026

16/420,816

Patent 12572143

SYSTEMS, METHODS, AND/OR APPARATUS FOR PROVIDING A USER DISPLAY AND INTERFACE FOR USE WITH AN AGRICULTURAL IMPLEMENT

2y 5m to grant Granted Mar 10, 2026

17/562,740

Patent 12546611

METHOD, APPARATUS, AND SYSTEM FOR PROVIDING DIGITAL STREET HAILING

2y 5m to grant Granted Feb 10, 2026

17/561,484

Patent 12529543

GENERATION AND APPLICATION OF AUTONOMOUSLY-CREATED THREE-DIMENSIONAL SAFETY OFFSET BOUNDING SURFACES FROM THREE-DIMENSIONAL VIRTUAL MAPS AROUND POINTS OF INTEREST

2y 5m to grant Granted Jan 20, 2026

17/101,181

Patent 12472441

MODIFYING USER INTERFACE OF APPLICATION DURING RECORDING SESSION

2y 5m to grant Granted Nov 18, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

49%

Grant Probability

89%

With Interview (+40.2%)

3y 7m

Median Time to Grant

Low

PTA Risk

Based on 178 resolved cases by this examiner. Grant probability derived from career allow rate.