Last updated: April 19, 2026
Application No. 17/974,677
SENTIMENT AWARE VOICE USER INTERFACE

Final Rejection §102§103§112
Filed
Oct 27, 2022
Examiner
SERRAGUARD, SEAN ERIN
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Amazon Technologies, Inc.
OA Round
6 (Final)
Interview Optional

— +33.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 134 resolved cases, 2023–2026
Examiner Intelligence

SERRAGUARD, SEAN ERIN View full profile →
Grants 69% — above average
Career Allow Rate
92 granted / 134 resolved
+6.7% vs TC avg
Strong +34% interview lift
Without
With
+33.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
43 currently pending
Career history
177
Total Applications
across all art units
Statute-Specific Performance

§101
9.4%
-30.6% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
18.6%
-21.4% vs TC avg
§112
19.2%
-20.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 134 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on 05 November 2025 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 21 and 31 has been acknowledged and entered.  
After entry of the amendments, claims 21, 23-25, 27-28, 30-31, 33-34 and 37-41 remain pending
In view of the amendment to claim(s) 21, the rejection of claim(s) 21, 23-25, 27-30 and 41 under 35 U.S.C. §102 and 35 U.S.C. §103 is withdrawn.
In view of the amendment to claim(s) 31, the rejection of claim(s) 31, 33-34 and 37-40 under 35 U.S.C. §102 and 35 U.S.C. §103 is maintained as to the cited art, as modified to address the amended claim language and claim interpretations. 
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 and 35 U.S.C. §112(a) are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102/103, see pages 8-11 of the Response to Non-Final Office Action dated 07 July 2025, which was received on 05 November 2025 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the rejection(s) of claim(s) 21 under 35 U.S.C. §102 as being anticipated by Wang (U.S. Pat. App. Pub. No. 2018/0314689, hereinafter Wang), applicant asserts that Wang fails to teach or suggest “determining second NLU data corresponding to the first utterance by reprocessing the first utterance to generate second NLU data as a modified interpretation of the first utterance that differs from the first NLU data,” as recited in amended claim 21. Applicant’s arguments in light of the amended claims are persuasive. Therefore, the rejection of claim 21 is withdrawn.
Applicant further argues that the rejection of dependent claims 23-25, 27-28, 30, and 41 should be withdrawn for at least the same reasons as independent claim 21. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 23-25, 27-28, 30, and 41 under 35 U.S.C. §102/ 35 U.S.C. §103 withdrawn.
With respect to the rejection(s) of claim(s) 31 under 35 U.S.C. §102 as being anticipated by Wang, applicant asserts that Wang fails to teach or suggest “based on the sentiment data, determining, using data related to past user inputs, an alternative representation of the first utterance, wherein the past user inputs are received prior to the first utterance and the second utterance, wherein the alternative representation results in a desired response,” as recited in amended claim 31. These arguments are not persuasive.
Wang discloses the above recited limitations. As indicated in the Office Action, Wang recites that based on the user reacting “negatively to the audio 2650 or text 2680 output,” the “machine learning systems can analyze [the] user's response” and “identify whether the output was correct or appropriate,” where the identification of wither the content was correct or appropriate corresponds to the clarifying question. (Wang, ¶ [0405]). The asking of a clarifying question is the determining of an alternate representation of the first utterance. As is well understood in the art, and as indicated by the mere existence of a confidence score, commonly available ASR and NLU systems select one or more hypotheses, from a plurality of hypotheses, for their predictions. As such, and as relied on in Wang, multiple hypotheses for both the ASR and NLU results already exist. In asking the clarifying question, Wang is trying to select between multiple hypotheses.
Wang then explains that, based on the detected negative reaction from the user, the system can “decrease the confidence level” for the automatic speech recognition output and the “confidence value (or values) can be provided to a first clarification 2764 engine, along with the text string produced by the automatic speech recognition 2712 engine... [to] examine the output from the automatic speech recognition 2712 engine and determine whether the confidence value is high enough to proceed.” The decreased confidence level corresponds to the determination that a particular hypothesis may be incorrect, as “confidence levels can be used preemptively to request clarification from the speaker with, for example, targeted questions that focus on specific parts of the automatic speech recognition 2712 and/or machine translation 2714 output.” Further, decreased confidence levels are directly reflected in the selection of the chosen hypothesis. Based in part on said confidence levels, the system is selecting the “system's textual hypothesis of natural language input” from among a plurality of hypotheses, and those hypotheses correspond to both ASR and NLU hypothetical outputs. (Wang, ¶ [0171], [0405]). As explained with reference to an example, the “the clarification 2764 engine can ask: ‘Is Hanna a name, or are you referring to a flower [hana]?'“ Though not expressly stated as separate hypotheses in the example, one skilled in the art would understand that two possible hypotheses are being explored in the hypothetical question, where the hypotheses are “Hanna is the name of a person” and “Hana is the Japanese word for flower,” respectively.  (Wang, ¶ [0405]). 
The clarification systems can “request clarification from the user” regarding the first utterance {determining an alternative representation of the utterance}, and where the user's negative reaction (e.g., frustration) can be determined by an interpretation component which determines the user intent, where “the interpretation 418 component” may further “be assisted by a dialog history {using data related to past user inputs}, which may assist the interpretation 418 in formulating a conclusion about a person's [emotional, mental, or cognitive state].” (Wang, ¶ [0113]-[0114], [0405]). Thus, the dialog history {data related to past user inputs} is applied in determining sentiment data, and at least the sentiment data (e.g., user frustration) is used in determining the alternate representation (through the request for clarification). Therefore, the rejection under Wang is maintained in light of the arguments provided.
Applicant further argues that the rejection of dependent claims 33-34 and 37-40 should be withdrawn for at least the same reasons as independent claim 31. Applicant’s arguments/arguments in light of the amended claims are not persuasive for at least the reasons cited above with respect to claim 31. As such, the rejections of claims 33-34 and 37-40 under 35 U.S.C. §102/35 U.S.C. §103 are maintained as modified to in light of the amended claims.
Upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of Wang, Sinha (U.S. Pat. App. Pub. No. 2014/0365226, hereinafter Sinha), and newly cited reference Carter (U.S. Pat. App. Pub. No. 2016/0188292, hereinafter Carter).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. §112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. §112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 21, 23-25, 27-28, 30, and 41 are rejected under 35 U.S.C. §112(a) or 35 U.S.C. §112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. §112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Regarding claim 21, the limitation “reprocessing the first utterance to generate second NLU data” is not supported by the specification as filed. Claim 21, as amended, recites “reprocessing the first utterance to generate second NLU data…” at lines 14-15. Applicant asserts that support for the amendment can be found “at least in paragraphs [0024], [0038], [0039], and [0258].” (Response, pg. 8). However, upon review of the cited paragraphs and the specification as a whole, clear support for the amendments could not be found. 
Paragraphs [0024], [0038], [0039], and [0258], either individually or when read together, fail to provide clear support for reprocessing of the first utterance. As understood by the examiner, paragraph [0024] is directed to processing a “repeated user input”, also referred to as a repeated request. However, in the context of claim 1, the repeated user input corresponds to the “second utterance” and not the “first utterance”. As such, reprocessing of the first utterance was not found in paragraph [0024]. Paragraphs [0037] and [0038] are directed to the repeated user input, as well. However, the repeated user input is discussed in the context of a present “user input” and the determination is “whether the user input is a repeat of a prior user input of the dialogue session” (Instant Application, ¶ [0038]). In relevant part, paragraph [0038] discloses the determination of a repeat can be “based on a comparison of the ASR data corresponding to the two user inputs, a comparison of the NLU data corresponding to the two user inputs and/or a comparison of other data corresponding to the two user inputs.” (Id.) Respectfully, this is understood as a comparison of existing data, not a reprocessing of either the first utterance or the second utterance. As above, paragraph [0258] is directed to the repeated user input, as discussed in the context of a present “user input” and the determination is to “determine which dialogue to output in response to the user input” (Instant Application, ¶ [0258]).
Upon further review of the specification as a whole, clear support for the amendment could not be found. Of note, the word “reprocess” does not occur in the specification, and known equivalents could not be found. Further, the described embodiments do not appear to contemplate returning to the first utterance itself, as distinguished from previous ASR results (e.g., the processing of “text data corresponding to the user input” at paragraph [0286] or “alternate ASR hypothesis data 708” at paragraph [0292]) or NLU results (e.g., the use of a “second best NLU hypothesis” described in paragraph [0247]), in response to any particular stimulus, including user sentiment and/or ASR confidence scores. 
Therefore, the amendments provided to claim 21 fail to comply with the written description requirement and are rejected under 35 U.S.C. §112(a).
Regarding claims 23-25, 27-28, 30, and 41, claims 23-25, 27-28, 30, and 41 depend from claim 21 and incorporate all limitations therefrom. Therefore, claims 23-25, 27-28, 30, and 41 are rejected under 35 U.S.C. §112(a) for at least the same reasons as described above with relation to claim 21.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 31, 37, and 39-40 is/are rejected under 35 U.S.C. §102(a)(1) and 35 U.S.C. §102(a)(2) as being anticipated by Wang.

Regarding claim 31, Wang discloses A system comprising (The systems and methods for speech recognition described with reference to the virtual personal assistant.; Wang, ¶ [0097]): at least one processor (“A processor(s), implemented in an integrated circuit, may perform the necessary tasks”; Wang, ¶ [0538]); and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to (“When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a computer-readable or machine-readable medium.”; Wang, ¶ [0538]): receive first audio data representing a first utterance (In embodiments describing a request for a prescription, the person provides the input the person tells the system, “I’d like to refill a prescription.”; Wang, ¶ [0098]); determine first natural language understanding (NLU) data corresponding to the first utterance (Using the intent interpreter, the system determines information about “the speaker’s emotional or cognitive state,” such as “that the person is speaking slowly and hesitantly… [and] may conclude that the speaker maybe is not quite ready for this conversation, and needs more time.” This information is determined alongside the content of the message itself, as understood by “a natural language recognition system”; Wang, ¶ [0098], [0091]); cause performance of a first action based on the first NLU data (Relying on the NLU data (including intent of the user and the speaker’s emotional and cognitive state) {...based on the first NLU data} “the system determines to change its dialog approach by asking direct yes/no questions, and responds, ‘Sure, happy to help you with that. I’ll need to ask you some questions first’. {causing the performance of a first action}”; Wang, ¶ [0098]); receive second audio data representing a second utterance (As described with reference to an example of a continuing dialog in a turn based fashion, the system asks “OK, on the bottle there should be a label <pause>. On the label there will be a medication number. Can you find it?” and receives with following utterance in response, where the person indicates “I think so, I found something here.. but.. <sigh>“ which is a second audio data representing a second utterance, that is received by the system; Wang, ¶ [0102] -[0103]); determine, using a trained machine learning model, sentiment data corresponding to the second audio data (“From this reply, the system may detect audible frustration. The system may further detect, from image data, a visible grimace,” where both the audible frustration and the visible grimace are sentiment data which correspond to the person’s response in the second audio data, and where “the emotion classifier 1712 n can make a conclusion with respect to the current emotional or affective state of the person” and can be a “statistical classifier... implemented and trained to perform classification of characteristics and emotional states,” which is a trained machine learning model.; Wang, ¶ [0103], [0234], [0236]), wherein the sentiment data indicates frustration (“From this information,” including the audible frustration and visible grimace {the sentiment data}, “the system may conclude that the person is probably frustrated” {indicates frustration}; Wang, ¶ [0103]); based on the sentiment data, determining, using data related to past user inputs, an alternative representation of the first utterance, (based on the user reacting “negatively to the audio 2650 or text 2680 output,” the “machine learning systems can analyze [the] user’s response” and “identify whether the output was correct or appropriate,” which corresponds to the clarifying question, and based on the detected negative reaction from the user, the system can “decrease the confidence level” for the automatic speech recognition output and the “confidence value (or values) can be provided to a first clarification 2764 engine, along with the text string produced by the automatic speech recognition 2712 engine... [to] examine the output from the automatic speech recognition 2712 engine and determine whether the confidence value is high enough to proceed,” and where the system is selecting among a plurality of hypotheses (as explained with reference to an example, the “the clarification 2764 engine can ask: “Is Hanna a name, or are you referring to a flower?’”, where the hypotheses correspond to Hanna, as the name of a person, and hana, as the Japanese word for flower, respectively). The clarification systems can “request clarification from the user” regarding the first utterance {determining an alternative representation of the utterance}, and where the user’s negative reaction (e.g., frustration) can be determined by an interpretation component which determines the user intent, where “the interpretation 418 component” may further “be assisted by a dialog history {using data related to past user inputs}, which may assist the interpretation 418 in formulating a conclusion about a person’s [emotional, mental, or cognitive state]” Thus data related to past user inputs is applied in determining sentiment data, and at least the sentiment data (e.g., user frustration) is used in determining the alternate representation (through the request for clarification).; Wang, ¶ [0103], [0113]-[0115], [0400], [0404]-[0405]) wherein the past user inputs are received prior to the first utterance and the second utterance, (The dialog history, which is applied, at least in part, in determining the sentiment data, includes “a history of previous clarifications made [by the user] during the current user-system dialog and/or previous dialog sessions”; Wang, ¶ [0114], [0459]) wherein the alternative representation results in a desired response (The system may continue with “analyz[ing] a user’s response to an audio 2650 or text 2650 output, and identify[ing] whether the output was correct or appropriate” and where “when the user proceeds with the conversation, the machine learning system can determine that the output {the alternative representation} was appropriate {results in a desired response}”; Wang, ¶ [0400], [0404]); determine second NLU data corresponding to the alternative (“the natural language processor 3020 may receive and process a response to...other system output 3042, to clarify the user’s intended meaning” where “the user response analyzer 3026 may extract (e.g., by parsing) an answer relating to the clarification target from the user’s response” to the system output 3042 “and modify the initial natural language dialog input 3012 {second NLU data} by replacing at least a portion of the clarification target with a machine-readable version of at least a portion of the answer,” where the modified version of the initial natural language dialog input 3012 is processed by the natural language processor 3020, resulting in the second NLU data.; Wang, ¶ [0414], [0452]; FIG. 30); and cause performance of a second action based on the second NLU data, (Based on the system determining that “a different approach is needed” using both verbal and non-verbal cues, the system, at step 332, “adapts by changing its questions towards more easy to remember information” the system then providing the response dialogue “OK, let’s try a different approach. Please tell me your home phone number instead” {causing performance of the second action}, where the adaptation may be performed based on the user response to the system output resulting in a modification to “the initial natural language dialog input”, and the second NLU data being generated by the natural language processor based on the modified initial natural language dialog input.; Wang, ¶ [0103]-[0104], [0452]; FIG. 3) wherein the second action is different from the first action (“the system can adjust not only to what the person says, but also to non-verbal cues that the system detects and determines indicate the person’s emotional state,” where in the exemplary embodiment, the specific type of information requested and the speech at step 332 changed from the information and speech at step 320, thus the second action is different from the first action.; Wang, ¶ [0099]-[0100], [0103]-[0104]).

Regarding claim 37, the rejection of claim 31 is incorporated. Wang discloses all of the elements of the current invention as stated above. Wang further discloses wherein the second action is based at least in part on user preference data (“The interpretation 418 component may also be aided by preference models.”; Wang, ¶ [0113]).

Regarding claim 39, the rejection of claim 31 is incorporated. Wang discloses all of the elements of the current invention as stated above. Wang further discloses wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine, using automatic speech recognition (ASR) processing, an ASR confidence score corresponding to the first audio data (“virtual personal assistant platform 410” uses “automatic speech recognition 412 component can identify natural language in audio input” thus identifying text of the audio input to be used by the interpreter to determine the speaker’s emotional state, and “can also provide a confidence value with the text string” where the text string is a hypothesis from the ASR engine representing the first utterance and “indicat[ing] an approximate accuracy of the text string.”; Wang, ¶ [0402], [0109]-[0110]); receive alternative representation data corresponding to the first utterance (the interpretation component and the reasoning component “may be assisted by a dialog history” such as a dynamic ontology {receiving alternative representation data...} “which may assist the interpretation 418 in formulating a conclusion about a person’s input state {...corresponding to the first utterance}.”; Wang, ¶ [0113]-[0114]); and determine the second action based at least in part on the sentiment data, the ASR confidence score, and the alternative representation data (the system can use a determination of “the [ASR] result with the highest confidence value,” the dialog history, and the user sentiment (e.g., frustration) as part of the multi-modal cues to determine the second action.; Wang, ¶ [0404], [0103], [0097]).

Regarding claim 40, the rejection of claim 31 is incorporated. Wang discloses all of the elements of the current invention as stated above. Wang further discloses wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine an NLU confidence score associated with the first NLU data (the natural language understanding system “can parse, and semantically analyze, and interpret the verbal content of natural language dialog inputs that have been processed by the automatic speech recognition system” and may include a “hybrid parser” to “arbitrate between the outputs of the rule-based parser and the statistical parser” based on “confidence value.”; Wang, ¶ [0172]); receive alternative representation data corresponding to the first utterance (the interpretation component and the reasoning component “may be assisted by a dialog history” such as a dynamic ontology {receiving alternative representation data...} “which may assist the interpretation 418 in formulating a conclusion about a person’s input state {...corresponding to the first utterance}.”; Wang, ¶ [0113]-[0114]); and determine the second action based at least in part on the sentiment data, the NLU confidence score, and the alternative representation data (the system can use a determination of “which of the outputs has the better [NLU] confidence value,” the dialog history, and the sentiment data as part of the multi-modal cues to determine the second action.; Wang, ¶ [0172], [0103], [0097]).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and 103 (or as subject to pre-AIA  35 U.S.C. §102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. §103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 21, 25, 27, 30 and 41 is/are rejected under 35 U.S.C. §103 as being unpatentable over Wang in view of Carter.

Regarding claim 21, Wang discloses A computer-implemented method comprising (The systems and methods for speech recognition described with reference to the virtual personal assistant.; Wang, ¶ [0097]): receiving first audio data representing a first utterance (In embodiments describing a request for a prescription, the person provides the input the person tells the system, “I’d like to refill a prescription.”; Wang, ¶ [0098]); determining an automatic speech recognition (ASR) confidence score corresponding to an ASR hypothesis representing the first utterance (“virtual personal assistant platform 410” uses “automatic speech recognition 412 component” which “can identify natural language in audio input” thus identifying text of the audio input to be used by the interpreter to determine the speaker’s emotional state, and “can also provide a confidence value {determining an ASR confidence score} with the text string,” where the text string is a hypothesis from the ASR engine representing the first utterance {representing the first utterance} and “indicat[ing] an approximate accuracy of the text string {corresponding to an ASR hypothesis}.”; Wang, ¶ [0402], [0109]-[0110]); determining first natural language understanding (NLU) data corresponding to the first utterance (Using the intent interpreter, the system determines information about “the speaker’s emotional or cognitive state,” such as “that the person is speaking slowly and hesitantly… [and] may conclude that the speaker maybe is not quite ready for this conversation, and needs more time.” This information is determined alongside the content of the message itself, as understood by “a natural language recognition system”; Wang, ¶ [0098], [0091]); causing performance of a first action based on the first NLU data (Relying on the NLU data (including intent of the user and the speaker’s emotional and cognitive state) {...based on the first NLU data} “the system determines to change its dialog approach by asking direct yes/no questions, and responds, ‘Sure, happy to help you with that. I’ll need to ask you some questions first’. {causing the performance of a first action}”; Wang, ¶ [0098]); receiving second audio data representing a second utterance (As described with reference to an example of a continuing dialog in a turn based fashion, the system asks “OK, on the bottle there should be a label <pause>. On the label there will be a medication number. Can you find it?” and receives with following utterance in response, where the person indicates “I think so, I found something here.. but.. <sigh>“ which is a second audio data representing a second utterance, that is received by the system; Wang, ¶ [0102] -[0103]); determining, using a trained machine learning model, sentiment data corresponding to the second audio data, (“From this reply, the system may detect audible frustration. The system may further detect, from image data, a visible grimace,” where both the audible frustration and the visible grimace are sentiment data which correspond to the person’s response in the second audio data, and where “the emotion classifier 1712 n can make a conclusion with respect to the current emotional or affective state of the person” and can be a “statistical classifier... implemented and trained to perform classification of characteristics and emotional states,” which is a trained machine learning model.; Wang, ¶ [0103], [0234], [0236]) wherein the sentiment data indicates frustration (“From this information,” including the audible frustration and visible grimace {the sentiment data}, “the system may conclude that the person is probably frustrated” {indicates frustration}; Wang, ¶ [0103]); based on the sentiment data and the ASR confidence score, determining second NLU data corresponding to the first utterance (Based on the system determining “that the person is probably frustrated {based on the sentiment data},” as well as based on the context of the dialogue, which itself is based on “[speech recognition] result with the highest confidence value” {based on the ASR confidence score}, the system concludes “that perhaps a different approach is needed” and a conclusion that “a different approach is needed,” as part of a continuing dialogue for filling a prescription between the device and user, includes a determination that (1) a first response occurred {first utterance} and a first approach was taken {first NLU data} (2) the “audible frustration and the visible grimace” correspond to frustration and, based on that, “a different approach is needed”. In determining the different approach, “the natural language processor 3020 may receive and process a response to...[the] other system output 3042, to clarify the user’s intended meaning” where “the user response analyzer 3026 may extract (e.g., by parsing) an answer relating to the clarification target from the user’s response” to the system output 3042 “and modify the initial natural language dialog input 3012 by replacing at least a portion of the clarification target with a machine-readable version of at least a portion of the answer,” where clarification targets can be determined based on “using the assigned attributes such as the confidence levels, prosodic features (i.e., the rhythm, stress, and intonation of speech), and/or syntactic features associated with each word and the surrounding words of the dialog input 3012,” and where the modified version of the initial natural language dialog input 3012 is processed by the natural language processor 3020, resulting in the second NLU data.; Wang, ¶ [0103], [0404], [0414], [0450], [0452]; FIG. 30); and in response to the second utterance, causing performance of a second action based on the second NLU data, (Based on the system determining that “a different approach is needed” using both verbal and non-verbal cues, the system, at step 332, “adapts by changing its questions towards more easy to remember information.” The system then provides the response dialogue “OK, let’s try a different approach. Please tell me your home phone number instead” {causing performance of the second action}, where the adaptation may be performed based on the user response to the system output resulting in a modification to “the initial natural language dialog input”, and the second NLU data being generated by the natural language processor based on the modified initial natural language dialog input.; Wang, ¶ [0103]-[0104], [0452]; FIG. 3) wherein the second action is different from the first action (“the system can adjust not only to what the person says, but also to non-verbal cues that the system detects and determines indicate the person’s emotional state,” where in the exemplary embodiment, the specific type of information requested and the speech at step 332 changed from the information and speech at step 320, thus the second action is different from the first action.; Wang, ¶ [0099]-[0100], [0103]-[0104]). However, Wang fails to expressly recite reprocessing the first utterance to generate second NLU data as a modified interpretation of the first utterance that differs from the first NLU data.
Carter teaches “systems and methods of interpreting natural language inputs based on storage of the inputs.” (Carter, ¶ [0002]). Regarding claim 21, Carter teaches determining second NLU data corresponding to the first utterance by reprocessing the first utterance to generate second NLU data as a modified interpretation of the first utterance that differs from the first NLU data (In response to “(ii) input provided by a user (e.g., who spoke the utterance) after the initial interpretation process is already underway or completed”, which, in the context of Wang is the expression of frustration, and using the stored user input, the system “may reprocess the one or more user inputs received from a user to determine one or more reinterpretations of the user inputs” where the system “may obtain the stored user input data to reprocess the one or more user inputs” and “may reprocess the original user input provided by the user” to generate a “reinterpretation of a user input” using “one or more natural language processing engines (e.g., natural language processing engine(s) 230 of FIG. 2), or other components for processing user inputs to determine user requests related to the user inputs.”; Carter, ¶ [0047]-[0048], [0054]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang to incorporate the teachings of Carter to include reprocessing the first utterance to generate second NLU data as a modified interpretation of the first utterance that differs from the first NLU data. Wang discloses response to user frustration as part of a continuing dialog, but fails to expressly recite the reprocessing of a prior speech input. As such, errors which occur during the original ASR processing in Wang may be propagated throughout the dialog chain. Carter recognizes that “a subsequent interpretation of the user input generated from the intermediate results of the initial processing may include inaccuracies of the initial interpretation that was derived from the intermediate results,” and addresses this problem by storing the user inputs such that they “may be reprocessed to determine one or more reinterpretations of the user inputs,” which helps correct for “inaccurate or inadequate” initial interpretations, and provides the recognized benefit of more accurate speech recognition in assistant systems, as recognized by Carter. (Carter, ¶ [0003], [0005]-[0007]). 

Regarding claim 25, the rejection of claim 21 is incorporated. Wang and Carter disclose all of the elements of the current invention as stated above. Wang further discloses further comprising: in response to determining the sentiment data indicates frustration, determining first data corresponding to an alternative representation of the first utterance (“the machine learning systems can analyze a user’s response to an audio 2650 or text 2650 output, and identify whether the output was correct or appropriate,” which corresponds to the clarifying question, and “when the user reacts negatively to the audio 2650 or text 2680 output, the machine learning system can decrease the confidence level” for the automatic speech recognition output and the “confidence value (or values) can be provided to a first clarification 2764 engine, along with the text string produced by the automatic speech recognition 2712 engine... [to] examine the output from the automatic speech recognition 2712 engine and determine whether the confidence value is high enough to proceed” and where the clarification systems can “request clarification from the user” regarding the first utterance {determining an alternative representation of the utterance}; Wang, ¶ [0103], [0400], [0404]-[0405]); determining the second NLU data corresponding to the first data (The system may continue with “analyz[ing] a user’s response to an audio 2650 or text 2650 output, and identify[ing] whether the output was correct or appropriate” and where “when the user proceeds with the conversation, the machine learning system can determine that the output {the alternative representation} was appropriate {results in a desired response}”; Wang, ¶ [0400], [0404]); and determining the second action based at least in part on the second NLU data (The clarification engines can provide the clarified text {alternative representation} “to the natural language processing 2718 engine... to produce an input intent 2740” from the clarified user input; Wang, ¶ [0414]).

Regarding claim 27, the rejection of claim 21 is incorporated. Wang and Carter disclose all of the elements of the current invention as stated above. Wang further discloses wherein the second action is based at least in part on user preference data (“The interpretation 418 component may also be aided by preference models.”; Wang, ¶ [0113]).

Regarding claim 30, the rejection of claim 21 is incorporated. Wang and Carter disclose all of the elements of the current invention as stated above. Wang further discloses further comprising: determining an NLU confidence score associated with the first NLU data (the natural language understanding system “can parse, and semantically analyze, and interpret the verbal content of natural language dialog inputs that have been processed by the automatic speech recognition system” and may include a “hybrid parser” to “arbitrate between the outputs of the rule-based parser and the statistical parser” based on “confidence value.”; Wang, ¶ [0172]); receiving alternative representation data corresponding to the first utterance (the interpretation component and the reasoning component “may be assisted by a dialog history” such as a dynamic ontology {receiving alternative representation data...} “which may assist the interpretation 418 in formulating a conclusion about a person’s input state {...corresponding to the first utterance}.”; Wang, ¶ [0113]-[0114]); and determining the second action based at least in part on the sentiment data, the NLU confidence score, and the alternative representation data (the system can use a determination of “which of the outputs has the better [NLU] confidence value,” the dialog history, and the sentiment data as part of the multi-modal cues to determine the second action.; Wang, ¶ [0172], [0103], [0097]).

Regarding claim 41, the rejection of claim 21 is incorporated. Wang and Carter disclose all of the elements of the current invention as stated above. However, Wang fails to expressly recite determining second NLU data corresponding to the first utterance by reprocessing the first utterance to generate second NLU data as a modified interpretation of the first utterance that differs from the first NLU data.
The relevance of Sinha is described above with relation to claim 21. Regarding claim 41, Carter teaches determining second NLU data corresponding to the first utterance by reprocessing the first utterance to generate second NLU data as a modified interpretation of the first utterance that differs from the first NLU data (In response to “(ii) input provided by a user (e.g., who spoke the utterance) after the initial interpretation process is already underway or completed”, which, in the context of Wang is the expression of frustration, and using the stored user input, the system “may reprocess the one or more user inputs received from a user to determine one or more reinterpretations of the user inputs” where the system “may obtain the stored user input data to reprocess the one or more user inputs” and “may reprocess the original user input provided by the user” to generate a “reinterpretation of a user input” using “one or more natural language processing engines (e.g., natural language processing engine(s) 230 of FIG. 2), or other components for processing user inputs to determine user requests related to the user inputs.”; Carter, ¶ [0047]-[0048], [0054]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang to incorporate the teachings of Carter to include determining second NLU data corresponding to the first utterance by reprocessing the first utterance to generate second NLU data as a modified interpretation of the first utterance that differs from the first NLU data. Wang discloses response to user frustration as part of a continuing dialog, but fails to expressly recite the reprocessing of a prior speech input. As such, errors which occur during the original ASR processing in Wang may be propagated throughout the dialog chain. Carter recognizes that “a subsequent interpretation of the user input generated from the intermediate results of the initial processing may include inaccuracies of the initial interpretation that was derived from the intermediate results,” and addresses this problem by storing the user inputs such that they “may be reprocessed to determine one or more reinterpretations of the user inputs,” which helps correct for “inaccurate or inadequate” initial interpretations, and provides the recognized benefit of more accurate speech recognition in assistant systems, as recognized by Carter. (Carter, ¶ [0003], [0005]-[0007]).

Claim 23-24, 28 is/are rejected under 35 U.S.C. §103 as being unpatentable over Wang and Carter, as applied to claim 21 above, and further in view of Sinha.

Regarding claim 23, the rejection of claim 21 is incorporated. Wang and Carter disclose all of the elements of the current invention as stated above. However, Wang fails to expressly recite further comprising: determining the second utterance corresponds to the first utterance based at least in part on the second utterance being semantically similar to the first utterance, wherein determining the second action is based further in part on the second utterance being semantically similar to the first utterance.
Sinha teaches “systems and methods for detecting errors in speech interactions with a digital assistant.” (Sinha, ¶ [0002]). Regarding claim 23, Sinha teaches further comprising: determining the second utterance corresponds to the first utterance based at least in part on the second utterance being semantically similar to the first utterance, (“determining whether the user interaction is indicative of a problem comprises determining that the second speech input and the third speech input indicate dissatisfaction with the at least one action” by “determining that the second speech input and the third speech input” includes “substantially the same words as the first speech input.” The second utterance being substantially the same as the first utterance, and used in the same context, necessarily means that the second utterance is semantically similar to the first utterance.; Sinha, ¶ [0127]) wherein causing performance of the second action is based further in part on the second utterance being semantically similar to the first utterance (“upon determining that the user interaction is indicative of a problem” where the problem is determined based on the second speech input being semantically similar to the first speech input “the digital assistant provides a first prompt {second output data} requesting the user to confirm whether there was a problem in the performing of the at least one action.”; Sinha, ¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang, as modified by the reinterpretation systems of Carter, to incorporate the teachings of Sinha to include further comprising: determining the second utterance corresponds to the first utterance based at least in part on the second utterance being semantically similar to the first utterance, wherein determining the second action is based further in part on the second utterance being semantically similar to the first utterance. The systems and methods described in Sinha can “identify particular instances where errors have occurred, so that the source of the errors can be identified and addressed,” which helpful for “improv[ing] the quality of digital assistants,” as recognized by Sinha. (Sinha, ¶ [0005]). 

Regarding claim 24, the rejection of claim 21 is incorporated. Wang and Carter disclose all of the elements of the current invention as stated above. However, Wang fails to expressly recite further comprising: determining the second utterance corresponds to the first utterance based at least in part on the second utterance sounding similar to the first utterance, wherein determining the output data is based further in part on the second utterance being semantically similar to the first utterance.
The relevance of Sinha is described above with relation to claim 23. Regarding claim 24, Sinha teaches further comprising: determining the second utterance corresponds to the first utterance based at least in part on the second utterance sounding similar to the first utterance, (“determining whether the user interaction is indicative of a problem comprises determining that the second speech input and the third speech input indicate dissatisfaction with the at least one action” by “determining that the second speech input and the third speech input” includes “substantially the same words as the first speech input {the second utterance sounding similar to the first utterance}”; Sinha, ¶ [0127]) wherein determining the output data is based further in part on the second utterance sounding similar to the first utterance (“upon determining that the user interaction is indicative of a problem” where the problem is determined based on the second speech input sounding similar to the first speech input “the digital assistant provides a first prompt {second output data} requesting the user to confirm whether there was a problem in the performing of the at least one action.”; Sinha, ¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang, as modified by the reinterpretation systems of Carter, to incorporate the teachings of Sinha to include further comprising: determining the second utterance corresponds to the first utterance based at least in part on the second utterance sounding similar to the first utterance, wherein determining the output data is based further in part on the second utterance being semantically similar to the first utterance. The systems and methods described in Sinha can “identify particular instances where errors have occurred, so that the source of the errors can be identified and addressed,” which helpful for “improv[ing] the quality of digital assistants,” as recognized by Sinha. (Sinha, ¶ [0005]). 

Regarding claim 28, the rejection of claim 21 is incorporated. Wang and Carter disclose all of the elements of the current invention as stated above. Wang further discloses further comprising: associating the first audio data with a dialog session identifier (Though not expressly indicated as having an identifier, the example in FIG. 3 is a dialogue session and all dialogue represented in FIG. 3 is understood by the system to be part of the same dialogue session (as indicated by the description of each of the interactions as “dialog sessions” in changing the “dialog approach.” Further, the system includes “the virtual personal assistant 2500 may optionally include a speaker identification (ID) subsystem 2502.” Thus the first audio, represented at element 310 is associated with a dialogue session identifier.; Wang, ¶ [0098], [0366]); receiving third audio data representing a third utterance (“At step 326, the person says, ‘Yes, it’s here somewhere, let me.. here it is.’”; Wang, ¶ [0102]); associating the third audio data with the dialog session identifier (The system, as part of a continuing dialog about ordering a prescription, determines that the person is uncertain about the next steps in the ordering process, where the continuation in the process is responsive to changes in the dialog. Therefore, the system associates the third audio data (which the system understands as indicating uncertainty in the process) with the dialog session, and thus the dialog session identifier. Further evidence can be found in FIG. 3, which displays a continuing dialog between the system and the user.; Wang, ¶ [0102]); receiving first data representing dialog history data corresponding to the dialog session identifier (the interpretation component “may be assisted by a dialog history, which may assist... in formulating a conclusion about a person’s input state,” where dialog history can include a dynamic ontology which “grows or shrinks based on input received through the course of a conversation” and “can be used by a virtual personal assistant to track relationships between things said during a conversation” as associated with the “speaker identification (ID) subsystem 2502”; Wang, ¶ [0114], [0068], [0366]). However, Wang fails to expressly recite determining, using the first data, that the third utterance is a repeat of the first utterance; and determining, based in part on the third utterance being a repeat of the first utterance, second output data representing the second action.
The relevance of Sinha is described above with relation to claim 23. Regarding claim 28, Sinha teaches determining, using the first data, that the third utterance is a repeat of the first utterance (“determining whether the user interaction is indicative of a problem comprises determining that the second speech input and the third speech input indicate dissatisfaction with the at least one action” by “determining that the second speech input and the third speech input” includes “substantially the same words as the first speech input,” where comparison between the third input {third utterance} and the first input {first utterance} implicitly discloses the system using the contents of the first input or information derived therefrom {data representing dialog history data} for the comparison.; Sinha, ¶ [0127]); and determining, based in part on the third utterance being a repeat of the first utterance, second output data representing the second action (“upon determining that the user interaction is indicative of a problem… the digital assistant provides a first prompt requesting the user to confirm whether there was a problem in the performing of the at least one action {second output data representing the second action}.”; Sinha, ¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang, as modified by the reinterpretation systems of Carter, to incorporate the teachings of Sinha to include determining, using the first data, that the third utterance is a repeat of the first utterance; and determining, based in part on the third utterance being a repeat of the first utterance, second output data representing the second action. The systems and methods described in Sinha can “identify particular instances where errors have occurred, so that the source of the errors can be identified and addressed,” which helpful for “improv[ing] the quality of digital assistants,” as recognized by Sinha. (Sinha, ¶ [0005]). 

Claim 33-34, and 38 is/are rejected under 35 U.S.C. §103 as being unpatentable over Wang as applied to claim 31 above, and further in view of Sinha.

Regarding claim 33, the rejection of claim 31 is incorporated. Wang discloses all of the elements of the current invention as stated above. However, Wang fails to expressly recite wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine the second utterance corresponds to the first utterance based at least in part on the second utterance being semantically similar to the first utterance, wherein causing performance of the second action is based further in part on the second utterance being semantically similar to the first utterance.
The relevance of Sinha is described above with relation to claim 23. Regarding claim 33, Sinha teaches wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine the second utterance corresponds to the first utterance based at least in part on the second utterance being semantically similar to the first utterance, (“determining whether the user interaction is indicative of a problem comprises determining that the second speech input and the third speech input indicate dissatisfaction with the at least one action” by “determining that the second speech input and the third speech input” includes “substantially the same words as the first speech input.” The second utterance being substantially the same as the first utterance, and used in the same context, necessarily means that the second utterance is semantically similar to the first utterance.; Sinha, ¶ [0127]) wherein causing performance of the second action is based further in part on the second utterance being semantically similar to the first utterance (“upon determining that the user interaction is indicative of a problem” where the problem is determined based on the second speech input being semantically similar to the first speech input “the digital assistant provides a first prompt {second output data} requesting the user to confirm whether there was a problem in the performing of the at least one action.”; Sinha, ¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang to incorporate the teachings of Sinha to include wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine the second utterance corresponds to the first utterance based at least in part on the second utterance being semantically similar to the first utterance, wherein causing performance of the second action is based further in part on the second utterance being semantically similar to the first utterance. The systems and methods described in Sinha can “identify particular instances where errors have occurred, so that the source of the errors can be identified and addressed,” which helpful for “improv[ing] the quality of digital assistants,” as recognized by Sinha. (Sinha, ¶ [0005]). 

Regarding claim 34, the rejection of claim 31 is incorporated. Wang discloses all of the elements of the current invention as stated above. However, Wang fails to expressly recite wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine the second utterance corresponds to the first utterance based at least in part on the second utterance sounding similar to the first utterance, wherein causing performance of the second action is based further in part on the second utterance sounding similar to the first utterance.
The relevance of Sinha is described above with relation to claim 23. Regarding claim 34, Sinha teaches wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine the second utterance corresponds to the first utterance based at least in part on the second utterance sounding similar to the first utterance, (“determining whether the user interaction is indicative of a problem comprises determining that the second speech input and the third speech input indicate dissatisfaction with the at least one action” by “determining that the second speech input and the third speech input” includes “substantially the same words as the first speech input {the second utterance sounding similar to the first utterance}”; Sinha, ¶ [0127]) wherein causing performance of the second action is based further in part on the second utterance sounding similar to the first utterance (“upon determining that the user interaction is indicative of a problem” where the problem is determined based on the second speech input sounding similar to the first speech input “the digital assistant provides a first prompt {second output data} requesting the user to confirm whether there was a problem in the performing of the at least one action.”; Sinha, ¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang to incorporate the teachings of Sinha to include wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine the second utterance corresponds to the first utterance based at least in part on the second utterance sounding similar to the first utterance, wherein causing performance of the second action is based further in part on the second utterance sounding similar to the first utterance. The systems and methods described in Sinha can “identify particular instances where errors have occurred, so that the source of the errors can be identified and addressed,” which helpful for “improv[ing] the quality of digital assistants,” as recognized by Sinha. (Sinha, ¶ [0005]).

Regarding claim 38, the rejection of claim 31 is incorporated. Wang discloses all of the elements of the current invention as stated above. Wang further discloses wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: associate the first audio data with a dialog session identifier (Though not expressly indicated as having an identifier, the example in FIG. 3 is a dialogue session and all dialogue represented in FIG. 3 is understood by the system to be part of the same dialogue session (as indicated by the description of each of the interactions as “dialog sessions” in changing the “dialog approach.” Further, the system includes “the virtual personal assistant 2500 may optionally include a speaker identification (ID) subsystem 2502.” Thus the first audio, represented at element 310 is associated with a dialogue session identifier.; Wang, ¶ [0098], [0366]); receive third audio data representing a third utterance (“At step 326, the person says, ‘Yes, it’s here somewhere, let me.. here it is.’”; Wang, ¶ [0102]); associate the third audio data with the dialog session identifier (The system, as part of a continuing dialog about ordering a prescription, determines that the person is uncertain about the next steps in the ordering process, where the continuation in the process is responsive to changes in the dialog. Therefore, the system associates the third audio data (which the system understands as indicating uncertainty in the process) with the dialog session, and thus the dialog session identifier. Further evidence can be found in FIG. 3, which displays a continuing dialog between the system and the user.; Wang, ¶ [0102]); receive first data representing dialog history data corresponding to the dialog session identifier (the interpretation component “may be assisted by a dialog history, which may assist... in formulating a conclusion about a person’s input state,” where dialog history can include a dynamic ontology which “grows or shrinks based on input received through the course of a conversation” and “can be used by a virtual personal assistant to track relationships between things said during a conversation” as associated with the “speaker identification (ID) subsystem 2502”; Wang, ¶ [0114], [0068], [0366]). However, Wang fails to expressly recite determine, using the first data, that the third utterance is a repeat of first utterance; and determine, based in part on the third utterance being a repeat of the first utterance, output data representing the second action.
The relevance of Sinha is described above with relation to claim 23. Regarding claim 38, Sinha teaches determine, using the first data, that the third utterance is a repeat of first utterance (“determining whether the user interaction is indicative of a problem comprises determining that the second speech input and the third speech input indicate dissatisfaction with the at least one action” by “determining that the second speech input and the third speech input” includes “substantially the same words as the first speech input,” where comparison between the third input {third utterance} and the first input {first utterance} implicitly discloses the system using the contents of the first input or information derived therefrom {data representing dialog history data} for the comparison.; Sinha, ¶ [0127]); and determine, based in part on the third utterance being a repeat of the first utterance, output data representing the second action (“upon determining that the user interaction is indicative of a problem… the digital assistant provides a first prompt requesting the user to confirm whether there was a problem in the performing of the at least one action {second output data representing the second action}.”; Sinha, ¶ [0135]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the virtual personal assistant and speech translation systems of Wang to incorporate the teachings of Sinha to include determine, using the first data, that the third utterance is a repeat of first utterance; and determine, based in part on the third utterance being a repeat of the first utterance, output data representing the second action. The systems and methods described in Sinha can “identify particular instances where errors have occurred, so that the source of the errors can be identified and addressed,” which helpful for “improv[ing] the quality of digital assistants,” as recognized by Sinha. (Sinha, ¶ [0005]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ying Wang et al. (U.S. Pat. App. Pub. No. 2018/0357286) discloses systems and methods for determining various emotional states using one or more signals provided by and/or obtained from a user, and then using the determined emotional states to provide answers to user queries that are contextually and emotionally relevant.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Oct 27, 2022
Application Filed
Jul 31, 2023
Non-Final Rejection — §102, §103, §112
Aug 16, 2023
Interview Requested
Aug 22, 2023
Applicant Interview (Telephonic)
Aug 22, 2023
Examiner Interview Summary
Nov 01, 2023
Response Filed
Feb 14, 2024
Final Rejection — §102, §103, §112
Apr 17, 2024
Response after Non-Final Action
Apr 23, 2024
Response after Non-Final Action
Apr 26, 2024
Request for Continued Examination
May 06, 2024
Response after Non-Final Action
Jun 14, 2024
Non-Final Rejection — §102, §103, §112
Oct 08, 2024
Response Filed
Jan 25, 2025
Final Rejection — §102, §103, §112
Apr 01, 2025
Response after Non-Final Action
Apr 15, 2025
Request for Continued Examination
Apr 18, 2025
Response after Non-Final Action
Jul 02, 2025
Non-Final Rejection — §102, §103, §112
Nov 05, 2025
Response Filed
Feb 10, 2026
Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/154,549
Patent 12603095
Stereo Audio Signal Delay Estimation Method and Apparatus
2y 5m to grant Granted Apr 14, 2026
17/648,548
Patent 12598250
SYSTEMS AND METHODS FOR COHERENT AND TIERED VOICE ENROLLMENT
2y 5m to grant Granted Apr 07, 2026
18/004,197
Patent 12597429
PACKET LOSS CONCEALMENT
2y 5m to grant Granted Apr 07, 2026
16/529,456
Patent 12512093
Sensor-Processing Systems Including Neuromorphic Processing Modules and Methods Thereof
2y 5m to grant Granted Dec 30, 2025
17/640,303
Patent 12505835
HOME APPLIANCE AND SERVER
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+33.6%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 134 resolved cases by this examiner. Grant probability derived from career allow rate.
SENTIMENT AWARE VOICE USER INTERFACE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email