Last updated: April 19, 2026
Application No. 18/297,066
System and Method for Providing Real-time Speech Recommendations During Verbal Communication

Non-Final OA §103
Filed
Apr 07, 2023
Examiner
YOUNG, CAMERON KENNETH
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Microsoft Technology Licensing, LLC
OA Round
3 (Non-Final)
Interview Optional

— +12.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 20 resolved cases, 2023–2026
Examiner Intelligence

YOUNG, CAMERON KENNETH View full profile →
Grants 70% — above average
Career Allow Rate
14 granted / 20 resolved
+8.0% vs TC avg
Moderate +12% lift
Without
With
+12.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
23 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
20.1%
-19.9% vs TC avg
§103
58.9%
+18.9% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
7.7%
-32.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 20 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/20/2026 has been entered.
 
Response to Amendment
	Applicant’s amendment, filed 01/20/2026, has been entered. No claims have been added or cancelled. Claims 1 – 20 remain pending within the application. 

Response to Arguments
Applicant's arguments filed 01/20/2026 have been fully considered but they are not persuasive. Applicant argues, on pages 8 and 9 of Applicant’s Response, that the newly amended limitation differentiates from the references because, as applicant alleges, the Vocal Representations present within Rechlis are “generic ‘correct’ pronunciations and not the user’s own prior utterance.” Examiner respectfully disagrees.
Examiner notes that Rechlis (particularly ¶¶ [0046] – [0049]) teaches that a user records utterances during a training session (i.e., communication session) that are stored within the database as both a word model and vocal representation (i.e., fields 13a word model and 13c vocal representation of the database.) As such, Examiner disagrees with the Applicant’s assertion that Rechlis’ vocal representations are generic only and not the user’s own prior utterance. As laid out in further detail below, it would have been an obvious variation of Rechlis’ invention to merely output the user’s recorded speech as the vocal representation instead of reconstructing the vocal representation to more closely match the voice of the user. In short, the recorded voice of the user will always be closest to the user’s voice. Therefore, no better solution to achieving a user’s voice can be achieved than outputting the user’s voice. Thus, an obvious variation of Rechlis’ invention achieving Rechlis’ goal of producing vocal outputs “closer in sound to the user’s voice.” Rechlis at ¶ [0061].
As such, in light of the arguments above, and the details of the rejection laid out below, the 35 U.S.C. § 103 rejections of claims 1 – 20 are maintained. 


Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1 – 7, and 15 - 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2023/0298581 A1 to Jaemin Moon et al. (hereinafter Moon) in view of U.S. Patent Application Publication No. 2023/0252984 A1 to Sriram Natarajan et al. (hereinafter Natarajan) and in further view of U.S. Patent Application Publication No. 2009/0220926 A1 to Gadi Rechlis (hereinafter Rechlis).
Regarding claim 1, Moon teaches a computer-implemented method, executed on a computing device, comprising: (Moon teaches a system implemented on a computer that executes computer program code. (i.e., a computer implemented method executed on a computing device.) Moon at ¶¶ [0213] – [0218].)
processing, using a speech processing system, an input speech signal associated with a user during a verbal communication involving the user; (Moon teaches processing user's speech from a user (i.e., an input speech signal associated with a user) during an interaction with an automated speech recognition system (i.e., verbal communication involving the user.) Moon at ¶¶ [0142] - [0155] and [0173] - [0182].)
determining context information for the input speech signal from an external reference associated with the verbal communication…; (Moon teaches determining context information from the input speech and storing context information from previous dialogues related to specific contexts. Moon at ¶¶ [0157] - [0172]. Further, the dialogues pertaining to the current dialogue context are searched to generate recommended speech information. Moon at ¶¶ [0162] - [0172].)
monitoring the input speech signal for a predefined intervention pattern…; (Moon teaches processing the speech looking for a speech pattern to provide recommended speech options in response to. (i.e., monitoring the input speech signal for a predefined intervention pattern.) Moon at ¶¶ [0157] - [0172].)
and in response to identifying the predefined intervention pattern in the input speech signal, generating one or more speech recommendations for the user based upon, at least in part, the context using the artificial intelligence-based prediction system and the speech processing system… (Moon teaches generating, and provide, speech recommendations for the user based on the input speech signal using example speeches from previously communicated utterances (i.e., context from a prediction system.) Moon at ¶¶ [0157] - [0172].)
Moon alone, however, does not teach determining context information … using an artificial intelligence-based prediction system. 
In a similar field of endeavor (e.g., providing speech suggestions during a dialog), Natarajan teaches determining context information … using an artificial intelligence-based prediction system. (Natarajan teaches processing contextual data using machine-learning models to predict aspects of the speech (i.e., intent, or context). Natarajan at ¶¶ [0042] – [0048].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon with the teachings of Natarajan to provide determining context information using an artificial intelligence-based prediction system. Doing so would have allowed the system to provide more relevant and helpful suggestions as recognized by Natarajan at ¶ [0044].
	Moon in view of Natarajan (hereinafter Moon-Natarajan), however, does not teach wherein the predefined intervention pattern includes a mispronunciation of a particular word by the user; based on the context information, determining a correct version of the particular word despite the user mispronouncing the word; based on determining the correct version of the particular word, accessing, within a user speech profile for the user, a previous recording that includes a correct pronunciation of the particular word by the user; and wherein the one or more speech recommendations include a playback of the user's correct pronunciation of the particular word, where the user's correct pronunciation is obtained from the previous recording included in the user speech profile.
	In a similar field of endeavor (e.g., listening to user speech and providing recommendations or suggestions for improving speech), Rechlis teaches wherein the predefined intervention pattern includes a mispronunciation of a particular word by the user; (Rechlis teaches a system for correcting mispronunciations of a user wherein the utterances spoken by a user are used to aid in correcting the user's pronunciation. (i.e., the intervention pattern is a mispronunciation of the user. Rechlis at ¶¶ [0014] - [0020].))
based on the context information, determining a correct version of the particular word despite the user mispronouncing the word; (Rechlis teaches using word models for each word to constitute the correct pronunciation for each word and output the word to the user (i.e., a correct version of the word is determined.). Rechlis at ¶¶ [0014] - [0020].)
based on determining the correct version of the particular word, accessing, within a user speech profile for the user, a previous recording that includes a correct pronunciation of the particular word by the user; and (Rechlis teaches accessing a word model and a vocal representation of a word in which the words were uttered by the user in order to provide the correct pronunciation to the user in a process of aiding speech correction. (i.e., accessing specific word models and vocal representations from a specific user are essentially a user profile comprising word models and vocal representations.) Rechlis at ¶¶ [0046] - [0049].)
…wherein the one or more speech recommendations include a playback of the user's correct pronunciation of the particular word, where the user's correct pronunciation is obtained from the previous recording included in the user speech profile, the correct pronunciation being a prior utterance made by the user, recorded during a previous communication session and stored in the user speech profile. (Rechlis teaches constructing a restoration of a user's utterance which is audibly outputted to the user. (i.e., a playback of the user's correct pronunciation of the particular word is obtained from a collection of user uttered words amounting to a user profile comprising word models and vocal representations). Rechlis at ¶¶ [0014] - [0020], [0031] - [0032], and [0046] - [0049]. Further, Rechlis teaches that the user records audible pronunciations of words present in the database. These record pronunciations can also be used to restore digitized versions of the user’s own pronunciations (i.e., digitized spoken words) and can be stored within the database in fields 13a and 13c (i.e., the Word Model and the Vocal Representation) when such a VR for the word is not present within the database. Rechlis at ¶¶ [0046] – [0049]. Rechlis’ system also adapts to provide vocal outputs closer in sound to the user’s voice. Rechlis at ¶ [0061]. As such, it would have been a predictable variation to skip the step of reconstructing the user’s voice and simply playback the digitized spoken word of the user that was recorded and added to the database as a vocal representation when the database lacked an entry for that word. Doing this would achieve the goal of providing vocal outputs closer in sound to the user’s voice as taught by Rechlis at ¶ [0061]. Furthermore, the training session taught by Rechlis (¶¶ [0042] – [0054]) is, in essence, a previous communication session including the user. As such, the digitized spoken word is previously recorded in a communication session, and stored in the user speech profile.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon-Natarajan with the teachings of Rechlis to provide determining a correct version of a user mispronounced word and displaying a playback recording of the user from a user profile. Doing so would have improved recognition of mispronounced words as recognized by Rechlis at ¶¶ [0049] – [0055]. As such, the improvement of recognition of mispronounced words would also result in more incorrectly pronounced words being corrected, improving user fluency.


Regarding claim 2, Moon-Natarajan-Rechlis teaches all the limitations of claim 1 as laid out above. Further, Moon teaches the computer-implemented method of claim 1, wherein processing the input speech signal includes processing the input speech signal in real-time. (Moon teaches processing the speech of the user in real-time. Moon at ¶ [0017].)

Regarding claim 3, Moon-Natarajan-Rechlis teaches all the limitations of claim 1 as laid out above. Further, Moon teaches the computer-implemented method of claim 1, wherein the predefined intervention pattern includes one or more of: a predefined period of silence, at least a threshold amount of inarticulate utterances within a predefined period of time, at least a threshold stress level, and at least a threshold number of mispronounced words. (Moon teaches the speech pattern to be detected to be a filler speech (i.e., a pause of meaningful speech with an inserted utterance or phrase without meaning between meaningful speech, in other words, a threshold amount of inarticulate utterances within a predefined period of time.) Moon at ¶¶ [0175] - [0182].)

Regarding claim 4, Moon-Natarajan-Rechlis teaches all the limitations of claim 1 as laid out above. Further, Moon teaches the computer-implemented method of claim 1, wherein generating the one or more speech recommendations for the user includes generating one or more speech recommendations for the user based upon, at least in part, the user speech profile. (Moon teaches storing a dialogue history of a user (i.e., a user profile) where the dialogue history is used to generate recommended speech based on the dialogue history. Moon at ¶¶ [0076] - [0083].)

Regarding claim 5, Moon-Natarajan-Rechlis teaches all the limitations of claim 1 as laid out above. Further, Moon teaches the computer-implemented method of claim 1, wherein generating the one or more speech recommendations for the user includes generating one or more synthetic speech signals including the one or more speech recommendations. (Moon teaches the output speech recommendations may be visibly or audibly output to the user with a display or speaker (i.e., outputting the speech recommendations includes generating synthetic speech signals to be audibly output to the user.) Moon at ¶¶ [0171] - [0172].)

Regarding claim 6, Moon-Natarajan-Rechlis teaches all the limitations of claim 1 as laid out above. Further, Moon teaches the computer-implemented method of claim 1, wherein generating the one or more speech recommendations for the user includes one or more of: presenting a visual representation of the one or more speech recommendations in a user interface; and presenting an audible representation of the one or more speech recommendations. (Moon teaches the output speech recommendations may be visibly or audibly output to the user with a display or speaker (i.e., outputting the speech recommendations includes generating synthetic speech signals to be audibly output to the user, or displayed to the user on a user interface.) Moon at ¶¶ [0171] - [0172].)

Regarding claim 7, Moon-Natarajan-Rechlis teaches all the limitations of claim 1 as laid out above. Further, Moon teaches the computer-implemented method of claim 1, wherein generating the one or more speech recommendations for the user includes providing a ranked list of speech recommendations for the user to select from. (Moon teaches generating a list of recommendations for the user ranked from 1 - 4 that the user may choose from. Moon at ¶¶ [0157] - [0172] and [0183] - [0196] and Fig. 12.)

Regarding claim 15, Moon teaches One or more hardware storage devices that store instructions that are executable by one or more processors to cause the one or more processors to: (Moon teaches a system implemented on a computer that executes computer program code. (i.e., a computer implemented method executed on a computing device.) Moon at ¶¶ [0213] – [0218].)
process, using an automated speech recognition system, an input speech signal associated with a user; (Moon teaches processing user's speech from a user (i.e., an input speech signal associated with a user) during an interaction with an automated speech recognition system (i.e., verbal communication involving the user.) Moon at ¶¶ [0142] - [0155] and [0173] - [0182].)
determine presentation context information from the input speech signal …; (Moon teaches determining context information from the input speech and storing context information from previous dialogues related to specific contexts. Moon at ¶¶ [0157] - [0172]. Further, the dialogues pertaining to the current dialogue context are searched to generate recommended speech information. Moon at ¶¶ [0162] - [0172].)
monitor the input speech signal for a predefined intervention pattern…; (Moon teaches processing the speech looking for a speech pattern to provide recommended speech options in response to. (i.e., monitoring the input speech signal for a predefined intervention pattern.) Moon at ¶¶ [0157] - [0172].
in response to identifying the predefined intervention pattern in the input speech signal, generate one or more speech recommendations for the user based upon, at least in part, the presentation context information using the artificial intelligence-based prediction system, the speech processing system, and a user speech profile, wherein the one or more speech recommendations include … (Moon teaches generating, and provide, speech recommendations for the user based on the input speech signal using example speeches from previously communicated utterances (i.e., context from a prediction system.) Moon at ¶¶ [0157] - [0172]. Further, Moon teaches storing a dialogue history of a user (i.e., a user profile) where the dialogue history is used to generate recommended speech based on the dialogue history. Moon at ¶¶ [0076] - [0083]. As such, because the dialogue history is user uttered speech, the generated recommendations are pronounceable by the user based upon a user speech profile (i.e., dialogue history.))
Moon, however, does not teach determining presentation context information … using an artificial intelligence-based prediction system.
In a similar field of endeavor (e.g., providing speech suggestions during a dialog), Natarajan teaches determining presentation context information … using an artificial intelligence-based prediction system. (Natarajan teaches processing contextual data using machine-learning models to predict aspects of the speech (i.e., intent, or context). Natarajan at ¶¶ [0042] – [0048].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon with the teachings of Natarajan to provide determining context information using an artificial intelligence-based prediction system. Doing so would have allowed the system to provide more relevant and helpful suggestions as recognized by Natarajan at ¶ [0044].
Moon-Natarajan, however, does not teach wherein the predefined intervention pattern includes a mispronunciation of a particular word by the user; based on the context information, determining a correct version of the particular word despite the user mispronouncing the word; based on determining the correct version of the particular word, accessing, within a user speech profile for the user, a previous recording that includes a correct pronunciation of the particular word by the user; and wherein the one or more speech recommendations include a playback of the user's correct pronunciation of the particular word, where the user's correct pronunciation is obtained from the previous recording included in the user speech profile.
	In a similar field of endeavor (e.g., listening to user speech and providing recommendations or suggestions for improving speech), Rechlis teaches wherein the predefined intervention pattern includes a mispronunciation of a particular word by the user; (Rechlis teaches a system for correcting mispronunciations of a user wherein the utterances spoken by a user are used to aid in correcting the user's pronunciation. (i.e., the intervention pattern is a mispronunciation of the user. Rechlis at ¶¶ [0014] - [0020].))
based on the context information, determining a correct version of the particular word despite the user mispronouncing the word; (Rechlis teaches using word models for each word to constitute the correct pronunciation for each word and output the word to the user (i.e., a correct version of the word is determined.). Rechlis at ¶¶ [0014] - [0020].)
based on determining the correct version of the particular word, accessing, within a user speech profile for the user, a previous recording that includes a correct pronunciation of the particular word by the user; and (Rechlis teaches accessing a word model and a vocal representation of a word in which the words were uttered by the user in order to provide the correct pronunciation to the user in a process of aiding speech correction. (i.e., accessing specific word models and vocal representations from a specific user are essentially a user profile comprising word models and vocal representations.) Rechlis at ¶¶ [0046] - [0049].)
…wherein the one or more speech recommendations include a playback of the user's correct pronunciation of the particular word, where the user's correct pronunciation is obtained from the previous recording included in the user speech profile, the correct pronunciation being a prior utterance made by the user, recorded during a previous communication session and stored in the user speech profile. (Rechlis teaches constructing a restoration of a user's utterance which is audibly outputted to the user. (i.e., a playback of the user's correct pronunciation of the particular word is obtained from a collection of user uttered words amounting to a user profile comprising word models and vocal representations). Rechlis at ¶¶ [0014] - [0020], [0031] - [0032], and [0046] - [0049]. Further, Rechlis teaches that the user records audible pronunciations of words present in the database. These record pronunciations can also be used to restore digitized versions of the user’s own pronunciations (i.e., digitized spoken words) and can be stored within the database in fields 13a and 13c (i.e., the Word Model and the Vocal Representation) when such a VR for the word is not present within the database. Rechlis at ¶¶ [0046] – [0049]. Rechlis’ system also adapts to provide vocal outputs closer in sound to the user’s voice. Rechlis at ¶ [0061]. As such, it would have been a predictable variation to skip the step of reconstructing the user’s voice and simply playback the digitized spoken word of the user that was recorded and added to the database as a vocal representation when the database lacked an entry for that word. Doing this would achieve the goal of providing vocal outputs closer in sound to the user’s voice as taught by Rechlis at ¶ [0061]. Furthermore, the training session taught by Rechlis (¶¶ [0042] – [0054]) is, in essence, a previous communication session including the user. As such, the digitized spoken word is previously recorded in a communication session, and stored in the user speech profile)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon-Natarajan with the teachings of Rechlis to provide determining a correct version of a user mispronounced word and displaying a playback recording of the user from a user profile. Doing so would have improved recognition of mispronounced words as recognized by Rechlis at ¶¶ [0049] – [0055]. As such, the improvement of recognition of mispronounced words would also result in more incorrectly pronounced words being corrected, improving user fluency.


Regarding claim 16, Moon-Natarajan-Rechlis teaches all the limitations of claim 15 as laid out above. Further, Moon teaches the one or more hardware storage devices of claim 15, wherein processing the input speech signal includes processing the input speech signal in real-time. (Moon teaches processing the speech of the user in real-time. Moon at ¶ [0017].)

Regarding claim 17, Moon-Natarajan-Rechlis teaches all the limitations of claim 15 as laid out above. Further, Moon teaches the one or more hardware storage devices of claim 15, wherein the predefined intervention pattern includes one or more of: a predefined period of silence, at least a threshold amount of inarticulate utterances within a predefined period of time, at least a threshold stress level, and at least a threshold number of mispronounced words. (Moon teaches the speech pattern to be detected to be a filler speech (i.e., a pause of meaningful speech with an inserted utterance or phrase without meaning between meaningful speech, in other words, a threshold amount of inarticulate utterances within a predefined period of time.) Moon at ¶¶ [0175] - [0182].)

Regarding claim 18, Moon-Natarajan-Rechlis teaches all the limitations of claim 15 as laid out above. Further, Moon teaches the one or more hardware storage devices of claim 15, wherein generating the one or more speech recommendations for the user includes generating one or more synthetic speech signals including the one or more speech recommendations. (Moon teaches the output speech recommendations may be visibly or audibly output to the user with a display or speaker (i.e., the speech recommendations include generating synthetic speech signals to be output to the user.) Moon at ¶¶ [0171] - [0172].)

Regarding claim 19, Moon-Natarajan-Rechlis teaches all the limitations of claim 15 as laid out above. Further, Moon teaches the one or more hardware storage devices of claim 15, wherein generating the one or more speech recommendations for the user includes one or more of: presenting a visual representation of the one or more speech recommendations in a user interface; and presenting an audible representation of the one or more speech recommendations. (Moon teaches the output speech recommendations may be visibly or audibly output to the user with a display or speaker (i.e., outputting the speech recommendations includes generating synthetic speech signals to be audibly output to the user or displayed to the user via a user interface.) Moon at ¶¶ [0171] - [0172].)

Regarding claim 20, Moon-Natarajan-Rechlis teaches all the limitations of claim 15 as laid out above. Further, Moon teaches the one or more hardware storage devices of claim 15, wherein generating the one or more speech recommendations for the user includes providing a ranked list of speech recommendations for the user to select from. (Moon teaches generating a list of recommendations for the user ranked from 1 - 4 that the user may choose from. Moon at ¶¶ [0157] - [0172] and [0183] - [0196] and Fig. 12.)


Claim 8 - 9 and 11 is rejected under 35 U.S.C. 103 as being unpatentable over Moon-Natarajan in further view of U.S. Patent Application Publication No. 2020/0265829 A1 to Su Liu et al. (hereinafter Liu) and in further view of Rechlis.
Regarding claim 8, Moon teaches A computing system comprising: one or more processors; and one or more hardware storage devices that store instructions that are executable by the one or more processors to cause the computing system to: (Moon teaches a system implemented on a computer that executes computer program code. (i.e., a computer implemented method executed on a computing device.) Moon at ¶¶ [0213] – [0218].)
process using a speech processing system, (Moon teaches processing user's speech from a user (i.e., an input speech signal associated with a user) during an interaction with an automated speech recognition system (i.e., verbal communication involving the user.) Moon at ¶¶ [0142] - [0155] and [0173] - [0182].) an input speech signal associated with a user, to monitor the input speech signal for a predefined period of inarticulate speech, … (Moon teaches processing the speech looking for a speech pattern to provide recommended speech options in response to. (i.e., monitoring the input speech signal for a predefined intervention pattern.) Moon at ¶¶ [0157] - [0172].) and, in response to identifying the predefined period of inarticulate speech in the input speech signal, generating one or more synthetic speech recommendations for the user using … the speech processing system…
Moon, however, does not teach using an artificial intelligence-based prediction system.
In a similar field of endeavor (e.g., providing speech suggestions during a dialog), Natarajan teaches using an artificial intelligence-based prediction system. (Natarajan teaches processing contextual data using machine-learning models to predict aspects of the speech (i.e., intent, or context). Natarajan at ¶¶ [0042] – [0048].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon with the teachings of Natarajan to provide using an artificial intelligence-based prediction system. Doing so would have allowed the system to provide more relevant and helpful suggestions as recognized by Natarajan at ¶ [0044].
Moon-Natarajan, however, do not teach synthesizing speech using a text-to-speech system.
In a similar field of endeavor (e.g., synthesizing speech using speech processing and machine learning), Liu teaches generating one or more speech recommendations … using one or more of: a text-to-speech system. (Liu teaches generating synthesized voice signals based on speech text from a speech text generator. (i.e., text to speech). Liu at ¶¶ [0081] - [0085].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon-Natarajan with the teachings of Liu to provide synthesizing speech using a text to speech system. Doing so would have allowed the speech generation system to generate speech similar to the original speaker without the speaker being present as recognized by Liu at ¶¶ [0014] – [0016].
Moon-Natarajan in view of Liu (hereinafter Moon-Natarajan-Liu), however, does not teach wherein the predefined period of inarticulate speech includes a mispronunciation of a particular word by the user; based on the context information, determine a correct version of the particular word despite the user mispronouncing the word; based on determining the correct version of the particular word, access, within a user speech profile for the user, a previous recording that includes a correct pronunciation of the particular word by the user; and wherein the one or more speech recommendations include a playback of the user's correct pronunciation of the particular word, where the user's correct pronunciation is obtained from the previous recording included in the user speech profile.
	In a similar field of endeavor (e.g., listening to user speech and providing recommendations or suggestions for improving speech), Rechlis teaches wherein the predefined period of inarticulate speech includes a mispronunciation of a particular word by the user; (Rechlis teaches a system for correcting mispronunciations of a user wherein the utterances spoken by a user are used to aid in correcting the user's pronunciation. (i.e., the intervention pattern is a mispronunciation of the user. Rechlis at ¶¶ [0014] - [0020].))
based on the context information, determine a correct version of the particular word despite the user mispronouncing the word; (Rechlis teaches using word models for each word to constitute the correct pronunciation for each word and output the word to the user (i.e., a correct version of the word is determined.). Rechlis at ¶¶ [0014] - [0020].)
based on determining the correct version of the particular word, access, within a user speech profile for the user, a previous recording that includes a correct pronunciation of the particular word by the user; and (Rechlis teaches accessing a word model and a vocal representation of a word in which the words were uttered by the user in order to provide the correct pronunciation to the user in a process of aiding speech correction. (i.e., accessing specific word models and vocal representations from a specific user are essentially a user profile comprising word models and vocal representations.) Rechlis at ¶¶ [0046] - [0049].)
… wherein the one or more speech recommendations include a playback of the user's correct pronunciation of the particular word, where the user's correct pronunciation is obtained from the previous recording included in the user speech profile, the correct pronunciation being a prior utterance made by the user, recorded during a previous communication session and stored in the user speech profile. (Rechlis teaches constructing a restoration of a user's utterance which is audibly outputted to the user. (i.e., a playback of the user's correct pronunciation of the particular word is obtained from a collection of user uttered words amounting to a user profile comprising word models and vocal representations). Rechlis at ¶¶ [0014] - [0020], [0031] - [0032], and [0046] - [0049]. Further, Rechlis teaches that the user records audible pronunciations of words present in the database. These record pronunciations can also be used to restore digitized versions of the user’s own pronunciations (i.e., digitized spoken words) and can be stored within the database in fields 13a and 13c (i.e., the Word Model and the Vocal Representation) when such a VR for the word is not present within the database. Rechlis at ¶¶ [0046] – [0049]. Rechlis’ system also adapts to provide vocal outputs closer in sound to the user’s voice. Rechlis at ¶ [0061]. As such, it would have been a predictable variation to skip the step of reconstructing the user’s voice and simply playback the digitized spoken word of the user that was recorded and added to the database as a vocal representation when the database lacked an entry for that word. Doing this would achieve the goal of providing vocal outputs closer in sound to the user’s voice as taught by Rechlis at ¶ [0061]. Furthermore, the training session taught by Rechlis (¶¶ [0042] – [0054]) is, in essence, a previous communication session including the user. As such, the digitized spoken word is previously recorded in a communication session, and stored in the user speech profile.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon-Natarajan-Liu with the teachings of Rechlis to provide determining a correct version of a user mispronounced word and displaying a playback recording of the user from a user profile. Doing so would have improved recognition of mispronounced words as recognized by Rechlis at ¶¶ [0049] – [0055]. As such, the improvement of recognition of mispronounced words would also result in more incorrectly pronounced words being corrected, improving user fluency.


Regarding claim 9, Moon-Natarajan-Liu-Rechlis teaches all the limitations of claim 8 as laid out above. Further, Liu teaches the computing system of claim 8, wherein the one or more speech recommendations include one or more synthetic speech signals generated with the text-to-speech system to imitate a voice of the user. (Liu teaches generating or synthesizing speech signals that imitate specific voice signals input such that the generated signals are indistinguishable from the original voice signals. Liu at ¶ [0018].)

Regarding claim 11, Moon-Natarajan-Liu-Rechlis teaches all the limitations of claim 8 as laid out above. Further, Liu teaches the computing system of claim 8, wherein generating the one or more synthetic speech recommendations for the user includes generating the one or more speech recommendations for the user based upon, at least in part, the user speech profile. (Liu teaches generating or synthesizing speech signals that imitate specific voice signals input such that the generated signals are indistinguishable from the original voice signals. Liu at ¶ [0018]. As such, the original voice signals used to imitate the voices signals and Moon’s dialogue history discussed above are a speech profile.)

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Moon-Natarajan-Liu-Rechlis as applied to claim 8 above, and further in view of U.S. Patent Application Publication No. 2022/0036878 A1 to Nicole Cyr et al. (hereinafter Cyr).

Regarding claim 10, Moon-Natarajan-Liu-Rechlis teaches all the limitations of claim 8 as laid out above. Moon-Natarajan-Liu-Rechlis, however, does not teach all the limitations of claim 10. 
In a similar field of endeavor (e.g., processing speech using user profile and speech assessment), Cyr teaches the computing system of claim 8, wherein the predefined period of inarticulate speech includes one or more of: at least a threshold amount of inarticulate utterances within a predefined period of time, and at least a threshold number of mispronounced words. (Cyr teaches assessing user speech in a predefined period of time to evaluate the severity of stuttering. Cyr at ¶¶ [0130]. Cyr's assessment of user speech in combination with Moon and Natarajan would have made determining an intervention in response to speech assessment determining a period of inarticulate speech obvious to one of ordinary skill in the art as Cyr, Moon, and Natarajan are in analogous fields of art and would benefit from accurate detection of incorrect or inarticulate periods of speech.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon-Natarajan-Liu-Rechlis with the teachings of Cyr to provide the limitations of claim 10. Doing so would have aided the user in improving their speech as recognized by Cyr at ¶ [0103].

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Moon-Natarajan-Liu-Rechlis as applied to claim 8 above, and further in view of U.S. Patent Application Publication No. 2023/0072898 A1 to Yoon Cho (hereinafter Cho).

Regarding claim 12, Moon-Natarajan-Liu-Rechlis teaches all the limitations of claim 8 as laid out above. Moon-Natarajan-Liu-Rechlis however, do not teach all the limitations of claim 12. 
In a similar field of endeavor (e.g., suggesting recommended speech to a user), cho teaches the computing system of claim 8, wherein generating the one or more speech recommendations for the user includes defining one or more conditions for automatically providing a synthetic speech recommendation for the user. (Cho teaches determining different output speeches depending on the situation at hand (i.e., defining one or more conditions for automatically outputting speech.) Cho at ¶¶ [0081] - [0084]. Further, Moon's suggestions may be audibly output to the user, (Moon at ¶¶ [0171] - [0172].) As such, Cho's recommendations could be audibly output to the user as Moon's are.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon-Natarajan-Liu-Rechlis with the teachings of Cho to provide the limitations of claim 12. Doing so would have prevented the user’s attention from being diverted as recognized by Cho at ¶¶ [0131] – [0135].

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Moon-Natarajan-Liu-Rechlis as applied to claim 8 above, and further in view of U.S. Patent No. 9,478,234 B1 to Dibyendu Nandy et al. (hereinafter Nandy).

Regarding claim 13, Moon-Natarajan-Liu-Rechlis teaches all the limitations of claim 8 as laid out above. Moon-Natarajan-Liu-Rechlis, however, do not teach the limitations of claim 13.
In a similar field of endeavor (e.g., Speech processing and speech recognition), Nandy teaches the computing system of claim 8, further comprising: generating a time window for generating the one or more speech recommendations by delaying transmission of the input speech signal by a predefined amount of time. (Nandy, teaches introducing a delay into transmission of audio signal to allow for better performance of the speech processing. (i.e., delaying the transmission of a speech signal by a predefined amount of time to allow for better processing.). Nandy at 4:9 - 4:30.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Moon-Natarajan-Liu-Rechlis with the teachings of Nandy to provide the limitations. Doing so would have provided additional time for speech processing to allow for better processing as recognized by Nandy at 4:9 – 4:30.

Regarding claim 14, Moon-Natarajan-Liu-Rechlis in view of Nandy (hereinafter Moon-Natarajan-Liu-Rechlis-Nandy) teaches all the limitations of claim 13 as laid out above. Further, Nandy teaches the computing system of claim 13, further comprising: dynamically adjusting playback of the input speech signal and a synthetic speech recommendation selected by the user based upon, at least in part, the time window for generating the one or more speech recommendations. (Nandy teaches synchronizing the output streams based on the delay and latency introduced. (i.e., dynamically adjusting playback of the input speech signal and the synthetic speech recommendation based on the time window of generating the synthetic speech.) Nandy at 4:9 - 4:30.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAMERON KENNETH YOUNG whose telephone number is (703)756-1527. The examiner can normally be reached Mon - Fri, 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CAMERON KENNETH YOUNG/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Apr 07, 2023
Application Filed
Apr 30, 2025
Non-Final Rejection — §103
Jul 24, 2025
Interview Requested
Jul 30, 2025
Applicant Interview (Telephonic)
Jul 30, 2025
Examiner Interview Summary
Aug 06, 2025
Response Filed
Oct 16, 2025
Final Rejection — §103
Nov 09, 2025
Interview Requested
Nov 17, 2025
Applicant Interview (Telephonic)
Nov 17, 2025
Examiner Interview Summary
Jan 20, 2026
Request for Continued Examination
Jan 27, 2026
Response after Non-Final Action
Mar 10, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/999,850
Patent 12602409
INFORMATION SEARCH SYSTEM
2y 5m to grant Granted Apr 14, 2026
18/290,574
Patent 12592230
RECOGNITION OR SYNTHESIS OF HUMAN-UTTERED HARMONIC SOUNDS
2y 5m to grant Granted Mar 31, 2026
17/974,455
Patent 12567429
VOICE CALL CONTROL METHOD AND APPARATUS, COMPUTER-READABLE MEDIUM, AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 03, 2026
18/619,608
Patent 12525250
Cascade Architecture for Noise-Robust Keyword Spotting
2y 5m to grant Granted Jan 13, 2026
18/096,309
Patent 12493748
LARGE LANGUAGE MODEL UTTERANCE AUGMENTATION
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
82%
With Interview (+12.5%)
2y 11m
Median Time to Grant
High
PTA Risk
Based on 20 resolved cases by this examiner. Grant probability derived from career allow rate.