DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s amendment, filed 10/01/2025, has been entered. Claims 5 and 22 have been cancelled. Claims 1 – 4, 6 – 21, and 23 – 34 remain pending within the application.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1 – 2, 4, 6 – 7, 15, 18 – 19, 21, 23 – 24, and 32 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by U.S. Patent Application Publication No. 2020/0193971 A1 to Christoph Johann Feinauer (hereinafter Feinauer).
Regarding claim 1, Feinauer teaches a communication system comprising: a processor; a computer-readable medium connected to the processor; and a set of instructions on the computer-readable medium, including: (Feinauer teaches a computer system comprising memory, processors, and computer-readable instructions wherein the computer system is employed to perform accent modification. Feinauer at ¶¶ [0031] - [0032].)
a speech reception unit executable by the processor to receive input speech in the form of an input signal derived from a sound wave generated by a microphone that includes input language; (Feinauer teaches receiving an audio signal through a microphone (i.e., an input signal is derived from the soundwave generated by a microphone). Feinauer at ¶¶ [0042] - [0044]. Further, as the system is directed to performing speech/accent modification (¶¶ [0031] - [0032]) then the input signal received by the microphone would contain speech with an accent, otherwise the system would be operating on nothing, yielding no results.)
a speech processing system connected to the speech reception unit and executable by the processor to modify the input signal to an output signal wherein the input speech in the input signal is modified to output speech in the output signal, (Feinauer teaches modifying the voice of a user within a call (i.e., the speech is received, modified, then output to the user on the other end of the call.) Feinauer at ¶¶ [0044] - [0046].)
wherein the speech processing system includes: a speech modification module executable by the processor to modify the input language in the input signal, (Feinauer teaches modifying the speech of a user to a dialect/accent more familiar to that of the user on the other end of the call wherein the speech of the user is obtained via the microphone. Feinauer at ¶¶ [0031] - [0032] and [0042] - [0044] and Figs. 2A, 2B, 3, and 4.)
wherein the speech modification module includes: an intelligibility improvement engine having: an accent conversion model (Feinauer teaches a machine learning model for modifying and altering speech signals to present an alternative dialect or accent more familiar with a conversation participant. (i.e., an accent conversion model.) Feinauer at ¶¶ [0047] - [0068].), wherein the accent conversion model has: a first training structure of pairs of utterances and transcripts in the first accent; a second training structure of pairs of utterances and transcripts in the second accent; (Feinauer teaches training data comprising prior data of multiple speakers pronouncing the same utterances. These utterances can further include multi-word sentence fragments in different accents or dialects (i.e., a first and second training structure of pairs of utterances in a first and second accent.) Feinauer at ¶¶ [0069] - [0078].)
and an input structure of pairs of utterances and transcripts in the first accent, wherein the offline training model trains on the first training structure and the second training structure based on input from the input structure to develop inferences to generate the conversion relationship. (Feinauer teaches receiving training data (i.e., the prior data of multiple speakers) as input into the training data, further, Feinauer teaches the training model produces an output of a pairing database including dialect pairings or accent/ dialect information for specific contexts. Feinauer at ¶¶ [0069] - [0078]. Further, Feinauer teaches a database comprising a machine learning model and training data wherein the database (i.e., machine learning model and training data) are stored locally on the same machine that the accent conversion is performed (i.e., the database is an offline database) therefore the machine learning model trained on the training data is trained offline. Feinauer at ¶¶ [0047] – [0052].);
and an accent converter that modifies an accent in the input language based on the accent conversion model; (Feinauer teaches replacing elements of the user's speech with accent/dialect modified replacements (i.e., converting the accent/dialect to another form) using information from the machine learning model. Feinauer at ¶¶ [0066] - [0067].)
and a speech output unit connected to the speech processing system and executable by the processor to provide an output of the output signal. (Feinauer teaches outputting modified audio to a user on the receiving end of a communications session (i.e., outputting modified speech to that user in phone calls/VOIP/etc.) Feinauer at ¶¶ [0047] - [0068].)
Regarding claim 2, Feinauer teaches the system of claim 1, wherein the accent converter retains a voice of the input speech. (Feinauer teaches modifying speech of a user to accommodate an accent more familiar to a speaker on the other end of a communication session. Feinauer at ¶¶ [0079] - [0086].)
Regarding claim 4, Feinauer teaches the system of claim 1, wherein the accent conversion model includes: an offline training model that generates a conversion relationship between a first accent of the language to a second accent of the language; (Feinauer teaches training a machine learning model for generating a pairing database for accent conversion (i.e., the pairing database is a conversion relationship between a first and second accent of a language). Feinauer at ¶¶ [0069] - [0078].)
and a streaming speech-to-speech model that converts from the first accent of the language to the second accent of the language based on the conversion relationship. (Feinauer teaches using the pairing database to convert accents into an accent more familiar to a caller on the other end of a communication session. (i.e., a streaming speech-to-speech model converting accents based on the conversion relationship.) Feinauer at ¶¶ [0079] - [0086].)
Regarding claim 6, Feinauer the system of claim 1, wherein the streaming speech-to-speech model has: at least one neural network model to convert a spectrogram of the first accent to a spectrogram of the second accent. (Feinauer teaches performing accent modification on an audio file wherein features of the audio file are extracted from a spectrogram wherein machine learning algorithms are applied during the process. Feinauer at ¶¶ [0094] - [0096]. Further, Feinauer teaches machine learning, including neural networks. Feinauer at ¶¶ [0037] - [0039]. As such, a person of ordinary skill in the art would have understood that the machine learning algorithm could be a neural network, and that any modification of an audio file would result in a new spectrogram for the modified audio file as a spectrogram is merely a representation of the contents of an audio file graphed in a specific manner (i.e., spectrum of frequencies over time))
Regarding claim 7, Feinauer teaches the system of claim 6, wherein the streaming speech-to-speech model has: a plurality of neural network models to convert a spectrogram of the first accent to a spectrogram of the second accent. (Feinauer teaches using multiple machine learning models. Feinauer at ¶¶ [0079] - [0082]. As such, because Feinauer contemplates neural networks as a machine learning model, and machine learning models are used in the process of converting the spectrogram from one accent to another accent as a result of the process then a Feinauer’s use of multiple machine learning models anticipates multiple neural networks to convert the first accent to a second accent.)
Regarding claim 15, Feinauer the system of claim 1, wherein the intelligibility improvement engine has: at least a first knowledge base; and at least a first routine that modifies the input language based on the first knowledge base. (Feinauer teaches a pairing database that is used to modify the first and second languages in order to alter the accent/dialect using the accent modifier. Feinauer at ¶¶ [0047] - [0068] and Fig. 3.)
Regarding claim 18, Feinauer teaches a method of communicating comprising: executing by a processor a speech reception unit to receive input speech in the form of an input signal derived from a sound wave generated by a microphone that includes input language; (Feinauer teaches a computer system comprising memory, processors, and computer-readable instructions wherein the computer system is employed to perform accent modification. Feinauer at ¶¶ [0031] - [0032]. Feinauer teaches receiving an audio signal through a microphone (i.e., an input signal is derived from the soundwave generated by a microphone). Feinauer at ¶¶ [0042] - [0044]. Further, as the system is directed to performing speech/accent modification (¶¶ [0031] - [0032]) then the input signal received by the microphone would contain speech with an accent, otherwise the system would be operating on nothing, yielding no results.)
executing by the processor a speech processing system connected to the speech reception unit to modify the input signal to an output signal wherein the input speech in the input signal is modified to output speech in the output signal; (Feinauer teaches modifying the voice of a user within a call (i.e., the speech is received, modified, then output to the user on the other end of the call.) Feinauer at ¶¶ [0044] - [0046].)
executing by the processor a speech modification module to modify the input language in the input signal, wherein the speech modification module includes: an intelligibility improvement engine having: an accent conversion model (Feinauer teaches a machine learning model for modifying and altering speech signals to present an alternative dialect or accent more familiar with a conversation participant. (i.e., an accent conversion model.) Feinauer at ¶¶ [0047] - [0068].), wherein the accent conversion model has: a first training structure of pairs of utterances and transcripts in the first accent; a second training structure of pairs of utterances and transcripts in the second accent; (Feinauer teaches training data comprising prior data of multiple speakers pronouncing the same utterances. These utterances can further include multi-word sentence fragments in different accents or dialects (i.e., a first and second training structure of pairs of utterances in a first and second accent.) Feinauer at ¶¶ [0069] - [0078].)
and an input structure of pairs of utterances and transcripts in the first accent, wherein the offline training model trains on the first training structure and the second training structure based on input from the input structure to develop inferences to generate the conversion relationship. (Feinauer teaches receiving training data (i.e., the prior data of multiple speakers) as input into the training data, further, Feinauer teaches the training model produces an output of a pairing database including dialect pairings or accent/ dialect information for specific contexts. Feinauer at ¶¶ [0069] - [0078]. Further, Feinauer teaches a database comprising a machine learning model and training data wherein the database (i.e., machine learning model and training data) are stored locally on the same machine that the accent conversion is performed (i.e., the database is an offline database) therefore the machine learning model trained on the training data is trained offline. Feinauer at ¶¶ [0047] – [0052].);
and an accent converter that modifies an accent in the input language based on the accent conversion model; (Feinauer teaches replacing elements of the user's speech with accent/dialect modified replacements (i.e., converting the accent/dialect to another form) using information from the machine learning model. Feinauer at ¶¶ [0066] - [0067].)
and executing by the processor a speech output unit connected to the speech processing system to provide an output of the output signal. (Feinauer teaches outputting modified audio to a user on the receiving end of a communications session (i.e., outputting modified speech to that user in phone calls/VOIP/etc.) Feinauer at ¶¶ [0047] - [0068].)
Regarding claim 19, Feinauer teaches the method of claim 18, wherein the accent converter retains a voice of the input speech. (Feinauer teaches modifying speech of a user to accommodate an accent more familiar to a speaker on the other end of a communication session. Feinauer at ¶¶ [0079] - [0086].)
Regarding claim 21, Feinauer teaches the method of claim 18, wherein the accent conversion model includes: an offline training model that generates a conversion relationship between a first accent of the language to a second accent of the language; (Feinauer teaches training a machine learning model for generating a pairing database for accent conversion (i.e., the pairing database is a conversion relationship between a first and second accent of a language). Feinauer at ¶¶ [0069] - [0078].)
and a streaming speech-to-speech model that converts from the first accent of the language to the second accent of the language based on the conversion relationship. (Feinauer teaches using the pairing database to convert accents into an accent more familiar to a caller on the other end of a communication session. (i.e., a streaming speech-to-speech model converting accents based on the conversion relationship.) Feinauer at ¶¶ [0079] - [0086].)
Regarding claim 23, Feinauer teaches the method of claim 18, wherein the streaming speech-to-speech model has: at least one neural network model to convert a spectrogram of the first accent to a spectrogram of the second accent. (Feinauer teaches performing accent modification on an audio file wherein features of the audio file are extracted from a spectrogram wherein machine learning algorithms are applied during the process. Feinauer at ¶¶ [0094] - [0096]. Further, Feinauer teaches machine learning, including neural networks. Feinauer at ¶¶ [0037] - [0039]. As such, a person of ordinary skill in the art would have understood that the machine learning algorithm could be a neural network, and that any modification of an audio file would result in a new spectrogram for the modified audio file as a spectrogram is merely a representation of the contents of an audio file graphed in a specific manner (i.e., spectrum of frequencies over time).)
Regarding claim 24, Feinauer teaches the method of claim 23, wherein the streaming speech-to-speech model has: a plurality of neural network models to convert a spectrogram of the first accent to a spectrogram of the second accent. (Feinauer teaches using multiple machine learning models. Feinauer at ¶¶ [0079] - [0082]. As such, because Feinauer contemplates neural networks as a machine learning model, and machine learning models are used in the process of converting the spectrogram from one accent to another accent as a result of the process then Feinauer’s use of multiple machine learning models anticipates multiple neural networks to convert the first accent to a second accent.)
Regarding claim 32, Feinauer teaches the method of claim 18, wherein the intelligibility improvement engine has: at least a first knowledge base; and at least a first routine that modifies the input language based on the first knowledge base. (Feinauer teaches a pairing database that is used to modify the first and second languages in order to alter the accent/dialect using the accent modifier. Feinauer at ¶¶ [0047] - [0068] and Fig. 3.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Feinauer as applied to claims 1 and 18 above, in further view of U.S. Patent Application Publication 2024/0355346 A1 to Jeffrey Lubin et al. (hereinafter Lubin).
Regarding claim 3, Feinauer teaches all the limitations of claim 1 as laid out above. Feinauer, however, does not explicitly teach the system of claim 1, wherein the accent converter retains a prosody of the input speech.
In a similar field of endeavor (e.g., the modification of input voice waveforms to produce output waveforms.), Lubin teaches [retaining] the prosody of input speech. (Lubin teaches accent modification preserving details and prosody of the speech. Lubin at ¶¶ [0014] - [0017].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Lubin to provide retaining the prosody of the input speech. Doing so would have improved intelligibility of a speaker, particularly that of foreign (i.e., accented) speakers, as recognized by Lubin at ¶¶ [0014] – [0020].
Regarding claim 20, Feinauer teaches all the limitations of claim 18 as laid out above. Feinauer, however, does not explicitly teach the system of claim 1, wherein the accent converter retains a prosody of the input speech.
In a similar field of endeavor (e.g., the modification of input voice waveforms to produce output waveforms.), Lubin teaches [retaining] the prosody of input speech. (Lubin teaches accent modification preserving details and prosody of the speech. Lubin at ¶¶ [0014] - [0017].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Lubin to provide retaining the prosody of the input speech. Doing so would have improved intelligibility of a speaker, particularly that of foreign (i.e., accented) speakers, as recognized by Lubin at ¶¶ [0014] – [0020].
Claims 8 – 10, and 25 – 27 are rejected under 35 U.S.C. 103 as being unpatentable over Feinauer as applied to claims 1 – 2, 4 – 7, 15, 18 – 19, 21 – 24, and 32 above, and further in view of U.S. Patent Application Publication No. 2021/0375289 A1 to Chenguang Zhu et al. (hereinafter Zhu).
Regarding claim 8, Feinauer teaches all the limitations of claim 7 as laid out above. Feinauer, however, does not teach the limitations of claim 8.
In a similar field of endeavor (e.g., speech processing and the observation and management of conversations), Zhu teaches the system of claim 7, wherein the neural network models have different parameters for execution. (Zhu teaches multiple transformers comprising a model (i.e., a neural network) and multiple models operating to process speech in order to generate meeting minutes. Zhu at ¶¶ [0468] - [0482] and Figs. 4 and 6. Further, Zhu teaches that the models achieve different tasks (e.g., a word-level transformer, turn-level transformer, and decoder would operate on different parameters as they perform different tasks). Zhu at ¶¶ [0468] - [0482] and Fig. 4.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Zhu to provide the limitations of claim 8. Doing so would have improved the accuracy of the model and its result as recognized by Zhu at ¶¶ [0031] – [0037].
Regarding claim 9, Feinauer in view of Zhu (hereinafter Feinaeur-Zhu) teaches all the limitations of claim 8 as laid out above. Further, Zhu teaches the system of claim 7, wherein the neural network models function in series. (Zhu teaches multiple models operating in series (e.g., the ASR model, then the Post-Process model, then the Summarization model) Zhu at Fig. 6 and ¶¶ [0502] - [0517].)
Regarding claim 10, Feinauer teaches all the limitations of claim 6 as laid out above. Feinauer, however, does not teach all the limitations of claim 10.
In a similar field of endeavor (e.g., speech processing and the observation and management of conversations), Zhu teaches the system of claim 6, wherein the at least one neural network model uses self-attention to model a sequence of input elements and a sequence of output elements by tracking relationships between pairs of the input elements. (Zhu teaches the neural networks using self-attention and cross attention layers. Zhu at ¶¶ [0468] - [0482] and Figs. 4 and 6. As such, Zhu in combination with Feinauer teaches pairing elements by tracking relationships between input elements and output replacements (Feinauer at ¶¶ [0079] - [0086].))
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Zhu to provide the limitations of claim 10. Doing so would have improved the accuracy of the model and its result as recognized by Zhu at ¶¶ [0031] – [0037].
Regarding claim 25, Feinauer teaches all the limitations of claim 24 as laid out above. Feinauer, however, does not teach the limitations of claim 25.
In a similar field of endeavor (e.g., speech processing and the observation and management of conversations), Zhu teaches the method of claim 24, wherein the neural network models have different parameters for execution. (Zhu teaches multiple transformers comprising a model (i.e., a neural network) and multiple models operating to process speech in order to generate meeting minutes. Zhu at ¶¶ [0468] - [0482] and Figs. 4 and 6. Further, Zhu teaches that the models achieve different tasks (e.g., a word-level transformer, turn-level transformer, and decoder would operate on different parameters as they perform different tasks). Zhu at ¶¶ [0468] - [0482] and Fig. 4.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Zhu to provide the limitations of claim 25. Doing so would have improved the accuracy of the model and its result as recognized by Zhu at ¶¶ [0031] – [0037].
Regarding claim 26, Feinauer in view of Zhu (hereinafter Feinaeur-Zhu) teaches all the limitations of claim 25 as laid out above. Further, Zhu teaches the method of claim 24, wherein the neural network models function in series. (Zhu teaches multiple models operating in series (e.g., the ASR model, then the Post-Process model, then the Summarization model) Zhu at Fig. 6 and ¶¶ [0502] - [0517].)
Regarding claim 27, Feinauer teaches all the limitations of claim 23 as laid out above. Feinauer, however, does not teach all the limitations of claim 27.
In a similar field of endeavor (e.g., speech processing and the observation and management of conversations), Zhu teaches the method of claim 23, wherein the at least one neural network model uses self-attention to model a sequence of input elements and a sequence of output elements by tracking relationships between pairs of the input elements. (Zhu teaches the neural networks using self-attention and cross attention layers. Zhu at ¶¶ [0468] - [0482] and Figs. 4 and 6. As such, Zhu in combination with Feinauer teaches pairing elements by tracking relationships between input elements and output replacements (Feinauer at ¶¶ [0079] - [0086].))
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Zhu to provide the limitations of claim 27. Doing so would have improved the accuracy of the model and its result as recognized by Zhu at ¶¶ [0031] – [0037].
Claims 11 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Feinauer-Zhu as applied to claims 10 and 27 above, and further in view of U.S. Patent Application Publication No. 2022/0310070 A1 to Niko Moritz et al. (hereinafter Moritz).
Regarding claim 11, Feinauer-Zhu teaches all the limitations of claim 10 as laid out above. Feinauer-Zhu, however, do not teach all the limitations of claim 11.
In a similar field of endeavor (i.e., neural networks processing sequences of input frames using self-attention), Moritz teaches the system of claim 10, wherein, for each output element, the self- attention looks at a subset of past input elements and a subset of future input elements. (Moritz teaches using self-attention layers within a neural network looking at past and future subsets of input context elements. Moritz at ¶¶ [0077] - [0078] and Fig. 6A.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer-Zhu with the teachings of Moritz to provide the limitations of claim 11. Doing so would have provided relevant information from surrounding nodes of the input frame sequence for processing the sequence of input frames as recognized by Moritz at ¶¶ [0077] – [0080]. Further, as the both Feinauer-Zhu and Moritz perform processing of input sequences using neural networks and self-attention mechanisms, a person of ordinary skill in the art would have found it obvious to use Moritz method of observing past and future frames of an input sequence with Feinauer-Zhu’s teachings of neural networks as they both process input data using the networks in a similar field of endeavor.
Regarding claim 28, Feinauer-Zhu teaches all the limitations of claim 27 as laid out above. Feinauer-Zhu, however, do not teach all the limitations of claim 28.
In a similar field of endeavor (i.e., neural networks processing sequences of input frames using self-attention), Moritz teaches the method of claim 27, wherein, for each output element, the self- attention looks at a subset of past input elements and a subset of future input elements. (Moritz teaches using self-attention layers within a neural network looking at past and future subsets of input context elements. Moritz at ¶¶ [0077] - [0078] and Fig. 6A.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer-Zhu with the teachings of Moritz to provide the limitations of claim 28. Doing so would have provided relevant information from surrounding nodes of the input frame sequence for processing the sequence of input frames as recognized by Moritz at ¶¶ [0077] – [0080]. Further, as the both Feinauer-Zhu and Moritz perform processing of input sequences using neural networks and self-attention mechanisms, a person of ordinary skill in the art would have found it obvious to use Moritz method of observing past and future frames of an input sequence with Feinauer-Zhu’s teachings of neural networks as they both process input data using the networks in a similar field of endeavor.
Claims 12 – 14 and 29 – 31 are rejected under 35 U.S.C. 103 as being unpatentable over Feinauer as applied to claims 1 and 18 above, and further in view of U.S. Patent Application Publication No. 2015/0100315 A1 to Itay Bianco (hereinafter Bianco).
Regarding claim 12, Feinauer teaches all the limitations of claim 1 as laid out above. Feinauer, however, does not explicitly teach all the limitations of claim 12.
In a similar endeavor (e.g., transmission of internet protocol telephone services), Bianco teaches the system of claim 1, further comprising: a relay server device positioned between first and second stacks of relays in a telephone system, the accent conversion model forming part of the relay server device. (Bianco teaches a telephony system comprising relays and codecs wherein there are multiple codecs and relays connected together to transmit IP telephony information. Bianco ¶¶ [0083] - [0085]. As such, a person of ordinary skill in the art would have understood that server-based accent modification systems such as that of Feinauer would be placed between the relays and codecs in order to operate on the audio data received from the codecs and relays.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Bianco to provide the limitations of claim 12. Doing so would have helped correct losses of data packets being transmitted as recognized by Bianco at ¶¶ [0083] – [0085].
Regarding claim 13, Feinauer in view of Bianco (hereinafter Feinauer-Bianco) teaches all the limitations of claim 12. Further, Bianco teaches the system of claim 12, wherein the relay server device includes first and second codecs that connect the accent conversion device to the first and second stacks of relays respectively. (Bianco teaches a telephony system comprising relays and codecs wherein there are multiple codecs and relays connected together to transmit IP telephony information. Bianco ¶¶ [0083] - [0085]. As such, a person of ordinary skill in the art would have understood that server-based accent modification systems such as that of Feinauer would be placed between the relays and codecs in order to operate on the audio data received from the codecs and relays.)
Regarding claim 14, Feinauer-Bianco teaches all the limitations of claim 13 as laid out above. Further, Feinauer teaches the system of claim 13, wherein the accent conversion model is a first accent conversion model, further comprising: a second accent conversion model that converts the second accent to the first accent, the second accent conversion model being connected to the first and second stacks of relays by the first and second codecs respectively. (Feinauer teaches the machine learning model used to perform accent modification trained on multiple dialects allowing the model to perform accent modification and replacement on multiple dialects (i.e., the model can translate multiple accents or dialects using the pairing database, effectively creating multiple models, each for a different context.) Feinauer at ¶¶ [0066] - [0075]. Further, Feinauer teaches the accent modification system replacing the accents of multiple speakers with multiple accents and dialects within a call. Feinauer at ¶¶ [0041] - [0045] and [0050] - [0061].)
Regarding claim 29, Feinauer teaches all the limitations of claim 18 as laid out above. Feinauer, however, does not explicitly teach all the limitations of claim 29.
In a similar endeavor (e.g., transmission of internet protocol telephone services), Bianco teaches the system of claim 18, further comprising: a relay server device positioned between first and second stacks of relays in a telephone system, the accent conversion model forming part of the relay server device. (Bianco teaches a telephony system comprising relays and codecs wherein there are multiple codecs and relays connected together to transmit IP telephony information. Bianco ¶¶ [0083] - [0085]. As such, a person of ordinary skill in the art would have understood that server-based accent modification systems such as that of Feinauer would be placed between the relays and codecs in order to operate on the audio data received from the codecs and relays.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Bianco to provide the limitations of claim 29. Doing so would have helped correct losses of data packets being transmitted as recognized by Bianco at ¶¶ [0083] – [0085].
Regarding claim 30, Feinauer in view of Bianco (hereinafter Feinauer-Bianco) teaches all the limitations of claim 29. Further, Bianco teaches the system of claim 29, wherein the relay server device includes first and second codecs that connect the accent conversion device to the first and second stacks of relays respectively. (Bianco teaches a telephony system comprising relays and codecs wherein there are multiple codecs and relays connected together to transmit IP telephony information. Bianco ¶¶ [0083] - [0085]. As such, a person of ordinary skill in the art would have understood that server-based accent modification systems such as that of Feinauer would be placed between the relays and codecs in order to operate on the audio data received from the codecs and relays.)
Regarding claim 31, Feinauer-Bianco teaches all the limitations of claim 30 as laid out above. Further, Feinauer teaches the method of claim 30, wherein the accent conversion model is a first accent conversion model, further comprising: a second accent conversion model that converts the second accent to the first accent, the second accent conversion model being connected to the first and second stacks of relays by the first and second codecs respectively. (Feinauer teaches the machine learning model used to perform accent modification trained on multiple dialects allowing the model to perform accent modification and replacement on multiple dialects (i.e., the model can translate multiple accents or dialects using the pairing database, effectively creating multiple models, each for a different context.) Feinauer at ¶¶ [0066] - [0075]. Further, Feinauer teaches the accent modification system replacing the accents of multiple speakers with multiple accents and dialects within a call. Feinauer at ¶¶ [0041] - [0045] and [0050] - [0061].)
Claims 16 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Feinauer as applied to claims 1 and 18 above, and further in view of U.S. Patent Application Publication No. 2022/0392478 A1 to Samer Lutfi Hijazi et al. (hereinafter Hijazi).
Regarding claim 16, Feinauer teaches all the limitations of claim 1. Feinauer, however, does not teach the limitations of claim 16.
In a similar field of endeavor (e.g., speech enhancement and processing within conversations), Hijazi teaches the system of claim 1, wherein the speech processing system includes: a conversation management module having: an overlap trigger to detect an overlap of input speech from first and second input signals; (Hijazi teaches a speech enhancement method wherein overlapping speech signals of multiple speakers cause the system to suppress the speech of secondary speakers instead of the primary (i.e., first) speaker. Hijazi at ¶¶ [0020] - [0025].)
and a speaker suppressor connected to the overlap trigger to suppress the second speech in favor of not suppressing the first speech only when the overlap is detected and not when the overlap is not detected. (Hijazi teaches suppressing one speaker instead of another speaker when overlap between the speakers’ voices is detected. Hijazi at ¶¶ [0020] - [0025].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Hijazi to provide the limitations of claim 16. Doing so would have prevented interferences in the discussion as recognized by Hijazi at ¶¶ [0020] – [0025].
Regarding claim 33, Feinauer teaches all the limitations of claim 18. Feinauer, however, does not teach the limitations of claim 33.
In a similar field of endeavor (e.g., speech enhancement and processing within conversations), Hijazi teaches the method of claim 18, wherein the speech processing system includes: a conversation management module having: an overlap trigger to detect an overlap of input speech from first and second input signals; (Hijazi teaches a speech enhancement method wherein overlapping speech signals of multiple speakers cause the system to suppress the speech of secondary speakers instead of the primary (i.e., first) speaker. Hijazi at ¶¶ [0020] - [0025].)
and a speaker suppressor connected to the overlap trigger to suppress the second speech in favor of not suppressing the first speech only when the overlap is detected and not when the overlap is not detected. (Hijazi teaches suppressing one speaker instead of another speaker when overlap between the speakers’ voices is detected. Hijazi at ¶¶ [0020] - [0025].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Hijazi to provide the limitations of claim 33. Doing so would have prevented interferences in the discussion as recognized by Hijazi at ¶¶ [0020] – [0025].
Claims 17 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Feinauer as applied to claims 1 and 18 above, and further in view of U.S. Patent Application Publication No. 2016/0065711 A1 to Carolina T. De Carney (hereinafter Carney).
Regarding claim 17, Feinauer teaches all the limitations of claim 1 as laid out above. Feinauer, however, doesn’t teach all the limitations of claim 17.
In a similar field of endeavor (e.g., the observation and management of phone calls using speech detection and conversation analysis), Carney teaches the system of claim 1, wherein the speech processing system includes: a conversation management module executable by the processor and having: a delay trigger that determines whether a gap between time segments in the input speech requires an injected utterance; and an utterance injector that merges an utterance with the time segments so that the utterance is between the time segments in the output speech. (Carney teaches a text-to-speech injection module that waits for pauses in the conversation then injects speech into the periods of silence within the conversation. Carney at ¶¶ [0020] - [0048]. As such, a speech injection into a pause within a conversation would be an injection of speech (i.e., an utterance) between time segments such that the injection occurs between the time segments in the output speech.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Carney to provide injecting utterances within pauses in the conversation of a phone call because the Feinauer’s teachings of processing voice communications and Carney’s teachings of injecting utterances into pauses of a conversation show that the injection of utterances into conversations is well-known within their shared field of endeavor. Furthermore, Carney’s teachings allow the transmittal of important information to the second party of the conversation (e.g., the information that one of the speakers has chosen to use a text-to-speech interface for the call) as recognized by Carney at ¶¶ [0008] – [0013].
Regarding claim 34, Feinauer teaches all the limitations of claim 18 as laid out above. Feinauer, however, doesn’t teach all the limitations of claim 34.
In a similar field of endeavor (e.g., the observation and management of phone calls using speech detection and conversation analysis), Carney teaches the method of claim 18, wherein the speech processing system includes: a conversation management module executable by the processor and having: a delay trigger that determines whether a gap between time segments in the input speech requires an injected utterance; and an utterance injector that merges an utterance with the time segments so that the utterance is between the time segments in the output speech. (Carney teaches a text-to-speech injection module that waits for pauses in the conversation then injects speech into the periods of silence within the conversation. Carney at ¶¶ [0020] - [0048]. As such, a speech injection into a pause within a conversation would be an injection of speech (i.e., an utterance) between time segments such that the injection occurs between the time segments in the output speech.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Feinauer with the teachings of Carney to provide injecting utterances within pauses in the conversation of a phone call because the Feinauer’s teachings of processing voice communications and Carney’s teachings of injecting utterances into pauses of a conversation show that the injection of utterances into conversations is well-known within their shared field of endeavor. Furthermore, Carney’s teachings allow the transmittal of important information to the second party of the conversation (e.g., the information that one of the speakers has chosen to use a text-to-speech interface for the call) as recognized by Carney at ¶¶ [0008] – [0013].
Response to Arguments
Applicant's arguments filed 10/01/2025 have been fully considered but they are not persuasive. Particularly, Applicant argues that the disclosure of Feinauer does not lend itself to determining several accent pairs because an alleged requirement to input accent and dialect labels would require a separate machine learning model for different accent pairs. Further, Applicant goes on to argue that Feinauer does not disclose an offline training model for the training structures and that the present invention can use the same accent conversion model for determining pairs of accents without requiring that the input accent be detected. Examiner respectfully disagrees.
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., the same accent conversion model can be used for different accents, and the training data can be used in a live scenario without requiring the input accent to be detected) are not recited in the rejected claims. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). In this case, the claims merely recite a generic “accent” to be modified into another accent. There is no suggestion or recitation within the claims that the “accent” to be modified is a plurality of accents. Further there is no suggestion or recitation within the claims that the accent conversion model, as claimed, would be used on any more than a single accent at any one time, as is similarly performed in Feinauer.
Even if there was a suggestion or recitation of using the model for more than one accent, Feinauer also teaches using machine learning model for multiple dialects by utilizing training data for different dialects or accents based on different contexts. Feinauer at ¶¶ [0041] – [0045] and [0050] – [0061]. Therefore, the inclusion of such a recitation within the claims would not substantially alter the scope of the claims to exceed the teachings of Feinauer.
Further, it is unclear where in the specification the negative, unrecited limitation of “without requiring that the input accent be detected” is present. Applicant directs attention to Fig. 8 and ¶ [0199] of the instant application. However, such a figure and its immediately related portions of the specification, ¶ [0199], merely disclose the generation of a streaming speech-to-speech model using accent pairs. This does not constitute the lack of a requirement of detection of an accent. Particularly, in some sense the accent must be detected in some way (i.e., live detection or the use of labeled input speech) in a system where a model is trained to convert a specific type of accent (i.e., Indian) to another accent (i.e., American), as is done in Fig. 8 of the instant application.
Further still, Applicant alleges that Feinauer does not teach the limitation of an offline training model. Examiner respectfully disagrees. As laid out above, Feinauer indeed teaches that the database containing the machine learning model and the training data used by the machine learning model (i.e., training model) can be stored locally on the same machine that performs the accent conversion (i.e., offline training). Feinauer at ¶¶ [0047] and [0052]. Therefore, the machine learning model may be trained offline as the instant application recites. As such, for at least all the reasons laid out above, the 35 U.S.C. § 102 rejections of claims 1 – 2, 4, 6 – 7, 15, 18 – 19, 21, 23 – 24, and 32 are maintained.
Applicant goes on to allege that all the claims rejected under 35 U.S.C. § 103 laid out in the previous Office Action dated 07/01/2025 are allowable over the prior art for the same reasons as claims 1 – 2, 4, 6 – 7, 15, 18 – 19, 21, 23 – 24, and 32. However, as the 35 U.S.C. § 102 rejections are maintained in light of the arguments laid out above, the 35 U.S.C. § 103 rejections of claims 3 and 20, 8 – 10 and 25 – 27, 11 and 28, 12 – 14 and 29 – 31, 16 and 33, and 17 and 34 are maintained because they are not challenged beyond their dependence on the maintained 35 U.S.C. § 102 rejections laid out above.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAMERON KENNETH YOUNG whose telephone number is (703)756-1527. The examiner can normally be reached Mon - Fri, 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CAMERON KENNETH YOUNG/Examiner, Art Unit 2655
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656