DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s Amendment, filed 10/13/2025, has been entered. Claims 4, 11, and 17 have been cancelled. Claims 1 – 3, 5 – 10, 12 – 16, and 18 – 20 remain pending within the application. Applicant’s amendments have resolved all of the previously set forth claim objections laid out in the Office Action filed 08/08/2025. The claim objections of claims 6, 7, 13, 19, and 20 have been withdrawn in light of the amendments.
Response to Arguments
Applicant’s arguments, see pages 7 – 10 of Applicant’s Response, filed 10/13/2025, with respect to the 35 U.S.C. § 101 rejections previous set forth in the Office Action filed 08/08/2025 have been fully considered and are persuasive. The 35 U.S.C. § 101 rejections of claims 1 – 3, 5 – 10, 12 – 16, and 18 - 20 have been withdrawn. Particularly, the 101 rejections have been withdrawn in light of the 2025 Guidance Memo on Subject Matter Eligibility because the elements of the claims that include a) the co-emitted text samples are predicted and not output pronunciations, b) there is a specific model for each word, and c) each of the co-emitted text samples are encoded into a convex hull, and therefore the claim as a whole, exceed the capabilities of the human mind. As such, the extent of the processing of data, and the extent of the processing at each data point, performed by the claims, exceeds the capabilities of the human mind on a fundamental level. Therefore, the 35 U.S.C. § 101 rejection has been withdrawn.
Applicant's arguments regarding Fanty’s teachings, on pages 10 and 11 of Applicant’s Response filed 10/13/2025, have been fully considered but they are not persuasive. Particularly, it is not Fanty’s teachings alone that amount to these limitations. Instead, it is the combination of Beutnagel and Fanty that teach such limitations.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
For the sake of discussion, it is noted that, as laid out in the Office Action dated 08/08/2025, Beutnagel is relied upon to teach co-emitted text entities, not Fanty. Instead, it is Fanty’s teachings of updating a dictionary of pronunciations in combination with Beutnagel’s teachings of predicted pronunciations and the N-best results of the multiple pronunciation candidates that amount to co-emitted text samples. As such, the 35 U.S.C. § 103 rejections of claims 1 – 3, 5 – 10, 12 – 16, and 18 – 20 are maintained in light of the 35 U.S.C. § 103 rejections laid out below.
Applicant’s further arguments with respect to claims 1, 8, and 14, and their respective dependents, have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument regarding the newly amended limitations of claims 1, 8, and 14. As such, the 35 U.S.C. § 103 rejections of claims 1 – 3, 5 – 10, 12 – 16, and 18 – 20 are maintained for at least this reason.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1 – 2, 5, 8 – 9, 12, 14 – 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 6,078,885 A to Mark C. Beutnagel (hereinafter Beutnagel) in view of U.S. Patent No. 6,389,394 B1 to Mark Fanty (hereinafter Fanty) in view of U.S. Patent No. 11,386,889 B2 to Petar Aleksic et al. (hereinafter Aleksic) and Non-Patent Literature Reconceptualizing the vowel space in analyzing regional dialect variation and sound change in American English. To Robert Allen Fox et al. (hereinafter Fox).
Regarding claim 1, Beutnagel teaches a method for predicting pronunciation of a text sample, comprising: (Beutnagel teaches predicting plausible pronunciations. Beutnagel at 4:52 - 5:15.)
generating, via processing circuitry, an encoding of allowable pronunciations of the text sample within a phoneme space; (Beutnagel teaches generating candidate pronunciations for a word (i.e., text entity) wherein the candidates are generated in order to ensure the correct pronunciation is selected (i.e., the correct pronunciation is "allowable"). Beutnagel at 5:5 - 6:19. Further, Beutnagel teaches generating a plurality of candidate pronunciations phonetically (i.e., encoding of allowable pronunciations). Beutnagel at 5:16 - 6:41. Further, Beutnagel teaches a computer coupled to memory executing software such as a speech recognition engine. Beutnagel at 2:35 – 2:56. Further still, Beutnagel teaches generating predicted pronunciations for a text sample based upon phonemes (i.e., within a phoneme space.) Beutnagel at Fig. 2 and 4:12 - 7:50.)
receiving an audio sample including the text sample, the audio sample comprising speech spoken by a user; (Beutnagel teaches receiving an audio sample containing an audio sample corresponding to a text sample and generating predicted pronunciations for the text. Beutnagel at Fig. 2 and 4:12 - 5:56.)
processing, via the processing circuitry, the audio sample to generate predicted text samples corresponding to an audio sample, the predicted text samples including the text sample and one or more co-emitted text samples; (Beutnagel teaches selecting the N-best results of the system to ensure the correct pronunciation is selected. Beutnagel at 6:20 - 6:64. As such, the N-best results concept, in conjunction with the generation of multiple pronunciation candidates, demonstrates that Beutnagel’s candidate pronunciations are associated with an audio sample (e.g., the user specifying the word Peabody verbally in 4:12 - 4:26.) Further, Beutnagel teaches processing audio samples using speech recognition in part of a process for generating predicted pronunciations of text samples and audio samples. Beutnagel at Fig. 2 and 4:12 – 7:50.)
outputting, via the processing circuitry, the text sample; (Beutnagel teaches returning the results (i.e., outputting) which are the N-best ranked answers. Beutnagel at 6:20 - 6:41.)
Beutnagel, however, does not teach updating, via the processing circuitry, the encoding of allowable pronunciations of the text sample based on pronunciations of the one or more co-emitted text samples.
In a similar field of endeavor (e.g., automatic speech recognition and modifying pronunciation databases), Fanty teaches updating, via the processing circuitry, the encoding of allowable pronunciations of the text sample …. (Fanty teaches updating a pronunciation dictionary of allowable pronunciations (i.e., an encoding of allowable pronunciations) by comparing the pronunciations of one pattern with a series of replacement or alternative phonemes (i.e., co-emitted text entities). Fanty at 6:63 - 7:65.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel with the teachings of Fanty to provide updating the encoding of allowable pronunciations. Doing so would have improved the accuracy of the pronunciation dictionary as recognized by Fanty at 12:23 – 12:41.
Beutnagel-Fanty, however, do not alone teach for each corresponding co-emitted text sample of the one or more co-emitted text samples, determining, using a corresponding phoneme model associated with corresponding text sample, one or more corresponding pronunciations of the corresponding co-emitted text sample;
In a similar field of endeavor (e.g., the generation of grammars for text transcriptions in speech processing), Aleksic teaches for each corresponding co-emitted text sample of the one or more co-emitted text samples, determining, using a corresponding phoneme model associated with corresponding text sample, one or more corresponding pronunciations of the corresponding co-emitted text sample; (Aleksic teaches, for each candidate transcription generating grammars (i.e., allowable pronunciations) and calculating confidence scores for the grammars for each of the candidate transcriptions. Aleksic at 9:5 - 9:26 and 11:56 - 12:16. Further, Aleksic teaches using a sequence-to-sequence neural network with specific layers that correspond to each sample utterance (i.e., a corresponding phoneme model associated with the corresponding text samples.) Aleksic at 3:61 - 4:25.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Aleksic to provide the limitations of claim 1. Doing so would have improved detection of user detent and speech recognition as recognized by Aleksic at 8:11 – 8:16.
Beutnagel-Fanty in view of Aleksic (hereinafter Beutnagel-Fanty-Aleksic) however, does not alone teach encoding the corresponding pronunciations of the one or more co-emitted text samples as a convex hull in the phoneme space; and removing any pronunciations of the text sample that fall outside of the convex hull.
In a similar field of endeavor (e.g., speech processing and pronunciation analysis), Fox teaches encoding the corresponding pronunciations of the one or more co-emitted text samples as a convex hull in the phoneme space; (Fox teaches encoding multiple vowels used in dialects and generations into a convex hull (i.e., encoding corresponding pronunciations of a text sample into a phoneme space) Fox at section II. subsection D. Vowel space area computations.)
and … removing any pronunciations of the text sample that fall outside of the convex hull. (Fox teaches including boundaries for the convex hull (i.e., excluding results outside the boundaries). Fox at section II. subsection D. Vowel space area computations.)
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the speech recognition system of Aleksic, which uses context to bias results, by incorporating the convex hull boundary method of Fox to improve recognition accuracy by ensuring that the predicted pronunciation of a target word is acoustically consistent with the dialect or accent of the surrounding (co-emitted) words. Fox teaches that the convex hull effectively captures the "working space" of a speaker's dialect. Applying this boundary to filter "outlier" pronunciations in Aleksic's lattice would be a predictable application of known geometric constraints to improve consistency.
Regarding claim 2, Beutnagel-Fanty teaches all the limitations of claim 1 as laid out above. Further, Beutnagel teaches the method of Claim 1, wherein the encoding of allowable pronunciations is generated based on a measure of pronunciation certainty of the text sample. (Beutnagel teaches the pronunciations are evaluated based on probability of the members of the text entity/word. Beutnagel at 5:5 - 5:56. As such, the generation of the pronunciations is based on a certainty (i.e., a probability).)
Regarding claim 5, Beutnagel-Fanty teaches all the limitations of claim 1 as laid out above. Further, Fanty teaches the method of Claim 1, wherein the updating the encoding of allowable pronunciations of the text sample includes updating a predicted accuracy of allowable pronunciations of the text sample based on the pronunciations of the one or more co-emitted text samples. (Fanty teaches a general method for improving the accuracy of a pronunciation dictionary that includes updating allowable pronunciations by comparing the pronunciations of one pattern with a series of replacement or alternative phonemes. Fanty at 6:63 - 7:65. Further, Fanty teaches that the general method disclosed is for determining and improving the accuracy of a pronunciation dictionary (i.e., allowable pronunciations). Fanty at 6:63 - 7:65 and 12:22 - 12:39. As such, Fanty teaches updating the accuracy of a pronunciation dictionary based on pronunciations of co-emitted text samples (i.e., alternative or replacement pronunciations).)
Regarding claim 8, Beutnagel teaches a device comprising:
processing circuitry configured to generate an encoding of allowable pronunciations of a text sample within a phoneme space, (Beutnagel teaches generating candidate pronunciations for a word (i.e., text entity) wherein the candidates are generated in order to ensure the correct pronunciation is selected (i.e., the correct pronunciation is "allowable"). Beutnagel at 5:5 - 6:19. Further, Beutnagel teaches generating a plurality of candidate pronunciations phonetically (i.e., encoding of allowable pronunciations). Beutnagel at 5:16 - 6:41. Further, Beutnagel teaches a computer coupled to memory executing software such as a speech recognition engine. Beutnagel at 2:35 – 2:56. Further still, Beutnagel teaches generating predicted pronunciations for a text sample based upon phonemes (i.e., within a phoneme space.) Beutnagel at Fig. 2 and 4:12 - 7:50.)
receive an audio sample including the text sample, the audio sample comprising speech spoken by a user; (Beutnagel teaches receiving an audio sample containing an audio sample corresponding to a text sample and generating predicted pronunciations for the text. Beutnagel at Fig. 2 and 4:12 - 5:56.)
process the audio sample to generate predicted text samples corresponding to the audio sample, the predicted text samples including the text sample and one or more co-emitted text samples, (Beutnagel teaches selecting the N-best results of the system to ensure the correct pronunciation is selected. Beutnagel at 6:20 - 6:64. As such, the N-best results concept, in conjunction with the generation of multiple pronunciation candidates, demonstrates that Beutnagel’s candidate pronunciations are associated with an audio sample (e.g., the user specifying the word Peabody verbally in 4:12 - 4:26.) Further, Beutnagel teaches processing audio samples using speech recognition in part of a process for generating predicted pronunciations of text samples and audio samples. Beutnagel at Fig. 2 and 4:12 – 7:50.)
output the text sample, and (Beutnagel teaches returning the results (i.e., outputting) which are the N-best ranked answers. Beutnagel at 6:20 - 6:41.)
Beutnagel, however, does not teach updating the encoding of allowable pronunciations of the text sample….
In a similar field of endeavor (e.g., automatic speech recognition and modifying pronunciation databases), Fanty teaches updating the encoding of allowable pronunciations of the text sample based on pronunciations of the one or more co-emitted text samples. (Fanty teaches updating a pronunciation dictionary of allowable pronunciations (i.e., an encoding of allowable pronunciations) by comparing the pronunciations of one pattern with a series of replacement or alternative phonemes (i.e., co-emitted text entities). Fanty at 6:63 - 7:65.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel with the teachings of Fanty to provide updating the encoding of allowable pronunciations. Doing so would have improved the accuracy of the pronunciation dictionary as recognized by Fanty at 12:23 – 12:41.
Beutnagel-Fanty, however, do not alone teach for each corresponding co-emitted text sample of the one or more co-emitted text samples, determine, using a corresponding phoneme model associated with corresponding text sample, one or more corresponding pronunciations of the corresponding co-emitted text sample;
In a similar field of endeavor (e.g., the generation of grammars for text transcriptions in speech processing), Aleksic teaches for each corresponding co-emitted text sample of the one or more co-emitted text samples, determining, using a corresponding phoneme model associated with corresponding text sample, one or more corresponding pronunciations of the corresponding co-emitted text sample; (Aleksic teaches, for each candidate transcription generating grammars (i.e., allowable pronunciations) and calculating confidence scores for the grammars for each of the candidate transcriptions. Aleksic at 9:5 - 9:26 and 11:56 - 12:16. Further, Aleksic teaches using a sequence-to-sequence neural network with specific layers that correspond to each sample utterance (i.e., a corresponding phoneme model associated with the corresponding text samples.) Aleksic at 3:61 - 4:25.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Aleksic to provide the limitations of claim 1. Doing so would have improved detection of user detent and speech recognition as recognized by Aleksic at 8:11 – 8:16.
Beutnagel-Fanty in view of Aleksic (hereinafter Beutnagel-Fanty-Aleksic) however, does not alone teach encode the corresponding pronunciations of the one or more co-emitted text samples as a convex hull in the phoneme space; and removing any pronunciations of the text sample that fall outside of the convex hull.
In a similar field of endeavor (e.g., speech processing and pronunciation analysis), Fox teaches encoding the corresponding pronunciations of the one or more co-emitted text samples as a convex hull in the phoneme space; (Fox teaches encoding multiple vowels used in dialects and generations into a convex hull (i.e., encoding corresponding pronunciations of a text sample into a phoneme space) Fox at section II. subsection D. Vowel space area computations.)
and … removing any pronunciations of the text sample that fall outside of the convex hull. (Fox teaches including boundaries for the convex hull (i.e., excluding results outside the boundaries). Fox at section II. subsection D. Vowel space area computations.)
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the speech recognition system of Aleksic, which uses context to bias results, by incorporating the convex hull boundary method of Fox to improve recognition accuracy by ensuring that the predicted pronunciation of a target word is acoustically consistent with the dialect or accent of the surrounding (co-emitted) words. Fox teaches that the convex hull effectively captures the "working space" of a speaker's dialect. Applying this boundary to filter "outlier" pronunciations in Aleksic's lattice would be a predictable application of known geometric constraints to improve consistency.
Regarding claim 9, Beutnagel-Fanty teaches all the limitations of claim 8, as laid out above. Further, Beutnagel teaches the device of claim 8, wherein the encoding of allowable pronunciations is generated based on a measure of pronunciation certainty of the text sample. (Beutnagel teaches the pronunciations are evaluated based on probability of the members of the text entity/word. Beutnagel at 5:5 - 5:56. As such, the generation of the pronunciations is based on a certainty (i.e., a probability).)
Regarding claim 12, Beutnagel-Fanty teaches all the limitations of claim 8 as laid out above. Further, Fanty teaches the device of Claim 8, wherein the processing circuitry is configured to update the encoding of allowable pronunciations of the text sample by updating a predicted accuracy of allowable pronunciations of the text sample based on the pronunciations of the one or more co- emitted text samples. (Fanty teaches a general method for improving the accuracy of a pronunciation dictionary that includes updating allowable pronunciations by comparing the pronunciations of one pattern with a series of replacement or alternative phonemes. Fanty at 6:63 - 7:65. Further, Fanty teaches that the general method disclosed is for determining and improving the accuracy of a pronunciation dictionary (i.e., allowable pronunciations). Fanty at 6:63 - 7:65 and 12:22 - 12:39. As such, Fanty teaches updating the accuracy of a pronunciation dictionary based on pronunciations of co-emitted text samples (i.e., alternative or replacement pronunciations).)
Regarding claim 14, Beutnagel teaches a non-transitory computer-readable storage medium for storing computer-readable instructions that, when executed by a computer, cause the computer to perform a method, the method comprising: (Beutnagel teaches a computer coupled to memory executing software such as a speech recognition engine. Beutnagel at 2:35 – 2:56.)
generating an encoding of allowable pronunciations of a text sample within a phoneme space; (Beutnagel teaches generating candidate pronunciations for a word (i.e., text entity) wherein the candidates are generated in order to ensure the correct pronunciation is selected (i.e., the correct pronunciation is "allowable"). Beutnagel at 5:5 - 6:19. Further, Beutnagel teaches generating a plurality of candidate pronunciations phonetically (i.e., encoding of allowable pronunciations). Beutnagel at 5:16 - 6:41. Further, Beutnagel teaches a computer coupled to memory executing software such as a speech recognition engine. Beutnagel at 2:35 – 2:56. Further still, Beutnagel teaches generating predicted pronunciations for a text sample based upon phonemes (i.e., within a phoneme space.) Beutnagel at Fig. 2 and 4:12 - 7:50.)
receiving an audio sample including the text sample, the audio sample comprising speech spoken by a user; (Beutnagel teaches receiving an audio sample containing an audio sample corresponding to a text sample and generating predicted pronunciations for the text. Beutnagel at Fig. 2 and 4:12 - 5:56.)
processing the audio sample to generate predicted text samples corresponding to an audio sample, the predicted text samples including the text sample and one or more co-emitted text samples; (Beutnagel teaches selecting the N-best results of the system to ensure the correct pronunciation is selected. Beutnagel at 6:20 - 6:64. As such, the N-best results concept, in conjunction with the generation of multiple pronunciation candidates, demonstrates that Beutnagel’s candidate pronunciations are associated with an audio sample (e.g., the user specifying the word Peabody verbally in 4:12 - 4:26.) Further, Beutnagel teaches processing audio samples using speech recognition in part of a process for generating predicted pronunciations of text samples and audio samples. Beutnagel at Fig. 2 and 4:12 – 7:50.)
outputting the text sample; (Beutnagel teaches returning the results (i.e., outputting) which are the N-best ranked answers. Beutnagel at 6:20 - 6:41.)
Beutnagel, however, does not teach updating the encoding of allowable pronunciations of the text sample….
In a similar field of endeavor (e.g., automatic speech recognition and modifying pronunciation databases), Fanty teaches updating the encoding of allowable pronunciations of the text sample based on pronunciations of the one or more co-emitted text samples. (Fanty teaches updating a pronunciation dictionary of allowable pronunciations (i.e., an encoding of allowable pronunciations) by comparing the pronunciations of one pattern with a series of replacement or alternative phonemes (i.e., co-emitted text entities). Fanty at 6:63 - 7:65.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel with the teachings of Fanty to provide updating the encoding of allowable pronunciations. Doing so would have improved the accuracy of the pronunciation dictionary as recognized by Fanty at 12:23 – 12:41.
Beutnagel-Fanty, however, do not alone teach for each corresponding co-emitted text sample of the one or more co-emitted text samples, determining, using a corresponding phoneme model associated with corresponding text sample, one or more corresponding pronunciations of the corresponding co-emitted text sample;
In a similar field of endeavor (e.g., the generation of grammars for text transcriptions in speech processing), Aleksic teaches for each corresponding co-emitted text sample of the one or more co-emitted text samples, determining, using a corresponding phoneme model associated with corresponding text sample, one or more corresponding pronunciations of the corresponding co-emitted text sample; (Aleksic teaches, for each candidate transcription generating grammars (i.e., allowable pronunciations) and calculating confidence scores for the grammars for each of the candidate transcriptions. Aleksic at 9:5 - 9:26 and 11:56 - 12:16. Further, Aleksic teaches using a sequence-to-sequence neural network with specific layers that correspond to each sample utterance (i.e., a corresponding phoneme model associated with the corresponding text samples.) Aleksic at 3:61 - 4:25.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Aleksic to provide the limitations of claim 1. Doing so would have improved detection of user detent and speech recognition as recognized by Aleksic at 8:11 – 8:16.
Beutnagel-Fanty in view of Aleksic (hereinafter Beutnagel-Fanty-Aleksic) however, does not alone teach encoding the corresponding pronunciations of the one or more co-emitted text samples as a convex hull in the phoneme space; and removing any pronunciations of the text sample that fall outside of the convex hull.
In a similar field of endeavor (e.g., speech processing and pronunciation analysis), Fox teaches encoding the corresponding pronunciations of the one or more co-emitted text samples as a convex hull in the phoneme space; (Fox teaches encoding multiple vowels used in dialects and generations into a convex hull (i.e., encoding corresponding pronunciations of a text sample into a phoneme space) Fox at section II. subsection D. Vowel space area computations.)
and … removing any pronunciations of the text sample that fall outside of the convex hull. (Fox teaches including boundaries for the convex hull (i.e., excluding results outside the boundaries). Fox at section II. subsection D. Vowel space area computations.)
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the speech recognition system of Aleksic, which uses context to bias results, by incorporating the convex hull boundary method of Fox to improve recognition accuracy by ensuring that the predicted pronunciation of a target word is acoustically consistent with the dialect or accent of the surrounding (co-emitted) words. Fox teaches that the convex hull effectively captures the "working space" of a speaker's dialect. Applying this boundary to filter "outlier" pronunciations in Aleksic's lattice would be a predictable application of known geometric constraints to improve consistency.
Regarding claim 15, Beutnagel-Fanty teaches all the limitations of claim 14, as laid out above. Further, Beutnagel teaches the non-transitory computer-readable storage medium of Claim 14, wherein the encoding of allowable pronunciations is generated based on a measure of pronunciation certainty of the text sample. (Beutnagel teaches the pronunciations are evaluated based on probability of the members of the text entity/word. Beutnagel at 5:5 - 5:56. As such, the generation of the pronunciations is based on a certainty (i.e., a probability).)
Regarding claim 18, Beutnagel-Fanty teaches all the limitations of claim 14 as laid out above. Further, Fanty teaches non-transitory computer-readable storage medium of Claim 14, wherein the updating the encoding of allowable pronunciations of the text sample includes updating a predicted accuracy of allowable pronunciations of the text sample based on the pronunciations of the one or more co-emitted text samples. (Fanty teaches a general method for improving the accuracy of a pronunciation dictionary that includes updating allowable pronunciations by comparing the pronunciations of one pattern with a series of replacement or alternative phonemes. Fanty at 6:63 - 7:65. Further, Fanty teaches that the general method disclosed is for determining and improving the accuracy of a pronunciation dictionary (i.e., allowable pronunciations). Fanty at 6:63 - 7:65 and 12:22 - 12:39. As such, Fanty teaches updating the accuracy of a pronunciation dictionary based on pronunciations of co-emitted text samples (i.e., alternative or replacement pronunciations).)
Claims 3, 6 – 7, 10, 13, 16, and 19 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Beutnagel-Fanty-Aleksic-Fox as applied to claims 1, 8, and 14 above, and further in view of U.S. Patent No. 10,339,920 B2 to Jeffrey Penrod Adams et al. (hereinafter Adams).
Regarding claim 3, Beutnagel-Fanty teaches all the limitations of claim 1 as laid out above. Beutnagel-Fanty, however, do not teach all the limitations of claim 3.
In a similar field of endeavor (e.g., predicting pronunciations in speech recognition), Adams teaches the method of Claim 1, wherein the text sample is outputted based on syntactic context of the audio sample. (Adams teaches speech storage including data indicating which words may be used together in particular contexts (i.e., syntactic context of speech storage, or audio samples.) Adams at 7:29 - 8:26.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Adams to provide the text sample is outputted based on syntactic context of the audio sample. Doing so would have improved the models and system performance as recognized by Adams at 7:29 – 8:26.
Regarding claim 6, Beutnagel-Fanty teaches all the limitations of claim 1 as laid out above. Beutnagel-Fanty, however, do not teach all the limitations of claim 6.
In a similar field of endeavor (e.g., predicting pronunciations in speech recognition), Adams teaches the method of Claim 1, wherein the updating the encoding of allowable pronunciations of the text sample includes generating allowable pronunciations using a grapheme- to-phoneme model. (Adams teaches using a grapheme-to-phoneme as part of a method for determining expected (i.e., allowable) pronunciations of words. Adams at 11:29 - 11:46. Further, Beutnagel contemplates, or is aware of the state of the art including grapheme-to-phoneme models as Beutnagel references grapheme-to-phoneme research used in Beutnagel’s process. Beutnagel at 4:52 - 5:4.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Adams to provide updating the encoding of allowable pronunciations including generating allowable pronunciations using a grapheme-to-phoneme model. Doing so would have improved the probability of selecting the correct pronunciation as recognized by Adams at 9:66 – 10:52. As such, a person of ordinary skill in the art would have found it obvious to combine Adams' teachings of a grapheme-to-phoneme model with Beutnagel’s teachings.
Regarding claim 7, Beutnagel-Fanty in view of Adams (hereinafter Beutnagel-Fanty-Adams) teaches all the limitations of claim 6 as laid out above. Further, Adams teaches the method of Claim 6, wherein the pronunciations of the one or more co-emitted text samples are inputs to the grapheme-to-phoneme model. (Further, Adams teaches processing speech commands using ASR and ranking the N-best list of results for ASR. Adams then teaches processing the ASR into a lexicon that contains one or more expected pronunciations of each textual identifier determined by a grapheme to phoneme (G2P) process (i.e., a grapheme to phoneme model). Adams at 10:53 - 11:46. As such, the speech recognition results of an N-best ranked list (i.e., co-emitted text samples) are processed using a grapheme to phoneme model. Therefore, the N-best ranked list (i.e., co-emitted text samples) is an input to the grapheme to phoneme model.)
Regarding claim 10, Beutnagel-Fanty teaches all the limitations of claim 8 as laid out above. Beutnagel-Fanty, however, do not teach all the limitations of claim 10.
In a similar field of endeavor (e.g., predicting pronunciations in speech recognition), Adams teaches the device of Claim 8, wherein the text sample is outputted based on syntactic context of the audio sample. (Adams teaches speech storage including data indicating which words may be used together in particular contexts (i.e., syntactic context of speech storage, or audio samples.) Adams at 7:29 - 8:26.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Adams to provide the text sample is outputted based on syntactic context of the audio sample. Doing so would have improved the models and system performance as recognized by Adams at 7:29 – 8:26.
Regarding claim 13, Beutnagel-Fanty teaches all the limitations of claim 8 as laid out above. Beutnagel-Fanty, however, do not teach all the limitations of claim 13.
In a similar field of endeavor (e.g., predicting pronunciations in speech recognition), Adams teaches the device of Claim 8, wherein the processing circuitry is configured to update the encoding of allowable pronunciations of the text sample by generating allowable pronunciations using a grapheme-to-phoneme model. (Adams teaches using a grapheme-to-phoneme as part of a method for determining expected (i.e., allowable) pronunciations of words. Adams at 11:29 - 11:46. Further, Beutnagel contemplates, or is aware of the state of the art including grapheme-to-phoneme models as Beutnagel references grapheme-to-phoneme research used in Beutnagel’s process. Beutnagel at 4:52 - 5:4.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Adams to provide updating the encoding of allowable pronunciations including generating allowable pronunciations using a grapheme-to-phoneme model. Doing so would have improved the probability of selecting the correct pronunciation as recognized by Adams at 9:66 – 10:52. As such, a person of ordinary skill in the art would have found it obvious to combine Adams' teachings of a grapheme-to-phoneme model with Beutnagel’s teachings.
Regarding claim 16, Beutnagel-Fanty teaches all the limitations of claim 14 as laid out above. Beutnagel-Fanty, however, do not teach all the limitations of claim 16.
In a similar field of endeavor (e.g., predicting pronunciations in speech recognition), Adams teaches the non-transitory computer-readable storage medium of Claim 14, wherein the text sample is outputted based on syntactic context of the audio sample. (Adams teaches speech storage including data indicating which words may be used together in particular contexts (i.e., syntactic context of speech storage, or audio samples.) Adams at 7:29 - 8:26.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Adams to provide the text sample is outputted based on syntactic context of the audio sample. Doing so would have improved the models and system performance as recognized by Adams at 7:29 – 8:26.
Regarding claim 19, Beutnagel-Fanty teaches all the limitations of claim 14 as laid out above. Beutnagel-Fanty, however, do not teach all the limitations of claim 19.
In a similar field of endeavor (e.g., predicting pronunciations in speech recognition), Adams teaches the non-transitory computer-readable storage medium of Claim 14, wherein the updating the encoding of allowable pronunciations of the text sample includes generating allowable pronunciations using a grapheme-to-phoneme model. (Adams teaches using a grapheme-to-phoneme as part of a method for determining expected (i.e., allowable) pronunciations of words. Adams at 11:29 - 11:46. Further, Beutnagel contemplates, or is aware of the state of the art including grapheme-to-phoneme models as Beutnagel references grapheme-to-phoneme research used in Beutnagel’s process. Beutnagel at 4:52 - 5:4.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Beutnagel-Fanty with the teachings of Adams to provide updating the encoding of allowable pronunciations including generating allowable pronunciations using a grapheme-to-phoneme model. Doing so would have improved the probability of selecting the correct pronunciation as recognized by Adams at 9:66 – 10:52. As such, a person of ordinary skill in the art would have found it obvious to combine Adams' teachings of a grapheme-to-phoneme model with Beutnagel’s teachings.
Regarding claim 20, Beutnagel-Fanty in view of Adams (hereinafter Beutnagel-Fanty-Adams) teaches all the limitations of claim 19 as laid out above. Further, Adams teaches the non-transitory computer-readable storage medium of Claim 19, wherein the pronunciations of the one or more co-emitted text samples are inputs to the grapheme-to-phoneme model. (Further, Adams teaches processing speech commands using ASR and ranking the N-best list of results for ASR. Adams then teaches processing the ASR into a lexicon that contains one or more expected pronunciations of each textual identifier determined by a grapheme to phoneme (G2P) process (i.e., a grapheme to phoneme model). Adams at 10:53 - 11:46. As such, the speech recognition results of an N-best ranked list (i.e., co-emitted text samples) are processed using a grapheme to phoneme model. Therefore, the N-best ranked list (i.e., co-emitted text samples) is an input to the grapheme to phoneme model.)
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAMERON KENNETH YOUNG whose telephone number is (703)756-1527. The examiner can normally be reached Mon - Fri, 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CAMERON KENNETH YOUNG/Examiner, Art Unit 2655
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655