Prosecution Insights
Last updated: May 29, 2026
Application No. 18/179,931

Training a Voice Recognition Model Using Simulated Voice Samples

Final Rejection §103
Filed
Mar 07, 2023
Priority
Feb 24, 2023 — provisional 63/448,130
Examiner
SERROU, ABDELALI
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Comcast Cable Communications LLC
OA Round
2 (Final)
74%
Grant Probability
Favorable
3-4
OA Rounds
2m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 74% — above average
74%
Career Allowance Rate
437 granted / 589 resolved
+12.2% vs TC avg
Strong +30% interview lift
Without
With
+30.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
17 currently pending
Career history
610
Total Applications
across all art units

Statute-Specific Performance

§101
5.0%
-35.0% vs TC avg
§103
80.8%
+40.8% vs TC avg
§102
8.9%
-31.1% vs TC avg
§112
1.2%
-38.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 589 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment 2. In response to the office action mailed on 09/30/2025, applicant filed an amendment on 02/11/2026, amending claims 1 and 6-9; canceling claims 12-21; and adding new claims 22-43. The pending claims are 1-11 and 22-43. Response to Arguments 3. Applicant’s arguments with respect to the pending claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 103 4. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim 1, 6-8, 11, 22, 27-29, 32-33, 38-40, 43 are rejected under 35 U.S.C. 103 as being unpatentable over Rosenberg (US 2023/0013587) in view of Ogilvie (WO 03/017229), and further in view of Garman (US 20240029710). As per claim 1, Rosenberg teaches receiving, by a computing device, a request for generating a plurality of simulated spoken phrases corresponding to a voice command ([0032], [0039], employing a text-to-speech (TTS) system 330 that is configured to generate, at each of a plurality of output steps, synthesized speech representations (e.g., synthetic speech) 332 for each of a plurality of unspoken training text utterances); generating, by the computing device and based on the request, the plurality of simulated spoken phrases corresponding to the voice command ([0073], generating, using a text-to-speech (TTS) system, a corresponding synthetic speech representation for each textual utterance 320 of the received training data); and training a voice recognition model based on the plurality of simulated spoken phrases ([0007], [0037], and [0073], wherein the audio encoder of the ASR model is trained based on the synthetic speech representations generated for the unspoken textual utterances. [0008], training data includes transcribed non-synthetic speech utterances and synthetic speech representation). Rosenberg may not explicitly wherein the plurality of simulated spoken phrases is based on: a first percentage value associated with a desired distribution of a first regional accent; and a second percentage value associated with a desired distribution of a second regional accent. Ogilvie in the same field of endeavor teaches a translation system with a capability of synthesizing translated text and speaking the synthesized speech (page 3, line 30, page 9, line 9-11, and page 27, line 12-16) that let the user to iteratively select displayed text for translation from one language to the other (Abstract, page 2, line25-29), and also, enables a user to select the percentage of a foreign language to be included within a given lesson (page 4, line 20-24). Therefore, it would have been obvious at the time the application was filed to add Ogilvie’s feature of selecting the percentage of a language with the system of Rosenberg, in order to select the percentage of a first language and the percentage of a second language for a plurality of simulated spoken phrases. Rosenberg in view of Ogilvie may not explicitly that the first and second percentages are associated with a first regional accent and a second regional accent. However, Garmin in the same field of endeavor teaches systems and methods for synthesizing speech in any voice in any language in any accent ([0007]). Therefore, it would have been obvious at the time the application was filed to add Gram’s above feature with the system of Rosenberg in view of Ogilvie, in order to generate a plurality of lesson’s simulated spoken phrases based on a first percentage value associated with a desired distribution of a first regional accent, and a second percentage value associated with a desired distribution of a second regional accent, as claimed. This would create speech that is more natural, localized, and emotionally resonant. As per claim 6, teaches causing output of a user interface screen ([0032]). Rosenberg may not explicitly disclose receiving, via the user interface screen, the first percentage value, wherein the first percentage value is further associated with a desired distribution of a first language; and receiving, via the user interface screen, the second percentage value, wherein the second percentage value is further associated with a desired distribution of a second language. Ogilvie in the same field of endeavor teaches a translation system with a capability of synthesizing translated text and speaking the synthesized speech (page 3, line 30, page 9, line 9-11, and page 27, line 12-16) that let the user to iteratively select, via a user interface, displayed text for translation from one language to the other (Abstract, page 2, line25-29), and also, enables a user to select the percentage of a foreign language to be included within a given lesson (page 4, line 20-24). Therefore, it would have been obvious at the time the application was filed to add Ogilvie’s feature of selecting the percentage of a language with the system of Rosenberg, in order to receive, via the user interface screen, the first percentage value, wherein the first percentage value is further associated with a desired distribution of a first language; and receive, via the user interface screen, the second percentage value, wherein the second percentage value is further associated with a desired distribution of a second language. This would create speech that is more natural, localized, and emotionally resonant. As per claim 7, Rosenberg in view of Ogilvie teaches causing output of a user interface screen (Rosenberg, [0032]), and further teaches a translation system with a capability of synthesizing translated text and speaking the synthesized speech (Ogilvie, page 3, line 30, page 9, line 9-11, and page 27, line 12-16) that let the user to iteratively select, via a user interface, displayed text for translation from one language to the other (Ogilvie, Abstract, page 2, line25-29), and also, enables a user to select the percentage of a foreign language to be included within a given lesson (Ogilvie, page 4, line 20-24) . Rosenberg in view of Ogilvie may not explicitly disclose receiving, via the user interface screen first percentage value indicating the desired distribution of the first regional accent; and receiving, via the user interface screen, the second percentage value indicating the desired distribution of the second regional accent. However, Garmin in the same field of endeavor teaches systems and methods for synthesizing speech in any voice in any language in any accent ([0007]). Therefore, it would have been obvious at the time the application was filed to add Gram’s above feature with the system of Rosenberg in view of Ogilvie, in order to receive, via the user interface screen first percentage value indicating the desired distribution of the first regional accent; and receive, via the user interface screen, the second percentage value indicating the desired distribution of the second regional accent. This would create speech that is more natural, localized, and emotionally resonant. As per claim 8, Rosenberg teaches sending the plurality of simulated spoken phrases to a voice recognition model of a voice- enabled system; receiving, from the voice recognition model, voice recognition results for the simulated spoken phrases; and revising, based on the voice recognition results, one or more operation parameters of the voice recognition model ([0047]). As per claim 11, Rosenberg teaches generating, by the computing device and based on the request, an expected result associated with the plurality of simulated spoken phrases ([0040]). As per claims 22, 27-29, and 32, system claims 22, 27-29, and 32 and method claims 1, 6-8, and 11 are related as apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claims 22, 27-29, and 32 are similarly rejected under the same rationale as applied above with respect to method claims 1, 6-8, and 11. Furthermore, Rosenberg teaches one or more processors; and memory storing thereon instructions, as claimed ([0077]). As per claims 33, 38-40, 43, Rosenberg teaches a computer readable medium ([0078]). The remaining steps are rejected under the same rationale as applied to the method steps of rejected claims 1, 6-8, and 11. Claim 2-4, 23-25, 34-36 are rejected under 35 U.S.C. 103 as being unpatentable over Rosenberg in view of Ogilvie and Garman, and further in view of Katsumata (US 2020/0177746). As per claims 2, 23, and 34, Rosenberg teaches receiving, by the computing device, a first text phrase and generating, based on the first text phrase, and based on one or more linguistic databases, synthetic speech ([0032], [0039], [0073). Rosenberg view of Ogilvie and Garman may not explicitly disclose, wherein the plurality of simulated spoken phrases comprises grammatical variants of the first text phrase. Katsumata in the same field of endeavor teaches generating synthetic speech based on textual data, and one or more linguistic databases, wherein the plurality of simulated spoken phrases comprise grammatical variants of the first text phrase ([0096]- [0098]). Therefore, it would have been obvious at the time the application was file to use Katsumata’s above features with the system of Rosenberg view of Ogilvie and Garman, in order to generate natural, expressive, and human-like speech. As per claims 3, 24, and 35, Rosenberg teaches receiving, by the computing device, a first text phrase and generating based on a syntactic database synthetic speech ([0032], [0034], [0039], [0073). Rosenberg view of Ogilvie and Garman may not explicitly disclose automatically generating, by the computing device, based on the first text phrase, based on a synonym database, the plurality of simulated spoken phrases, wherein the plurality of simulated spoken phrases comprises grammatical variants of the first text phrase. Katsumata in the same field of endeavor teaches generating synthetic speech based on a synonym database, and based on a syntactic database, the plurality of simulated spoken phrases, wherein the plurality of simulated spoken phrases comprise grammatical variants of the first text phrase ([0096]- [0098]). Therefore, it would have been obvious at the time the application was file to use Katsumata’s above features with the system of Rosenberg view of Ogilvie and Garman, in order to enable the model to select from a wider range of appropriate verbal options, and improve the naturalness, contextual accuracy, and diversity of the generated speech. As per claims 4, 25, and 36, Rosenberg teaches receiving, by the computing device, a first text phrase; and sending the first text phrase to a natural language understanding (NLU) process, and receiving from the NLU process, information indicating entity and intent terms based on the first text phrase; and automatically generating, by the computing device, based on the first text phrase, based on the entity and intent terms, and based on one or more linguistic databases, the plurality of simulated spoken phrases ([0032], wherein a received natural language phrase “What is the weather in New York City?” is processed by a natural language understanding (NLU) module to extract the corresponding intent and entities ). Rosenberg view of Ogilvie and Garman may not explicitly disclose wherein the plurality of simulated spoken phrases comprise grammatical variants of the first text phrase. Katsumata in the same field of endeavor teaches generating synthetic speech based on textual data, and one or more linguistic databases, wherein the plurality of simulated spoken phrases comprise grammatical variants of the first text phrase ([0096]- [0098]). Furthermore, Katsumata teaches identifying the claimed entities and intents (Katsumata, [0039], [0061], [0096]). Therefore, it would have been obvious at the time the application was filed to use Katsumata’s above features with the system of Rosenberg view of Ogilvie and Garman, in order to generate natural, expressive, and human-like speech. Claims 5, 26, and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Rosenberg in view of Ogilvie and Garman, and further in view of Katae (US 20110060590). Rosenberg in view of Ogilvie and Garman may not explicitly disclose causing output of a user interface screen; and receiving, via a first field of the user interface screen, a value indicating a quantity of desired simulated spoken phrases. Katae in the same field of endeavor teaches a synthetic speech device that displays on the screen a text box in accordance with the desired number of characters to be entered ([0064]). Therefore, it would have been obvious at the time the application was filed to use the graphical user interface of Katae with the system of Rosenberg in view of Ogilvie and Garman, in order to indicate the quantity of desired simulated spoken phrases. This would provide greater control and consistency over the generated audio and optimizes system performance. Claims 10, 31, and 42 are rejected under 35 U.S.C. 103 as being unpatentable over Rosenberg in view of Ogilvie and Garman, and further in view of Ye (US 2016/0283839). Rosenberg in view of Ogilvie and Garman may not explicitly disclose receiving updates to a future program schedule, and wherein the generating the plurality of simulated spoken phrases is performed automatically based on the updates to the future program schedule. Ye in the same field of endeavor teaches a graphical user interface (GUI), of a text to speech (TTS) system, wherein a user is provided a series of buttons which allows them to select which dictionary the instruction should update…. By clicking the “OK To Edit” button (Fig. 6), the selected dictionary is updated so that future instances of the text in the “Wrong text” field 620 can be automatically replaced with the correction ([0099], [0111]). Therefore, it would have been obvious at the time the application was filed to use Ye’s above features with the system of Rosenberg in view of Ogilvie and Garman, in order to provide timely, accurate information along with increasing accessibility and flexibility to users. Allowable Subject Matter 5. Claims 930, and 41 are objected to as being dependent upon a rejected base claim but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The prior art does not teach sending the plurality of simulated spoken phrases to a voice recognition model of a voice- enabled system; receiving, from the voice recognition model, screen images of voice recognition results for the simulated spoken phrases; performing optical character recognition, on the screen images, to generate resulting text; comparing the resulting text with expected text associated with the plurality of simulated spoken phrases; and revising, based on the comparing, one or more operation parameters of the voice recognition model. Conclusion 6. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDELALI SERROU whose telephone number is (571)272-7638. The examiner can normally be reached M-F 9 Am - 5 PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ABDELALI SERROU/Primary Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

Mar 07, 2023
Application Filed
Sep 30, 2025
Non-Final Rejection mailed — §103
Feb 11, 2026
Response Filed
Apr 23, 2026
Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12632665
CONTEXT-BASED NATURAL LANGUAGE PROCESSING
1y 11m to grant Granted May 19, 2026
Patent 12602544
INFORMATION PROCESSING APPARATUS, OPERATION METHOD, AND RECORDING MEDIUM
2y 2m to grant Granted Apr 14, 2026
Patent 12596875
TECHNIQUES FOR ADAPTIVE LARGE LANGUAGE MODEL USAGE
2y 6m to grant Granted Apr 07, 2026
Patent 12597417
EXPORTING MODULAR ENCODER FEATURES FOR STREAMING AND DELIBERATION ASR
2y 5m to grant Granted Apr 07, 2026
Patent 12596889
GENERATION OF NATURAL LANGUAGE (NL) BASED SUMMARIES USING A LARGE LANGUAGE MODEL (LLM) AND SUBSEQUENT MODIFICATION THEREOF FOR ATTRIBUTION
1y 10m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+30.5%)
3y 5m (~2m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 589 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month