Last updated: April 19, 2026

Application No. 18/675,792

Method and System for a Parametric Speech Synthesis

Final Rejection §103§DP

Filed

May 28, 2024

Examiner

PATEL, SHREYANS A

Art Unit

2659

Tech Center

2600 — Communications

Assignee

Georgetown University

OA Round

2 (Final)

Interview Optional

— +7.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 403 resolved cases, 2023–2026

Examiner Intelligence

PATEL, SHREYANS A View full profile →

Grants 89% — above average

Career Allow Rate

359 granted / 403 resolved

+27.1% vs TC avg

Moderate +7% lift

Without

With

+7.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 3m

Avg Prosecution

46 currently pending

Career history

449

Total Applications

across all art units

Statute-Specific Performance

§101

21.3%

-18.7% vs TC avg

§103

36.0%

-4.0% vs TC avg

§102

22.6%

-17.4% vs TC avg

§112

8.8%

-31.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 403 resolved cases

Office Action

§103 §DP

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments with respect to 35 U.S.C. 101 Abstract Idea rejection of claims 21-35 have been considered and found persuasive due to amendments, and the rejection has been withdrawn.
Double Patenting is still maintained.
Applicant's arguments with respect to 35 U.S.C. 102 in regards to claims 21, 28 and 35 have been considered but are moot due to new grounds of rejection necessitated by amendments. See detailed rejection below. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21-35 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-14 of U.S. Patent No. 12,020,687. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following:

Pending US Application No. 18/675,792
US Patent No. 12,020,687
a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme from the text converter and an identity of a speaker and to generate and enhance acoustic features for each of the at least one phoneme, wherein the voice patterns comprise a plurality of production components and a plurality of acoustic components arranged in a matrix, each of the plurality of production components contributing a selected amount to each of the acoustic components, such that the selected amount contributed is present in the matrix for each pair of production component and acoustic component, and wherein the enhanced acoustic features comprise at least one of spectral enhancement or focal enhancement; and
a decoder adapted to receive the generated acoustic features from the machine-learning model and to generate a speech signal simulating a voice of the identified speaker in a language.
a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive at least one phoneme and an identity of a speaker and to generate and enhance acoustic features for each phoneme, wherein the voice patterns comprise .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21, 23, 26-28, 30 and 32-35 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Baumgartner et al. (US 6,463,412) in view of De Leon et al. (US 9,865,253).

Claims 21, 28 and 35,
Baumgartner teaches a speech conversion system comprising; a memory for storing a plurality of phonemes ([Summary of the Invention] [col. 2 lines 6-13] the voice dictionaries consist of an array of symbolic representations for phonemes); 
a text converter adapted to convert input text to at least one phoneme selected from the plurality of phonemes stored in the memory ([col. 4 lines 9-17] The voice recognition device 140 breaks down the input voice pattern into symbolic representations of the phonemes that make up the input voice pattern which are then forwarded to the voice dictionary interface 150);
wherein the enhanced acoustic features comprise at least one of spectral enhancement or focal enhancement ([col. 4 lines 42-55] the application of the input voice characteristics extracted from the input voice pattern may be performed using digital filtering techniques); and
a decoder adapted to receive the generated acoustic features from the machine-learning model and to generate a speech signal simulating a voice of the identified speaker in a language ([col. 4 lines 42-55] [col. 5 lines 60-67] speech output generator generates the output speech signals using target speaker segments and user can designate a different output voice).
The difference between the prior art and the claimed invention is that Baumgartner does not explicitly teach a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme from the text converter and an identity of a speaker and to generate and enhance acoustic features for each of the at least one phoneme, wherein the voice patterns comprise a plurality of production components and a plurality of acoustic components arranged in a matrix, each of the plurality of production components contributing a selected amount to each of the acoustic components, such that the selected amount contributed is present in the matrix for each pair of production component and acoustic component.
De Leon teaches a machine-learning model storing voice patterns for a plurality of individuals ([col. 8 lines 15-31] ASR; during the training stage, each speaker’s enrollment speech is processed; the mean of the IQRs for each enrollment speaker is computed and the means are stored) and 
adapted to receive the at least one phoneme from the text converter and an identity of a speaker ([col. 4 line 59 to col. 5 line 4] [col. 8 lines 39-50] an automatic speech recognizer can be used to segment the utterance into individual phonemes; a user provides a claim of identity) and 
to generate and enhance acoustic features for each of the at least one phoneme ([col. 8 lines 15-31] the pitch pattern feature vector is computed for each phoneme), 
wherein the voice patterns comprise a plurality of production components and a plurality of acoustic components arranged in a matrix, each of the plurality of production components contributing a selected amount to each of the acoustic components, such that the selected amount contributed is present in the matrix for each pair of production component and acoustic component ([Figs. 1A-1B & 5] [col. 6 lines 5-19] extract the connected components; processing includes determining a bounding box and area of a connected component which are then used to filter out very small and irregularly-shaped components; the small and irregularly-shaped connected components are artifacts of the speech signal and not useful in feature extraction; the resulting connected components are then analyzed and used to compute the following statistics-based features: mean pitch stability, μs; mean time stability bandwidth, μB; and jitter, J; determines parameters on a per-connected component basis and then computes statistics over the connected components (connected components = production components; computed feature components = acoustic components; per-components parameters/feature-vectors from a matrix of contribution)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Baumgartner with teachings of De Leon by modifying the high performance voice transformation apparatus and method as taught by Baumgartner to include a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme from the text converter and an identity of a speaker and to generate and enhance acoustic features for each of the at least one phoneme, wherein the voice patterns comprise a plurality of production components and a plurality of acoustic components arranged in a matrix, each of the plurality of production components contributing a selected amount to each of the acoustic components, such that the selected amount contributed is present in the matrix for each pair of production component and acoustic component as taught by De Leon for the benefit of classifying the speech signal as human or synthetic based on the extracted features (De Leon [Abstract]).

Claims 23 and 30,
Baumgartner further teaches the system of claim 21, wherein the machine-learning model comprises a neural network model ([col. 4 line 4] neural network).

Claims 25 and 32,
The system of claim 21, wherein the generated acoustic features include accent acoustic features and the generated speech signal further simulates a voice of the identified speaker in a language and in an accent ([col. 6 line 66 to col. 7-8] [col. 9 line 1] creating a target speaker speak from a sentence based on phonemes in the target speaker language and accent).

Claims 26 and 33,
Baumgartner further teaches the system of claim 25, wherein the accent corresponds to an native accent of the identified speaker ([col. 1 lines 38-43] tick accent of a player among other players implies user accent and identity).

Claims 27 and 34,
Baumgartner further teaches the system of claim 21, wherein the plurality of production components comprise phones, coarticulation, prosody from linguistic features, and prosody from extra-linguistic features ([col. 2 lines 6-13] phonemes, an array of  symbolic representations), and the plurality of acoustic components comprise spectrum power, duration, and pitch ([col. 3 lines 49-52] voice characteristics include speech volume, pitch, pause lengths and the like ).

Claims 22 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Baumgartner et al. (US 6,463,412) in view of De Leon et al. (US 9,865,253) and further in view of Gupta (US 2024/0420681).

Claims 22 and 29,
Baumgartner and De Leon teach all the limitations in claim 21. The difference between the prior art and the claimed invention is that Baumgartner nor De Leon explicitly teach wherein the at least one comprises a phoneme of the International Phonetic Alphabet and silence and breath.
Gupta teaches wherein the at least one comprises a phoneme of the International Phonetic Alphabet and silence and breath (]0006] [0054] silences/pauses, breaths and international phonetic alphabet (IPA)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Baumgartner with teachings of Gupta by modifying the high performance voice transformation as method and device as taught by Baumgartner to include wherein the at least one comprises a phoneme of the International Phonetic Alphabet and silence and breath as taught by Gupta for the benefit of providing a unified solution that improves the accuracy, naturalness, and emotional consistency of synthesized speech across different languages (Gupta [0013].

Claims 24 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Baumgartner et al. (US 6,463,412) in view of De Leon et al. (US 9,865,253) and further in view of Endo et al. (JP 2010286608).

Claims 24 and 31,
Baumgartner and De Leon teach all the limitations in claim 21. The difference between the prior art and the claimed invention is that Baumgartner nor De Leon explicitly teach wherein spectral enhancement comprises increasing a peak of a spectral envelope or decreasing a trough of the spectral envelope and focal emphasizing the difference between a first frame and a second frame.
Endo teaches wherein spectral enhancement comprises increasing a peak of a spectral envelope or decreasing a trough of the spectral envelope and focal emphasizing the difference between a first frame and a second frame ([Appendix 1] an envelope amplitude spectrum broadening unit that broadens the envelope amplitude spectrum by extending a frequency band of the envelope amplitude spectrum to a second frequency band different from the first frequency band). 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Baumgartner with teachings of Endo by modifying the high performance voice transformation as method and device as taught by Baumgartner to include wherein spectral enhancement comprises increasing a peak of a spectral envelope or decreasing a trough of the spectral envelope and focal emphasizing the difference between a first frame and a second frame as taught by Endo for the benefit of improving the quality of reproduced audio by artificially extending a frequency band including an audio signal (Endo [Background-Art].

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Primary Examiner
Art Unit 2653



/SHREYANS A PATEL/Examiner, Art Unit 2659

Read full office action

Prosecution Timeline

May 28, 2024

Application Filed

Sep 17, 2025

Non-Final Rejection — §103, §DP

Dec 22, 2025

Response Filed

Jan 26, 2026

Final Rejection — §103, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/934,906

Patent 12586597

ENHANCED AUDIO FILE GENERATOR

2y 5m to grant Granted Mar 24, 2026

18/744,449

Patent 12586561

TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM, AND A METHOD OF CALCULATING AN EXPRESSIVITY SCORE

2y 5m to grant Granted Mar 24, 2026

17/983,671

Patent 12548549

ON-DEVICE PERSONALIZATION OF SPEECH SYNTHESIS FOR TRAINING OF SPEECH RECOGNITION MODEL(S)

2y 5m to grant Granted Feb 10, 2026

18/589,789

Patent 12548583

ACOUSTIC CONTROL APPARATUS, STORAGE MEDIUM AND ACCOUSTIC CONTROL METHOD

2y 5m to grant Granted Feb 10, 2026

18/201,105

Patent 12536988

SPEECH SYNTHESIS METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM

2y 5m to grant Granted Jan 27, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

89%

Grant Probability

96%

With Interview (+7.4%)

2y 3m

Median Time to Grant

Moderate

PTA Risk

Based on 403 resolved cases by this examiner. Grant probability derived from career allow rate.