Last updated: April 19, 2026

Application No. 18/262,242

METHODS AND SYSTEMS FOR MODIFYING SPEECH GENERATED BY A TEXT-TO-SPEECH SYNTHESISER

Final Rejection §103

Filed

Jul 20, 2023

Examiner

JACKSON, JAKIEDA R

Art Unit

2657

Tech Center

2600 — Communications

Assignee

Spotify AB

OA Round

2 (Final)

Interview Optional

— +15.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 905 resolved cases, 2023–2026

Examiner Intelligence

JACKSON, JAKIEDA R View full profile →

Grants 74% — above average

Career Allow Rate

669 granted / 905 resolved

+11.9% vs TC avg

Strong +15% interview lift

Without

With

+15.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

35 currently pending

Career history

940

Total Applications

across all art units

Statute-Specific Performance

§101

25.8%

-14.2% vs TC avg

§103

42.5%

+2.5% vs TC avg

§102

21.8%

-18.2% vs TC avg

§112

3.5%

-36.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 905 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicants argue that the prior art cited fails to teach the claims as amended.  Applicants’ arguments are persuasive, but are moot in view of new grounds of rejection.

Response to Arguments
Applicants argue that the prior art cited fails to teach the claims as amended.  Applicants arguments are persuasive, but are moot in view of new grounds of rejection.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claim(s) 1-3, 6-7, 9-12, 15, 17 and 48-51 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (PGPUB 2021/0142783), hereinafter reference as Kim in view of Audfray et al. (PGPUB 2021/0176588), hereafter referenced as Audfray.

Regarding claims 1 and 17, Kim discloses a method and system, hereinafter referenced as a method of modifying a speech signal generated by a text-to- speech synthesiser, the method comprising:
receiving a text signal (input text; p. 0124-0126); 
generating a speech signal from the text signal (Text-to-speech; p. 0119-0123); 
deriving a control feature vector, wherein the control feature vector represents modifications to the speech signal (vector; p. 0129-0138); 
inputting the control feature vector in the text-to-speech synthesiser, wherein the text-to-speech synthesiser is configured to generate a modified speech signal using the control feature vector (synthesis; p. 0135-0136); and 
outputting the modified speech signal wherein:
the text-to-speech synthesiser comprises a first model configured to generate the speech signal, and a controllable model configured to generate the modified speech signal (change speech style; p. 0102, 0112, 0122); and
the controllable model is trained using speech signals generated by the first model (train; p. 0059), but does not specifically teach that the controllable model is distinct from the first model.
Audfray discloses a method wherein a first model is generated based on the audio model components and the second model comprises a modified audio signal, which is two distinct models (p. 0011-0014, 0104-0110), to assist with storing, organizing and maintaining acoustic data. 
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to assist with providing synthetic audio.
Regarding claims 2 and 48, Kim discloses a method wherein deriving the control feature vector comprises:
analysing the speech signal (analyze data; p. 0143) 
obtaining a first feature vector from the analysed speech signal (vector; p. 0129-0138); 
obtaining a user input (user input; p. 0063, 0074-0076); and
modifying the first feature vector using the user input to obtain the control feature vector (vector; p. 0129-0138).  
Regarding claims 3 and 49, Kim discloses a method wherein the user input comprises a reference speech signal (reference speech; p. 0102).  
Regarding claims 6 and 50, Kim discloses a method wherein the controllable model comprises an encoder module, a decoder module, and an attention module linking the encoder module to the decoder module (p. 0126-0128, 0131, 0135-0136).  
Regarding claims 7 and 51, Kim discloses a method wherein the first feature vector is inputted at the decoder module (decoder; p. 0126-0128, 0131, 0135-0136).  
Regarding claim 9, Kim discloses a method wherein the first feature vector represents one of the properties of pitch or intensity (p. 0133-0136).  
Regarding claim 10, Kim discloses the method further comprising deriving a second feature vector, wherein the second feature vector represents features of the generated speech signal that are used to generate the modified speech signal (change speech style; p. 0102, 0112, 0122); and
inputting the second feature vector in the text-to-speech synthesiser, wherein the second feature vector is obtained from the analysed speech signal (synthesis; p. 0135-0136).  
Regarding claim 11, Kim discloses a method wherein: 
the controllable model comprises an encoder module, a decoder module, and an attention module linking the encoder module to the decoder module (p. 0126-0128, 0131, 0135-0136), and 
the second feature vector is inputted at the decoder module of the controllable model (p. 0126-0128, 0131, 0135-0136).  
Regarding claim 12, Kim discloses a method wherein a representation of the speech signal is inputted at the encoder module of the controllable model (p. 0126-0128, 0131, 0135-0136).  
Regarding claim 15, Kim discloses a method wherein the first model comprises an encoder module, a decoder module, and an attention module linking the encoder module to the decoder module (p. 0126-0128, 0131, 0135-0136).  

Claim(s) 8, 13-14, 16 and 52 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Audfray and in further view of Arik et al. (PGPUB 2018/0336880), hereinafter referenced as Arik.

Regarding claims 8 and 52, Kim and Audfray disclose a method as described above, but does not specifically teach wherein the first feature vector is modified by a pre net before being inputted at the decoder module of the controllable model.  
Arik discloses a method wherein the first feature vector is modified by a pre net before being inputted at the decoder module of the controllable model (p. 0110), to assist with modeling and data analysis.
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to improve computer performances, features and uses. 
Regarding claim 13, it is interpreted and rejected for similar reasons as set forth above.  In addition, Kim discloses a method wherein the method further comprises deriving a modified alignment from the user input, wherein the modified alignment indicates modifications to a timing of the speech signal (p. 0052-0065, 0111).  
Regarding claim 14, it is interpreted and rejected for similar reasons as set forth above.  In addition, Kim discloses a method wherein the modified alignment is inputted at the attention module of the controllable model (p. 0107-0110).  
Regarding claim 16, Kim discloses a method the method further comprising:
inputting the third feature vector in the encoder module of the controllable model (p. 0126-0128, 0131, 0135-0136).  In addition, Arik discloses a method comprising deriving a third feature vector from the attention module of the first model, wherein the third feature vector corresponds to a timing of phonemes of the received text signal (p. 0052-0065, 0111).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  This information has been detailed in the PTO 892 attached (Notice of References Cited).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JAKIEDA R JACKSON/           Primary Examiner, Art Unit 2657

Read full office action

Prosecution Timeline

Jul 20, 2023

Application Filed

Jul 09, 2025

Non-Final Rejection — §103

Oct 09, 2025

Interview Requested

Nov 03, 2025

Applicant Interview (Telephonic)

Nov 04, 2025

Response Filed

Nov 06, 2025

Examiner Interview Summary

Feb 12, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/151,953

Patent 12603079

PROVIDING A REPOSITORY OF AUDIO FILES HAVING PRONUNCIATIONS FOR TEXT STRINGS TO PROVIDE TO A SPEECH SYNTHESIZER

2y 5m to grant Granted Apr 14, 2026

18/379,618

Patent 12603088

TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL

2y 5m to grant Granted Apr 14, 2026

17/750,345

Patent 12598092

SYSTEMS, METHODS, AND APPARATUS FOR NOTIFYING A TRANSCRIBING AND TRANSLATING SYSTEM OF SWITCHING BETWEEN SPOKEN LANGUAGES

2y 5m to grant Granted Apr 07, 2026

18/327,115

Patent 12597427

CONFIGURABLE NATURAL LANGUAGE OUTPUT

2y 5m to grant Granted Apr 07, 2026

18/614,575

Patent 12597418

AUDIO SIGNAL PROCESSING DEVICE AND METHOD FOR SYNCHRONIZING SPEECH AND TEXT BY USING MACHINE LEARNING MODEL

2y 5m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

74%

Grant Probability

89%

With Interview (+15.4%)

3y 0m

Median Time to Grant

Moderate

PTA Risk

Based on 905 resolved cases by this examiner. Grant probability derived from career allow rate.

METHODS AND SYSTEMS FOR MODIFYING SPEECH GENERATED BY A TEXT-TO-SPEECH SYNTHESISER

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email