Prosecution Insights
Last updated: April 19, 2026
Application No. 18/262,242

METHODS AND SYSTEMS FOR MODIFYING SPEECH GENERATED BY A TEXT-TO-SPEECH SYNTHESISER

Final Rejection §103
Filed
Jul 20, 2023
Examiner
JACKSON, JAKIEDA R
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Spotify AB
OA Round
2 (Final)
74%
Grant Probability
Favorable
3-4
OA Rounds
3y 0m
To Grant
89%
With Interview

Examiner Intelligence

Grants 74% — above average
74%
Career Allow Rate
669 granted / 905 resolved
+11.9% vs TC avg
Strong +15% interview lift
Without
With
+15.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
35 currently pending
Career history
940
Total Applications
across all art units

Statute-Specific Performance

§101
25.8%
-14.2% vs TC avg
§103
42.5%
+2.5% vs TC avg
§102
21.8%
-18.2% vs TC avg
§112
3.5%
-36.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 905 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment Applicants argue that the prior art cited fails to teach the claims as amended. Applicants’ arguments are persuasive, but are moot in view of new grounds of rejection. Response to Arguments Applicants argue that the prior art cited fails to teach the claims as amended. Applicants arguments are persuasive, but are moot in view of new grounds of rejection. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-3, 6-7, 9-12, 15, 17 and 48-51 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (PGPUB 2021/0142783), hereinafter reference as Kim in view of Audfray et al. (PGPUB 2021/0176588), hereafter referenced as Audfray. Regarding claims 1 and 17, Kim discloses a method and system, hereinafter referenced as a method of modifying a speech signal generated by a text-to- speech synthesiser, the method comprising: receiving a text signal (input text; p. 0124-0126); generating a speech signal from the text signal (Text-to-speech; p. 0119-0123); deriving a control feature vector, wherein the control feature vector represents modifications to the speech signal (vector; p. 0129-0138); inputting the control feature vector in the text-to-speech synthesiser, wherein the text-to-speech synthesiser is configured to generate a modified speech signal using the control feature vector (synthesis; p. 0135-0136); and outputting the modified speech signal wherein: the text-to-speech synthesiser comprises a first model configured to generate the speech signal, and a controllable model configured to generate the modified speech signal (change speech style; p. 0102, 0112, 0122); and the controllable model is trained using speech signals generated by the first model (train; p. 0059), but does not specifically teach that the controllable model is distinct from the first model. Audfray discloses a method wherein a first model is generated based on the audio model components and the second model comprises a modified audio signal, which is two distinct models (p. 0011-0014, 0104-0110), to assist with storing, organizing and maintaining acoustic data. Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to assist with providing synthetic audio. Regarding claims 2 and 48, Kim discloses a method wherein deriving the control feature vector comprises: analysing the speech signal (analyze data; p. 0143) obtaining a first feature vector from the analysed speech signal (vector; p. 0129-0138); obtaining a user input (user input; p. 0063, 0074-0076); and modifying the first feature vector using the user input to obtain the control feature vector (vector; p. 0129-0138). Regarding claims 3 and 49, Kim discloses a method wherein the user input comprises a reference speech signal (reference speech; p. 0102). Regarding claims 6 and 50, Kim discloses a method wherein the controllable model comprises an encoder module, a decoder module, and an attention module linking the encoder module to the decoder module (p. 0126-0128, 0131, 0135-0136). Regarding claims 7 and 51, Kim discloses a method wherein the first feature vector is inputted at the decoder module (decoder; p. 0126-0128, 0131, 0135-0136). Regarding claim 9, Kim discloses a method wherein the first feature vector represents one of the properties of pitch or intensity (p. 0133-0136). Regarding claim 10, Kim discloses the method further comprising deriving a second feature vector, wherein the second feature vector represents features of the generated speech signal that are used to generate the modified speech signal (change speech style; p. 0102, 0112, 0122); and inputting the second feature vector in the text-to-speech synthesiser, wherein the second feature vector is obtained from the analysed speech signal (synthesis; p. 0135-0136). Regarding claim 11, Kim discloses a method wherein: the controllable model comprises an encoder module, a decoder module, and an attention module linking the encoder module to the decoder module (p. 0126-0128, 0131, 0135-0136), and the second feature vector is inputted at the decoder module of the controllable model (p. 0126-0128, 0131, 0135-0136). Regarding claim 12, Kim discloses a method wherein a representation of the speech signal is inputted at the encoder module of the controllable model (p. 0126-0128, 0131, 0135-0136). Regarding claim 15, Kim discloses a method wherein the first model comprises an encoder module, a decoder module, and an attention module linking the encoder module to the decoder module (p. 0126-0128, 0131, 0135-0136). Claim(s) 8, 13-14, 16 and 52 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Audfray and in further view of Arik et al. (PGPUB 2018/0336880), hereinafter referenced as Arik. Regarding claims 8 and 52, Kim and Audfray disclose a method as described above, but does not specifically teach wherein the first feature vector is modified by a pre net before being inputted at the decoder module of the controllable model. Arik discloses a method wherein the first feature vector is modified by a pre net before being inputted at the decoder module of the controllable model (p. 0110), to assist with modeling and data analysis. Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to improve computer performances, features and uses. Regarding claim 13, it is interpreted and rejected for similar reasons as set forth above. In addition, Kim discloses a method wherein the method further comprises deriving a modified alignment from the user input, wherein the modified alignment indicates modifications to a timing of the speech signal (p. 0052-0065, 0111). Regarding claim 14, it is interpreted and rejected for similar reasons as set forth above. In addition, Kim discloses a method wherein the modified alignment is inputted at the attention module of the controllable model (p. 0107-0110). Regarding claim 16, Kim discloses a method the method further comprising: inputting the third feature vector in the encoder module of the controllable model (p. 0126-0128, 0131, 0135-0136). In addition, Arik discloses a method comprising deriving a third feature vector from the attention module of the first model, wherein the third feature vector corresponds to a timing of phonemes of the received text signal (p. 0052-0065, 0111). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. This information has been detailed in the PTO 892 attached (Notice of References Cited). Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /JAKIEDA R JACKSON/ Primary Examiner, Art Unit 2657
Read full office action

Prosecution Timeline

Jul 20, 2023
Application Filed
Jul 09, 2025
Non-Final Rejection — §103
Oct 09, 2025
Interview Requested
Nov 03, 2025
Applicant Interview (Telephonic)
Nov 04, 2025
Response Filed
Nov 06, 2025
Examiner Interview Summary
Feb 12, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12603079
PROVIDING A REPOSITORY OF AUDIO FILES HAVING PRONUNCIATIONS FOR TEXT STRINGS TO PROVIDE TO A SPEECH SYNTHESIZER
2y 5m to grant Granted Apr 14, 2026
Patent 12603088
TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL
2y 5m to grant Granted Apr 14, 2026
Patent 12598092
SYSTEMS, METHODS, AND APPARATUS FOR NOTIFYING A TRANSCRIBING AND TRANSLATING SYSTEM OF SWITCHING BETWEEN SPOKEN LANGUAGES
2y 5m to grant Granted Apr 07, 2026
Patent 12597427
CONFIGURABLE NATURAL LANGUAGE OUTPUT
2y 5m to grant Granted Apr 07, 2026
Patent 12597418
AUDIO SIGNAL PROCESSING DEVICE AND METHOD FOR SYNCHRONIZING SPEECH AND TEXT BY USING MACHINE LEARNING MODEL
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
89%
With Interview (+15.4%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 905 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month