Last updated: April 19, 2026
Application No. 18/817,443
SYNTHETIC SPEECH PROCESSING

Non-Final OA §101§102§103§DP
Filed
Aug 28, 2024
Examiner
PATEL, SHREYANS A
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Amazon Technologies, Inc.
OA Round
1 (Non-Final)
Interview Optional

— +7.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 403 resolved cases, 2023–2026
Examiner Intelligence

PATEL, SHREYANS A View full profile →
Grants 89% — above average
Career Allow Rate
359 granted / 403 resolved
+27.1% vs TC avg
Moderate +7% lift
Without
With
+7.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 3m
Avg Prosecution
46 currently pending
Career history
449
Total Applications
across all art units
Statute-Specific Performance

§101
21.3%
-18.7% vs TC avg
§103
36.0%
-4.0% vs TC avg
§102
22.6%
-17.4% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 403 resolved cases
Office Action

§101 §102 §103 §DP
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21-28, 30-38 and 40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-10 of U.S. Patent No. 12,080,269. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following mapping:

US Application No. 18/817,443
US Patent No. 12,080,269
Claims 21 and 31:
presenting, on a first device, a graphical user interface comprising a first element corresponding to a first characteristic of speech of a first synthesized voice and a second element corresponding to a second characteristic of speech of the first synthesized voice;

receiving, by the first device, a first user input corresponding to the first characteristic of speech;

determining, using the first user input, first data representing the first characteristic;

storing an association between the first data and the first device;

after storing the association, receiving, by the first device, a second user input; determining the first device is associated with the first data;

determining output data responsive to the second user input; and 

performing speech synthesis processing using the first data and the output data to determine synthesized speech data responsive to the second user input and corresponding to the first synthesized voice having the first characteristic.
Claims 1 and 9:
displaying, using a first device, a graphical user interface, wherein the first user input was received by the first device in response to display of the graphical user interface;




receiving a first user input corresponding to a characteristic of speech; 


determining, using the first user input, first data representing the characteristic; 

associating the first data with a user profile; 


after associating the first data with the user profile, receiving a second user input associated with the user profile; 

determining output data responsive to the second user input; and 

performing speech synthesis processing using the first data and the output data to determine synthesized speech data responsive to the second user input and corresponding to the characteristic.
Claims 22 and 32 correspond to
Claim 4
Claims 23 and 33 correspond to 
Claim 5
Claims 24 and 34 correspond to 
Claim 6
Claims 25 and 35 correspond to
Claim 7
Claims 26 and 36 correspond to
Claim 8
Claims 27 and 37 correspond to
Claim 9
Claims 28 and 28 correspond to 
Claim 10



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 21-40 are rejected under 35 U.S.C. 101.
Claims 21 and 31 are directed to an abstract idea to the basic abstract concept of storing user preferences and applying them to a later task. When we strip away the technical jargon, the claim is simply describing the process of asking a user how they want a computer voice to sound, remembering that choice for their specific device, and then using that saved choice the next time the device needs to speak. These are fundamental steps of collecting data, recognizing a user or device, and storing or retrieving information are considered abstract ideas. They represent routine organizational and data-gathering concepts that humans have performed mentally or manually for a very long time.
Simply taking an abstract idea and telling a computer to do it does not make it a patentable invention. For an abstract idea to be patentable, it must be integrated into a practical application that actually improves how the technology works. These claim does not describe a new, innovative way to generate synthesized speech or a technically improved computer interface. Instead, they relies on completely standard, off-the-shelf computer functions to do the job. It uses a generic GUI to receive the choice, basic computer memory to store the association, and standard "speech synthesis processing" to generate the voice.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims are (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device. 
Dependent claims 22-30 and 32-40 further recite an abstract idea performable by a human and do not amount to significantly more than the abstract idea as they do not provide steps other than what is conventionally known in speech synthesis processing.
Claims 22 and 32, abstract concept of organizing information.
Claims 23 and 33, abstract mental process of sorting and categorizing data.
Claims 24 and 34, abstract idea of sending and updating data over a generic network.
Claims 25 and 35, the abstract idea of translating data from one format to another.
Claims 26 and 36, abstract mathematical algorithm.
Claims 27 and 37, the abstract step of gathering data.
Claims 28 and 38, a routine data-gathering step.
Claims 29 and 39, merely changes the informational content of the data being processed, which does not make the underlying idea less abstract.
Claims 30 and 40, does not transform the abstract idea into a patentable invention.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 21-23, 25, 27-33, 35 and 37-40 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Buntschuh (EP 0 762 384).

Claims 21 and 31,
Buntschuh teaches a computer-implemented method, comprising: presenting, on a first device, a graphical user interface comprising a first element corresponding to a first characteristic of speech of a first synthesized voice and a second element corresponding to a second characteristic of speech of the first synthesized voice ([Fig. 3] [col. 6 lines 18-end] a computer-based system with display/input devices and GUI display 20 comprising parameter scales 22; where the parameter scales provide means to modify speech parameters including pitch and rate);
receiving, by the first device, a first user input corresponding to the first characteristic of speech ([col. 6 line 32 to col. 7 line 8] user inputs occur by manipulating GUI controls; the user changes parameter scales by dragging/clicking such that sliders are repositioned);
determining, using the first user input, first data representing the first characteristic ([col. 6 line 32-end] each time sliders are repositioned, current speech parameter values are updated with the scale values 22d (i.e. data representing the characteristic is determined from the input/slider position);
storing an association between the first data and the first device ([col. 6 lines 4-17] [col. 8 lines 45-53] default initialization file in a user’s home directory for parameter values and saving parameter combinations as named voices that are stored in the default initialization file with their associated speech parameter values);
after storing the association, receiving, by the first device, a second user input ([col. 7 line 49 to col. 8 line 15] when ready to listen, the user presses the carriage return or clicks on the “say it” button, which triggers an even/script);
determining the first device is associated with the first data ([col. 7 line 55 to col. 8 line 44] named voices are loaded from the user’s default initialization file, which includes named voices and associated speech parameter values; selecting a named voice assigns the associated values as current speech parameter values evidencing the device’s association with stored data);
determining output data responsive to the second user input ([col. 7 line 49 to col. 8 line 15] the user enters test utterances in the input box; after the “say it” input, the script that includes the test utterances from the input box); and
performing speech synthesis processing using the first data and the output data to determine synthesized speech data responsive to the second user input and corresponding to the first synthesized voice having the first characteristic ([Fig. 2] [col. 5 line 40 to col. 6 line 3] the formed text string includes escape sequences paired with associated speech parameter values; upon receipt, the TTS synthesizer converts the text utterances to speech using a base synthesized voice altered according to the escape sequences (i.e. using parameter data plus utterance text)).

Claims 22 and 32,
Buntschuh further teaches the computer-implemented method of claim 21, further comprising: determining the first characteristic corresponds to a first application with respect to the first device ([col. 8 line 57 to col. 9 line 1] once some voices have been created and stored, they can be used to process dialogue scripts or other applications);
associating the first data with the first application ([Fig. 9] [col. 9 lines 3-6] the preprocessor accesses data in step 9a from a voice file, as shown in Fig. 9, which contains a list of named voices and their associated speech parameter values); and 
determining the second user input corresponds to the first application ([col. 9 lines 6-10] in steps 9b and 9c, the preprocessor filters out the bracket-enclosed speaker names and then replaces them with escape sequences formed using the speech parameter values associated with the named voices matching the speaker names), 
wherein performing the speech synthesis processing using the first data is based at least in part on the second user input and the first data corresponding to the first application ([col. 9 lines 10-14] the escape sequences and the utterances are output in step 9d to the Bell Labs text-to-speech synthesizer to be converted to speech; the result is a spoken colloquy with different voices).

Claims 23 and 33,
Buntschuh further teaches the computer-implemented method of claim 21, further comprising: determining the first characteristic corresponds to a first application with respect to the first device ([col. 8 line 57 to col. 9 line 1] once some voices have been created and stored, they can be used to process dialogue scripts or other applications);
associating the first data with the first application (([Fig. 9] [col. 9 lines 3-6] the preprocessor accesses data in step 9a from a voice file, as shown in Fig. 9, which contains a list of named voices and their associated speech parameter values);
receiving a third user input corresponding to the second characteristic of speech to be associated with a second application with respect to the first device ([col. 1 lines 51-55] [col. 8 line 57 to col. 9 line 1] a virtual continuum of new voices created via user input (parameter scales) and storing them for use in various other applications);
determining, using the third user input, second data representing the second characteristic ([col. 1 lines 1-end] by manipulating the above mentioned speech parameters, a virtual continuum of new voice can be created via sliders to determine the current speech parameter values); and
associating the second data with the second application and with the first device ([col. 8 line 54 to col. 9 line 17] created voices are stored in a library (on the device memory) to be matched and used in multiple dialogue scripts or other applications).

Claims 25 and 35,
Buntschuh further teaches the computer-implemented method of claim 21, wherein determining the first data comprises: determining, using the first user input, a first value representing the first characteristic ([col. 6 lines 32-end] a scale value 22d that corresponds to the relative position of the slider 22a within the range of the corresponding parameter scale 22); and 
determining, using the first value, encoded data representing the first characteristic, wherein the first data comprises the encoded data ([col. 5 lines 43-45] the escape sequences are ASCII codes comprised of pairs of escape codes and associated speech parameter values).

Claims 27 and 37,
Buntschuh further teaches the computer-implemented method of claim 21, wherein: the first user input was received by the first device in response to a touch interaction corresponding to the first element ([col. 6 lines 32-end] mouse click to drag the sliders via user interaction).

Claims 28 and 38,
Buntschuh further teaches the computer-implemented method of claim 27, further comprising: determining the first user input corresponds to manipulation of the first element from a first position to a second position ([col. 6 lines 32-end] use the mouse to drag the scale value from first position to second position (repositioning the slider)),
wherein determining the first data is based at least in part on the manipulation ([col. 6 lines 32-end] repositioning the slider).

Claims 29 and 39,
Buntschuh further teaches the computer-implemented method of claim 21, wherein the first characteristic of speech corresponds to a speech rate ([col. 6 lines 35-36] speech parameters includes speech rate).  

Claims 30 and 40,
Buntschuh further teaches the computer-implemented method of claim 21, wherein the first characteristic of speech corresponds to an emotion ([col. 6 lines 35-36] speech parameters includes pitchT, pitchR, pitchB and aspiration).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 24 and 34 is/are rejected under 35 U.S.C. 103 as being unpatentable over Buntschuh (EP 0 762 384) and further in view of Kazan et al. (US 2011/0179149).

Claim 24 and 34,
Buntschuh further teaches the computer-implemented method of claim 21, further comprising, ([Summary of the invention] the following manipulable parameter scales: three pitches, front and rear head of the vocal tract, rate and aspiration (three or more different characteristics)).
The difference between the prior art and the claimed invention is that Buntschuh does not explicitly teach after performing the speech synthesis processing: receiving, from a second device; determining the second device is associated with the first device; determining modified first data representing the third characteristic; and storing second data associating the modified first data with the first device.
Kazan teaches after performing the speech synthesis processing: receiving, from a second device ([0004] a second change to application settings on a second device of the one or more additional computing devices); 
determining the second device is associated with the first device ([0036-0037] a record of the computing devices across which application settings are roamed can be maintained at computing device 200 and devices identified via computing device from the user logs into a remote service); 
determining modified first data representing the third characteristic ([Abstract] [0070] application setting changes received from other computing devices across which application settings are roamed, and are incorporated into the application settings of the computing device as discussed above); and 
storing second data associating the modified first data with the first device ([0016] whenever a change to a roamed application setting is made on one of computing devices 102, 104, and 106, it is automatically communicated to and saved by the other computing devices 102, 104, and 106).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Buntschuh with teachings of Kazan by modifying the method and apparatus for modifying voice characteristics of synthesized speech as taught by Buntschuh to after performing the speech synthesis processing: receiving, from a second device; determining the second device is associated with the first device; determining modified first data representing the third characteristic; and storing second data associating the modified first data with the first device as taught by Kazan include  for the benefit of performing the same application customization on each of the multiple devices ([0002] Kazan).

Claim(s) 26 and 36 is/are rejected under 35 U.S.C. 103 as being unpatentable over Buntschuh (EP 0 762 384) and further in view of Reber et al. (US 2019/0304435).

Claims 26 and 36,
Buntschuh teaches all the limitations in claim 25. The difference between the prior art and the claimed invention is that Buntschuh does not explicitly teach wherein performing the speech synthesis processing comprises processing the encoded data using a neural network speech synthesis processing component to determine the synthesized speech data.
Reber teaches wherein performing the speech synthesis processing comprises processing the encoded data using a neural network speech synthesis processing component to determine the synthesized speech data ([Abstract] to provide analysis and conversion of text into input vectors, each having at least a base frequency, f.sub.0, a phenome duration, and a phoneme sequence that is processed by a signal generation unit of the back-end subsystem; the signal generation unit includes the neural network interacting with a pre-existing knowledgebase of phenomes to generate audible speech from the input vectors).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Buntschuh with teachings of Reber by modifying the method and apparatus for modifying voice characteristics of synthesized speech as taught by Buntschuh to include wherein performing the speech synthesis processing comprises processing the encoded data using a neural network speech synthesis processing component to determine the synthesized speech data as taught by Reber for the benefit of improving the quality of the generated audible speech signals ([Abstract] Reber).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Li et al. (US 7,689,421) – described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Primary Examiner
Art Unit 2653



/SHREYANS A PATEL/               Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Aug 28, 2024
Application Filed
Feb 27, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/934,906
Patent 12586597
ENHANCED AUDIO FILE GENERATOR
2y 5m to grant Granted Mar 24, 2026
18/744,449
Patent 12586561
TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM, AND A METHOD OF CALCULATING AN EXPRESSIVITY SCORE
2y 5m to grant Granted Mar 24, 2026
17/983,671
Patent 12548549
ON-DEVICE PERSONALIZATION OF SPEECH SYNTHESIS FOR TRAINING OF SPEECH RECOGNITION MODEL(S)
2y 5m to grant Granted Feb 10, 2026
18/589,789
Patent 12548583
ACOUSTIC CONTROL APPARATUS, STORAGE MEDIUM AND ACCOUSTIC CONTROL METHOD
2y 5m to grant Granted Feb 10, 2026
18/201,105
Patent 12536988
SPEECH SYNTHESIS METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
96%
With Interview (+7.4%)
2y 3m
Median Time to Grant
Low
PTA Risk
Based on 403 resolved cases by this examiner. Grant probability derived from career allow rate.