Last updated: May 29, 2026
Application No. 18/599,431
RECIPIENT-SPECIFIC VOICE TONE ADJUSTMENT IN TELEPHONY

Final Rejection §101§102§103
Filed
Mar 08, 2024
Examiner
WOZNIAK, JAMES S
Art Unit
2655
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
2 (Final)
Interview Optional

— +39.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 60% grant rate with +39.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 391 resolved cases, 2023–2026
Examiner Intelligence

WOZNIAK, JAMES S View full profile →
Grants 60% of resolved cases
Career Allowance Rate
233 granted / 391 resolved
-2.4% vs TC avg
Strong +39% interview lift
Without
With
+39.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
21 currently pending
Career history
429
Total Applications
across all art units
Statute-Specific Performance

§101
7.2%
-32.8% vs TC avg
§103
82.5%
+42.5% vs TC avg
§102
5.8%
-34.2% vs TC avg
§112
4.2%
-35.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 391 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

In response to the Non-final Office Action mailed on 9/30/2025, Applicant has filed an amendment on 12/22/2025.  In this reply, Applicant has amended independent claims 1, 7, and 15 to further recite that the voice samples are extracted from "a user" and that the voice tone data corresponds "to a specified voice tone of the user."  Applicant has also argued that the prior art of record fails to teach the limitations added via amendment regarding user-specific voice tone data extracted from voice samples of the user for regenerating the user's own specified voice tone in synthesized speech output (Remarks, Pages 15-17).  These arguments have been fully considered, however, are not found to be persuasive for the reasons noted in the below Response to Arguments section.

Response to Arguments

Patent Subject Matter Eligibility Rejections under 35 U.S.C. 101:

Applicant traverses the rejection of claims 1-20 under 35 U.S.C. 101 relying upon multiple arguments.  First, Applicant contends that under the 2019 Patent Subject Matter Eligibility Guidelines of Step 2A prong 1, the rejection relies only upon conclusions that do not rely upon evidence showing how a human could perform the claimed operations as recited.  Applicant contends that extraction of voice tone data "requires computational analysis of digital audio samples to derive machine-readable tone parameters that are later re-used by a text-to-speech model" where the speech synthesis relates to algorithmic speech synthesis and not merely reading text aloud (Remarks, Pages 7-8).
In response, per MPEP 2106.04(II) and 2106.04(a), the step 2A prong 1 framework for abstract ideas involves the identification of limitations falling within at least one of the groupings of abstract idea under the broadest reasonable interpretation (BRI).  This step of the analysis framework is not evidentiary contrary to Applicant's position as would be the case with 2B/Berkheimer analysis.  Moreover, the rejection does meet the requirements of a prima facie subject matter eligibility rejection under 35 U.S.C. 101 because the rejection identifies at least one category of abstract idea (i.e., mental process), identifies the limitations falling within that category, and explains how, under the BRI, a human could perform each step as actually claimed.  It should be noted that Applicant's analysis of the claim is not in line with the BRI as it takes an overly narrow approach to include features that are not claimed.  Also, while the claim does feature a speech synthesis model, the model is recited at a high level of generality, the underlying function can be performed by a human (i.e., reading out text), and those models were set aside to determine whether they would qualify as an inventive concept under the additional steps 2A prong 2 and 2B under the eligibility framework.  It should be pointed out that those models are not inventive concepts via the evidence provided as a part of the 2B/Berkheimer analysis.  Note also that the specification specifically admits that the text to speech model was not invented by Applicant and does not constitute an inventive concept or improvement to a particular field of technology (see "presently available" as admitted in Paragraphs 0016 and 0045).
Applicant next finds the Step 2A prong 2 analysis to be deficient by not viewing the claim as a whole and ignoring the technical solution of the claims as a whole (Remarks, Page 8).
In response, it is noted that the rejection under 35 U.S.C. 101 by setting aside the elements in addition to the properly identified at least one abstract idea and analyzing whether those additional elements constitute a practical application or improvement in technology.  As explained in RecogniCorp, LLC v. Nintendo Co., 855 F.3d 1322, 1327, 122 USPQ2d 1377 (Fed. Cir. 2017) - "after determining that a claim is directed to a judicial exception, "we then ask, ‘[w]hat else is there in the claims before us?""  In this case the abstract mental process and the limitations pertaining to this process were identified in step 2A prong 1 where the underlying functions of speech transcription (e.g., listening and writing) and speech production (e.g., reading text and speaking aloud) were automated by computer models- speech-to-text and text-to speech.  If Applicant has somehow improved these known models, those improvements were not presenting in the claim.  Instead, the claims feature generic, high-level, and known models automating an otherwise abstract human process.  Thus, the "what else" in the present claims amounts to automation of an otherwise human process that does not involve improved models particularly based upon Applicant's own admission nor a practical application.  Thus, these arguments directed to step 2A prong 2 are not found to be persuasive.
In regards to the step 2B/Berkheimer analysis, Applicant argues that the rejection lacks evidence supporting the well-known, routine, and conventional assertion (Remarks, Page 8).
In addition to Applicant's admission noted above, the rejection explicitly provides evidence in the form of prior art.  Also, in the rejection only the speech-to-text and text-to-speech models were set aside for further consideration under step 2A prong 2 and 2B.  The use of tone data for extraction and in speech reproduction was identified as part of the mental process under the BRI.  Applicant is reminded that an inventive concept "cannot be furnished by the unpatentable law of nature (or natural phenomenon or abstract idea) itself." Genetic Techs. Ltd. v. Merial LLC, 818 F.3d 1369, 1376, 118 USPQ2d 1541, 1546 (Fed. Cir. 2016). Accordingly, Applicant's 2B arguments are not found to be persuasive.
Applicant attempts to draw an analogy between the increasingly cited and now precedential machine-learning subject matter in Ex Parte Desjardins (Remarks, Page 9-10).
In response, Desjardins involved a particular approach to training a machine learning model leading to an improvement in the field of artificial intelligence model training per the director's opinion.  It is unclear how the presently recited claims involve machine learning in any manner.  None of the claims feature any type of machine learning model let alone an approach for training such a model as per Desjardins.  Thus, the attempt to draw an analogy between Desjardins and the present claims is not successful and these arguments are not found to be persuasive.
Lastly, Applicant revisits the analysis framework and argues that the recited process is directed towards a specific improvement in computer-implemented voice-communication and involve "non-conventional computer processing, including model-based audio analysis and synthesis, and cannot be reasonably performed in the human mind" (Remarks, Pages 10-14).
In response, it is reiterated that the underlying process of extracting tone and reproducing that tone is part of the identified mental processing where an inventive concept cannot be used to furnish an inventive concept.  Moreover, the use of models was not included in the step 2A prong 1 analysis as being part of the mental process.  Instead, these elements were set aside for further consideration in step 2A prong 2 and 2B of the analysis framework where it was shown that these computer models were not invented by Applicant and only serve to automate an otherwise human process.  The rejection also provided evidence, in addition to Applicant's admission, that such models are well-known, routine, and conventional and not related to an inventive concept.  Accordingly, since the Applicant attempt to argue eligibility relying upon the identified abstract idea and contends that models indicated as "presently available" somehow relate to an inventive concept, these arguments are not found to be persuasive.  As such, the 35 U.S.C. 101 rejection updated to reflect the amended claim language, has been maintained.

Prior Art Rejections:
With respect to independent Claims 1, 7, and 15, Applicant argues that the prior art of record, i.e., Subramanian, et al. (U.S. PG Publication:  2007/0208569 A1), fails to teach the amended limitation regarding “extracting, from a plurality of voice samples of a user, voice tone data corresponding to a specified voice tone of the user.”  In particular, Applicant contends that First, Applicant contends that Subramanian does not teach "user-specific voice tone data extracted from voice samples of the user for regenerating the user's own specified voice tone in synthesized speech output because it is alleged that "Subramanian teaches applying emotion-based prosodic adjustments to a synthetic system voice, using emotion-category metadata rather than regenerating the user's tone from their own samples." (Remarks, Page 15).
In response, while Applicant's argued claim language differs from that of the instant claim, the concepts of extracting a user's voice tone data and then generating a speech output with "a voice tone using the voice tone data" are generally recited in the instant independent claims.  Turning now to the prior art, it is noted that Subramanian discloses that "speech communication" from a user is "fed to [a] voice analyzer” (Paragraph 0051).   This voice analyzer extracts "voice patterns" from the user speech.  While it is noted that these patterns are associated with an emotion, these voice patterns that are extracted from the user nevertheless include "specific pitches, tones, cadences, and amplitudes, or combinations thereof, contained in the speech delivery," i.e., the user's delivered speech is analyzed to extract tone data voice patterns.  
Next, Subramanian uses the extracted voice patterns containing a user's voice tone of their "speech delivery" in populating a database (see Paragraph 0047 describing a speaker profile that is used to provide emotion "speech patterns that the speaker uses;" see upkeeping a "speaker profile" that contains the voice patterns including tone at Paragraphs 0049 and 0071; see Paragraph 0048 describing a speaker/user as an owner profile used to synthesize a communication).  These voice patterns are then accessed based upon emotional markup metadata of text to extract the particular speaker profile voice tone/emotion again noting that emotions are associated with specific pitches, tones, etc. (Paragraphs 0082-0083).  In this manner, while Subramanian does rely upon metadata as a prosodic/emotional markup of text, that metadata is used to obtain the voice patterns including “specific pitches, tones, cadences, and amplitudes, or combinations thereof, contained in the speech delivery” that adjust speech synthesis.  Accordingly, it is maintained that Subramanian teaches the extracting step that includes a voice tone data corresponding to a specified voice tone of the user from “speech patterns that the speaker uses” that is then used in speech synthesis to adjust "the pitch, tone, and amplitude of the voice and changes the frequency, or cadence, of the voice delivery" wherein the process does not solely rely upon metadata as alleged by Applicant.
After the summary argument, Applicant argues the claimed method against some provided citations of Subramanian.  These points will be addressed in the following table:
Applicant Argument
Rebuttal
Subramanian's extraction only relates to emotion detection/classification not extraction of user specific voice tone data (Remarks, Pages 15-16).
While Subramanian's process involves emotion recognition, the process involves emotion "extraction" that identifies "voice patterns" (Paragraph 0051).  These voice patterns are used to create/update a user/speaker voice profile (Paragraphs 0046-0047, 0049, 0071, and 0081; see also Fig. 4 user/speaker profiles).  Moreover, this unique emotion-specific tone data is used to adjust a speech synthesis voice for that user (Paragraphs 0083 and 0114).  In this manner, Subramanian's extraction, while involving emotional recognition is used to extract and create a user voice profile including tone.


Applicant argues that the provided citations relate to various rules, dictionaries, and mappings, none of which relate to storing voice samples of a user; extracting voice tone data corresponding to a specified tone of the user, or generating speech using the user's own reconstructed tone (Remarks, Page 16).
In response, it is noted that some of these arguments are attempts to characterize the prior art and have not individually been relied upon to addresses the limitations in question.  It is maintained that in Subramanian, tone data related to emotions is extracted (per preceding citations), a user/speaker voice profile unique to a user is created (per preceding citations), and that voice tone data is used to generate a "synthesized voice" (e.g., Paragraph 0083- noting "emotion to voice pattern definitions are selected using the context profiles for the user").  In this manner, while Subramanian may feature embodiments that do relate to translation of emotion particular to a specific language, Subramanian provides teachings that relate to extraction of user voice patterns including tone that is used in generating a synthesized voice that maps to the claimed "generating a speech output.... using a voice tone generated using the voice tone data."  It should lastly be noted that Applicant continually refers to reconstruction in their arguments even though this language is not present in the instant amended claims.
Applicant acknowledge that speech to text is present in the provided citations (Remarks, Pages 16-17).
No argument to rebut is present in this section of Applicant’s remarks.
Applicant characterizes a number of the provided citations regarding generating text to speech audio using voice tone data arguing that Subramanian has to do with the use of emotional metadata for cultural translation and modifying a synthetic baseline voice, not reconstruction of the user's tone (Remarks, Page 17).
As confirmed above, both cultural translation and modifying synthetic speech with respect to pitch, tone, etc. are part of the disclosed invention, however, these arguments overlook the teachings of Subramanian regarding speaker/user profiles having voice patterns that are used to make such modifications.  As noted above, as a known approach in prosody-based text-to-speech, Subramanian also relies upon emotional markup metadata (Paragraph 0082).  Importantly, this metadata is used to retrieve the voice patterns indicative of emotion containing user voice tones (pitch, tone, etc.) (Paragraph 0083).  Also, Applicant should consider that while cultural translation is present in Subramanian, there are citations discussing the utilization of a user's own speech profile including voice patterns (see updated voice patterns in a user profile in Paragraph 0071, see that voice synthesis uses voice patterns of the user and may include their personality wherein the synthesized speech is modified using the emotional information that includes tone in Paragraph 0083).  It is again noted that Applicant continues to discuss "reconstruction" of a user's tone when the claim only reads "generating a speech output corresponding to the text.... comprising a voice tone generated using the voice tone data."  The general use of the user's tone, which is present in the use of the captured user tonal information indicative of emotion in the speech communications of Subramanian, is the subject matter present in the claims.  The claims say nothing about somehow reconstructing the user's voice tone or how such a process is performed.  Thus, Applicant's arguments pertain to features that are not claimed.
Applicant argues a number of other citations of Subramanian that were applied to other claims or not applied to the independent claims.
These arguments serve Applicant's attempt to characterize the prior art and are not required to address the subject matter of the amended independent claims.  Accordingly, these arguments are considered moot.


The art rejections of the amended independent and dependent claims were traversed for reasons similar to Claim 1 (Remarks, Page 19).  
In regards to such arguments, see the response directed towards claim 1.
As a closing remark, Applicant should be aware that the use of a user’s tone (even reconstructed) in speech synthesis is not unknown in the prior art.  For example:  see:
Tischer (U.S. PG Publication: 2004/0111271 A1) - discloses that speech samples recorded from a person's "own voice file" are used for "customizing...text to speech" with parameters including "intonations" in samples and "emphasis" (Paragraphs 0034-0035, 0041, 0053, 0055, 0061 (discussing the use of a person's "own voice file")).
DeSimone (U.S. PG Publication:  2006/0074677 A1)- discloses a "prosody modification subsystem" that analyzes the "pitch and tone of the user's voice, which is subsequently used to modify the speech synthesis subsystem output" (Paragraph 0056) to produce "output from the speech synthesis subsystem 38 [that] is modified by the user's own voice prosody (Paragraph 0058).
Shin, et al. (U.S. PG Publication:  2019/0019497 A1)- teaches a TTS system that is "configured to enable a user to apply intonation from their own voice to generated TTS" (Paragraph 0023).
Lahr, et al. (U.S. PG Publication:  2023/03856446 A1)- see Paragraph 0063- "Users can then record their own voices saying the same text but with any intonation the users so desire. When a user's recording is complete, the embodiments will then attempt to extract the emotion (e.g., volume, pitch, intonations, etc.) from the user's recording and apply that to the synthesized voice."
Note that any of these references could have been applied individually or in a combination with Subramanian to address the subject matter of the independent claims, however, such rejections are not deemed necessary because it is maintained that Subramanian teaches the claimed invention under the BRI.  Note that the reconstruction aspect that is unclaimed is also present in these additional prior art references.
During the interview conducted on 12/11/2025, the Examiner explained that the extraction and use of voice tone was too broad/general in view of the prior art and that specific models used in the specification beyond just text-to-speech models admitted in the specification to be known or techniques should be added to the claim in order to further define over the prior art of record.  It was also suggested that the addition of any disclosed specific machine learning models used in the process could be helpful in both defining over the prior art (published prior to a significant number of such innovations) and overcoming the patent subject matter eligibility rejection under 35 U.S.C. 101. The present amendment does not reflect such discussions and the Applicant is still recommended to consider amendments to this effect in further advancing prosecution.

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because under the broadest reasonable interpretation (BRI), the claimed invention is directed to an abstract idea without significantly more. 
Independent Claims 1, 7, and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  The claims regard a process that, as drafted under its BRI, covers performance of the limitations as a mental process, but for the recitation of generic computer components/models.
In regards to the process/functionality of claims 1, 7, and 15 the claimed functionality could be practiced as a mental process in the following manner:
extracting, from a plurality of voice samples of a user, voice tone data corresponding to a specified voice tone of the user (listening to voice samples of another person and mentally evaluating these samples to decide upon specific tone characteristics) ; 
converting, 
generating a speech output corresponding to the text, the speech output comprising audio generated from the text (a human can read out the written text and vocally reproduce a voice tone (e.g., by raising their pitch, volume, cadence, or tone).
This judicial exception is not integrated into a practical application.  Outside of the identified abstract idea, the claimed invention only recites processors and storage media which amount to no more than mere instructions to implement an otherwise abstract idea using generic computer components and a mention of generic speech-to-text and text-to-speech models that are a mere machine/software automation of human transcription and reading/speaking processes.  
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The above identified additional generic computer components are no more than mere instructions to apply the exception using generic computer components that are well-known, routine, and conventional as is evidenced by Bancorp Services v. Sun Life (Fed. Cir. 2012) and Alice Corp. v. CLS Bank (2014).  As for evidence that the speech-to-text and text-to-speech models are well-known, routine, and conventional activity that does not direct patent ineligible subject matter to significantly more than the abstract idea, see the following prior art: Subramanian, et al. (U.S. PG Publication:  2007/0208569 A1- Paragraph 0051-0052- speech-to-text is "known" and Paragraph 0082- text-to-speech synthesis is "known"), Patel, et al. (U.S. PG Publication:  2016/0379622 A1- Paragraph 0063- text-to-speech has many "known techniques"), and Jaroker (U.S. PG Publication:  2005/0010407 A1- Paragraph 0043- speech recognition is "generally known").
Accordingly, independent claims 1, 7, and 15 under the BRI are not patent eligible under 35 U.S.C. 101.
The remaining dependent claims do not add patent eligible subject matter to their respective parent claims and have also been rejected under 35 U.S.C. 101:
Claims 2-3, 6, 10-11, 14, 16-17, and 20 further limit the data being processed in the independent claims that can be understood and analyzed by a human.
Claims 4, 12, and 18 regard a human mentally deciding on a recipient for their communication.
Claims 5, 13, and 19 regard a human mentally deciding upon a tone that they used in the past for a particular participant.
Claim 8 regards generic computer structures as addressed in claim 1 and a network transfer that does not patentably limit the recited computer-program process (nor would such transfer make the claim eligible if claimed as part of the program instructions).
Claim 9 regards a human deciding on how much a conversion serviced was accessed and mentally calculating a bill at an established rate (e.g., per use, time-based).

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-8 and 10-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Subramanian, et al. (U.S. PG Publication:  2007/0208569 A1).
With respect to Claim 1, Subramanian discloses:
A computer-implemented method comprising: 
extracting, from a plurality of voice samples of a user , voice tone data corresponding to a specified voice tone of the user (speech features that are extracted from speech communication samples of the speaker/user are used to extract various specified types of voice tone information (e.g., pitch, tone, cadence, amplitude, etc., Paragraphs 0051-0052; such voice patters are labeled and used to populate a database for synthesis, Paragraphs 0065, 0071, 0076, 0090, 0095, and 0110); 
converting, using a speech to text model, a speech input to corresponding text (speech recognition model that converts a speech communication into text, Paragraph 0035, 0049, and 0051-0052); and 
generating a speech output corresponding to the text, the speech output comprising audio generated from the text using a text to speech model and a voice tone generated using the voice tone data (generation and playback relying on a text to speech model that is modulated using the voice patterns including tone information, Paragraphs 0047, 0076, 0082-0084, 0114, and 0117).
With respect to Claim 2, Subramanian further discloses:
The computer-implemented method of claim 1, wherein the voice tone data comprises data usable to generate the voice tone (data such as different voice patterns indicative of voice tone that can be relied upon to modulate a voice in text-to-speech processing, Paragraphs 0047, 0076, 0082-0084, 0114, and 0117).
With respect to Claim 3, Subramanian further discloses:
The computer-implemented method of claim 1, wherein the voice tone data is maintained in a user-specific voice tone repository (dictionary/database for a specific user (e.g., a profile), Paragraphs 0071, 0081, and 0094-0095).
With respect to Claim 4, Subramanian further discloses:
The computer-implemented method of claim 1, further comprising: selecting, for use in a voice communication with a communication recipient, the voice tone data (the voice tone information is selected based upon particular scenarios such as speaker and "audience," Paragraphs 0042-0045 and 0047-0048).
With respect to Claim 5, Subramanian further discloses:
The computer-implemented method of claim 4, wherein the voice tone data was previously selected for use in a previous voice communication with the communication recipient (user profiles that include voice tone data used/frequently used in past communications and learned, Paragraphs 0044-0045, 0047, 0049, 0063, and 0071; Fig. 4).
With respect to Claim 6, Subramanian further discloses:
The computer-implemented method of claim 4, wherein the voice tone data is default voice tone data ("generic or default profile" including voice tone data, Paragraphs 0049 and 0117; see the last line of the rightmost column in the audience profiles shown in Fig. 4).
Claim 7 is directed towards an embodiment that implements the method of claim 1 as one or more computer readable storage media storing processor-executable program instructions, and thus, is rejected under similar rationale.  Moreover, Subramanian teaches method implementation as a computer-readable medium storing processor-executable instructions (Paragraph 0021).
With respect to Claim 8, Subramanian further discloses:
The computer program product of claim 7, wherein the stored program instructions are stored in a computer readable storage device in a data processing system (processor embodied in a data processing system/computer also having a memory storing program code, Paragraph 0024), and wherein the stored program instructions are transferred over a network from a remote data processing system (this wherein clause described an implementation environment that does not patentably limit the claimed product claim (i.e., one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media) because it does not limit or modify the structure of the claimed computer readable medium, and thus, need not be addressed with prior art to render the added subject matter of claim 8 unpatentable.  It should be noted, however, that Paragraph 0023-0024 does teach communication of program instructions through a network).
Claims 10-14 contain subject matter respectively similar to Claims 2-6, and thus, are rejected under similar rationale.
Claim 15 is directed towards an embodiment that implements the method of claim 1 as a computer system comprising a processor and one or more computer readable storage media storing processor-executable program instructions, and thus, is rejected under similar rationale.  Moreover, Subramanian teaches method implementation as a computer system comprising one or more processors and a computer-readable medium storing program code (Paragraphs 0021 and 0024).
Claims 16-20 contain subject matter respectively similar to Claims 2-6, and thus, are rejected under similar rationale.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Yu, et al. (U.S. PG Publication:  2021/0256958 A1).
With respect to Claim 9, Subramanian teaches the computer program product of claim 7 comprising one or more computer readable storage media and program instructions stored thereupon.  Claim 9 relates to the stored program instructions also being stored in a "stored in a computer readable storage device in a server data processing system, and wherein the stored program instructions are downloaded in response to a request over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system."  Parent claim 7, however, is not directed towards a network computing system (i.e., such a system is outside the scope of the claimed invention) nor does the wherein clause modify the recited "one or more computer-readable storage media" or add to the instructions stored on that media.  As such, the wherein clause is not patentably limiting.  
Claim 9 does include further program instructions comprising:  program instructions to meter use of the program instructions associated with the request; and program instructions to generate an invoice based on the metered use.  These program instructions while not taught by Subramanian are taught by Yu.  Specifically, Yu discloses software resources for metering that tracks the usage of resources and "billing or invoicing" for such consumption of these resources (Paragraph 0064).
Subramanian and Yu are analogous art because they are from a similar field of endeavor in voice conversion.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Subramanian to include the metering/billing instructions taught by Yu to provide a predictable result of allowing a developer to profit off of and recover development costs from implementing a voice service.

Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:  See discussion of the additional prior art provided in the Response to Arguments section.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655



/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Show 2 earlier events
Dec 11, 2025
Examiner Interview Summary
Dec 11, 2025
Applicant Interview (Telephonic)
Dec 22, 2025
Response Filed
Mar 03, 2026
Final Rejection mailed — §101, §102, §103
Mar 26, 2026
Examiner Interview Summary
Mar 26, 2026
Applicant Interview (Telephonic)
Apr 11, 2026
Request for Continued Examination
Apr 13, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/585,204
Patent 12640139
METHOD AND APPARATUS FOR IMPROVING PERFORMANCE OF ARTIFICIAL INTELLIGENCE MODEL USING SPEECH RECOGNITION RESULTS AS TEXT INPUT
2y 3m to grant Granted May 26, 2026
18/535,521
Patent 12609113
NATURAL LANGUAGE PROCESSING SYSTEMS AND METHODS FOR INTENT CLASSIFICATION OF SPEECH TRANSCRIPTION
2y 4m to grant Granted Apr 21, 2026
18/544,354
Patent 12609106
EMOTIVE TEXT-TO-SPEECH WITH AUTO DETECTION OF EMOTIONS
2y 4m to grant Granted Apr 21, 2026
18/399,876
Patent 12597422
SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION
2y 3m to grant Granted Apr 07, 2026
18/488,578
Patent 12586569
Knowledge Distillation with Domain Mismatch For Speech Recognition
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+39.4%)
3y 8m (~1y 5m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 391 resolved cases by this examiner. Grant probability derived from career allowance rate.