Last updated: May 29, 2026

Application No. 18/792,750

ADJUSTING SPEECH RATE FOR AN AUDIO INPUT

Non-Final OA §101§102

Filed

Aug 02, 2024

Priority

Jul 01, 2024 — GB 2409467.4

Examiner

SHARMA, NEERAJ

Art Unit

2659

Tech Center

2600 — Communications

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

Interview Optional

— +11.6% interview lift. Interview lift (+11.6%) is below the 15.0% threshold. A written response is recommended.

Based on 463 resolved cases, 2023–2026

Examiner Intelligence

SHARMA, NEERAJ View full profile →

Grants 85% — above average

Career Allowance Rate

393 granted / 463 resolved

+22.9% vs TC avg

Moderate +12% lift

Without

With

+11.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

19 currently pending

Career history

482

Total Applications

across all art units

Statute-Specific Performance

§101

6.7%

-33.3% vs TC avg

§103

75.0%

+35.0% vs TC avg

§102

17.1%

-22.9% vs TC avg

§112

1.0%

-39.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 463 resolved cases

Office Action

§101 §102

DETAILED ACTION

Introduction

1.	This office action is in response to Applicant's submission filed on 08/02/2024. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-25 are currently pending and examined below. 

Drawings

2.	The drawings filed on 08/02/2024 have been accepted and considered by the Examiner. 

Information Disclosure Statement

3.	The Information Statements (IDSs) filed on 09/18/2024, 01/17/2025, 01/29/2026 have been accepted/considered and are in compliance with the provisions of 37 CFR 1.97.

Priority

4.	The Applicants priority to U.K. Patent Application # 2409467.4, filed on July 1, 2024, has been accepted and considered in this office action. 

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

5.	Claims 1-25 are rejected under 35 U.S.C. 101 as being nothing more than an abstract idea. As an example, analysis of claim 1, reveals that the core of this claim is a sequence of data analysis and transformation operations: detect syllable onsets, compute average inter-syllable time (median), compute a rate adjustment, apply a smoothing filter, and adjust buffer lengths (and optionally overlap buffers and blend signals). These are mathematical operations, algorithmic manipulations of audio data and control/decision steps. As such these individual elements represent algorithms or mental processes that can be expressed as mathematical formulas or procedural steps, which in turn can be accomplished by a human being using their mind and at most pen/paper. Hence, all these steps fall under the category of mental processes. These steps are drafted at a high level of generality without tying it to a specific technological improvement and the computing device recited herein can be a general-purpose computing device. Accordingly, this claim recites an abstract idea.

This judicial exception is not integrated into a practical application because the 
recitation of a system, memory, computer readable storage device, computer program product or general-purpose computing devices merely read to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using the specification. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to generate, extract, determine, and generate, amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is therefore not patent eligible.

Claims 2-11, merely provide certain details of the calculations outlined above or outline more mathematical manipulations, such as stretching of buffer period, using the system for real-time or offline audio, use of particular latency period, use of particular types of filtering algorithms (bandpass or smoothing), inclusion of voices in the input data signal, inclusion of a target speech rate etc. These are all steps which themselves can also be accomplished by a human being with at most the aid of a pen/paper and hence also do not amount to significantly more than the judicial exception. 

Claims 12-19, are system claims for the corresponding method claims 1-11 and hence rejected under 35 U.S.C. 101 for the same reasons as outlined above. Claims 24-25, are computer program product (CPP) claims for the corresponding method claims 1-2 and hence rejected under 35 U.S.C. 101 for the same reasons as outlined above. Claims 20-22, are also rejected under 35U.S.C. 101 for the same reasons as outlined above for method claims 1-3 and 9. Claim 23, is also rejected under 35U.S.C. 101 for the same reasons as outlined above for method claims 1-2. 

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1)	The claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

6.	Claims 1-6, 9- and 19-25 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Boillot (U.S. Patent Application Publication # 2004/0267540 A1).

With regards to claim 1, Boillot teaches a computer-implemented method for adjusting speech rate for an audio input, the method comprising applying syllable onset analysis to speech input in a buffer period (Para 37, teaches a syllabic rate that describes the rate of speech by the number of syllables per unit time as a numeric value. The word rate describes how many words are spoken per unit time. If a listener has a preferred hearing rate of N syllables a minute where N is the number of syllables and the present invention determines the current syllabic rate as X syllables/minute, the present invention employs the time compression/expansion utility to change the speaking rate by a factors of N/X); 

determining an average inter-syllable time for the buffer period (Paragraphs 80-81, teach that the rate of expansion and compression cannot exceed the rate in which data is being written into the circular audio buffer. As long as there is sufficient space in the audio buffer, the number of frames and/or windows being processed at any given time can change. This enables Speed slow down and speed up as audio is being played out of the buffer in real-time. Also see paragraphs 64-65);

determining a rate adjustment required for the average inter-syllable time of the buffer period to conform to a target speech rate (Para 37, teaches a syllabic rate that describes the rate of speech by the number of syllables per unit time as a numeric value. The word rate describes how many words are spoken per unit time. If a listener has a preferred hearing rate of N syllables a minute where N is the number of syllables and the present invention determines the current syllabic rate as X syllables/minute, the present invention employs the time compression/expansion utility to change the speaking rate by a factors of N/X); 

applying a smoothing filter to smooth the rate adjustment across multiple sequential buffer periods (Para 69, teaches the SOLA method allows for both time compression and expansion. Time compression is a process, which blends periodic sections of the speech signal. The blending is a triangular overlap and add technique used to smooth out the shifted frame boundaries. Time expansion is essentially a process, which replicates and inserts sections of periodic speech and performs the same blending to smooth the transition regions); 

and adjusting the buffer period based on the smoothed rate adjustment (Para 70 and figures 4-16, teach a flow diagram corresponding to the overall process of performing real-time SOLA operations in a single outbound circular audio buffer using modulo pointers). 

With regards to claim 2, Boillot teaches the method of claim 1, including overlapping buffer periods and modifying a signal in an overlapping portion for a smooth transition between the smoothed rate adjustments of adjacent buffer periods (Para 69, teaches the SOLA method allows for both time compression and expansion. Time compression is a process, which blends periodic sections of the speech signal. The blending is a triangular overlap and add technique used to smooth out the shifted frame boundaries. Time expansion is essentially a process, which replicates and inserts sections of periodic speech and performs the same blending to smooth the transition regions). 

With regards to claim 3, Boillot teaches the method of claim 1, wherein adjusting the buffer period stretches the buffer period by a determined positive or negative ratio based on the smoothed rate adjustment (Para 95, teaches that if sola_enable_cf is positive the range 0 to N/2 is selected by a right shift of 1, and if sola_enable_cf is negative the range 0 to N/4 is selected by a right shift of 2 given the frame length N. Para 84, teaches that if the frame length is N samples 902, the correlation index must fall between 0 and N/2 910). 

With regards to claim 4, Boillot teaches the method of claim 1, wherein the method is carried out for a real-time audio input with a latency equal to the buffer period (Para 70, teaches that the entire operation happens on the "fly". Para 65, teaches that the autocorrelation reveals the degree of correlation in a signal and can be used with an iterative peak picking method to find the number of lags, which correspond to the maximum correlation. Para 85, teaches that x:rundindex1 1008, is the subtraction of the correlation index from the endpoint location of oldwin A2 502 on the outbound audio buffer). 

With regards to claim 5, Boillot teaches the method of claim 1, wherein the method is carried out for an offline audio input and includes adjusting the buffer period based on the speech input to avoid a buffer cut off mid-utterance (Para 64, teaches that the remaining samples of newin 604 i.e., the frame length minus acorrindex 906, will need to be left shifted after SOLA processing to abut with the blended region. Para 88, teaches that in contrast to prior art system, SOLA routines are all processed not in real-time on the outbound audio buffer but rather processed and buffered separately in additional memory space and not in the outbound audio buffer. 

With regards to claim 6, Boillot teaches the method of claim 1, wherein the method further comprises using a temporary buffer of a same length as buffer period for syllable onset analysis (Para 85, teaches that x:rundindex1 1008, is the subtraction of the correlation index from the endpoint location of oldwin A2 502 on the outbound audio buffer. The region between this point and the end of oldwin A2 502 is the SOLA region 1010. Only the first acorrindex 906 samples of newin 504 will be used in the blending since acorrindex 906 describes the number of samples to shift back newin 604 for maximum correlation. Accordingly, only the last acorrindex 906 samples of oldwin 602 will be used in the blending. Thus, the remaining samples of newin 604 (i.e., the frame length minus acorrindex 906) will need to be left shifted after SOLA processing to abut with the blended region);

and applying a bandpass filter to the temporary buffer based on a selected voice range (Para 95, teaches that there is a cross-correlation range to search for the lag, which is 0 to N/2 for the two compression modes and 0 to N/4 for the two expansion modes, where N is the frame length. For compression, a larger search range provides maximal compression, and for expansion a smaller search range provides maximal expansion. The sola_enable_cf data word specifies the type of rate adjustment: (+2) full compression 412, (+1) half compression, (+1) 404, half expansion 418, and (-2) full expansion 418. These numeric values are to select the SOLA mode. A (+) value denotes compassion and a (-) value denotes expansion. The numeric vales of 1 and 2 are only to designate the mode level as half or full. The sola_enable_cf also sets the range for the cross-correlation lag search on every call to the acorr method. Thus, compression and expansion levels can change the playback rate as speech is being played). 

With regards to claim 9, Boillot teaches the method of claim 1, wherein applying the smoothing filter to smooth the rate adjustments across the multiple sequential buffer periods applies the smoothing filter in a moving average window (Paragraphs 78-79, teach use of both Oldwin pointer--pointer to the start of an old window in a frame and the Newin pointer--pointer to the start of a new window in a frame immediately adjacent in time to the old window. The frame for the oldwin pointer and the new window pointer do not have to be the same frame but rather the frames only need to be adjacent in time). 

With regards to claim 10, Boillot teaches the method of claim 1, the method further comprising receiving a target speech rate for a user for the audio input (Para 33, teaches a user adjustable feature to change the voice playback rate to the listeners' preferred listening rate or comfort). 

With regards to claim 11, Boillot teaches the method of claim 1, wherein the audio input includes one or more voices (Para 33, teaches a speaker’s voice or speech). 

With regards to claims 12-17 and 19, these are system claims for the corresponding method claims 1-6 and 9. These two sets of claims are related as method and apparatus of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claims 12-17 and 19 are similarly rejected under the same rationale as applied above with respect to method claims 1-6 and 9.

With regards to claims 24-25, these are computer program product (CPP) claims for the corresponding method claims 1-2. These two sets of claims are related as method and CPP of using the same, with each claimed CPP element's function corresponding to the claimed method step. Accordingly, claims 24-25 are similarly rejected under the same rationale as applied above with respect to method claims 1-2.

With regards to claims 20-22, please see the rejection of method claims 1-3 and 9 above. 

With regards to claim 23, this is a system claim for the corresponding method claims 1-2. These two sets of claims are related as method and apparatus of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claim 23 is similarly rejected under the same rationale as applied above with respect to method claims 1-2.

Allowable Subject Matter

7.	Claims 7-8 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The prior art of record, alone or in combination, does not currently suggest or teach the invention as outlined in these claims. More detailed reasons for allowance will be outlined as and when the Application proceeds to allowability.

Conclusion

8.	The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Chen (U.S. Patent Application Publication # 2008/0304678 A1), Chong-White (U.S. Patent # 7065485 B1). These references are also included in the PTO-892 form attached with this office action.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. If you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). In case you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below.  The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone).  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.

/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)

Read full office action

Prosecution Timeline

Aug 02, 2024

Application Filed

Apr 13, 2026

Non-Final Rejection mailed — §101, §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/341,412

Patent 12626692

GENERATIVE LANGUAGE MODELS

2y 10m to grant Granted May 12, 2026

18/524,771

Patent 12620389

PREDICTOR-CORRECTOR METHOD FOR INCLUDING SPEECH HINTS IN AUTOMATIC SPEECH RECOGNITION

2y 5m to grant Granted May 05, 2026

18/582,462

Patent 12597428

DISPLAY DEVICE, CONTROL METHOD OF DISPLAY DEVICE, AND RECORDING MEDIUM

2y 1m to grant Granted Apr 07, 2026

18/670,148

Patent 12591736

FINE-TUNED LARGE LANGUAGE MODELS FOR CAPABILITY CONTROLLER

1y 10m to grant Granted Mar 31, 2026

18/453,338

Patent 12579983

SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM

2y 6m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

85%

Grant Probability

96%

With Interview (+11.6%)

2y 8m (~10m remaining)

Median Time to Grant

Low

PTA Risk

Based on 463 resolved cases by this examiner. Grant probability derived from career allowance rate.