Last updated: May 29, 2026

Application No. 18/778,230

Use Of Modulation Spectrums In Automatic Speech Recognition Models

Non-Final OA §102

Filed

Jul 19, 2024

Priority

Mar 08, 2024 — provisional 63/563,159

Examiner

HOQUE, NAFIZ E

Art Unit

2693

Tech Center

2600 — Communications

Assignee

Oracle International Corporation

OA Round

1 (Non-Final)

Interview Optional

— +23.4% interview lift. Examiner has a relatively high allowance rate (75%); +23.4% interview lift. A written response may suffice.

Based on 613 resolved cases, 2023–2026

Examiner Intelligence

HOQUE, NAFIZ E View full profile →

Grants 75% — above average

Career Allowance Rate

461 granted / 613 resolved

+13.2% vs TC avg

Strong +23% interview lift

Without

With

+23.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

22 currently pending

Career history

632

Total Applications

across all art units

Statute-Specific Performance

§101

4.2%

-35.8% vs TC avg

§103

70.9%

+30.9% vs TC avg

§102

14.8%

-25.2% vs TC avg

§112

5.3%

-34.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 613 resolved cases

Office Action

§102

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 4 are 13 objected to because of the following informalities: the acronym “ReLu” needs to be spelled out the first time it is used in each claim group.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gulati et al. (“Conformer: Convolution-augmented Transformer for Speech Recognition”).
Regarding claim 1, Gulati discloses one or more non-transitory computer readable media comprising instructions which, when executed by one or more hardware processors, cause performance of operations comprising:
accessing encoded time series data generated by an encoder of a speech recognition model (see section 2 in page 2 – audio encoder; also see section 2.4 and section 3.1 “Data”);
applying at least a convolution filter to the encoded time series data to generate a modulation spectrum (see section 2.2 and fig. 2; and table 7 – convolution kernel size); and
inputting the modulation spectrum to a decoder of the speech recognition model (see section 2.2, section 3.2, Table 1 – output of the conformer encoder is input to the LSTM decoder for generating the speech recognition output).
Regarding claim 2, Gulati discloses wherein
the encoded time series data comprises a plurality of time frames each having a dimensionality (Table 1 – encoder dimension 144, 256, 512; section 3.1); and
applying the convolution filter to the encoded time series data comprises computing a plurality of dot products of values of columns of a convolution matrix and values of columns of a normalized matrix of feature values indexed by time frame and dimension (Section 2.2 - 2.3 - Layeynorm; fig. 2 ).
Regarding claim 3, Gulati discloses wherein the convolution filter uses a filter width between five (5) and twenty-five (25), a number of time frames between fifty (50) and five hundred (500), and an embedding dimensionality matching a dimensionality of an architecture of the speech recognition model (see Table 7 – kernel size of 7-65 is the filter width and Table 1 disclsoes encoder embedding dimension of 144, 256 and 512).
Regarding claim 4, Gulati discloses wherein the operations further comprise applying a ReLU nonlinearity function to an output of the convolution filter to obtain a ReLU nonlinearity result, and wherein the modulation spectrum is generated based at least in part on the ReLU nonlinearity result (section 2.2 and fig. 2 and see Table 3 using ReLU).
Regarding claim 5, Gulati discloses wherein the operations further comprise, prior to applying the convolution filter to the encoded time series data: applying a normalization function to the encoded time series data (Section 2.2 -2.4 and fig 2 – Layernorm).
Regarding claim 6, Gulati discloses wherein the operations further comprise:
applying a normalization function to the modulation spectrum (section 2.2. – Batchnorm is applied after depthwise convolution; section 2.4).
Regarding claim 7, Gulati discloses wherein the operations further comprise residually connecting the encoded time series data to the modulation spectrum (section 2.1  and  fig. 1 and fig. 2 – shows residual connection).
Regarding claim 8, Gulati discloses wherein
the encoded time series data comprises a plurality of time frames (section 3.1, Table 1); and
the operations comprise applying a normalization function to the encoded time series data (section 2.1-2.4 and Equation 1) by:
generating a matrix for the encoded time series data, the matrix comprising a plurality of rows indexed by time frame and a plurality of columns indexed by dimension (the disclosed Conformer architecture = Time frames X encoder dimension);
for each cell of the matrix for the encoded time series data, performing matrix operations on the cell to determine a normalized value by: subtracting a mean value for the matrix from a cell value for the cell to obtain a corresponding result; dividing the corresponding result by a standard deviation value for the matrix to obtain the normalized value; and storing the normalized value in a corresponding matrix cell (section 2.3-2.4 and fig. 1 – See Layernorm function which is the same as the limitation).
Regarding claim 9, Gulati discloses wherein the instructions further cause performance of operations comprising:
decoding the modulation spectrum at the decoder (section 2, section 3.2; table 1 – conformer block is provided to the LSTM decoder); and
outputting one or more subword units from the decoder (section 3.2 – wordpiece).
Regarding claims 10 and 19, see rejection of claim 1.
Regarding claims 11 and 20, see rejection of claim 2.
Regarding claim 12, see rejection of claim 3.
Regarding claim 13, see rejection of claim 4.
Regarding claim 14, see rejection of claim 5.
Regarding claim 15, see rejection of claim 6.
Regarding claim 16, see rejection of claim 7.
Regarding claim 17, see rejection of claim 8.
Regarding claim 18, see rejection of claim 9.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAFIZ E HOQUE whose telephone number is (571)270-1811. The examiner can normally be reached M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached at (571)272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/NAFIZ E HOQUE/           Primary Examiner, Art Unit 2693

Read full office action

Prosecution Timeline

Jul 19, 2024

Application Filed

Apr 06, 2026

Non-Final Rejection mailed — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/484,512

Patent 12639866

PIPELINE FOR GENERATING EDITABLE GRAPHIC DESIGNS FROM NATURAL LANGUAGE PROMPTS

2y 7m to grant Granted May 26, 2026

18/480,039

Patent 12620393

TECHNOLOGIES FOR LEVERAGING MACHINE LEARNING TO PREDICT EMPATHY FOR IMPROVED CONTACT CENTER INTERACTIONS

2y 7m to grant Granted May 05, 2026

18/649,354

Patent 12619830

OPTIMIZING PERFORMANCE OF CONVERSATIONAL INTERFACE APPLICATIONS USING EXAMPLE FORGETTING

2y 0m to grant Granted May 05, 2026

18/695,752

Patent 12621386

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM

2y 1m to grant Granted May 05, 2026

18/384,428

Patent 12614041

NONVERBAL MESSAGE EXTRACTION AND GENERATION

2y 6m to grant Granted Apr 28, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

75%

Grant Probability

99%

With Interview (+23.4%)

3y 1m (~1y 3m remaining)

Median Time to Grant

Low

PTA Risk

Based on 613 resolved cases by this examiner. Grant probability derived from career allowance rate.