Prosecution Insights
Last updated: May 29, 2026
Application No. 18/778,230

Use Of Modulation Spectrums In Automatic Speech Recognition Models

Non-Final OA §102
Filed
Jul 19, 2024
Priority
Mar 08, 2024 — provisional 63/563,159
Examiner
HOQUE, NAFIZ E
Art Unit
2693
Tech Center
2600 — Communications
Assignee
Oracle International Corporation
OA Round
1 (Non-Final)
75%
Grant Probability
Favorable
1-2
OA Rounds
1y 3m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allowance Rate
461 granted / 613 resolved
+13.2% vs TC avg
Strong +23% interview lift
Without
With
+23.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
22 currently pending
Career history
632
Total Applications
across all art units

Statute-Specific Performance

§101
4.2%
-35.8% vs TC avg
§103
70.9%
+30.9% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
5.3%
-34.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 613 resolved cases

Office Action

§102
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Objections Claims 4 are 13 objected to because of the following informalities: the acronym “ReLu” needs to be spelled out the first time it is used in each claim group. Appropriate correction is required. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gulati et al. (“Conformer: Convolution-augmented Transformer for Speech Recognition”). Regarding claim 1, Gulati discloses one or more non-transitory computer readable media comprising instructions which, when executed by one or more hardware processors, cause performance of operations comprising: accessing encoded time series data generated by an encoder of a speech recognition model (see section 2 in page 2 – audio encoder; also see section 2.4 and section 3.1 “Data”); applying at least a convolution filter to the encoded time series data to generate a modulation spectrum (see section 2.2 and fig. 2; and table 7 – convolution kernel size); and inputting the modulation spectrum to a decoder of the speech recognition model (see section 2.2, section 3.2, Table 1 – output of the conformer encoder is input to the LSTM decoder for generating the speech recognition output). Regarding claim 2, Gulati discloses wherein the encoded time series data comprises a plurality of time frames each having a dimensionality (Table 1 – encoder dimension 144, 256, 512; section 3.1); and applying the convolution filter to the encoded time series data comprises computing a plurality of dot products of values of columns of a convolution matrix and values of columns of a normalized matrix of feature values indexed by time frame and dimension (Section 2.2 - 2.3 - Layeynorm; fig. 2 ). Regarding claim 3, Gulati discloses wherein the convolution filter uses a filter width between five (5) and twenty-five (25), a number of time frames between fifty (50) and five hundred (500), and an embedding dimensionality matching a dimensionality of an architecture of the speech recognition model (see Table 7 – kernel size of 7-65 is the filter width and Table 1 disclsoes encoder embedding dimension of 144, 256 and 512). Regarding claim 4, Gulati discloses wherein the operations further comprise applying a ReLU nonlinearity function to an output of the convolution filter to obtain a ReLU nonlinearity result, and wherein the modulation spectrum is generated based at least in part on the ReLU nonlinearity result (section 2.2 and fig. 2 and see Table 3 using ReLU). Regarding claim 5, Gulati discloses wherein the operations further comprise, prior to applying the convolution filter to the encoded time series data: applying a normalization function to the encoded time series data (Section 2.2 -2.4 and fig 2 – Layernorm). Regarding claim 6, Gulati discloses wherein the operations further comprise: applying a normalization function to the modulation spectrum (section 2.2. – Batchnorm is applied after depthwise convolution; section 2.4). Regarding claim 7, Gulati discloses wherein the operations further comprise residually connecting the encoded time series data to the modulation spectrum (section 2.1 and fig. 1 and fig. 2 – shows residual connection). Regarding claim 8, Gulati discloses wherein the encoded time series data comprises a plurality of time frames (section 3.1, Table 1); and the operations comprise applying a normalization function to the encoded time series data (section 2.1-2.4 and Equation 1) by: generating a matrix for the encoded time series data, the matrix comprising a plurality of rows indexed by time frame and a plurality of columns indexed by dimension (the disclosed Conformer architecture = Time frames X encoder dimension); for each cell of the matrix for the encoded time series data, performing matrix operations on the cell to determine a normalized value by: subtracting a mean value for the matrix from a cell value for the cell to obtain a corresponding result; dividing the corresponding result by a standard deviation value for the matrix to obtain the normalized value; and storing the normalized value in a corresponding matrix cell (section 2.3-2.4 and fig. 1 – See Layernorm function which is the same as the limitation). Regarding claim 9, Gulati discloses wherein the instructions further cause performance of operations comprising: decoding the modulation spectrum at the decoder (section 2, section 3.2; table 1 – conformer block is provided to the LSTM decoder); and outputting one or more subword units from the decoder (section 3.2 – wordpiece). Regarding claims 10 and 19, see rejection of claim 1. Regarding claims 11 and 20, see rejection of claim 2. Regarding claim 12, see rejection of claim 3. Regarding claim 13, see rejection of claim 4. Regarding claim 14, see rejection of claim 5. Regarding claim 15, see rejection of claim 6. Regarding claim 16, see rejection of claim 7. Regarding claim 17, see rejection of claim 8. Regarding claim 18, see rejection of claim 9. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAFIZ E HOQUE whose telephone number is (571)270-1811. The examiner can normally be reached M-F 8-5. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached at (571)272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /NAFIZ E HOQUE/ Primary Examiner, Art Unit 2693
Read full office action

Prosecution Timeline

Jul 19, 2024
Application Filed
Apr 06, 2026
Non-Final Rejection mailed — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12639866
PIPELINE FOR GENERATING EDITABLE GRAPHIC DESIGNS FROM NATURAL LANGUAGE PROMPTS
2y 7m to grant Granted May 26, 2026
Patent 12620393
TECHNOLOGIES FOR LEVERAGING MACHINE LEARNING TO PREDICT EMPATHY FOR IMPROVED CONTACT CENTER INTERACTIONS
2y 7m to grant Granted May 05, 2026
Patent 12619830
OPTIMIZING PERFORMANCE OF CONVERSATIONAL INTERFACE APPLICATIONS USING EXAMPLE FORGETTING
2y 0m to grant Granted May 05, 2026
Patent 12621386
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
2y 1m to grant Granted May 05, 2026
Patent 12614041
NONVERBAL MESSAGE EXTRACTION AND GENERATION
2y 6m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+23.4%)
3y 1m (~1y 3m remaining)
Median Time to Grant
Low
PTA Risk
Based on 613 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month