DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 4 are 13 objected to because of the following informalities: the acronym “ReLu” needs to be spelled out the first time it is used in each claim group. Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gulati et al. (“Conformer: Convolution-augmented Transformer for Speech Recognition”).
Regarding claim 1, Gulati discloses one or more non-transitory computer readable media comprising instructions which, when executed by one or more hardware processors, cause performance of operations comprising:
accessing encoded time series data generated by an encoder of a speech recognition model (see section 2 in page 2 – audio encoder; also see section 2.4 and section 3.1 “Data”);
applying at least a convolution filter to the encoded time series data to generate a modulation spectrum (see section 2.2 and fig. 2; and table 7 – convolution kernel size); and
inputting the modulation spectrum to a decoder of the speech recognition model (see section 2.2, section 3.2, Table 1 – output of the conformer encoder is input to the LSTM decoder for generating the speech recognition output).
Regarding claim 2, Gulati discloses wherein
the encoded time series data comprises a plurality of time frames each having a dimensionality (Table 1 – encoder dimension 144, 256, 512; section 3.1); and
applying the convolution filter to the encoded time series data comprises computing a plurality of dot products of values of columns of a convolution matrix and values of columns of a normalized matrix of feature values indexed by time frame and dimension (Section 2.2 - 2.3 - Layeynorm; fig. 2 ).
Regarding claim 3, Gulati discloses wherein the convolution filter uses a filter width between five (5) and twenty-five (25), a number of time frames between fifty (50) and five hundred (500), and an embedding dimensionality matching a dimensionality of an architecture of the speech recognition model (see Table 7 – kernel size of 7-65 is the filter width and Table 1 disclsoes encoder embedding dimension of 144, 256 and 512).
Regarding claim 4, Gulati discloses wherein the operations further comprise applying a ReLU nonlinearity function to an output of the convolution filter to obtain a ReLU nonlinearity result, and wherein the modulation spectrum is generated based at least in part on the ReLU nonlinearity result (section 2.2 and fig. 2 and see Table 3 using ReLU).
Regarding claim 5, Gulati discloses wherein the operations further comprise, prior to applying the convolution filter to the encoded time series data: applying a normalization function to the encoded time series data (Section 2.2 -2.4 and fig 2 – Layernorm).
Regarding claim 6, Gulati discloses wherein the operations further comprise:
applying a normalization function to the modulation spectrum (section 2.2. – Batchnorm is applied after depthwise convolution; section 2.4).
Regarding claim 7, Gulati discloses wherein the operations further comprise residually connecting the encoded time series data to the modulation spectrum (section 2.1 and fig. 1 and fig. 2 – shows residual connection).
Regarding claim 8, Gulati discloses wherein
the encoded time series data comprises a plurality of time frames (section 3.1, Table 1); and
the operations comprise applying a normalization function to the encoded time series data (section 2.1-2.4 and Equation 1) by:
generating a matrix for the encoded time series data, the matrix comprising a plurality of rows indexed by time frame and a plurality of columns indexed by dimension (the disclosed Conformer architecture = Time frames X encoder dimension);
for each cell of the matrix for the encoded time series data, performing matrix operations on the cell to determine a normalized value by: subtracting a mean value for the matrix from a cell value for the cell to obtain a corresponding result; dividing the corresponding result by a standard deviation value for the matrix to obtain the normalized value; and storing the normalized value in a corresponding matrix cell (section 2.3-2.4 and fig. 1 – See Layernorm function which is the same as the limitation).
Regarding claim 9, Gulati discloses wherein the instructions further cause performance of operations comprising:
decoding the modulation spectrum at the decoder (section 2, section 3.2; table 1 – conformer block is provided to the LSTM decoder); and
outputting one or more subword units from the decoder (section 3.2 – wordpiece).
Regarding claims 10 and 19, see rejection of claim 1.
Regarding claims 11 and 20, see rejection of claim 2.
Regarding claim 12, see rejection of claim 3.
Regarding claim 13, see rejection of claim 4.
Regarding claim 14, see rejection of claim 5.
Regarding claim 15, see rejection of claim 6.
Regarding claim 16, see rejection of claim 7.
Regarding claim 17, see rejection of claim 8.
Regarding claim 18, see rejection of claim 9.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAFIZ E HOQUE whose telephone number is (571)270-1811. The examiner can normally be reached M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached at (571)272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NAFIZ E HOQUE/ Primary Examiner, Art Unit 2693