DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Comments
Examiner notes claim 4 where applicant reads one value as multiple values. The examiner notes the same conventions will potentially be used when mapping the prior art to determine allowability.
“a correlation between the prediction item information” in claim 8 is read as the correlation between the prediction item information and speech.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-10 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Vaillancourt(US 20240321285 A1).
As per claim 1, Vaillancourt discloses a method of improving recognition accuracy of acoustic data, which is performed by one or more processors of a computing device, the method comprising:
configuring one or more acoustic frames (variable subframes) based on acoustic data (101 fig. 2);
processing each of the one or more acoustic frames as an input of an acoustic recognition model to output predicted values corresponding to each acoustic frame (para. 82: probability);
identifying one or more recognized acoustic frames through threshold analysis based on the predicted values corresponding to each acoustic frame (detecting speech or music per frame per para 82);
identifying a converted acoustic frame through time series analysis based on the one or more recognized acoustic frames (any of 211,212,264,266); and
converting a predicted value corresponding to the converted acoustic frame (207, 257,113).
As per claim 2, the method of claim 1, wherein the configuring of the one or more acoustic frames based on the acoustic data includes configuring the one or more acoustic frames by dividing the acoustic data to have a size of a preset first time unit (subframes per para 32).
As per claim 3, the method of claim 2, wherein a start time of each of the one or more acoustic frames is determined to have a size difference of a second time unit from a start time of each of adjacent acoustic frames (the subframes per para 32 are relative to the frames and to the adjacent frames).
As per claim 4, the method of claim 1, wherein the predicted value includes one or more pieces of prediction item information and predicted numerical information corresponding to each of the one or more pieces of prediction item information (the associated subframe signaling per para 32), and
the threshold analysis is an analysis that identifies the one or more recognized acoustic frames by determining whether each of the one or more pieces of predicted numerical information corresponding to each of the acoustic frames is greater than or equal to a predetermined threshold value corresponding to each of the prediction item information (the threshold functions per para 82).
As per claim 5, the method of claim 4, wherein the identifying of the converted acoustic frame through the time series analysis includes:
identifying prediction item information corresponding to each of the one or more recognized acoustic frames (the probability,82);
determining whether the identified prediction item information is repeated a predetermined threshold number of times or more for a predetermined reference time (the prediction occurs per frame or subframe, which is repeated for a predetermined threshold number of times, which is determined by the subframe relative to the frame); and
identifying the converted acoustic frame based on the determination result (the frame is identified as speech or not per stage 205).
As per claim 6, the method of claim 5, further comprising: identifying a correlation between each recognized acoustic frame based on the prediction item information corresponding to each of one or more recognized acoustic frames (the prediction is a an indicator of a correlation between the current frame and a recognized speech or not speech frame) ; and
determining whether to adjust the threshold values and the threshold number of times corresponding to each of the one or more acoustic frames based on the correlation (the threshold value is adjusted based on the adapted per para 35: the time-domain excitation contribution is filtered out above the cut-off frequency. The filtering operation permits to keep valuable information coded with the time-domain excitation contribution and remove the non-valuable information above the cut-off frequency, where a band that is set to 0 comprises 0 probability) (the threshold number of times is determined by the relationship between the subframes and the frames, where the subframes are adapted per para 42: The sub-frame length decision is based on the available bitrate and on an analysis of the input sound signal, particularly the spectral dynamics of this input sound signal) (where it is based on the detection/correlation between the input audio and identified speech because speech affects the frequency content of the input audio signal).
As per claim 7, the method of claim 1, wherein the conversion for the predicted value includes at least one of a noise conversion that converts an output of the acoustic recognition model based on the converted acoustic frame into a non-recognition item (the non speech option in 205 as affecting the processing in 207 and 113), and
an acoustic item conversion (207) that converts prediction item information related to the converted acoustic frame into corrected prediction item information (output of 207).
As per claim 8, the method of claim 7, wherein the corrected prediction item information is determined based on a correlation between the prediction item information (the function of 207 is based on 206).
As per claim 9, an apparatus for performing a method of improving recognition accuracy of acoustic data, comprising:
a memory configured to store one or more instructions; and
a processor configured to execute the one or more instructions stored in the memory, wherein the processor performs the method of claim I by executing the one or more instructions (the system and method of claim 1 require a processor memory and software in order to be implemented).
As per claim 10, a computer-readable recording medium on which a program for executing a method of improving recognition accuracy of acoustic data with a computing device is recorded, wherein the method comprises:
configuring one or more acoustic frames based on acoustic data;
processing each of the one or more acoustic frames as an input of an acoustic recognition model to output predicted values corresponding to each acoustic frame;
identifying one or more recognized acoustic frames through threshold analysis based on the predicted values corresponding to each acoustic frame;
identifying a converted acoustic frame through time series analysis based on the one or more recognized acoustic frames; and
converting a predicted value corresponding to the converted acoustic frame.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER KRZYSTAN whose telephone number is 571-272-7498, and whose email address is alexander.krzystan@uspto.gov
The examiner can usually be reached on m-f 7:30-4:00 est.
If attempts to reach the examiner by telephone or email are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on (571) 272-7547.
The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications.
/ALEXANDER KRZYSTAN/Primary Examiner, Art Unit 2653
Examiner Alexander Krzystan
March 9, 2026