Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-6, 9-19 and 12-21 are pending.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-6, 9-19 and 12-21 are directed to an abstract idea without significantly more.
Regarding Claims 1, 9 and 10, the claims recite a method, device and non-transitory computer-readable medium for detecting a keyword, comprising: determining, for a target audio clip in a target audio, a first probability that a target audio frame in the target audio clip corresponds to a target character unit, wherein the first probability indicates a probability that the target audio frame is a voice frame of the target character unit, the target character unit is a character unit comprised in a preset keyword, and a position of the target audio frame in the target audio clip corresponds to a position of the target character unit in the preset keyword; determining, based on the first probability, a second probability that the target audio clip corresponds to the preset keyword, the second probability indicating a probability that respective audio frames in the target audio clip are sequentially respective character units in the preset keyword; and determining, based on the second probability, whether the target audio clip is a voice clip of the preset keyword.
The limitations of the claims, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a "processor", nothing in the claim element precludes the step from practically being performed in the mind. Each of the limitations in the claim can be performed in the human mind including observation, evaluation and judgement. For example, the limitation for " determining, for a target audio clip in a target audio, a first probability that a target audio frame in the target audio clip corresponds to a target character unit” can be completed by a person listening to an audio clip and calculating a probability that the clip includes a target character. The limitation for determining “a position of the target audio frame in the target audio clip corresponds to a position of the target character unit in the preset keyword” can be completed by a person listening to the audio clip and writing down a sequence of characters of the words from the clip and determine which of the characters of the clip is a key character and whether there are consecutive characters in the clip that are key characters and where the characters appear in the sequence of characters. The limitation for “determining, based on the first probability, a second probability that the target audio clip corresponds to the preset keyword, the second probability indicating a probability that respective audio frames in the target audio clip are sequentially respective character units in the preset keyword” can be completed observing on paper the sequence of characters determined earlier and determining whether a number of consecutive characters are key characters of a keyword. The limitation for “determining, based on the second probability, whether the target audio clip is a voice clip of the preset keyword” can be completed by judging from the determined sequence of characters on paper whether a number of key characters are present for keyword and determining that the keyword is in the audio clip.
This judicial exception is not integrated into a practical application. In particular, the claims only recite one additional element - using a processor to perform the processing steps. The processor is recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Therefore, the claims are directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a processor to perform the processing steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, claims are not patent eligible.
Regarding Claims 2-6, 11-19 and 12-21, the rationale provided for Claims 1, 9 and 10 is incorporated herein.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5-6, 9-12, 16-17, 20 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Gao et al. (US Pub. 2020/0357386 A1) in view of Washio et al. (US Pub. 2011/0218805 A1).
Regarding Claims 1, 9 and 10, Gao teaches a keyword detection method (see Fig.1 and paragraph [0025]), comprising:
determining, for a target audio clip in a target audio, a first probability that a target audio frame in the target audio clip corresponds to a target character unit (see Fig.2 (201,202,203) and paragraphs [0043-0045], calculating the posterior probability for each character and frame), wherein the first probability indicates a probability that the target audio frame is a voice frame of the target character unit (see Fig.2 (201,202,203) and paragraphs [0043-0045]), and the target character unit is a character unit comprised in a preset keyword (see Fig.2 (203) and paragraphs [0045-0046]);
determining, based on the first probability, a second probability that the target audio clip corresponds to the preset keyword (see Fig.2 (204), paragraphs [0050-0054] and paragraph [0058], determining confidence of the presence of a number of N key characters), the second probability indicating a probability that respective audio frames in the target audio clip are sequentially respective character units in the preset keyword (see Fig.2 (204), paragraphs [0050-0054] and paragraph [0058], determining confidence of the presence of a number of N consecutive key characters);
and determining, based on the second probability, whether the target audio clip is a voice clip of the preset keyword (see Fig.2 (204,205) and paragraph 0065]).
Gao fails to teach determining a position of the target audio frame in the target audio clip corresponds to a position of the target character unit in the preset keyword.
Washio, however, teaches determining the position and score of each audio frame in a speech sample containing a keyword (see Fig.3, Fig.4, paragraph [0036] and paragraph [0046], pointers indicating the positions of the audio frames).
It would have been obvious for one skilled in the art, before the effective filing date of the application, to include to Gao’s method the step for determining a position of the target audio frame in the target audio clip corresponds to a position of the target character unit in the preset keyword. The motivation would be to identify the portion of the audio clip or the consecutive frames that contains at least two key characters of a particular keyword.
Regarding Claims 2, 12 and 17. Gao further teaches determining an audio feature of the target audio frame (see Fig.2 (202) and paragraphs [0034-0035], extracting eigenvector of the character frame); and inputting the audio feature into a trained neural network model to obtain the first probability that the target audio frame corresponds to the target character unit (see Fig.3 and paragraphs [0050-0053], inputting the eigenvectors into the neural network model for the character determination).
Regarding Claims 5, 15 and 20. Gao further teaches determining if the second probability is greater than a preset threshold, determining that the target audio clip is a voice segment of the preset keyword (see Fig.2 (205) and paragraphs [0065-0066]).
Regarding Claims 6, 16 and 21. Gao further teaches wherein the character unit comprises a Chinese character (see Fig.3, paragraph [0020] and paragraph [0023]).
Claim Objections
Claims 3-4, 13-14 and 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding Claims 3, 13 and 18, the following is a statement of reasons for the indication of allowable subject matter: The prior art of record does teach, disclose or suggest the claimed limitation of (in combination with all other limitations in the claim) “wherein, the determining, based on the first probability, a second probability that the target audio clip corresponds to the preset keyword, comprises: determining a first probability that a target audio frame corresponds to a last target character unit in the preset keyword; determining a maximum value of confidences of a second-from-bottom target character unit in the preset keyword appearing in an audio frame before the target audio frame in the target audio clip; and determining the sum of the maximum value and the first probability that the target audio frame corresponds to the last target character unit in the preset keyword as the second probability; wherein the target audio frame is any audio frame in the target audio clip”. Similar features are claimed in Claims 4, 14 and 19.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VU B HANG whose telephone number is (571)272-0582.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan, can be reached at (571)272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/VU B HANG/Primary Examiner, Art Unit 2654