DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on May 14, 2024.
Claims 1-20 are pending in the application. As such, claims 1-20 have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings were received on May 14, 2024. These drawings have been accepted and considered by the Examiner.
Claim Objections
Claims 17-20 are objected to because of the following informalities:
Claims 17-20 each in line 1 reads “The method.” Examiner believes this to be a clerical error and it is intended to read “The operating method” in order to be consistent with the rest of the claims.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-6, 8-13 and 15-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claims 1, 8 and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite:
An electronic device comprising:
a memory storing at least one instruction; and
at least one processor operatively connected to the memory and configured to execute the at least one instruction to:
perform speaker verification on a voice input to determine whether the voice input matches a voice of an enrolled speaker,
based on determining that the voice input does not match the voice of the enrolled speaker, perform first speech recognition on the voice input based on a first automatic speech recognition (ASR) model, and
based on determining that the voice input matches the voice of the enrolled speaker, perform second speech recognition on the voice input based on a sequence summarizing neural network (SSN) and a second ASR model.
The claim limitations, under their broadest reasonable interpretation, cover performance of the limitations in the mind. For example,
“perform speaker verification on a voice input to determine whether the voice input matches a voice of an enrolled speaker” in the context of this claim encompasses a person listening to a voice and deciding if it is a known voice,
“perform first speech recognition on the voice input based on a first automatic speech recognition (ASR) model” in the context of this claim encompasses a person listening to a speaking voice and deciding what is being said,
“perform second speech recognition on the voice input” in the context of this claim encompasses a person listening to a speaking voice again and deciding what is being said.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim recites these additional elements. These additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea.
a memory
at least one processor
a first automatic speech recognition (ASR) model
a sequence summarizing neural network (SSN)
a second ASR model.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea that do not provide an inventive concept. The claim is not patent eligible.
The dependent claims do not add limitations that would either integrate the recited abstract idea into a practical application or could help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim.
Claims 2, 9 and 16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite:
wherein the second ASR model is configured by selectively adding an adapter layer configured for personalization of the enrolled speaker to the first ASR model.
The additional limitations of the claim do not preclude the method from practically being performed in the mind. There are no additional mental steps.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim recites these additional elements. These additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea.
the second ASR model
the first ASR model.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea that do not provide an inventive concept. The claim is not patent eligible.
Claims 3, 10 and 17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite:
wherein the at least one processor is further configured to execute the at least one instruction to:
obtain a voice feature based on the voice input, and
perform the first speech recognition from the voice feature based on the first ASR model.
The additional limitations of the claim do not preclude the method from practically being performed in the mind. For example,
“obtain a voice feature based on the voice input” in the context of this claim encompasses a person listening to a voice and identifying a certain characteristic,
“perform the first speech recognition from the voice feature based on the first ASR model” in the context of this claim encompasses a person listening to a speaking voice and deciding what is being said.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim recites these additional elements. These additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea.
at least one processor
the first ASR model.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea that do not provide an inventive concept. The claim is not patent eligible.
Claims 4, 11 and 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite:
wherein the at least one processor is further configured to execute the at least one instruction to:
obtain a voice feature based on the voice input,
obtain an SSN adaptation feature from the voice feature based on the SSN, and
perform the second speech recognition from the SSN adaptation feature based on the second ASR model.
The additional limitations of the claim do not preclude the method from practically being performed in the mind. For example,
“obtain a voice feature based on the voice input” in the context of this claim encompasses a person listening to a voice and identifying a certain characteristic,
“obtain an SSN adaptation feature from the voice feature based on the SSN” in the context of this claim encompasses a person listening to a voice and identifying a certain characteristic and adapting it to an SSN,
“perform the second speech recognition from the SSN adaptation feature based on the second ASR model” in the context of this claim encompasses a person listening to a speaking voice again and deciding what is being said.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim recites these additional elements. These additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea.
at least one processor
an SSN
the second ASR model.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea that do not provide an inventive concept. The claim is not patent eligible.
Claims 5, 12 and 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite:
wherein the first ASR model comprises one of a transformer-based ASR model, a conformer-based ASR model, or a recurrent neural network (RNN)-transducer-based ASR model.
The additional limitations of the claim do not preclude the method from practically being performed in the mind. There are no additional mental steps.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim recites these additional elements. These additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea.
the first ASR model
a transformer-based ASR model
a conformer-based ASR model
a recurrent neural network (RNN)-transducer-based ASR model.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea that do not provide an inventive concept. The claim is not patent eligible.
Claims 6 and 13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite:
wherein the at least one processor is further configured to execute the at least one instruction to:
perform partial speech recognition on the voice input that is sequentially input, based on the first ASR model, and
based on the voice input being entirely input, perform the speaker verification.
The additional limitations of the claim do not preclude the method from practically being performed in the mind. For example,
“perform partial speech recognition on the voice input that is sequentially input” in the context of this claim encompasses a person listening to a speaking voice and deciding what is partially being said,
“perform the speaker verification” in the context of this claim encompasses a person listening to a voice and deciding if it is a known voice.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim recites these additional elements. These additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea.
at least one processor
the first ASR model.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements are generic computer components and the hardware is generic computer components that are merely being used as a tool to perform the abstract idea that do not provide an inventive concept. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 8-11 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zeljkovic et al. (US Patent Pub. No. 20130097682 A1), hereinafter Zeljkovic, in view of Veseley et al. (Veselý, Karel, Shinji Watanabe, Katerina Žmolíková, Martin Karafiát, Lukáš Burget, and Jan Honza Černocký. "Sequence summarizing neural network for speaker adaptation." In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5315-5319. IEEE, 2016.), hereinafter Veseley.
Regarding claims 1, 8 and 15, Zeljkovic teaches an electronic device, an intelligent server, and an operating method of an intelligent server (Zeljkovic in [0005] teaches using a computing device, and an authentication server, and a method for operating an authentication server)
comprising:
a memory storing at least one instruction (Zeljkovic in [0094] teaches using a mobile device with a processor for executing instructions stored in a memory);
and
at least one processor operatively connected to the memory and configured to execute the at least one instruction to (Zeljkovic in [0094] teaches using a mobile device with a processor for executing instructions stored in a memory):
perform speaker verification on a voice input to determine whether the voice input matches a voice of an enrolled speaker (Zeljkovic in [0005] teaches using speaker verification which includes an indication of whether or not the speech input matches a voice print, and in [0055] teaches using a voice enrollment procedure is performed to enroll the user's voice for the creation of a voice print to be utilized for authentication purposes),
based on determining that the voice input does not match the voice of the enrolled speaker,
perform first speech recognition on the voice input based on a first automatic speech recognition (ASR) model (Zeljkovic in [0043] teaches performing the speech recognition service to determine whether a speech sample included in the request matches a voice print previously established for a user that is allegedly associated with the speech sample).
Zeljkovic teaches determining that the voice input matches the voice of the enrolled speaker.
Zeljkovic does not teach, however Veseley teaches
[based on determining that the voice input matches the voice of the enrolled speaker],
perform second speech recognition on the voice input based on a sequence summarizing neural network (SSN) and a second ASR model (Veseley in [page 4, column 2, section 4] teaches performing speech recognition, and in [page 5, column 2, section 4.5] teaches using a Sequence Summarization Neural Network (SSNN)).
Veseley is considered to be analogous to the claimed invention because it is in the same field of Sequence Summarizing Neural Networks (SSNNs). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic further in view of Veseley to allow for using a Sequence Summarization Neural Network (SSNN). Motivation to do so would allow for appending both the i-vector and summary vector to the FBANK features which leads to additional improvement comparable to the performance of FMLLR adapted DNN system (Veseley [Abstract]).
Regarding claims 2, 9 and 16, Zeljkovic, as modified above, teaches the electronic device, intelligent server, and operating method of claims 1, 8 and 15.
Zeljkovic, as modified above, teaches the enrolled speaker.
Zeljkovic, as modified above, does not teach, however Veseley teaches
wherein the second ASR model is configured by selectively adding an adapter layer configured for personalization of [the enrolled speaker] to the first ASR model (Veseley in [page 5, column 2, section 4.5] teaches the SSNN consists of 2 Tanh layers with 512 neurons and a 600-dimensional layer with linear output, and in [Abstract] teaches using a SSNN speaker adaptation method).
Veseley is considered to be analogous to the claimed invention because it is in the same field of Sequence Summarizing Neural Networks (SSNNs). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic, as modified above, further in view of Veseley to allow for using a Sequence Summarization Neural Network (SSNN). Motivation to do so would allow for appending both the i-vector and summary vector to the FBANK features which leads to additional improvement comparable to the performance of FMLLR adapted DNN system (Veseley [Abstract]).
Regarding claims 3, 10 and 17, Zeljkovic, as modified above, teaches the electronic device, intelligent server, and operating method of claims 1, 8 and 15.
Zeljkovic further teaches
wherein the at least one processor is further configured to execute the at least one instruction to:
obtain a voice feature based on the voice input (Zeljkovic in [0003] teaches creating a voice print for the user),
and
perform the first speech recognition from the voice feature based on the first ASR model (Zeljkovic in [0030] teaches performing speech recognition using the voice print).
Regarding claims 4, 11 and 18, Zeljkovic, as modified above, teaches the electronic device, intelligent server, and operating method of claims 1, 8 and 15.
Zeljkovic further teaches
wherein the at least one processor is further configured to execute the at least one instruction to:
[claim 18 only] wherein the performing the second speech recognition comprises:
obtain a voice feature based on the voice input (Zeljkovic in [0003] teaches creating a voice print for the user).
Zeljkovic, as modified above, does not teach, however Veseley teaches
obtain an SSN adaptation feature from the voice feature based on the SSN (Veseley in [page 3, column 2, paragraph 2] teaches using i-vectors as extra input features, replacing the i-vector extractor by a Sequence Summarizing Neural Network (SSNN), and similarly to i-vector extractor, the SSNN produces a “summary vector”, which represents acoustic summary of an utterance, where the “summary vector” is obtained by enclosing a sequence-averaging operation into the last component of the SSNN),
and
perform the second speech recognition from the SSN adaptation feature based on the second ASR model (Veseley in [page 4, column 2, section 4] teaches performing speech recognition, and in [page 5, column 2, section 4.5] teaches using a Sequence Summarization Neural Network (SSNN)).
Veseley is considered to be analogous to the claimed invention because it is in the same field of Sequence Summarizing Neural Networks (SSNNs). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic, as modified above, further in view of Veseley to allow for using a Sequence Summarization Neural Network (SSNN). Motivation to do so would allow for appending both the i-vector and summary vector to the FBANK features which leads to additional improvement comparable to the performance of FMLLR adapted DNN system (Veseley [Abstract]).
Claims 5, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zeljkovic, in view of Veseley, in view of Yu et al. (US Patent Pub. No. 20220122586 A1), hereinafter Yu.
Regarding claims 5, 12 and 19, Zeljkovic, as modified above, teaches the electronic device, intelligent server, and operating method of claims 1, 8 and 15.
Zeljkovic, as modified above, teaches the first ASR model.
Zeljkovic, as modified above, does not teach, however Yu teaches
wherein the first ASR model comprises one of
a transformer-based ASR model,
a conformer-based ASR model,
or
a recurrent neural network (RNN)-transducer-based ASR model (Yu in [0007] teaches using a speech recognition model which includes at least one of a recurrent neural-transducer (RNN-T) model, a Transformer-Transducer model, a Convolutional Network-Transducer (ConvNet-Transducer) model, or a Conformer-Transducer model).
Yu is considered to be analogous to the claimed invention because it is in the same field of speech recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic, as modified above, further in view of Yu to allow for using a speech recognition model which includes various ASR sub-models. Motivation to do so would allow for a small computational footprint and utilizes less memory requirements than conventional ASR architectures (Yu [0028]).
Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Zeljkovic, in view of Veseley, in view of Itoh et al. (US Patent No. 5799275 A), hereinafter Itoh, in view of Okabe et al. (US Patent Pub. No. 20230282217 A1), hereinafter Okabe.
Regarding claims 6 and 13, Zeljkovic, as modified above, teaches the electronic device and intelligent server of claims 1 and 8.
Zeljkovic, as modified above, does not teach, however Itoh teaches
wherein the at least one processor is further configured to execute the at least one instruction to:
perform partial speech recognition on the voice input that is sequentially input, [based on the first ASR model] (Itoh in [col 1 ln 6-19] teaches performing speech recognition on input of a partial speech sentence where the input is sequentially input).
Itoh is considered to be analogous to the claimed invention because it is in the same field of speech recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic, as modified above, further in view of Itoh to allow for performing speech recognition on input of a partial speech sentence. Motivation to do so would allow for a speech recognizing device for enabling a hearer to recognize a speaker's speech summary by means of voice reproduction of synthesized content (Itoh [col 13 ln 33-36]).
Zeljkovic, as modified above, teaches performing the speaker verification.
Zeljkovic, as modified above, does not teach, however Okabe teaches
based on the voice input being entirely input, [perform the speaker verification] (Okabe in [0056, Fig. 5] teaches determining the end of voice input before proceeding to a next step).
PNG
media_image1.png
278
479
media_image1.png
Greyscale
Okabe is considered to be analogous to the claimed invention because it is in the same field of speaker verification. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic, as modified above, further in view of Okabe to allow for determining the end of voice input before proceeding to a next step. Motivation to do so would allow for improving the accuracy of the speaker verification even under a noise environment (Okabe [0054]).
Claims 7, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zeljkovic, in view of Veseley, in view of Flaks et al. (US Patent Pub. No. 20180315417 A1), hereinafter Flaks.
Regarding claims 7, 14 and 20, Zeljkovic, as modified above, teaches the electronic device, intelligent server, and operating method of claims 2, 9 and 16.
Zeljkovic, as modified above, does not teach, however Veseley teaches
wherein the SSN and the adapter layer are trained (Veseley in [page 3, column 2, paragraph 2] teaches the “summary vector” is obtained by enclosing a sequence-averaging operation into the last component of the SSNN, and the “summary vector” is then appended to the input of main network, and both networks are trained together, while the gradients for SSNN are calculated by back-propagating through the main network, which is trained to optimize a single loss function).
Veseley is considered to be analogous to the claimed invention because it is in the same field of Sequence Summarizing Neural Networks (SSNNs). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic, as modified above, further in view of Veseley to allow for using a Sequence Summarization Neural Network (SSNN). Motivation to do so would allow for appending both the i-vector and summary vector to the FBANK features which leads to additional improvement comparable to the performance of FMLLR adapted DNN system (Veseley [Abstract]).
Zeljkovic, as modified above, teaches performing the speaker verification.
Zeljkovic, as modified above, does not teach, however Flaks teaches
based on transcript data obtained by decoding a user voice that passes speaker verification (Flaks in [0007] teaches selecting a subset of transcriptions based on factors that may include confidence score [factor may be passes speaker verification], and uses the selected subset of transcriptions to re-train the ASR model).
Flaks is considered to be analogous to the claimed invention because it is in the same field of speech recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zeljkovic, as modified above, further in view of Flaks to allow for using a selected subset of transcriptions to train an ASR model. Motivation to do so would allow by continuously retraining the ASR model, the system is able to provide ever faster and more accurate text transcriptions of detected speech activity (Flaks [Abstract]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J. MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 9:00am-5:00pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
PAUL MUELLER
Examiner
Art Unit 2657
/PAUL J. MUELLER/Examiner, Art Unit 2657