Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 13 is rejected under 35 U.S.C. 101 because:
The claimed invention is directed to non-statutory subject matter.
In reference to:
2106.07 Formulating and Supporting Rejections For Lack Of Subject Matter Eligibility
“The evaluation of whether the claimed invention qualifies as patent-eligible subject matter should be made on a claim-by-claim basis, because claims do not automatically rise or fall with similar claims in an application. For example, even if an independent claim is determined to be ineligible, the dependent claims may be eligible because they add limitations that integrate the judicial exception into a practical application or amount to significantly more than the judicial exception recited in the independent claim. And conversely, even if an independent claim is determined to be eligible, a dependent claim may be ineligible because it adds a judicial exception without also adding limitations that integrate the judicial exception or provide significantly more”
Therefore, in view of 2106.07, as per dependent claim 13, the language “computer program product” does not place the claimed subject matter into statutory form.
In reference to 2106.03:
“…examples of claims that are not directed to any of the statutory categories include:
…Products that do not have a physical or tangible form, such as information (often referred to as “data per se”) or a computer program per se (often referred to as “software per se”) when claimed as a product without any structural recitations;”
“As the courts' definitions of machines, manufactures and compositions of matter indicate, a product must have a physical or tangible form in order to fall within one of these statutory categories. Digitech, 758 F.3d at 1348, 111 USPQ2d at 1719.”
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 8-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
“calculator” while supported in the specification does not allow one of ordinary skill in the art to definitely interpret the claims, a calculator, whether “local” or “main”, as claimed, can pertain to various types. It is unclear as to what such a calculator pertains to, for example, hardware, processor, etc.
Claim 12 recites the limitation "central" in “central calculator”. There is insufficient antecedent basis for this limitation in the claim. For purposes of prior art “central” will be construed as “main”. However, in lieu of giving an objection for informalities, because of the nature of the claim, the existence of multiple “calculators” creates an indefinite circumstance based on antecedent basis, thus it is unclear as to which original calculator “central” refers to.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 1-3 and 5-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jung et al, Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention, Oct 25 2020, INTERSPEECH 2020, pages 931-935 (hereinafter Jung) in view of US 20190273767 A1 Nelson; Steven et al (hereinafter Nelson).
Re claim 1, Jung teaches
1. (Currently Amended) A method (100) for analysing a noisy sound signal for the recognition of at least one group of command keywords and a speaker of the noisy sound signal analysed, the noisy sound signal to be analysed being recorded by at least one microphone and the method comprising: (SV [speaker verification] and KWS [keyword spotting] requires a way of recording a user or recordings already stored, Abstract)
- constituting a training database comprising the following sub-steps of: (deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1)
- for each speaker to be recognised, recording at least one noiseless sound signal spoken by the speaker; (clean speech recorded, section 3.1)
- recording, by the microphone the environmental noise the environmental noise being a noise generated by the speaker's sound environment; (noise component can be isolated and removed as in section 2.1, and then injecting or corrupting a signal with recorded or same noise added to the existing speech signal, additionally the original signal is clean per se but can contain the similar injected noise to simulate challenging environments, section 3.1)
- for each noiseless sound signal recorded, adding the noise recorded to the noiseless sound signal to obtain a noisy sound signal ; (noise component can be isolated and removed as in section 2.1, and then injecting or corrupting a signal with recorded or same noise added to the existing speech signal, additionally the original signal is clean per se but can contain the similar injected noise to simulate challenging environments, section 3.1)
- for each noisy sound signal obtained, calculating a sound signature of the noisy sound signal obtained (using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2)
- for each sound signature calculated, associating the sound signature calculated with the speaker who spoke the corresponding noiseless sound signal and with at least one group of command keywords (clean speech recorded, section 3.1… comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
- supervised training of an artificial neural network on the training database constituted to obtain an artificial neural network trained capable of providing, from a sound signature obtained from a noisy sound signal, a prediction of speaker and at least one prediction of command keyword group; (Since CTC labeling and classification is involved, we realize that supervised learning is taking place in sec 2.2, combined with the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
- calculating a sound signature of the noisy sound signal analysed; (using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2, calculating verification using noisy speech as in sec 3.3. with Table 2)
- using the artificial neural network trained on the sound signature calculated to obtain a prediction of speaker and at least one prediction of command keyword group. (verifying the speaker, Introduction, and the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
However, although a processor or hardware in some capacity is required to check the speech of a user for verification, Nelson has been incorporated for clarity to emphasize the otherwise inherent need for a microphone, thus Jung failing to teach expressly and semantically by term a “microphone” per se:
a microphone (Nelson 0521 microphone to record speech, and multiple microphones to track a moving speaker 0398 with fig. 17d by isolating noise from a speaker 0506 to identify one or more commands spoken in a single utterance as well as the beginning and end of one or more commands using including silence gaps and end events 0487 with 0525… in the context of machine learning 0452)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Jung to incorporate the above claim limitations as taught by Nelson to allow for a simple substitution of one known element, such as the manner in which speech is collected in Jung, for another, as in the express microphone, otherwise inherent, in Nelson, to obtain predictable results, thus improving the system of Jung for clarity to use a microphone for collection of audio per se, as well as the ability in Nelson to reduce noise in real-time and track changing audio directions with a multi-microphone approach if needed, handling commands to know the start and end of speech commands, and machine learning analogous to Jung.
Re claim 13, this dependent claim has been rejected for teaching a broader, or narrower claim based on general inclusion of hardware alone (e.g. processor, memory, instructions), representation of claim 1 omitting/including hardware for instance, otherwise amounting to a virtually identical scope.
Re claim 14, this dependent claim has been rejected for teaching a broader, or narrower claim based on general inclusion of hardware alone (e.g. processor, memory, instructions), representation of claim 1 omitting/including hardware for instance, otherwise amounting to a virtually identical scope.
Re claim 2, Jung teaches
2. (Currently Amended) The method according to claim 1, wherein the artificial neural network trained is further capable of providing, from a sound signature, a prediction of activation binary relating to the detection or non-detection of at least one group of activation keywords, each sound signature of the training database being further associated with an activation binary, the step of using of the artificial neural network trained making it possible to further obtain a prediction of activation binary. (yes/no is a wake word detected? Wherein a wake word is an activation command or any command that invokes an action, Introduction… verifying the speaker, Introduction, and the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
Re claim 3, Jung teaches an artificial network such as DNN/CNN as well as wake words but fails to teach termination per se, and thus fails to teach the concept of termination keywords as a termination binary:
3. (Currently Amended) The method according to claim 1, wherein the artificial neural network trained is further capable of providing, from a sound signature (using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2), a prediction of termination binary relating to the detection or non-detection of at least one group of termination keywords (Jung KWS, Introduction), each sound signature of the training database being further associated with a termination binary, using of the artificial neural network (Jung DNN, Introduction) trained making it possible to further obtain a prediction of termination binary. (Nelson 0521 microphone to record speech, and multiple microphones to track a moving speaker 0398 with fig. 17d by isolating noise from a speaker 0506 to identify one or more commands spoken in a single utterance as well as the beginning and end of one or more commands using including silence gaps and end events 0487 with 0525… in the context of machine learning 0452)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Jung to incorporate the above claim limitations as taught by Nelson to allow for use of a known technique of start and stop or termination condition, for detection in commands, to improve similar devices such as for speech recognition focusing on speaker verification in the same way with enhanced microphone arrays as well as algorithms for complex speech detection, thus improving the system of Jung for clarity to use a microphone for collection of audio per se, as well as the ability in Nelson to reduce noise in real-time and track changing audio directions with a multi-microphone approach if needed, handling commands to know the start and end of speech commands, and machine learning analogous to Jung.
Re claim 5, Jung teaches
5. (Currently Amended) The method (100) according to claim 1 any of the preceding claims, wherein at least one noiseless sound signal recorded during the step (noise component can be isolated and removed as in section 2.1, and then injecting or corrupting a signal with recorded or same noise added to the existing speech signal, additionally the original signal is clean per se but can contain the similar injected noise to simulate challenging environments, section 3.1)…
However, while a database or network per se, is taught for training to verify a speaker, Jung fails to teach a moving speaker:
… of constituting of the training database is spoken by a moving speaker. (Nelson 0521 microphone to record speech, and multiple microphones to track a moving speaker 0398 with fig. 17d by isolating noise from a speaker 0506 to identify one or more commands spoken in a single utterance as well as the beginning and end of one or more commands using including silence gaps and end events 0487 with 0525… in the context of machine learning 0452)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Jung to incorporate the above claim limitations as taught by Nelson to allow for use of a known technique of multi-microphones to track a moving user/speaker’s sound for speech detection of commands, to improve similar devices such as for speech recognition focusing on speaker verification in the same way with enhanced microphone arrays as well as algorithms for complex speech detection, thus improving the system of Jung for clarity to use a microphone for collection of audio per se, as well as the ability in Nelson to reduce noise in real-time and track changing audio directions with a multi-microphone approach if needed, handling commands to know the start and end of speech commands, and machine learning analogous to Jung.
Re claim 6, Jung teaches
6. (Currently Amended) The method according to claim 1 any of the preceding claims, wherein the training database is updated on request, at regular intervals, or automatically after detection of a change in the sound environment of the microphone. (DNN/CNN automatically train when new events are detected including noisy speech input… verifying the speaker, Introduction, and the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
Re claim 7, Jung teaches
7. (Currently Amended) The method according to claim 6, wherein the step of supervised training of the artificial neural network is carried out as soon as the training database is updated. (Since CTC labeling and classification is involved, we realize that supervised learning is taking place in sec 2.2, combined with the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
Re claim 8, Jung teaches
8. (Currently Amended) A system for implementing the method according to claim 1 , comprising:
— at least one microphone (201) configured to record noisy or noiseless sound signals and the environmental noise; (noise component can be isolated and removed as in section 2.1, and then injecting or corrupting a signal with recorded or same noise added to the existing speech signal, additionally the original signal is clean per se but can contain the similar injected noise to simulate challenging environments, section 3.1))
- at least one local calculator configured to: (sub-network of figure 2-d with embedding or signature or representation per se)
- calculate sound signatures from noisy sound signals obtained via at least one microphone ; (noise component can be isolated and removed as in section 2.1, and then injecting or corrupting a signal with recorded or same noise added to the existing speech signal, additionally the original signal is clean per se but can contain the similar injected noise to simulate challenging environments, section 3.1))
— use the artificial neural network trained on sound signatures calculated; (the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
- at least one main calculator (202-2) configured to: (The DNN/CNN model itself contained in the DNN/CNN overall or the DNN/CNN itself in figure 1, and is the database driven calculator which contains sub-network per se of figure 2, Introduction and sec 2.1)
— constitute the training database from sound signatures calculated by the local calculator (202-1); (DNN uses the model and sub-networks of fig. 2a-d, verifying the speaker, Introduction, and the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2)
— train in a supervised manner the artificial neural network on the training database constituted. (Since CTC labeling and classification is involved, we realize that supervised learning is taking place in sec 2.2, combined with the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
However, although a processor or hardware in some capacity is required to check the speech of a user for verification, Nelson has been incorporated for clarity to emphasize the otherwise inherent need for a microphone, thus Jung failing to teach expressly and semantically by term a “microphone” per se as:
a microphone (Nelson in a server capable system 0433 and 0539 using data in a non-local fasion, 0521 microphone to record speech, and multiple microphones to track a moving speaker 0398 with fig. 17d by isolating noise from a speaker 0506 to identify one or more commands spoken in a single utterance as well as the beginning and end of one or more commands using including silence gaps and end events 0487 with 0525… in the context of machine learning 0452)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Jung to incorporate the above claim limitations as taught by Nelson to allow for a simple substitution of one known element, such as the manner in which speech is collected in Jung, for another, as in the express microphone, otherwise inherent, in Nelson, to obtain predictable results, thus improving the system of Jung for clarity to use a microphone for collection of audio per se, including additional memory/data that can be accessed at a server if less common or heavier data processing is needed, while using a network locally, as well as the ability in Nelson to reduce noise in real-time and track changing audio directions with a multi-microphone approach if needed, handling commands to know the start and end of speech commands, and machine learning analogous to Jung.
Re claim 9, Jung teaches
9. (Currently Amended) The system according to claim 8, further comprising at least one storage device configured to store each noiseless sound signal recorded. (clean speech recorded, section 3.1… data is clearly stored in at least Table 2, and verifying the speaker, Introduction, and the use of DNN/CC deep neural network and CNN automatically learn and have a database of training data, Introduction and sec 2.1… using embedding vectors as another representation/signature, section 2.4 and also figures 1 and 2… using comparison thereof using KWS and SV as discrimination tasks that make a decision given a score between embeddings of enrollment and test utterance, section 3.3 with introduction)
Re claim 10, although a processor or hardware in some capacity is required to check the speech of a user for verification, Nelson has been incorporated for clarity to emphasize the otherwise inherent need for a microphone, thus Jung failing to teach expressly and semantically by term a “microphone” per se:
10. (Currently Amended) The system according to claim 8 , comprising a plurality of independent or coupled microphones . (Nelson 0521 microphones as an array to record speech, and multiple microphones to track a moving speaker 0398 with fig. 17d by isolating noise from a speaker 0506 to identify one or more commands spoken in a single utterance as well as the beginning and end of one or more commands using including silence gaps and end events 0487 with 0525… in the context of machine learning 0452)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Jung to incorporate the above claim limitations as taught by Nelson to allow for a simple substitution of one known element, such as the manner in which speech is collected in Jung, for another, as in the express microphone, otherwise inherent, in Nelson, to obtain predictable results, thus improving the system of Jung for clarity to use a microphone for collection of audio per se, as well as the ability in Nelson to reduce noise in real-time and track changing audio directions with a multi-microphone approach if needed, handling commands to know the start and end of speech commands, and machine learning analogous to Jung.
Re claim 11, although a processor or hardware in some capacity is required to check the speech of a user for verification, Nelson has been incorporated for clarity to emphasize the otherwise inherent need for a microphone, thus Jung failing to teach expressly and semantically by term a “microphone” per se:
11. (Currently Amended) The system according to claim 8, comprising one local calculator (sub-network of figure 2-d with embedding or signature or representation per se) … per microphone (Nelson 0521 microphone to record speech, and multiple microphones to track a moving speaker 0398 with fig. 17d by isolating noise from a speaker 0506 to identify one or more commands spoken in a single utterance as well as the beginning and end of one or more commands using including silence gaps and end events 0487 with 0525… in the context of machine learning 0452)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Jung to incorporate the above claim limitations as taught by Nelson to allow for use of a known technique of multi-microphones to provide sensor data for machine learning, with each sensor containing the sub-network of Jung per context e.g. image sensor versus speech sensor, and not limited to one, where the sub-network can perform the operations to deliver the data thereof to the overall DNN of Jung, to track a moving user/speaker’s sound for speech detection of commands, to improve similar devices such as for speech recognition focusing on speaker verification in the same way with enhanced microphone arrays as well as algorithms for complex speech detection, thus improving the system of Jung for clarity to use a microphone for collection of audio per se, as well as the ability in Nelson to reduce noise in real-time and track changing audio directions with a multi-microphone approach if needed, handling commands to know the start and end of speech commands, and machine learning analogous to Jung.
Re claim 12,
12. (Currently Amended) The system according to claim 8 , wherein the local calculator and the central calculator correspond to a single calculator (The DNN/CNN model itself contained in the DNN/CNN overall in figure 1, is the database driven calculator which contains sub-network per se of figure 2, Introduction and sec 2.1)
Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jung et al, Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention, Oct 25 2020, INTERSPEECH 2020, pages 931-935 (hereinafter Jung) in view of US 20190273767 A1 Nelson; Steven et al (hereinafter Nelson) and further in view of US 20200152198 A1 CHUNG; Ji-hye et al. (hereinafter Chung).
Re claim 4, while Jung teaches an AI network such as DNN/CNN including a database as the model itself for training, for keyword KWS and SV operations, it fails to teach compound commands per se, thus the combination failing to teach link binary and link keywords:
4. (Currently Amended) The method (100) according to claim 1 , wherein the artificial neural network trained is further capable of providing, from a sound signature, at least one prediction of link binary relating to the detection or non-detection of at least one group of link keywords, each sound signature of the training database being further associated with at least one link binary and, if the value of the link binary corresponds to the detection of at least one group of link keywords, with at least one second group of command keywords the step of using of the artificial neural network trained making it possible to further obtain a prediction of link binary and at least one prediction of second group of command keywords. (CHUNG multiple commands linked by the word AND for instance 00632 0089 and 0173)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Jung to view of Nelson to incorporate the above claim limitations as taught by Chung to allow for use of a known technique of speech command recognition but for a compound command in one utterance regardless if in-context or distinct commands, to improve similar devices such as for speech recognition focusing on speaker verification in the same way but including multiple commands at once, thus improving the system of Jung for the handling of a long and complex input sentence that may be divided and restored into a plurality of short sentences, and thus performance of the system may be improved and faster thereof, otherwise processing speech input analogous with Jung but now faster as it can handle the same input but in one microphone session or single utterance.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20220101847 A1 Receveur; Timothy J. et al.
Compound speech commands
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C COLUCCI whose telephone number is (571)270-1847. The examiner can normally be reached on M-F 9 AM - 5 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655 (571)-270-1847
Examiner FAX: (571)-270-2847
Michael.Colucci@uspto.gov