DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections – 35 USC § 101
2. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Independent claims 1 and 13 recite a method comprising:
analyzing input audio voice signals and generating speech parameters, converting measurements into speech data, presenting one or more interactive speech exercises to users based on the speech data, and storing the speech data;
receiving speech data and providing users with reports, the reports adapted to provide at least one score through which to aid users at improving speaking performance;
the score including at least one variable indicating at least one or more of: pitch, rate of speech, speech intensity, shape of vocal pulsation, voicing, magnitude profile, pitch, pitch strength, phonemes. rhythm of speech, harmonic to noise values, cepstral peak prominence, spectral slope, shimmer, and jitter; and
score assessments including measures from at least one or more linguistic rules from a group of: phonology, phonetics, syntactic, semantics, and morphology.
The limitations of analyzing voice signals, generating speech parameters, converting measurements, presenting speech exercises, receiving speech data and providing reports, as drafted, constitutes a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, akin to the abstract idea of “collecting information, analyzing it, and providing certain results of the collection in analysis” in Electric Power Group. That is, other than reciting a processor system with user interface and a memory adapted to capture and process the signals and output the score, nothing in the claim elements precludes the steps from practically being performed in the mind. For example, but for the “processor” language, “analyzing”, “generating”, “converting”, “presenting”, “receiving”, and “providing” in the context of this claim encompasses a user manually observing a user’s speech, analyzing it and providing exercises and a score, for example orally or using a pen and paper. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim recites using generic computer components (processor, user interface and memory) to perform the claimed steps. These components are recited at a high-level of generality (i.e., as a generic processor, interface and memory performing generic computer functions of receiving audio signals, analyzing them and providing results of the analysis) such that it amounts no more than mere instructions to apply the exception using generic computer components. The claims further recite a machine learning algorithm is adapted to receive the speech data and provide the reports. This generic machine learning algorithm similarly represents no more than the use of a computer programmed to perform the abstract idea. Alternatively, the use of machine learning only generally links the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a generic processor, user interface and memory to perform the claimed steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Furthermore, as noted above the use of a machine learning algorithm represents use of a computer programmed to perform the abstract idea, or generally linking the use of the judicial exception to a particular technological environment or field of use. The claim is not patent eligible.
Dependent claims 2-12 and 14-21 recite the same abstract idea as in their respective parent claims, and only recite additional abstract speech signal processing being performed by generic computer components (e.g. client/server, edge computing, processing modules). Accordingly, these do not recite additional limitations sufficient to direct the claimed invention to significantly more.
Claim Rejections - 35 USC § 102
3. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
4. Claims 1-3, 5-9, 12-18 and 21 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yu (US 2009/0258333 A1).
Regarding claims 1-3, 5-9, 12-18 and 21, Yu discloses a speech recognition technology system for delivering speech therapy (see Par. 5) comprising:
at least one processor system, at least one memory system, and at least one user interface disposed on at least one user computer system, the user computer system adapted to be operationally coupled to at least one server computer system (client/server system including a laptop or mobile computing device - Par. 21);
at least one input system disposed on the user computer system adapted to, substantially in real time, capture, process, and analyze audio voice signals (e.g. using microphone and PC with internet access - Par. 51);
a processor system disposed on at least one or more of the user computer system and the server computer system, the processor system operating as at least one or more of a speech processor adapted to analyze input audio voice signals and generate speech parameters and a feedback processor adapted to convert measurements generated by the speech processor into speech data (Par. 46), the processor system adapted to present one or more interactive speech exercises (e.g. instructions) to users based on the speech data, and the memory adapted to store the speech data (speech exercises such as prompts for the user to respond to in a certain language – Par. 52, 10, 46);
at least one software program disposed on the at least one or more of the user computer system and the server computer system, the software program including at least one machine learning algorithm adapted to receive speech data from the processor, the machine learning algorithm adapted to provide users with reports, the reports adapted to provide at least one score through which to aid users at improving speaking performance (acoustic and linguistic patterns collected and analyzed using machine learning to generate scores – Par. 49);
the score including at least one variable indicating at least one or more of: pitch, rate of speech, speech intensity, shape of vocal pulsation, voicing, magnitude profile, pitch, pitch strength, phonemes, rhythm of speech, harmonic to noise values, cepstral peak prominence, spectral slope, shimmer, and jitter (pitch – see e.g. Par. 110); and
score assessments including measures from at least one or more linguistic rules from a group of: phonology, phonetics, syntactic, semantics, and morphology (scores calculated in part based on semantic items – Par. 99) (as per claims 1 and 13),
the speech data includes at least one vector having positional, directional, and magnitude measurements (Par. 49) (as per claims 2 and 15),
the speech data includes delta Mel Frequency Cepstral Coefficient (MFCC) vectors (Par. 119) (as per claim 3),
a speech processor arranged to analyze input speech and to output various speech and language parameters, comprising: a processor, the processor arranged with an automatic speech recognition model, the automatic speech recognition model to be loaded with at least one of: a language model; and, an acoustic model (Par. 13); and, a microphone in communication with the processor, wherein the microphone is arranged to collect audio inputs and output the audio inputs to the processor in sequences (Par. 51) (as per claims 5 and 16),
a plurality of processing layers, each of the plurality of processing layers having at least one processing module (Par. 23) (as per claim 6),
one of the plurality of processing layers comprises: a converting layer arranged to convert the output of the microphone into a representation accepted by the processor (Par. 24) (as per claim 7),
one of the plurality of processing layers comprises: a speech enhancement layer including an algorithm arranged to provide at least one of: automatic gain control; noise reduction; and, acoustic echo cancellation (n-gram model to reduce noise – Par. 168) (as per claims 8 and 17),
at least one noise reduction algorithm is adapted to filter speech data (Par. 168) (as per claims 9 and 18),
the user interface adapted to provide feedback by way of text, color, and movable images (Par. 116) (as per claim 12), and
analyzing speech data with at least one machine learning software program (Par. 49) (as per claim 14).
Claim Rejections – 35 USC § 103
5. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
7. Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Yu (US 2009/0258333 A1) in view of Jung et al. (US 2022/0319507 A1).
Regarding claim 4, to the extent that Yu does not explicitly disclose the user computer system and the server computer system are adapted to operate as an edge computing system, further having at least one edge node and at least one edge data center, Jung discloses utilizing this type of edge computing system in a speech recognition system (Par’s. 56-57). It would have been obvious to one skilled in the art before the effective filing date of the invention to modify the teachings of Yu by utilizing an edge computing system, as suggested by Jung, to obtain predictable results of reducing latency in the system.
8. Claims 10, 11, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yu (US 2009/0258333 A1) in view of Chen et al. (US 2021/0050001 A1).
Regarding claims 10, 11, 19 and 20, Yu does not appear to explicitly disclose using a neural network adapted to predict which parts of spectrums to attenuate (as per claims 10 and 19), and an automatic speech recognition module is adapted to predict a sequence of text items in real time wherein the text predictions are updated based on results and variance from predictions (as per claims 11 and 20). However, Chen discloses a speech scoring and diagnosis system (see abstract) that utilizes a neural network for spectrum attenuation (see Par’s. 34, 46, 48), and wherein text predictions are updated based on variance in previous results (Par. 46). It would have been obvious to one skilled in the art before the effective filing date of the invention to modify the teachings of Yu by utilizing these features of Chen, to obtain predictable results of enhancing the efficiency of the speech recognition and scoring.
Conclusion
9. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See attached PTO-892.
10. Any inquiry concerning this communication or earlier communications from the examiner should be directed to PETER EGLOFF whose telephone number is (571)270-3548. The examiner can normally be reached on Monday - Friday 9:00 am - 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xuan Thai can be reached at (571) 272-7147. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Peter R Egloff/
Primary Examiner, Art Unit 3715