DETAILED ACTION
This action is pursuant to claims filed on 12/23/2025. Claims 1-5, 7-11, 14-17, 19-20, and 61-67 are pending. An action on the merits with respect to claims 1-5, 7-11, 14-17, 19-20, and 61-67 is as follows.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/23/2025 has been entered.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 67 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 67, the claim recites the limitation “one or more machine learning models” in line 2. It is unclear if this limitation is meant to refer to the one or more machine learning models from claim 1, or different machine learning models. If it is referring to the one or more machine learning models from claim 1, it needs to refer back to it. If it is referring to different machine learning models, it needs to be distinguished from the machine learning models from claim 1. For purposes of examination, it is being interpreted as referring to the one or more machine learning models from claim 1.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 7-11, 14-17, 19-20, and 65-66 are rejected under 35 U.S.C. 103 as being unpatentable over Shrivastav (US 20120265024) in view of Ispahani (US 20210064327).
Regarding independent claim 1, Shrivastav teaches a device for assessing speech changes resulting from respiratory tract function ([0003]: “The present disclosure is directed to screening for neurological and other diseases and medical states using speech behavior as a biomarker, and systems, applications, and methods for accomplishing the same.”), the device comprising:
audio input circuitry configured to provide an input signal that is indicative of speech provided by a user to assess a physiological status ([0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject”), the input signal including a text-acoustic alignment dataset derived from transcribed speech of the user ([0018]: “The biomarkers may be determined, for example, from acoustic analyses of the speech signal, by the application of an automatic speech recognition system including large vocabulary systems, phoneme detection, word spotting engines or the like, and the application of syntactical coding or transcription on input speech”).
However, Shrivastav does not teach the transcribed speech of the user aligned with timepoints.
Ispahani discloses a system and method for processing digital audio data. Specifically, Ispahani teaches the transcribed speech of the user aligned with timepoints ([0015]: “the system and method of the present invention accepts as its input a digital audio stream and a set of one or more time intervals in the audio stream for which the speech therein shall be transcribed as text data. The set of one or more time intervals may include the entire audio stream from start to finish. In one or more embodiments, the system and method of the present invention provides as its output a log file containing the transcribed text along with one or more timestamps that link the transcribed text with its corresponding position in the audio stream”). Shrivastav and Ispahani are analogous arts as they are both systems used to process speech recordings.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include the timepoints from Ispahani into the system from Shrivastav as it allows the system to state where the speech aspects are in the audio recording, which can provide a more accurate and comprehensive analysis of the user’s speech.
The Shrivastav/Ispahani combination teaches signal processing circuitry (Shrivastav, [0045]: “an identification device 200 used as an analytical tool can include … a processor”) configured to:
receive the input signal (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”);
process the input signal perform computerized signal analysis of the input signal over time and frequency domains (Shrivastav, [0041]: “one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration; fricative noise characteristics; stop burst duration; burst spectral characteristics; changes in speaking rate within or across phrases/sentence; changes in formant frequencies; and changes in formant frequency transitions. In addition, one or more acoustic measures for neurological and other diseases can include, but are not limited to, measures of aspiration noise, frequency and intensity perturbation; signal-to-noise (SNR) ratios; changes in pitch over time; changes in loudness over time; and/or other temporal and/or spectral characteristics of a speech sample(s). The one or more acoustic measures also can include a measure of partial loudness. In one embodiment, acoustic measures associated with neurological and other diseases can include a measure of low frequency periodic energy, a measure of high frequency aperiodic energy, and/or a measure of partial loudness of a periodic signal portion of the speech sample. The acoustic measure of the speech sample can further include a measure of noise in the speech sample and a measure of partial loudness of the speech sample. Of course, embodiments are not limited thereto”; [0042]: “the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics. These may be determined through analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.), grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words”) to generate an instantaneous multi-dimensional statistical signature of speech production abilities of the user from the input signal (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”; [0039]: “According to an embodiment of the invention, acoustic biomarkers can be recorded and a patient can be monitored over a period of time (such as a few days to several years). A comparison with look-up tables or a rapid change in specific biomarkers can indicate a greater likelihood of a disease.”; [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision. These include, but are not limited to, discriminant analyses, regression, hidden Markov-models, support-vector machines, and neural networks.”; [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”; [0018]: “These quantifiable measures of acoustic characteristics of a person's speech provide one or more biomarkers indicative of a likelihood of disease onset and/or stage of degeneration. The biomarkers may be determined, for example, from acoustic analyses of the speech signal, by the application of an automatic speech recognition system including large vocabulary systems, phoneme detection, word spotting engines or the like, and the application of syntactical coding or transcription on input speech”; [0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject (S210 of FIG. 2B). The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B). The processor 202 can determine a health state of the subject based upon the results of the comparison or by tracking the rate of change in specific baseline acoustic measures (S240 of FIG. 2B). The processor 202 can then output a diagnosis”) that maps a plurality of physiological aspects of speech production defined from the text-acoustic alignment dataset to quantifiable acoustic measurements, wherein to generate the instantaneous multi-dimensional statistical signature, the signal processing circuitry identifies an acoustic manifestation associated with one or more perceptual dimensions, the acoustic manifestation attributable to the physiological status (Shrivastav, [0018]: “These quantifiable measures of acoustic characteristics of a person's speech provide one or more biomarkers indicative of a likelihood of disease onset and/or stage of degeneration. The biomarkers may be determined, for example, from acoustic analyses of the speech signal, by the application of an automatic speech recognition system including large vocabulary systems, phoneme detection, word spotting engines or the like, and the application of syntactical coding or transcription on input speech”; [0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject (S210 of FIG. 2B). The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B). The processor 202 can determine a health state of the subject based upon the results of the comparison or by tracking the rate of change in specific baseline acoustic measures (S240 of FIG. 2B). The processor 202 can then output a diagnosis”; [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”), and
generates a corresponding interpretable metric (Shrivastav, [0041]: “one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration; fricative noise characteristics; stop burst duration; burst spectral characteristics; changes in speaking rate within or across phrases/sentence; changes in formant frequencies; and changes in formant frequency transitions. In addition, one or more acoustic measures for neurological and other diseases can include, but are not limited to, measures of aspiration noise, frequency and intensity perturbation; signal-to-noise (SNR) ratios; changes in pitch over time; changes in loudness over time; and/or other temporal and/or spectral characteristics of a speech sample(s). The one or more acoustic measures also can include a measure of partial loudness. In one embodiment, acoustic measures associated with neurological and other diseases can include a measure of low frequency periodic energy, a measure of high frequency aperiodic energy, and/or a measure of partial loudness of a periodic signal portion of the speech sample. The acoustic measure of the speech sample can further include a measure of noise in the speech sample and a measure of partial loudness of the speech sample. Of course, embodiments are not limited thereto”; [0042]: “the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics. These may be determined through analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.), grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words”; [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision”. The analysis of the acoustic measures and biomarkers are the corresponding metric.), wherein the signal processing circuitry:
extracting extracts acoustic features from the text-acoustic alignment dataset via digital signal processing of the input signal (Shrivastav, [0041]: “one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration; fricative noise characteristics; stop burst duration; burst spectral characteristics; changes in speaking rate within or across phrases/sentence; changes in formant frequencies; and changes in formant frequency transitions. In addition, one or more acoustic measures for neurological and other diseases can include, but are not limited to, measures of aspiration noise, frequency and intensity perturbation; signal-to-noise (SNR) ratios; changes in pitch over time; changes in loudness over time; and/or other temporal and/or spectral characteristics of a speech sample(s). The one or more acoustic measures also can include a measure of partial loudness. In one embodiment, acoustic measures associated with neurological and other diseases can include a measure of low frequency periodic energy, a measure of high frequency aperiodic energy, and/or a measure of partial loudness of a periodic signal portion of the speech sample. The acoustic measure of the speech sample can further include a measure of noise in the speech sample and a measure of partial loudness of the speech sample. Of course, embodiments are not limited thereto”; [0042]: “the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics. These may be determined through analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.), grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words”. The biomarkers are the acoustic features).
However, the Shrivastav/Ispahani combination does not teach wherein extracting the acoustic features comprises using the timepoints of the text-acoustic alignment dataset to select corresponding windows of the input signal for the digital signal processing.
Ispahani discloses wherein extracting the acoustic features comprises using the timepoints of the text-acoustic alignment dataset to select corresponding windows of the input signal for the digital signal processing ([0015]: “the system and method of the present invention accepts as its input a digital audio stream and a set of one or more time intervals in the audio stream for which the speech therein shall be transcribed as text data. The set of one or more time intervals may include the entire audio stream from start to finish. In one or more embodiments, the system and method of the present invention provides as its output a log file containing the transcribed text along with one or more timestamps that link the transcribed text with its corresponding position in the audio stream”; [0014]: “select time intervals in the stream for the audio to be transcribed as text, and review and organize the transcribed text”).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include the selection of time intervals from Ispahani into the Shrivastav/Ispahani combination as it allows the device to select specific time intervals instead of requiring the whole time to be processed, which can ensure that only the necessary time intervals are analyzed.
The Shrivastav/Ispahani combination teaches measuring the acoustic features via application of the acoustic features to one or more machine learning models (Shrivastav, [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision”), wherein the corresponding metric is an output of the one or more machine learning models and quantifies the acoustic manifestation and reflects the underlying one or more perceptual dimensions (Shrivastav, [0069]: After performing the speech and/or language analysis, modeling and coding can be performed by the coding module 511 via statistical approaches, machine learning, pattern recognition, or other algorithms to combine information from various biomarkers before reaching a diagnostic decision”; [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”), and
compares the multi-dimensional statistical signature against one or more baseline statistical signatures of speech production ability derived or obtained from the user (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”); and
provides a speech change identification signal attributable to the physiological status of the user, based on the multi-dimensional statistical signature comparison (Shrivastav, [0028]: “the speech and language of a speaker may be monitored over different periods, ranging from a few minutes to several days, weeks, months, or even years. During this monitoring, candidate biomarkers can be tracked to determine their presence/absence or the degree to which these change over time. These data can be compared to some normative database or to some specified criteria, and results of the comparison can be used to predict the likelihood of one or more neurological/neurodegenerative or other disease, such as infectious and/or respiratory disease, condition(s)”); and a notification element in operable communication with the signal processing circuitry, the notification element configured to receive the speech change identification signal and provide at least one notification signal to the user (Shrivastav, [0046]: “The diagnosis can be obtained by a user through the interface 201. The results may be provided via phone, email, text messaging, mail, an attached or networked printer, website interface, or directly on a display screen of the device.”).
Regarding claim 2, the Shrivastav/Ispahani combination teaches the device of claim 1, wherein the multi-dimensional statistical signature spans one or more of the following perceptual dimensions: articulation, prosodic variability, phonation changes, rate, and rate variation (Shrivastav, [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”).
Regarding claim 3, the Shrivastav/Ispahani combination teaches the device of claim 1,wherein the acoustic features include one or more of articulation rate, articulation entropy, vowel space area, energy decay slope, phonatory duration, and average pitch (Shrivastav, [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”; [0032]: “Using acoustic measures as a biomarker involves evaluating changes in various aspects (or subsystems of speech) over time. These subsystems include, but are not limited to, aspects such as articulation (i.e. the way in which various consonants and vowels are produced), the prosody or intonation (i.e. the tone of voice), the voice or vocal quality, overall speech intelligibility (i.e. how much of the message or meaning can be conveyed by the speaker under ideal or non-ideal conditions), the rate of speech and changes in the rate of speech across an utterance, etc. The analyses may also include, but is not limited to, analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.) grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words. The analysis may also evaluate, as an alternative or in addition, the frequency (i.e. the number of occurrences), the intensity (i.e. the strength), or other characteristics of cough during a conversation.”).
Regarding claim 4, the Shrivastav/Ispahani combination teaches the device of claim 3, wherein the signal processing circuitry is configured to compare the multi-dimensional statistical signature against the one or more baseline statistical signatures of speech production ability by comparing each speech feature to a corresponding baseline speech feature of the one or more baseline statistical signatures of speech production ability (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B). The processor 202 can determine a health state of the subject based upon the results of the comparison or by tracking the rate of change in specific baseline acoustic measures (S240 of FIG. 2B). The processor 202 can then output a diagnosis.”; [0083]: “an audio (conversational) stream received via a phone/microphone (e.g., mobile phone, VoIP, internet, etc.) is analyzed by segmenting the audio stream into short windows, computing specific acoustic measures from each window (e.g. mel-frequency cepstral coefficients), comparing the acoustic measures across successive windows, developing and training a machine learning pattern recognition engine to identify acoustic patterns of a cough, and determining the likelihood of a particular window (or set of windows) to contain an instance of cough”; [0036]: “The average of each feature can then be charted against time. For example, the average variability of a fundamental frequency (F.sub.0) can be charted against time over the analysis period and compared against the variability of F.sub.0 from a healthy group.”).
Regarding claim 5, the Shrivastav/Ispahani combination teaches the device of claim 1, wherein the signal processing circuitry is configured to process the input signal utilizing the input signal and additional data comprising one or more of sensor data, a time of day, an ambient light level, a device usage pattern of the user, or a user input (Shrivastav, [0052]: “The user may produce speech samples that correspond to a scheduled time, day, week, or month that repeats at a predetermined frequency. Further analysis of the speech samples can be provided based on potential changes in the speech samples taken at the specified intervals”; [0046]: “baseline acoustic measures can be stored in the memory 203. The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject”); wherein the signal processing circuitry is configured to select or adjust the one or more baseline statistical signatures of speech production ability based on said input signal and additional data (Shrivastav, [0036]: “The average of each feature can then be charted against time. For example, the average variability of a fundamental frequency (F.sub.0) can be charted against time over the analysis period and compared against the variability of F.sub.0 from a healthy group.”; [0038]: “The baseline acoustic measures can be used in diagnostic tools using speech behavior as a biomarker of the onset of the neurological or other disease. In one embodiment, the baseline acoustic measures can be arranged and stored in the form of look-up tables or other organized storage format.”; [0039]: “acoustic biomarkers can be recorded and a patient can be monitored over a period of time (such as a few days to several years). A comparison with look-up tables or a rapid change in specific biomarkers can indicate a greater likelihood of a disease.”; [0046]: “baseline acoustic measures can be stored in the memory 203. The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject”).
Regarding claim 7, the Shrivastav/Ispahani combination teaches the device of claim 1, wherein the device is a mobile computing device operating an application for assessing speech changes resulting from respiratory tract function (Shrivastav, [0049]: “In a specific embodiment utilizing a smartphone, an application (app) on the phone can be accessed and, when selected to run, the app brings up a GUI providing the interface 201 on the screen of the phone. In an embodiment, a speech sample can be recorded by the phone through the phone's microphone. The screening app on the phone may prompt the user to record a sample of their speech and/or request a sample already stored in the phone's memory, which may provide the memory 203 of the identification device 200 when the screening app and baseline acoustic measures are stored entirely on the phone. The screening app can perform the steps to determine the health state of the subject”).
Regarding claim 8, the Shrivastav/Ispahani combination teaches the device of claim 7, wherein the application queries the user periodically to provide a speech sample from which the input signal is derived (Shrivastav, [0049]: “In a specific embodiment utilizing a smartphone, an application (app) on the phone can be accessed and, when selected to run, the app brings up a GUI providing the interface 201 on the screen of the phone. In an embodiment, a speech sample can be recorded by the phone through the phone's microphone. The screening app on the phone may prompt the user to record a sample of their speech and/or request a sample already stored in the phone's memory, which may provide the memory 203 of the identification device 200 when the screening app and baseline acoustic measures are stored entirely on the phone. The screening app can perform the steps to determine the health state of the subject.”; [0057]: “For an internet-based model, 420, speech samples are uploaded regularly by a subject for screening. The subject can be reminded to upload the speech samples in order to test for the biomarkers at regular or periodic intervals”).
Regarding claim 9, the Shrivastav/Ispahani combination teaches the device of claim 7, wherein the application facilitates the user spontaneously providing a speech sample from which the input signal is derived (Shrivastav, [0088]: “The incidence and type of cough behavior and voice quality can be monitored by monitoring mobile phone users' conversations over extended periods of time. According to an embodiment, signal processing algorithms are used to identify cough and voice quality within an audio (speech) stream”; Abstract: “The speech samples can be provided to the device by an intentional action of a user or passively due to the device being in the signal path of the subject's speech.”).
Regarding claim 10, the Shrivastav/Ispahani combination teaches the device of claim 7, wherein the application passively detects changes in speech patterns of the user and initiates generation of the instantaneous multi-dimensional statistical signature of speech production abilities of the user (Shrivastav, [0056]: “For a telephony-based model 410, a subject can provide a speech sample (intentionally or passively) through a telephone service provider to be screened”; [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B). The processor 202 can determine a health state of the subject based upon the results of the comparison or by tracking the rate of change in specific baseline acoustic measures (S240 of FIG. 2B). The processor 202 can then output a diagnosis”).
Regarding claim 11, the Shrivastav/Ispahani combination teaches the device of claim 1, wherein the notification element comprises a display (Shrivastav, [0046]: “The diagnosis can be obtained by a user through the interface 201. The results may be provided via phone, email, text messaging, mail, an attached or networked printer, website interface, or directly on a display screen of the device.”); wherein the signal processing circuitry is configured to show prompts comprising requests for the input signals from the user on the display (Shrivastav, [0049]: “In a specific embodiment utilizing a smartphone, an application (app) on the phone can be accessed and, when selected to run, the app brings up a GUI providing the interface 201 on the screen of the phone. In an embodiment, a speech sample can be recorded by the phone through the phone's microphone. The screening app on the phone may prompt the user to record a sample of their speech and/or request a sample already stored in the phone's memory, which may provide the memory 203 of the identification device 200 when the screening app and baseline acoustic measures are stored entirely on the phone. The screening app can perform the steps to determine the health state of the subject.”; [0057]: “For an internet-based model, 420, speech samples are uploaded regularly by a subject for screening. The subject can be reminded to upload the speech samples in order to test for the biomarkers at regular or periodic intervals”).
Regarding independent claim 14, Shrivastav teaches a method for assessing speech changes resulting from respiratory tract function ([0003]: “The present disclosure is directed to screening for neurological and other diseases and medical states using speech behavior as a biomarker, and systems, applications, and methods for accomplishing the same.”), the method comprising:
receiving an input signal that is indicative of speech provided by a user ([0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject. The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”), the input signal including a text-acoustic alignment dataset derived from transcribed speech of the user ([0018]: “The biomarkers may be determined, for example, from acoustic analyses of the speech signal, by the application of an automatic speech recognition system including large vocabulary systems, phoneme detection, word spotting engines or the like, and the application of syntactical coding or transcription on input speech”).
However, Shrivastav does not teach the transcribed speech of the user aligned with timepoints.
Ispahani discloses a system and method for processing digital audio data. Specifically, Ispahani teaches the transcribed speech of the user aligned with timepoints ([0015]: “the system and method of the present invention accepts as its input a digital audio stream and a set of one or more time intervals in the audio stream for which the speech therein shall be transcribed as text data. The set of one or more time intervals may include the entire audio stream from start to finish. In one or more embodiments, the system and method of the present invention provides as its output a log file containing the transcribed text along with one or more timestamps that link the transcribed text with its corresponding position in the audio stream”). Shrivastav and Ispahani are analogous arts as they are both systems used to process speech recordings.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include the timepoints from Ispahani into the method from Shrivastav as it allows the system to state where the speech aspects are in the audio recording, which can provide a more accurate and comprehensive analysis of the user’s speech.
The Shrivastav/Ispahani combination teaches extracting a multi-dimensional statistical signature of speech production abilities of the user from the input signal for assessing a physiological status (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”; [0039]: “According to an embodiment of the invention, acoustic biomarkers can be recorded and a patient can be monitored over a period of time (such as a few days to several years). A comparison with look-up tables or a rapid change in specific biomarkers can indicate a greater likelihood of a disease.”; [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision. These include, but are not limited to, discriminant analyses, regression, hidden Markov-models, support-vector machines, and neural networks.”; [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”; [0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject”), including
generating at least one metric by extracting and measuring acoustic features from the input signal (Shrivastav, [0041]: “one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration; fricative noise characteristics; stop burst duration; burst spectral characteristics; changes in speaking rate within or across phrases/sentence; changes in formant frequencies; and changes in formant frequency transitions. In addition, one or more acoustic measures for neurological and other diseases can include, but are not limited to, measures of aspiration noise, frequency and intensity perturbation; signal-to-noise (SNR) ratios; changes in pitch over time; changes in loudness over time; and/or other temporal and/or spectral characteristics of a speech sample(s). The one or more acoustic measures also can include a measure of partial loudness. In one embodiment, acoustic measures associated with neurological and other diseases can include a measure of low frequency periodic energy, a measure of high frequency aperiodic energy, and/or a measure of partial loudness of a periodic signal portion of the speech sample. The acoustic measure of the speech sample can further include a measure of noise in the speech sample and a measure of partial loudness of the speech sample. Of course, embodiments are not limited thereto”; [0042]: “the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics. These may be determined through analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.), grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words”; [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision”. The analysis of the acoustic measures and biomarkers are the metric.), wherein extracting the acoustic features comprises performing computerized signal analysis of the input signal over time and frequency domains and using the timepoints of the text-acoustic alignment dataset (Shrivastav, [0041]: “one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration; fricative noise characteristics; stop burst duration; burst spectral characteristics; changes in speaking rate within or across phrases/sentence; changes in formant frequencies; and changes in formant frequency transitions. In addition, one or more acoustic measures for neurological and other diseases can include, but are not limited to, measures of aspiration noise, frequency and intensity perturbation; signal-to-noise (SNR) ratios; changes in pitch over time; changes in loudness over time; and/or other temporal and/or spectral characteristics of a speech sample(s). The one or more acoustic measures also can include a measure of partial loudness. In one embodiment, acoustic measures associated with neurological and other diseases can include a measure of low frequency periodic energy, a measure of high frequency aperiodic energy, and/or a measure of partial loudness of a periodic signal portion of the speech sample. The acoustic measure of the speech sample can further include a measure of noise in the speech sample and a measure of partial loudness of the speech sample. Of course, embodiments are not limited thereto”; [0042]: “the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics. These may be determined through analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.), grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words”) to select corresponding windows of the input signal for digital signal processing (Ispahani , [0015]: “the system and method of the present invention accepts as its input a digital audio stream and a set of one or more time intervals in the audio stream for which the speech therein shall be transcribed as text data. The set of one or more time intervals may include the entire audio stream from start to finish. In one or more embodiments, the system and method of the present invention provides as its output a log file containing the transcribed text along with one or more timestamps that link the transcribed text with its corresponding position in the audio stream”; [0014]: “select time intervals in the stream for the audio to be transcribed as text, and review and organize the transcribed text”), and wherein the at least one metric quantifies an acoustic manifestation for predicting the physiological status (Shrivastav, [0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject (S210 of FIG. 2B). The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B). The processor 202 can determine a health state of the subject based upon the results of the comparison or by tracking the rate of change in specific baseline acoustic measures (S240 of FIG. 2B). The processor 202 can then output a diagnosis”);
comparing the multi-dimensional statistical signature against one or more baseline statistical signatures of speech production ability (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”); and
providing a speech change identification signal attributable to respiratory tract function of the user, based on the multi-dimensional statistical signature comparison (Shrivastav, [0028]: “the speech and language of a speaker may be monitored over different periods, ranging from a few minutes to several days, weeks, months, or even years. During this monitoring, candidate biomarkers can be tracked to determine their presence/absence or the degree to which these change over time. These data can be compared to some normative database or to some specified criteria, and results of the comparison can be used to predict the likelihood of one or more neurological/neurodegenerative or other disease, such as infectious and/or respiratory disease, condition(s)”).
Regarding claim 15, the Shrivastav/Ispahani combination teaches the method of claim 14, wherein the one or more baseline statistical signatures of speech production ability are derived or obtained from the user (Shrivastav, [0034]: “the baseline acoustic measures for the diseases can be created using a method including: collecting speech samples from patients at the time of their diagnosis”).
Regarding claim 16, the Shrivastav/Ispahani combination teaches the method of claim 14, wherein the one or more baseline statistical signatures of speech production ability are based on normative acoustic data from a database (Shrivastav, [0028]: “the speech and language of a speaker may be monitored over different periods, ranging from a few minutes to several days, weeks, months, or even years. During this monitoring, candidate biomarkers can be tracked to determine their presence/absence or the degree to which these change over time. These data can be compared to some normative database or to some specified criteria, and results of the comparison can be used to predict the likelihood of one or more neurological/neurodegenerative or other disease, such as infectious and/or respiratory disease, condition(s)”).
Regarding claim 17, the Shrivastav/Ispahani combination teaches the method of claim 14, wherein the comparing the multi- dimensional statistical signature against the one or more baseline statistical signatures of speech production ability comprises applying a machine learning algorithm to the multi-dimensional statistical signature (Shrivastav, [0069]: “After performing the speech and/or language analysis, modeling and coding can be performed by the coding module 511 via statistical approaches, machine learning, pattern recognition, or other algorithms to combine information from various biomarkers before reaching a diagnostic decision.”).
Regarding claim 19, the Shrivastav/Ispahani combination teaches the method of claim 14, wherein: extracting the multi-dimensional statistical signature of speech production abilities of the user from the input signal comprises measuring speech features across one or more of the following perceptual dimensions: articulation, prosodic variability, phonation changes, rate, and rate variation (Shrivastav, [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”); and comparing the multi-dimensional statistical signature against the one or more baseline statistical signatures of speech production ability comprises comparing each of the speech features to a corresponding baseline speech feature of the one or more baseline statistical signatures of speech production ability (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B). The processor 202 can determine a health state of the subject based upon the results of the comparison or by tracking the rate of change in specific baseline acoustic measures (S240 of FIG. 2B). The processor 202 can then output a diagnosis.”; [0083]: “an audio (conversational) stream received via a phone/microphone (e.g., mobile phone, VoIP, internet, etc.) is analyzed by segmenting the audio stream into short windows, computing specific acoustic measures from each window (e.g. mel-frequency cepstral coefficients), comparing the acoustic measures across successive windows, developing and training a machine learning pattern recognition engine to identify acoustic patterns of a cough, and determining the likelihood of a particular window (or set of windows) to contain an instance of cough”; [0036]: “The average of each feature can then be charted against time. For example, the average variability of a fundamental frequency (F.sub.0) can be charted against time over the analysis period and compared against the variability of F.sub.0 from a healthy group.”)).
Regarding independent claim 20, Shrivastav teaches a non-transitory computer readable storage medium which, when executed by a computer ([0103]: “In a distributed-computing environment, program modules can be located in both local and remote computer-storage media including memory storage devices. The computer-useable instructions form an interface to allow a computer to react according to a source of input. The instructions cooperate with other code segments or modules to initiate a variety of tasks in response to data received in conjunction with the source of the received data.”), causes the computer to:
receive an input signal that is indicative of speech provided by a user ([0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject. The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”),
the input signal including a text-acoustic alignment dataset derived from transcribed speech of the user ([0018]: “The biomarkers may be determined, for example, from acoustic analyses of the speech signal, by the application of an automatic speech recognition system including large vocabulary systems, phoneme detection, word spotting engines or the like, and the application of syntactical coding or transcription on input speech”).
However, Shrivastav does not teach the transcribed speech of the user aligned with timepoints.
Ispahani discloses a system and method for processing digital audio data. Specifically, Ispahani teaches the transcribed speech of the user aligned with timepoints ([0015]: “the system and method of the present invention accepts as its input a digital audio stream and a set of one or more time intervals in the audio stream for which the speech therein shall be transcribed as text data. The set of one or more time intervals may include the entire audio stream from start to finish. In one or more embodiments, the system and method of the present invention provides as its output a log file containing the transcribed text along with one or more timestamps that link the transcribed text with its corresponding position in the audio stream”). Shrivastav and Ispahani are analogous arts as they are both systems used to process speech recordings.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include the timepoints from Ispahani into the system from Shrivastav as it allows the system to state where the speech aspects are in the audio recording, which can provide a more accurate and comprehensive analysis of the user’s speech.
The Shrivastav/Ispahani combination teaches extract a multi-dimensional statistical signature of speech production abilities of the user from the input signal that maps diverse physiological aspects of speech production defined from the input signal to quantifiable acoustic measurements comprising computerized signal analysis of the input (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”; [0039]: “According to an embodiment of the invention, acoustic biomarkers can be recorded and a patient can be monitored over a period of time (such as a few days to several years). A comparison with look-up tables or a rapid change in specific biomarkers can indicate a greater likelihood of a disease.”; [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision. These include, but are not limited to, discriminant analyses, regression, hidden Markov-models, support-vector machines, and neural networks.”; [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”; [0018]: “These quantifiable measures of acoustic characteristics of a person's speech provide one or more biomarkers indicative of a likelihood of disease onset and/or stage of degeneration. The biomarkers may be determined, for example, from acoustic analyses of the speech signal, by the application of an automatic speech recognition system including large vocabulary systems, phoneme detection, word spotting engines or the like, and the application of syntactical coding or transcription on input speech”; [0046]: “The identification device 200 can be used to determine a health state of a subject by receiving, as input to the interface 201, one or more speech samples from a subject (S210 of FIG. 2B). The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B). The processor 202 can determine a health state of the subject based upon the results of the comparison or by tracking the rate of change in specific baseline acoustic measures (S240 of FIG. 2B). The processor 202 can then output a diagnosis”));
compare the multi-dimensional statistical signature against one or more baseline statistical signatures of speech production ability (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”); and
provide a speech change identification signal attributable to respiratory tract function of the user, based on the multi-dimensional statistical signature comparison (Shrivastav, [0046]: “The interface 201 then communicates the one or more speech samples to the processor 202, which identifies the acoustic measures from the speech samples (S220 of FIG. 2B) and compares the acoustic measures of the speech samples with the baseline acoustic measures 225 stored in the memory 203 (S230 of FIG. 2B)”).
Regarding claim 65, the Shrivastav/Ispahani combination teaches the device of claim 1, wherein the multi-dimensional statistical signature includes a composite statistical signature comprising a plurality of feature-specific metrics, the composite statistical signature representing the user's instantaneous speech production characteristics across multiple perceptual dimensions (Shrivastav, [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision. These include, but are not limited to, discriminant analyses, regression, hidden Markov-models, support-vector machines, and neural networks.”; [0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”).
Regarding claim 66, the Shrivastav/Ispahani combination teaches the device of claim 1, wherein the multi-dimensional statistical signature includes a synthesized output incorporating a plurality of composites that correspond to a readout associated with the physiological status (Shrivastav, [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision. These include, but are not limited to, discriminant analyses, regression, hidden Markov-models, support-vector machines, and neural networks.”).
Claims 61 and 63 are rejected under 35 U.S.C. 103 as being unpatentable over the Shrivastav/Ispahani combination as applied to claims 1 and 14 above, and further in view of Hanson (US 20200227161).
Regarding claim 61, the Shrivastav/Ispahani combination teaches the device of claim 1.
However, the Shrivastav/Ispahani combination does not teach wherein the speech changes resulting from respiratory tract function are related to or result from a congestion state.
Hanson discloses a health management system. Specifically, Hanson teaches wherein the speech changes resulting from respiratory tract function are related to or result from a congestion state ([0046]: “health management application 201 may detect coughing (including whether the cough is dry or productive), wheezing, shortness of breath, sneezing, congestion, sniffling, or any other non-verbal indications of probable symptoms of potential health conditions”). Shrivastav and Hanson are analogous arts as they both relate to systems that obtain measurements from a user and determine a health state from the information.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include evaluating congestion from Hanson into the device from the Shrivastav/Ispahani combination as it allows the device to take congestion into account when determining speech changes, which can provide the user with more information about their health state.
Regarding claim 63, Shrivastav teaches the method of claim 14.
However, Shrivastav does not teach wherein the speech changes resulting from respiratory tract function are related to or result from a congestion state.
Hanson teaches wherein the speech changes resulting from respiratory tract function are related to or result from a congestion state ([0046]: “health management application 201 may detect coughing (including whether the cough is dry or productive), wheezing, shortness of breath, sneezing, congestion, sniffling, or any other non-verbal indications of probable symptoms of potential health conditions”).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include evaluating congestion from Hanson into the method from Shrivastav as it allows the device to take congestion into account when determining speech changes, which can provide the user with more information about their health state.
Claims 62 and 64 are rejected under 35 U.S.C. 103 as being unpatentable over the Shrivastav/Ispahani combination as applied to claims 1 and 14 above, and further in view of Lotan (US 20150216448).
Regarding claim 62, the Shrivastav/Ispahani combination teaches the device of claim 1.
However, the Shrivastav/Ispahani combination does not teach wherein the speech changes resulting from respiratory tract function are related to or result of a cessation of smoking.
Lotan discloses a system and method for measuring lung capacity and stamina. Specifically, Lotan teaches wherein the speech changes resulting from respiratory tract function are related to or result of a cessation of smoking ([0103]: “The system's database comprising information and test results of a plurality of patients may be used by the system application to perform various statistical operations for calculating, for example, variance of a patient's test results between different times of testing, variance of test results between patients, variance of test results according to other known parameters (age, gender, known disease, geographic location, smoking and more)”). Shrivastav and Lotan are analogous arts as they both relate to systems that obtain measurements from a user and determine a health state from the information.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include evaluating smoking of the user from Lotan into the device from the Shrivastav/Ispahani combination as it allows the device to take smoking into account when determining speech changes, which can provide the user with more information about their health state.
Regarding claim 64, Shrivastav teaches the method of claim 14.
However, Shrivastav does not teach wherein the speech changes resulting from respiratory tract function are related to or result of a cessation of smoking.
Lotan teaches wherein the speech changes resulting from respiratory tract function are related to or result of a cessation of smoking ([0103]: “The system's database comprising information and test results of a plurality of patients may be used by the system application to perform various statistical operations for calculating, for example, variance of a patient's test results between different times of testing, variance of test results between patients, variance of test results according to other known parameters (age, gender, known disease, geographic location, smoking and more)”).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include evaluating smoking of the user from Lotan into the method from Shrivastav as it allows the device to take smoking into account when determining speech changes, which can provide the user with more information about their health state.
Claim 67 is rejected under 35 U.S.C. 103 as being unpatentable over the Shrivastav/Ispahani combination as applied to claim 1 above, and further in view of Su (CN 110634474). Citations to CN 110634474 will refer to the English Machine Translation that accompanies this Office Action.
Regarding claim 67, the Shrivastav/Ispahani combination teaches the device of claim 1, wherein the signal processing circuitry implements one or more machine learning models (Shrivastav, [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision. These include, but are not limited to, discriminant analyses, regression, hidden Markov-models, support-vector machines, and neural networks.”).
However, the Shrivastav/Ispahani combination does not teach the one or more machine learning models to: generate a first score for a first perceptual dimension attributable to the physiological status, generate a second score for a second perceptual dimension attributable to the physiological status, and synthesize the first score and the second score into a new output indicative of the physiological status and respiratory tract function.
Su discloses a voice recognition method and device based on artificial intelligence. Specifically, Su teaches the machine learning models to: generate a first score for a first perceptual dimension of the one or more perceptual dimensions attributable to the physiological status, generate a second score for a second perceptual dimension of the one or more perceptual dimensions attributable to the physiological status ([0074]: “the agent uses a deep neural network and the acoustic modeling description features calculated from the entire speech signal (i.e., the input features based on the acoustic model posterior score in the figure) as input to predict the decoding parameters”; [0055]: “The acoustic modeling unit score can be obtained by combining the posterior probability and the acoustic statistical prior. At this time, the acoustic score does not follow the distribution of 0 to 1; this score will be used for speech recognition decoding”). Shrivastav, Ispahani, and Su are analogous arts as they are all systems used to process speech recordings.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to include the scores from Su into the Shrivastav/Ispahani combination as it allows the device to further analyze the measurements and audio data, which can provide a more accurate and comprehensive analysis.
The Shrivastav/Ispahani/Su combination teaches synthesize the first score and the second score into a new output indicative of the physiological status and respiratory tract function (Shrivastav, [0044]: “The biomarkers described above may be suitably weighted and combined using appropriate statistical, pattern-recognition and/or machine learning techniques prior to making a diagnostic decision. These include, but are not limited to, discriminant analyses, regression, hidden Markov-models, support-vector machines, and neural networks.”; Su, [0074]: “the agent uses a deep neural network and the acoustic modeling description features calculated from the entire speech signal (i.e., the input features based on the acoustic model posterior score in the figure) as input to predict the decoding parameters”; [0055]: “The acoustic modeling unit score can be obtained by combining the posterior probability and the acoustic statistical prior. At this time, the acoustic score does not follow the distribution of 0 to 1; this score will be used for speech recognition decoding”).
Response to Arguments
All of applicant’s argument regarding the rejections and objections previously set forth have been fully considered and are persuasive unless directly addressed subsequently.
Applicant has amended the claims to overcome the 112(b) rejections, however the amendments have introduced a new 112(b) rejection.
Applicant's arguments with regards to the 103 rejection have been fully considered but they are not persuasive. Applicant argues that Shrivastav does not teach a text-acoustic alignment dataset for feature extraction by using time points, however the combination of Shrivastav and Ispahani is used to teach this limitation, not solely Shrivastav. Applicant also argues that Ispahani does not teach this limitation, however again, the combination of the two references teaches on this limitation. Shrivastav teaches using specific acoustic features extracted from the dataset for analysis ([0041]: “one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration; fricative noise characteristics; stop burst duration; burst spectral characteristics; changes in speaking rate within or across phrases/sentence; changes in formant frequencies; and changes in formant frequency transitions. In addition, one or more acoustic measures for neurological and other diseases can include, but are not limited to, measures of aspiration noise, frequency and intensity perturbation; signal-to-noise (SNR) ratios; changes in pitch over time; changes in loudness over time; and/or other temporal and/or spectral characteristics of a speech sample(s). The one or more acoustic measures also can include a measure of partial loudness. In one embodiment, acoustic measures associated with neurological and other diseases can include a measure of low frequency periodic energy, a measure of high frequency aperiodic energy, and/or a measure of partial loudness of a periodic signal portion of the speech sample. The acoustic measure of the speech sample can further include a measure of noise in the speech sample and a measure of partial loudness of the speech sample. Of course, embodiments are not limited thereto”; [0042]: “the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics. These may be determined through analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.), grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words”. The biomarkers are the acoustic features), and Ispahani teaches labeling the speech data with specific timepoints of the important speech characteristics ([0015]: “the system and method of the present invention accepts as its input a digital audio stream and a set of one or more time intervals in the audio stream for which the speech therein shall be transcribed as text data. The set of one or more time intervals may include the entire audio stream from start to finish. In one or more embodiments, the system and method of the present invention provides as its output a log file containing the transcribed text along with one or more timestamps that link the transcribed text with its corresponding position in the audio stream”; [0014]: “select time intervals in the stream for the audio to be transcribed as text, and review and organize the transcribed text”), therefore the combination teaches on this limitation, not the references individually. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Applicant also argues that the combination does not teach the corresponding metric is an output of one or more machine learning models that quantifies an acoustic manifestation and reflects one or more perceptual dimensions, however Shrivastav teaches that the machine learning models are used to combine the acoustic manifestations for use in diagnostic decision making ([0069]: After performing the speech and/or language analysis, modeling and coding can be performed by the coding module 511 via statistical approaches, machine learning, pattern recognition, or other algorithms to combine information from various biomarkers before reaching a diagnostic decision”) and that the analysis is performed to quantify metrics of the perceptual dimensions ([0066]: “acoustic analysis can be performed to quantify metrics including, but not limited to fundamental frequency characteristics, intensity, articulatory characteristics, speech/voice quality, prosodic characteristics, and speaking rate.”), therefore Shrivastav teaches on the machine learning model limitation.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIN K MCCORMACK whose telephone number is (703)756-1886. The examiner can normally be reached Mon-Fri 7:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Sims can be reached at 5712727540. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.K.M./Examiner, Art Unit 3791
/MATTHEW KREMER/Primary Examiner, Art Unit 3791