Last updated: May 29, 2026
Application No. 18/613,692
TOOL FOR ANALYZING SPEECH USING AI TECHNIQUES TO DIAGNOSE DEMENTIA

Non-Final OA §103
Filed
Mar 22, 2024
Priority
Mar 22, 2023 — provisional 63/453,892
Examiner
ISLAM, MOHAMMAD K
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Drexel University
OA Round
1 (Non-Final)
Interview Optional

— +16.4% interview lift. Examiner has a relatively high allowance rate (83%); +16.4% interview lift. A written response may suffice.
Based on 1308 resolved cases, 2023–2026
Examiner Intelligence

ISLAM, MOHAMMAD K View full profile →
Grants 83% — above average
Career Allowance Rate
1088 granted / 1308 resolved
+21.2% vs TC avg
Strong +16% interview lift
Without
With
+16.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
58 currently pending
Career history
1384
Total Applications
across all art units
Statute-Specific Performance

§101
11.3%
-28.7% vs TC avg
§103
62.4%
+22.4% vs TC avg
§102
20.8%
-19.2% vs TC avg
§112
2.3%
-37.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1308 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/04/2025 is considered by the examiner.
Drawings
The drawing submitted on 03/22/2024 is considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-20, are rejected under 35 U.S.C. 103 as being unpatentable over Gosztolya et al.(US 2022/0039741 A1)  in view of Chen (US 2024/0274286 A1).

Regarding Claim 1, Gosztolya et al. teach: A method of training a machine-learning model to detect a neurological condition, the method comprising ([0004] Using various parameters, all the above cited test methods measure the divergence between the cognitive capabilities of a patient who is presumably affected by neurocognitive impairment and the cognitive capabilities of healthy persons. [0008] The detection of neurocognitive impairment through automated evaluation of speech samples by a machine is also addressed in a study by the authors Laszlo Tóth, Ildiko Hoffmann, Gabor Gosztolya, Veronika Vincze, Greta Szatloczki, Zoltan Banreti, Magdolna Pákáski, Janos Kálmán entitled “A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech”, Curr. Alzheimer Res. 15:(2) 130-138 (2018). ): providing a neurological condition dataset (acoustic parameter) from subjects (patient) having the neurological condition; providing a control dataset (acoustic parameter) from healthy subjects (healthy subject or control group); and training the machine-learning model using the neurological condition dataset and the control dataset, forming a trained machine-learning model (trained machine learning algorithm); wherein the trained machine-learning model is configured to detect (predict) a neurological condition ([0045] In the speech sample generation step 10 of the method (as shown in FIG. 1), a speech sample 22 of human speech is recorded, preferably applying the sound recording unit 20, followed by passing on the recorded speech sample 22 to the speech recognition unit 24 and/or the database (the database is not shown in the figure). [0053] A characteristic feature of the acoustic parameters 30 extracted by the parameter extraction unit 28 is that they can be computed from the length of the particular segments of the labelled segment series 26, wherein a value of at least one of the acoustic parameters 30 being significantly different in the case of healthy subjects and patients presumably exhibiting neurocognitive impairment. The acoustic parameters 30 therefore contain information applicable for distinguishing healthy subjects from patients presumably exhibiting neurocognitive impairment. A comparison of acoustic parameters 30 extracted from speech samples 22 from patients exhibiting neurocognitive impairment and from speech samples from a control group was performed via a statistical method. [0057] The machine learning algorithms have to be trained using speech samples 22 from patients exhibiting neurocognitive impairment and from a healthy control group. By training the machine learning algorithm applying both the acoustic parameters 30 and the additional histogram data set 42, the machine learning algorithm operates more effectively and makes higher-quality decisions. [0059] By generating the decision information 34, the evaluation unit 32 determines whether the acoustic parameters 30 of the speech sample 22 under examination and the additional histogram data set 42 are closer to the corresponding parameters of the speech sample 22 of the group with neurocognitive impairment, or those of the speech sample 22 corresponding to the control group. The trained machine learning algorithm applied by the evaluation unit 32 preferably assigns a respective probability value (a value between 0 and 1) to the events of the subject belonging to one or the other possible decision group; the sum of the probability values is 1.).
Gosztolya et al. do not  specifically teach: neurological condition dataset comprising neurological condition text embeddings, the neurological condition text embeddings including large language model (LLM) text embeddings and  control dataset comprising control text embeddings, the control text embeddings including LLM text embeddings; and training the machine-learning model using the neurological condition dataset and the control dataset, forming a trained machine-learning model.
However, training the machine-learning model using embeddings associated with data fields of unstructured/prediction encounter dataset or data elements(condition dataset) including large language model (LLM) text embeddings and the structured/target classification dataset (control dataset)  including large language model (LLM) text embeddings to determine a predictive classification output based on structured/target classification data is well-known in any domains include banking, healthcare, industrial, manufacturing, education, retail, to name a few.

Such as Chen teach: [004] Here, the clinical tabular multi-head attention model includes a categorical feature encoder, a continuous feature encoder, a concatenator, a multi-head attention network, and a fully-connected feed forward network. The categorical feature encoder is configured to receive, as input, each categorical feature of the one or more categorical features extracted from the clinical data table, and generate, as output, a corresponding categorical embeddings for each categorical feature. The continuous feature encoder is configured to receive, as input, each continuous feature of the one or more categorical features extracted from the clinical data table, and generate, as output, a corresponding continuous feature embedding for each categorical feature. The concatenator is configured to concatenate the one or more categorical feature embeddings and the one or more continuous feature embeddings to form a set of parametric embeddings. The multi-head attention network is configured to receive, as input, each parametric embedding in the set of parametric embeddings formed by the concatenator, and generate, as output, a corresponding contextual embedding for each parametric embedding in the set of parametric embeddings. The fully-connected feedforward network is configured to receive, as input, the contextual embeddings generated as output from the multi-head attention network, and predict, as output, the one or more clinical outcomes for the patient. [0037] In other implementations, the clinical prediction model 150 includes a large language model (LLM) 400. The LLM 400 provides the ability to provide profound in-context learning capabilities (even when the number of training data tables is sparse) by exploiting knowledge from other resources to downstream tasks with minimal tuning. While the ClinTaT model 200 is suited for tabular data modeling, the LLM 400 is trained to predict clinical outcomes from input text sequences 402 (FIG. 4) serialized from the tabular clinical data 201. Thus, the LLM 400 is configured to process an input text sequence 402 serialized/converted from tabular clinical data 201 to generate the one or more predicted clinical outcomes 182. The LLM 400 may include a domain-specific LLM pre-trained on a vocabulary/syntax associated with the domain such as medical terminology. The use of domain-specific LLMs allows for smaller LLMs to be utilized for few-shot learning, whereby input text sequences 402 serialized from training data tables 40 may be used as context for a query to predict the clinical outcomes 182. Additional implementations are directed leveraging a large language model (LLM) for predicting clinical outcomes for a patient based on an input text sequence serialized from tabular clinical data. Advantageously, LLM's are capable of providing profound in-context learning capabilities when available training samples are limited by exploiting knowledge from other resources to downstream tasks with minimal tuning. [0038] A training network 50 is trained on a set of training data tables 40, 40a-n each associated with a respective training patient and including prognostic variables related to the respective training patient's clinical features and molecular profile in tabular form. When the training network 50 is training the ClinTaT model 200, an extractor 55 extracts the categorical features 202 and continuous features 204 from each training data table 40 and provides the extracted categorical and continuous features 202, 204 to the training network 50 for training the ClinTaT model.
Therefore, it would have been obvious to one of ordinary skilled in the art at the time of the invention  was made for Gosztolya et al. to include the teaching of Chen  above in order to predict, as output, one or more clinical outcomes for a patient, based on input categorical features extracted from the clinical data table, a corresponding contextual embedding for parametric embedding.

Regarding Claim 2: The method of claim 1, wherein: providing the neurological condition dataset comprises generating the neurological condition dataset, the generating of the neurological condition dataset including: providing neurological condition audio recordings of speech from the subjects having the neurological condition; and providing the control dataset comprises generating the control dataset, the generating of the control dataset including: providing control audio recordings of speech from the healthy subjects (See rejection of claim 1 specifically Gosztolya et al. teaching and  [0057] The machine learning algorithms have to be trained using speech samples 22 from patients exhibiting neurocognitive impairment and from a healthy control group. By training the machine learning algorithm applying both the acoustic parameters 30 and the additional histogram data set 42, the machine learning algorithm operates more effectively and makes higher-quality decisions. [0059] By generating the decision information 34, the evaluation unit 32 determines whether the acoustic parameters 30 of the speech sample 22 under examination and the additional histogram data set 42 are closer to the corresponding parameters of the speech sample 22 of the group with neurocognitive impairment, or those of the speech sample 22 corresponding to the control group.); converting the neurological condition audio recordings to neurological condition text transcripts; converting the control audio recordings to control text transcripts ([0014] In U.S. Pat. No. 5,333,275 a method for making a time aligned transcription of a sound recording of spontaneous speech is disclosed. [0037] In a segmentation and labelling step 11, the speech sample is processed, generating a labelled segment series from the speech sample by using a speech recognition unit. Any type of speech recognition system (even a prior art one) can be preferably applied as a speech recognition unit. [0043] For performing the steps of the method, the data processing system comprises a speech recognition unit 24 adapted for generating a labelled segment series 26 from a speech sample 22, and a parameter extraction unit 28 that is connected to the output of the speech recognition unit 24 and is adapted for extracting acoustic parameters 30 from the labelled segment series 26.  [0053] A characteristic feature of the acoustic parameters 30 extracted by the parameter extraction unit 28 is that they can be computed from the length of the particular segments of the labelled segment series 26, wherein a value of at least one of the acoustic parameters 30 being significantly different in the case of healthy subjects and patients presumably exhibiting neurocognitive impairment.); and deriving the neurological condition text embeddings from the neurological condition text transcripts using the LLM; and deriving the control text embeddings from the control text transcripts using the LLM (See Chen teaching in rejection of claim 1).

Regarding Claim 3: The method of claim 2, wherein: the converting of the neurological condition audio recordings to the neurological condition text transcripts includes inputting the neurological condition audio recordings to an automatic speech recognition model; and the converting of the control audio recordings to the control text transcripts includes inputting the control audio recordings to an automatic speech recognition model (See Gosztolya et al. teaching in claim 2).

Regarding Claim 4: The method of claim 1, wherein the text embeddings from subjects having the neurological condition and the text embeddings from healthy subjects capture at least one of lexical, syntactic, and semantic properties (See Gosztolya et al. teaching: [0011] Automated speech recognition algorithms and software known in the prior art, for example HTK (http://htk.eng.cam.ac.uk/) or PRAAT (http://www.praat.org/) usually apply a probabilistic model, for example, hidden Markov model for recognizing particular phonemes or words of the speech sample, i.e. they determine the word or phoneme that can be heard with the greatest probability during a given temporal segment. [0014] In U.S. Pat. No. 5,333,275 a method for making a time aligned transcription of a sound recording of spontaneous speech is disclosed. In the method, speech sounds and certain extralinguistic features, namely pauses, the sounds of inhalation and exhalation and the sounds of lip smacking are identified applying separate hidden Markov models; other extralinguistic features, such as pauses of hesitation and filled pauses, are not modelled by the method. [0046] The recorded speech sample 22 is processed by the speech recognition unit 24, wherein the speech sample 22 is either retrieved from the sound recording unit 20 or from the database. The speech recognition unit 24 can be implemented by applying a speech recognition system that is known from the prior art and includes commercially available hardware, software, or a combination thereof.).

Regarding Claim 5: The method of claim 1, wherein the LLM includes a natural language processing (NLP) LLM (See rejection of claim 1 and Chen teaching: [0029] More specifically, these additional implementations are directed toward leveraging LLMs that have been pre-trained on natural language text in the medical domain, and then fine-tuned through few-shot learning by conditioning the domain-specific pre-trained LLMs on available input text sequences serialized from tabular clinical data for predicting specific clinical outcomes. [0053] The training network 50 applies serialization 410 to serialize the tabular clinical data stored in each training data table 40 into a corresponding input text sequence 402. Here, the input text sequence 402 serialized from the features in the column of each training data table 40 includes a sequence of natural language tokens (e.g., words/wordpieces) that the LLM is able to comprehend and encode.).

Regarding Claim 6: The method of claim 1, wherein the LLM is fine-tuned with speech transcripts (See rejection of claim 5).

Regarding Claim 7: The method of claim 1, wherein the neurological condition dataset and the control dataset further comprise at least one acoustic feature(acoustic parameter) (See rejection of claim 1).

Regarding Claim 8: The method of claim 7, wherein the at least one acoustic feature includes features related to temporal analysis ( Chan teaching parametric embeddings), frequency analysis, different aspects of speech production, and combinations thereof ( See rejection of claim 1, specifically Gosztolya et al. teaching [0037] In a segmentation and labelling step 11, the speech sample is processed, generating a labelled segment series from the speech sample by using a speech recognition unit. [0038] The labelled segment series includes phoneme labels, silent pause labels, and filled pause labels as different label types, and also contains the initial and final time instances for each label. [0039] In an acoustic parameter calculation step 12, acoustic parameters characteristic of the speech sample are generated from the labelled segment series, which acoustic parameters are, in an evaluation step 13, fed into an evaluation unit that applies a machine learning algorithm for producing decision information. Based on the decision information, it can be decided if the person providing the speech sample is potentially affected by neurocognitive impairment. The evaluation unit adapted for providing the decision information is also described in more detail in relation to FIG. 2. [0040] In the course of processing the speech sample, in a probability analysis step 14, respective probability values are determined corresponding to silent pauses, filled pauses and any types of pauses (i.e. either silent or filled pauses) for respective temporal intervals of a particular temporal division of the speech sample.).

Regarding Claim 9: The method of claim 7, wherein the at least one acoustic feature is combined using concatenation (See rejection of claim 7 and Chan teaching [0004] The concatenator is configured to concatenate the one or more categorical feature embeddings and the one or more continuous feature embeddings to form a set of parametric embeddings.).

Regarding Claim 10:  The method of claim 1, wherein detecting the neurological condition includes distinguishing between subjects having the neurological condition and healthy subjects without the neurological condition (See rejection of claim 1).

Regarding Claim 11: The method of claim 10, wherein the machine-learning model includes support vector classifier (SVC), logistic regression (LR), or random forest (RF) (See rejection of claim 10 and Gosztolya et al. teaching: [0057] The evaluation unit 32 used for generating the decision information 34 preferably applies a trained machine learning algorithm, and more preferably applies a “Naive Bayes” (NB), linear “Support Vector Machine” (SVM) or “Random Forest”-type machine learning algorithm.  ).

Regarding Claim 12: The method of claim 1, wherein detecting the neurological condition includes providing a severity prediction for the neurological condition (See rejection of claim 1).

Regarding Claim 13: The method of claim 12, wherein the machine-learning model includes a regression model ( Naive Bayes” (NB), linear “Support Vector Machine” (SVM) or “Random Forest”-type machine learning algorithm) (See rejection of claim 12 and Gosztolya et al. teaching: [0057] The evaluation unit 32 used for generating the decision information 34 preferably applies a trained machine learning algorithm, and more preferably applies a “Naive Bayes” (NB), linear “Support Vector Machine” (SVM) or “Random Forest”-type machine learning algorithm. ).

Regarding Claim 14:  The method of claim 13, wherein the regression model includes support vector regressor (SVR), ridge regression (Ridge), or random forest regressor (RFR) (See rejection of claim 13 and Gosztolya et al. teaching: [0057] The evaluation unit 32 used for generating the decision information 34 preferably applies a trained machine learning algorithm, and more preferably applies a “Naive Bayes” (NB), linear “Support Vector Machine” (SVM) or “Random Forest”-type machine learning algorithm. ).

Regarding Claim 15:  The method of claim 1, wherein the neurological condition is dementia or Alzheimer's disease (AD) ( See rejection of claim 1 and Gosztolya et al. teaching: [0009] A publication cited in the above mentioned article, by the authors K. López-de-Ipina et al., entitled “On the Selection of Non-Invasive Methods Based on Speech Analysis Oriented to Automatic Alzheimer Disease Diagnosis”, Sensors 13, 6730-6745 (2013), also deals with the automated diagnosis of Alzheimer's disease based on speech samples. [0074] In the following, an exemplary application of the method according to the invention for monitoring the therapy of Alzheimer's disease is described. The currently feasible aim of the therapy of Alzheimer's is to slow the progress of the disease. One of the most frequently applied commercially available drugs is donepezil, an acetylcholinesterase inhibitor. Therefore, the method according to the invention was applied for examining the efficacy of donepezil therapy of patients with early-stage Alzheimer's disease.).

Regarding Claim 16: A method of detecting a neurological condition in a subject (patient), the method comprising: providing a test dataset from the subject (acoustic parameter), the test dataset including LLM text embeddings from the subject(patient); and inputting the test dataset to the trained machine-learning model according to claim 1; wherein the trained machine-learning model outputs a detection of the neurological condition based upon the test dataset (See rejection of claim 1).

Regarding Claim 17: The method of claim 16, wherein providing the test dataset comprises generating the test dataset, the generating of the test dataset including: providing audio recordings of speech from the subject; converting the audio recordings to text transcripts; and deriving the text embeddings from the text transcripts using the LLM (See rejection of claim 2).

Regarding Claim 18:  The method of claim 17, wherein the converting of the audio recordings to text transcripts includes inputting the audio recordings to an automatic speech recognition model (See rejection of claim 3).

Regarding Claim 19:  A system for detecting a neurological condition in a subject, the system including: a processor; a memory unit (Gosztolya et al. teaching: [0002] The subject of the invention is primarily a method related to the automatic detection of neurocognitive impairment based on a speech sample, without any medical intervention, as well as a data processing system, a computer program product and computer-readable storage medium that are adapted for carrying out the method.); and a communication interface; wherein the processor is connected to the memory unit and the communication interface (Gosztolya et al. teaching: [0044] The data processing system preferably comprises a sound recording unit 20 connected to the input of the speech recognition unit 24, and/or a display unit 36 connected to the output of the evaluation unit 32, and/or a database, the database being interconnected with the sound recording unit 20, with the speech recognition unit 24, with the parameter extraction unit 28, with the additional parameter extraction unit 40, with the evaluation unit 32 and/or with the display unit 36. [0045] In the speech sample generation step 10 of the method (as shown in FIG. 1), a speech sample 22 of human speech is recorded, preferably applying the sound recording unit 20, followed by passing on the recorded speech sample 22 to the speech recognition unit 24 and/or the database (the database is not shown in the figure).); and wherein the processor and memory are configured to implement the method of claim 16 (See rejection of claim 16).

Regarding Claim 20: The system of claim 19, further comprising a web application configured and adapted to provide a user interface for the subject (See rejection of claim 19 and Gosztolya et al. teaching: [0045] In the speech sample generation step 10 of the method (as shown in FIG. 1), a speech sample 22 of human speech is recorded, preferably applying the sound recording unit 20, followed by passing on the recorded speech sample 22 to the speech recognition unit 24 and/or the database (the database is not shown in the figure). The sound recording unit 20 is preferably a telephone or a mobile phone, more preferably a smartphone or a tablet, while the sound recording unit 20 can also be implemented as a microphone or a voice recorder. The sound recording unit 20 is preferably also adapted for conditioning and/or amplifying the speech sample 22. [0047] Preferably, the speech recognition system that is known in the field as HTK and is publicly available free of charge (http://htk.eng.cam.ac.uk/) can be applied as the speech recognition unit 24. [0056] Based on the decision information 34 it can be decided whether the person producing the speech sample 22 under examination is healthy or presumably suffers from neurocognitive impairment. The decision information 34 preferably also contains a decision limit and the corresponding error margins, which allows for more sophisticated decision-making. The decision information 34 is preferably displayed by a display unit 36 that is preferably implemented as a device having a screen, i.e. a smartphone or tablet.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record KRISHNAN  et al. (WO 2023/014495 A1) teach: HIGH-QUALITY EMBEDDINGS FOR MEDICAL IMAGING AND SMALL, EASY-TO-TRAIN NETWORKS FOR LOW-DATA TASKS.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2653
Read full office action
Prosecution Timeline

Mar 22, 2024
Application Filed
Dec 23, 2025
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/189,823
Patent 12638606
SYSTEMS AND METHODS FOR SEISMIC IMAGE GENERATION USING DEEP LEARNING NETWORKS
3y 2m to grant Granted May 26, 2026
18/370,079
Patent 12631609
MEASUREMENT SYSTEM, MEASUREMENT METHOD, AND COMPUTER PROGRAM
2y 8m to grant Granted May 19, 2026
19/306,727
Patent 12632514
AI-GENERATED MUSIC DERIVATIVE WORKS
9m to grant Granted May 19, 2026
18/874,645
Patent 12625493
Monitoring Methods, Computer Program Product, Monitoring Unit, Gas Analysis Device, and use of Artificial Intelligence
1y 5m to grant Granted May 12, 2026
17/747,479
Patent 12619225
MEMORY AND COMPUTE-EFFICIENT UNSUPERVISED ANOMALY DETECTION FOR INTELLIGENT EDGE PROCESSING
3y 11m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+16.4%)
2y 8m (~6m remaining)
Median Time to Grant
Low
PTA Risk
Based on 1308 resolved cases by this examiner. Grant probability derived from career allowance rate.