Last updated: April 19, 2026
Application No. 18/495,588
TECHNIQUES FOR SPEECH LANGUAGE MODEL TRAINING AND APPLICATION

Non-Final OA §103
Filed
Oct 26, 2023
Examiner
ALBERTALLI, BRIAN LOUIS
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Smk Corporation
OA Round
2 (Non-Final)
Interview Optional

— +16.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 852 resolved cases, 2023–2026
Examiner Intelligence

ALBERTALLI, BRIAN LOUIS View full profile →
Grants 82% — above average
Career Allow Rate
697 granted / 852 resolved
+19.8% vs TC avg
Strong +16% interview lift
Without
With
+16.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
19 currently pending
Career history
871
Total Applications
across all art units
Statute-Specific Performance

§101
13.8%
-26.2% vs TC avg
§103
34.9%
-5.1% vs TC avg
§102
27.7%
-12.3% vs TC avg
§112
16.6%
-23.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 852 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. 112, 35 U.S.C. 102 and 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Vasquez-Correa et al. and Themistocleous et al.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5, 11, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez-Correa et al. (Convolutional Neural Networks and a Transfer Learning Strategy to Classify Parkinson’s Disease from Speech in Three Different Languages, hereinafter “Vasquez-Correa”), in view of Themistocleous et al. (Identification of Mild Cognitive Impairment from Speech in Swedish Using Deep Sequential Neural Networks, hereinafter “Themistocleous”).
In regard to claim 1, Vasquez-Correa discloses an apparatus (statistical classifications are computed, which would require a computer, section 2.3), comprising:
a processor (a computer inherently includes a processor); and
a memory coupled with the processor (a computer inherently requires a memory to store instructions for the processor), the memory storing code that is executable by the processor to case the apparatus to:
train a first speech model in a first language, the first speech model used to determine one or more characteristics of speech data that is indicative of a medical condition (a convolutional neural network (CNN) model is trained with speech in a first language, section 2.5);
train a second speech model for use in a second language using at least a portion of the first speech model trained in the first language (transfer learning is used to transfer parameters from the first speech model to a second speech model that is further trained with speech of a different language, section 2.5 and Fig. 1);
apply the second speech model trained for use in the second language to speech data for a user captured in the second language (the CNN trained in the first language and fine-tuned in the second language is used to predict a medical condition for a speaker, section 3.2); and
determine, based on output from the second speech model trained for use in the second language, an assessment of the medical condition for the user (the model performs a binary classification to identify whether the speaker has the medical condition or not, sections 2.3 and 3.2).
Vasquez-Correa discloses the medical condition determined using binary speech classification models is Parkinson’s, rather than mild cognitive impairment.
Themistocleous discloses machine learning models to identify whether a user has mild cognitive impairment based on their speech (see Abstract).
At the level of detail required by the claim, the only difference between Vasquez-Correa and the claim is the substitution of detecting Parkinson’s for detecting MCI. Both represent a binary classification problem, where the result of the classification indicates whether the user is likely to have the detected condition or not. It is widely recognized in the art that classification models can be applied to many different problems by training the classification models with appropriate data. For a binary classification task, those of ordinary skill in the art would recognize that simply substituting the original training data (i.e. speech samples labeled as indicating Parkinson’s or not) for new labeled training data (i.e. speech samples labeled as indicating MCI or not) would predictably result in the claimed first and second speech models that would determine one or more characteristics of speech data that is indicative of MCI.
Additionally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the binary classifier of Vasquez-Correa to the binary classification problem of identifying a user with MCI, because detecting MCI using speech would enable interventions to delay or prevent the development of Alzheimer’s disease and other types of dementia, as taught by Themistocleous (section 4).

In regard to claim 2, Vasquez-Correa discloses the first speech model trained in the first language is configured to analyze sublanguage characteristics of the speech data (features include speech acoustic features, section 2.3).

In regard to claim 3, Vasquez-Correa discloses the sublanguage characteristics comprise at least one of a speech tone, a speech rate, a speech pattern, acoustic speech characteristics, prosodic speech characteristics, and linguistic speech features (MFCC’s, etc., section 2.3).

In regard to claim 4, Vasquez-Correa does not disclose the first speech model trained in the first language is further trained to analyze non-speech data, the user's non-speech data further provided to the first speech model trained in the first language to determine the assessment of MCI for the user.
Themistocleous discloses the first speech model trained in the first language is further trained to analyze non-speech data, the user's non-speech data further provided to the first speech model trained in the first language to determine the assessment of MCI for the user (in addition to acoustic features, non-speech features such as age and gender are used, section 2.3.3).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further train the speech model to analyze non-speech data, because such features are useful for predicting whether as user has MCI, as suggested by Themistocleous (section 2.3.3 and section 4).

In regard to claim 5, Vasquez-Correa does not disclose the non-speech data comprises demographic information for the user.
Themistocleous discloses the non-speech data comprises demographic information for the user (gender and age, section 2.3.3).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use demographic information for the user, because such features are useful for predicting whether as user has MCI, as suggested by Themistocleous (section 2.3.3 and section 4).

In regard to claim 11, Vasquez-Correa discloses the speech data comprises a recorded audio clip of verbal responses by the user in response to at least one of at least one query and unprompted dialog (users were asked to pronounce sentences, section 2.1).

In regard to claim 19, Vasquez-Correa discloses a method, comprising:
apply a second speech model to speech data for a user captured in a second language, wherein the second speech model was trained for use in the second language to determine one or more characteristics of speech data that is indicative of a medical condition and wherein the second speech model was trained using at least a portion of a first speech model that was trained for use in determining one or more characteristics of speech data, in a first language, that is indicative of a medical condition (a convolutional neural network (CNN) model is trained with speech in a first language, section 2.5; transfer learning is used to transfer parameters from the first speech model to a second speech model that is further trained with speech of a different language, section 2.5 and Fig. 1; the CNN trained in the first language and fine-tuned in the second language is used to predict a medical condition for a speaker, section 3.2); and
determining, based on output from the second speech model trained for use in the second language, an assessment of the medical condition for the user (the model performs a binary classification to identify whether the speaker has the medical condition or not, sections 2.3 and 3.2).
Vasquez-Correa discloses the medical condition determined using binary speech classification models is Parkinson’s, rather than mild cognitive impairment.
Themistocleous discloses machine learning models to identify whether a user has mild cognitive impairment based on their speech (see Abstract).
At the level of detail required by the claim, the only difference between Vasquez-Correa and the claim is the substitution of detecting Parkinson’s for detecting MCI. Both represent a binary classification problem, where the result of the classification indicates whether the user is likely to have the detected condition or not. It is widely recognized in the art that classification models can be applied to many different problems by training the classification models with appropriate data. For a binary classification task, those of ordinary skill in the art would recognize that simply substituting the original training data (i.e. speech samples labeled as indicating Parkinson’s or not) for new labeled training data (i.e. speech samples labeled as indicating MCI or not) would predictably result in the claimed first and second speech models that would determine one or more characteristics of speech data that is indicative of MCI.
Additionally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the binary classifier of Vasquez-Correa to the binary classification problem of identifying a user with MCI, because detecting MCI using speech would enable interventions to delay or prevent the development of Alzheimer’s disease and other types of dementia, as taught by Themistocleous (section 4).

In regard to claim 20, Vasquez-Correa discloses an apparatus (statistical classifications are computed, which would require a computer, section 2.3), comprising:
means for training a first speech model in a first language, the first speech model used to determine one or more characteristics of speech data that is indicative of a medical condition (a convolutional neural network (CNN) model is trained with speech in a first language, section 2.5);
means for training a second speech model for use in a second language using at least a portion of the first speech model trained in the first language (transfer learning is used to transfer parameters from the first speech model to a second speech model that is further trained with speech of a different language, section 2.5 and Fig. 1);
means for applying the second speech model trained for use in the second language to speech data for a user captured in the second language (the CNN trained in the first language and fine-tuned in the second language is used to predict a medical condition for a speaker, section 3.2); and
means for determining, based on output from the second speech model trained for use in the second language, an assessment of the medical condition for the user (the model performs a binary classification to identify whether the speaker has the medical condition or not, sections 2.3 and 3.2).
Vasquez-Correa discloses the medical condition determined using binary speech classification models is Parkinson’s, rather than mild cognitive impairment.
Themistocleous discloses machine learning models to identify whether a user has mild cognitive impairment based on their speech (see Abstract).
At the level of detail required by the claim, the only difference between Vasquez-Correa and the claim is the substitution of detecting Parkinson’s for detecting MCI. Both represent a binary classification problem, where the result of the classification indicates whether the user is likely to have the detected condition or not. It is widely recognized in the art that classification models can be applied to many different problems by training the classification models with appropriate data. For a binary classification task, those of ordinary skill in the art would recognize that simply substituting the original training data (i.e. speech samples labeled as indicating Parkinson’s or not) for new labeled training data (i.e. speech samples labeled as indicating MCI or not) would predictably result in the claimed first and second speech models that would determine one or more characteristics of speech data that is indicative of MCI.
Additionally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the binary classifier of Vasquez-Correa to the binary classification problem of identifying a user with MCI, because detecting MCI using speech would enable interventions to delay or prevent the development of Alzheimer’s disease and other types of dementia, as taught by Themistocleous (section 4).


Claim(s) 6-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez-Correa, in view of Themistocleous, and further in view of Dagum (U.S. Patent Application Pub. No. 2020/0245949).
In regard to claim 6, Vasquez-Correa and Themistocleous do not disclose the non-speech data comprises gait data, the gait data describing the user’s manner of walking.
Dagum discloses the non-speech data comprises gait data, the gait data describing the user’s manner of walking (paragraph [0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize non-speech data comprising gait data, the gait data describing the user’s manner of walking, because these additional biomarkers provide objective measures that can be passively collected, among other advantages, as taught by Dagum (paragraph [0021]).

In regard to claim 7, Vasquez-Correa and Themistocleous do not disclose the non-speech data comprises activity data, the activity data captured from one or more sensors associated with the user.
Dagum discloses the non-speech data comprises activity data, the activity data captured from one or more sensors associated with the user (sensor data including information related to user activity, paragraph [0055]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize non-speech data comprising activity data, the activity data captured from one or more sensors associated with the user, because these additional biomarkers provide objective measures that can be passively collected, among other advantages, as taught by Dagum (paragraph [0021]).

In regard to claim 8, Vasquez-Correa and Themistocleous do not disclose the non-speech data comprises driving-related data associated with the user’s driving history.
Dagum discloses the non-speech data comprises driving-related data associated with the user’s driving history (motion artifacts from driving, paragraph [0068]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize non-speech data comprising driving-related data associated with the user’s driving history, because these additional biomarkers provide objective measures that can be passively collected, among other advantages, as taught by Dagum (paragraph [0021]).

In regard to claim 9, Vasquez-Correa and Themistocleous do not disclose the non-speech data comprises medication information for the user.
Dagum discloses the non-speech data comprises medication information for the user (medications, paragraph [0067]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize non-speech data comprising medication information for the user, because these additional biomarkers provide objective measures that can be passively collected, among other advantages, as taught by Dagum (paragraph [0021]).

In regard to claim 10, Vasquez-Correa and Themistocleous do not disclose the non-speech data comprises data describing one or more motor functions for the user.
Dagum discloses the non-speech data comprises data describing one or more motor functions for the user (locomotion, paragraph [0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize non-speech data comprising data describing one or more motor functions for the user, because these additional biomarkers provide objective measures that can be passively collected, among other advantages, as taught by Dagum (paragraph [0021]).


Claim(s) 12-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez-Correa, in view of Themistocleous, and further in view of Shriberg et al. (U.S. Patent Application Pub. No. 2022/0328064, hereinafter “Shriberg”).
In regard to claim 12, Vasquez-Correa and Themistocleous are silent regarding querying the user.
Shriberg discloses an apparatus for soliciting speech samples from a user for assessment, wherein the apparatus is configured to record  an audio clip of verbal responses by the user in response to at least one query (input speech is obtained using one or more queries, paragraph [0084]); and
to create the recorded audio clip from a plurality of shorter audio clips of the user’s verbal responses to the at least one query by combining the plurality of shorter audio clips into a single audio clip having a length that satisfies a threshold length (the speech is collected as a plurality of audio snippets, paragraph [0169]; where the system elicits longer responses from the user if the responses are not long enough, paragraph [0205]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to create the recorded audio clip from a plurality of shorter audio clips of the user’s verbal responses to the at least one query by combining the plurality of shorter audio clips into a single audio clip having a length that satisfies a threshold length, because it would allow the system to determine if the speech provided by the user was unacceptable for analysis, as suggested by Shriberg (paragraph [0205]).

In regard to claim 13, Vasquez-Correa and Themistocleous are silent regarding querying the user.
Shriberg discloses the code is further executable by the processor to present multiple prompts to the user to elicit the plurality of shorter audio clips (a plurality of queries are presented to the user to collect a plurality of audio snippets, paragraphs [0085] and [0169]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to present multiple prompts to the user to elicit the plurality of shorter audio clips, because it would elicit more responses from the user when the user had not provided enough speech samples, as taught by Shriberg (paragraph [0205]).

In regard to claim 14, Vasquez-Correa and Themistocleous are silent regarding querying the user.
Shriberg discloses the plurality of audio clips comprise snippets taken of a longer conversation that demonstrate various predefined language characteristics (audio snippets recorded during a conversation, paragraphs [0084] and [0169]; comprising a threshold number of words, paragraph [0205]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize snippets taken of a longer conversation that demonstrate various predefined language characteristics, because it would ensure the collected audio clips were acceptable for analysis, as taught by Shriberg (paragraph [0205]).

In regard to claim 15, Vasquez-Correa and Themistocleous are silent regarding querying the user.
Shriberg discloses the code is further executable by the processor to remove audio clips of the plurality of shorter audio clips that are shorter than a threshold length (when input speech is not long enough, the user a prompted to elicit new responses, paragraph [0205]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to remove audio clips of the plurality of shorter audio clips that are shorter than a threshold length, because it would ensure the collected audio clips were acceptable for analysis, as taught by Shriberg (paragraph [0205]).

In regard to claim 16, Vasquez-Correa and Themistocleous are silent regarding querying the user.
Shriberg discloses the code is further executable by the processor to remove audio clips of the plurality of shorter audio clips that do not demonstrate predefined speech characteristics (when input speech does not contain enough words or has a volume that is too low relative to background noise, the user a prompted to elicit new responses, paragraph [0205]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to remove audio clips of the plurality of shorter audio clips that do not demonstrate predefined speech characteristics, because it would ensure the collected audio clips were acceptable for analysis, as taught by Shriberg (paragraph [0205]).

In regard to claim 17, Vasquez-Correa and Themistocleous are silent regarding querying the user.
Shriberg discloses the predefined speech characteristics comprise at least one of a number of words, a number of syllables, a pronunciation, a tone, a signal-to-noise ratio, and a length of speech (number of words, volume, etc., paragraph [0205]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to remove audio clips of the plurality of shorter audio clips that do not demonstrate predefined speech characteristics, because it would ensure the collected audio clips were acceptable for analysis, as taught by Shriberg (paragraph [0205]).


Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez-Correa, in view of Themistocleous, and further in view of Rutowski et al. (U.S. Patent Application Pub. No. 2024/0087752, hereinafter “Rutowski”).
In regard to claim 18, Vasquez-Correa and Themistocleous do not disclose an x-vector embedding speech model.
Rutowski discloses an apparatus for assessing speech in multiple languages, using  x-vector embedding speech models (paragraph [0079]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize an x-vector embedding speech model for the first speech model, the second speech model, or a combination thereof, because x-vector embedding speech models significantly outperform similar alternatives such as i-vectors, and do not require transcribed data during training.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Vasquez-Correa et al. discloses an additional study of transfer learning to classify patients with different speech disorders in different languages.  Li et al. disclose a method for detecting AD using transfer learning from a first language to a second language.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





BLA 2/24/26
/BRIAN L ALBERTALLI/               Primary Examiner, Art Unit 2656
Read full office action
Prosecution Timeline

Oct 26, 2023
Application Filed
Sep 24, 2025
Non-Final Rejection — §103
Dec 29, 2025
Response Filed
Feb 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/859,660
Patent 12592247
INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING
2y 5m to grant Granted Mar 31, 2026
18/049,984
Patent 12573407
QUICK AUDIO PROFILE USING VOICE ASSISTANT
2y 5m to grant Granted Mar 10, 2026
18/142,926
Patent 12574386
DISTRIBUTED IDENTIFICATION IN NETWORKED SYSTEM
2y 5m to grant Granted Mar 10, 2026
18/431,679
Patent 12572327
CONDITIONALLY ASSIGNING VARIOUS AUTOMATED ASSISTANT FUNCTION(S) TO INTERACTION WITH A PERIPHERAL ASSISTANT CONTROL DEVICE
2y 5m to grant Granted Mar 10, 2026
18/740,292
Patent 12573382
ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

2-3
Expected OA Rounds
82%
Grant Probability
98%
With Interview (+16.5%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 852 resolved cases by this examiner. Grant probability derived from career allow rate.
TECHNIQUES FOR SPEECH LANGUAGE MODEL TRAINING AND APPLICATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email