Office Action Analysis: 18771290 — ELECTRONIC APPARATUS AND METHOD FOR CLASSIFYING COGNITIVE IMPAIRMENT BASED ON LARGE LANGUAGE MODEL

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant claims the benefit of Korean Patent Application No. 10-2024-0008329, filed on January 18, 2024. Claims 1-20 have been afforded the benefit of the January 18, 2024 filing date.

Information Disclosure Statement
The IDS dated July 12, 2024 and December 12, 2025 have been considered and placed in the application file.  
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are “interface module”, “first extraction unit”, “second extraction unit”, and “classification module” in claim 1. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20] rejected under 35 U.S.C. 101 because the claimed invention is directed to a mental
process without significantly more.
Regarding claim 1, 8, 15 recite a system, method and an software process for generating a first prompt related to a fluency evaluation request based on transcribed text corresponding to utterance voice of a user; [person can generate an prompt]; acquiring evaluation feedback related to fluency in response to the first prompt through a large language model; [person can evaluate feedback]; extracting acoustic features and linguistic features based on the utterance voice, the transcribed text, and the evaluation feedback; [person can evaluate extract these features from the user voice, text or feedback];
and classifying a cognitive impairment group to which the user belongs based on the acoustic features and the linguistic features; [person can classify if user has cognitive impairment]. 
As described above, these limitations can be carried out as a series of mental steps.
This judicial exception is not integrated into a practical application because the only additional
elements recited are a large language model, an interface module, first extraction unit, second extraction unit, classification module, a memory, a processor, and a computer program executing on a device, and these additional elements are nothing more than instructions to apply the mental process using a general-purpose software model and general-purpose hardware.
	These claims do not include additional elements that are sufficient to amount to significantly
more than the judicial exception because, as described above, the only additional elements recited are a large language model, an interface module, first extraction unit, second extraction unit, classification module, a memory, a processor, and a computer program executing on device, and these additional elements are nothing more than instructions to apply the mental process using general-purpose software and hardware.
	Regarding claim 2 and 9 recite a system and a method wherein the utterance voice includes a voice of the user that describes a painting or photo. [person can speak without a system and describe a painting or photo]. 
	These additional limitations do not prevent the process from being carried out as a mental
process. 
	Regarding claim 3, 10 and 18 recite an apparatus, method, software process generating of the first prompt includes: generating the first prompt including the transcribed text and at least one evaluation criterion text among explanatory phrase related to the transcribed text, [person can generate a prompt with a criteria];  a score range and a score unit of the fluency evaluation [person can provide a score based on the fluency], and example transcribed text rated with a plurality of scores within the score range[person can assign scores to example transcribed text]. 
As described above, these limitations can be carried out as a series of mental steps.
This judicial exception is not integrated into a practical application because the only additional
elements recited are an electronic apparatus and these additional elements are nothing more than instructions to apply the mental process using the hardware.
The claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception because, as described above, the only additional elements
recited are an electronic system, and these additional elements are nothing more than instructions to apply the mental process using the hardware.
	Regarding claim 4 and 11 recite electronic apparatus of claim 3, wherein the plurality of scores include a lowest score and a highest score within the score range [person can determine a score whether it should be rated as high or low]. 
These limitations can be carried out as a series of mental steps.
This judicial exception is not integrated into a practical application because the only additional
elements recited are an electronic system device, and these additional elements are
nothing more than instructions to apply the mental process using the hardware.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as described above, the only additional elements recited are an electronic system executing on a device, and these additional elements are nothing
more than instructions to apply the mental process using the hardware.
	Regarding claim 5, 12, 19 recite an apparatus, method, software process that is configured to generate the first prompt allowing acquisition of fluency evaluation feedback related to at least one of inclusion of key elements, consistency, term repetition, and wording accuracy in the transcribed text [person can prompt based on any of these elements, like listening if the user is repeating a word].
These limitations can be carried out as a series of mental steps.
This judicial exception is not integrated into a practical application because the only additional
elements recited are an electronic system device, and these additional elements are
nothing more than instructions to apply the mental process using the hardware.
These claims do not include additional elements that are sufficient to amount to significantly
more than the judicial exception because, as described above, the only additional elements recited are
an interface module and this additional element is nothing more than instructions to apply the mental process using general-purpose software and hardware.
	Regarding claim 6 and 13 recite a system and a method wherein the classification module is configured to classify the user into one of a dementia group, a mild cognitive impairment group, or a normal group based on the linguistic features and the acoustic features. [This classification of whether a person has a mild cognitive issue or is normal can be done by a person]. 
As described above, these limitations can be carried out as a series of mental steps.
These claims don’t describe any additional elements. Thus, they don’t describe a practical
application or significantly more than the mental process.
Regarding claim 7, 14, 20  recite a system, method, software process  wherein the interface module is configured to generate a test opinion that summarizes a result of the classification on the cognitive impairment and the evaluation feedback through the large language model. [This summary of the results can be done by a person]. 
As described above, these limitations can be carried out as a series of mental steps.
This judicial exception is not integrated into a practical application because the only additional
elements recited are an interface module and a large language model, and these additional elements are nothing more than instructions to apply the mental process using the hardware and software.
These claims do not include additional elements that are sufficient to amount to significantly
more than the judicial exception because, as described above, the only additional elements recited are
an interface module and a large language model and these additional elements are nothing more than instructions to apply the mental process using general-purpose software and hardware.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims [1, 5, 6, 8, 12, 13, 15, 19] are rejected under 35 U.S.C. 103 as being unpatentable over Agbavor( Agbavor, F. and Liang, H., 2022. Predicting dementia from spontaneous speech using large language models. PLOS digital health, 1(12), p.e0000168.) in view of Yao (US 2024/0321131 A).
Regarding Claim 1, Agbavor teaches an electronic apparatus comprising: 
MMSE score. For the classification task, either the acoustic features or GPT-3 embeddings (Ada and Babbage) or both are fed into a machine-learning model such as support vector classifier (SVC), logistic regression (LR) or random forest (RF). As a comparison, we further perform finetuning on the GPT-3 model to see if there is any advantage over the GPT-3 embedding”]; [Page 3, lines 12-17 “In this section we present the AD classification results between AD and non-AD (or healthy control) subjects based on different features: our proposed GPT-3 based text embeddings, the acoustic features, and their combination. We also benchmark the GPT-3 based text embeddings against the mainstream fine-tuning approach. We show that the GPT-3 based text embeddings considerably outperform both the acoustic feature-based approach and the fine-tuned model”]; [ Page 5, lines 1-4 “To evaluate whether the acoustic features and the text embeddings can provide complementary information to augment the AD classification, we combine the acoustic features from speech audio data and the GPT-3 based text embeddings by simply concatenating them”].
However, Agbavor does not explicitly teach an electronic apparatus comprising an 
interface module configured to generate a first prompt related to a fluency evaluation request based on utterance voice of a user, and acquire evaluation feedback related to fluency based on the first prompt through a large language model.
But Yao teaches teach electronic apparatus comprising an interface module configure
 to generate a first prompt related to a fluency evaluation request based on utterance voice of a user, and acquire evaluation feedback related to fluency based on the first prompt through a large language model - [0030 “In one embodiment, generative AI engine 102 can
include an LLM. This LLM can generate coherent and contextually relevant text, which can form the basis of the conversation content for chatbot system 100, and answer questions based on a given context or passage of text. In general, generic LLMs can answer or attempt to answer any
question or request (which are referred to as "prompt"). LLMs are typically not entirely intuitive about the exact nuances or specifics the user might be interested in, such as the dialogue carried out by a trained language learning partner”]; [0031 “In order to induce the LLM in generative AI engine 102 to provide the desired response, or to initiate a conversation with the appropriate messages, one aspect of the present disclosure uses a prompt engine 104 to generate the appropriate prompts to the LLM, such that the LLM can provide a desired output. Specifically, prompt engine 104 can generate prompts to cause generative AI engine 102 to initiate a dialogue session or to respond to a user-provided message”]; [0037 “In some aspects, the prompt engine is responsible for creating, based on the user audio messages, prompts to
the generative AI engine to induce the AI engine to provide desired responses”]; [0070 “Note that if the user pronounces any of those words incorrectly, the audio-text conversion engine can recognize and convert such mispronounced word to an incorrectly spelled word, which is then included in the prompt sent to the generative AI engine. For example, if the user pronounces "Gracias" as "Graziaz," the prompt engine can include this mispronounced (which results in mis-spelling) word in the prompt to the generative AI engine. As a result, the generative AI engine can provide a message to help the user correct their mistake”]; [0144 “The methods and processes described herein can be executed by and/or included in hardware modules or apparatus. These modules or apparatus may include, but are not limited to, an application-specific integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), a dedicated or shared processor that executes a particular software module or a piece of code at a particular time, and/or other programmable-logic devices now known or later developed. When the hardware modules or apparatus are activated, they perform the methods and processes included within them”]; [0141 “FIG. 8 presents an exemplary computing system that facilitates an AI-based language learning partner chatbot, in accordance with an aspect of the present disclosure. In this example, a computing system 800 can include a processor 802, a memory device 804, and a storage device 808. Computing system 800 can also include a touch screen 812 which can display information and receive user input via touches, and an audio device 814 which can receive and transmit audio signals”]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor with the teaching of Yao because incorporating the prompt based language model processing of Yao would enable a complete end-to-end system in which speech is processed through prompt base model feedback and feature extractions. This would improve interpretability of results, enhance diagnostic insight and provide more accurate assessments. 
	Regarding claim 5,  Agbavor does not explicitly teach the electronic apparatus of claim 1, wherein the interface module is configured to generate the first prompt allowing acquisition of fluency evaluation feedback related to at least one of inclusion of key elements, consistency, term repetition, and wording accuracy in the transcribed text.
	However, Yao teaches the electronic apparatus of claim 1, wherein the interface module is configured to generate the first prompt allowing acquisition of fluency evaluation feedback related to at least one of inclusion of key elements, consistency, term repetition, and wording accuracy in the transcribed text- [ 0013 “Subsequently, the prompt engine can receive a user
response, which in this case is expected to be a repetition of the above sentence, "Me gusta escuchar musica cuando estudio." Note that the system can use low-tolerance voice to- text conversation to convert the user's audio response into a text message, and then generate the following prompt to the AI engine.”]; [0070 “Note that if the user pronounces any of those words incorrectly, the audio-text conversion engine can recognize and convert such mispronounced word to an incorrectly spelled word, which is then included in the prompt sent to the generative AI engine. For example, if the user pronounces "Gracias" as "Graziaz," the prompt engine can include this mispronounced (which results in mis-spelling) word in the prompt to the generative AI engine. As a result, the generative AI engine can provide a message to help the user correct their mistake”]; [0120 “One of the key features of the present system is that the chatbot can evaluate the user's response and provide specific feedback to help the user correct any potential mistakes. To do so, the prompt engine can generate optimized
prompts based on the user's response, which can cause the generative AI engine to provide specific feedback for the user”].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor with the teaching of Yao because incorporating the fluency evaluation feedback based on word accuracy or repetition would enhance the prompt generation, and in turn would improve the feedback which would then lead to improved classification results. 
	Claim 6, Agbavor teaches the electronic apparatus of claim 1, wherein the classification module is configured to classify the user into one of a dementia group, a mild cognitive impairment group, or a normal group based on the linguistic features and the acoustic features – {Page 11 , lines 4-11 “AD vs non-AD classification. The AD classification task consists of creating a binary classification model to distinguish between AD and non-AD speech. The model may use acoustic features from speech, linguistic features (embeddings) from transcribed speech, or both. As such, we use (1) the acoustic features extracted from speech audio data, (2) the text embeddings from each GPT-3 base model (Babbage or Ada), and (3) the combination of both as inputs for three different kinds of commonly used machine learning models, including Support Vector Classifier (SVC), Random Forest (RF), and Logistic Regression (LR). We use the scikit-learn library for the implementation of these models [42]. The hyperparameters for each
[ Page 11, line 18- 23  “MMSE is perhaps the most common measure for assessing the severity of AD. We perform regression analysis using both the acoustic features and text embeddings from GPT-3 (Ada and Babbage) to predict the MMSE score. The scores normally range from 0 to 30, with scores of 26 or higher being considered normal [3]. A score of 20 to 24 suggests mild dementia, 13 to 20 suggests moderate dementia, and less than 12 indicates severe dementia. As such, the prediction is clipped to a range between 0 and 30”].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor with the teaching of Yao because incorporating prompt generation would enable the language model to provide targeted feedback, thereby improving interpretability of classification results and diagnosis. 
	Regarding claim 8, Agbavor teaches a method of classifying cognitive 
impairment, the method comprising: Speech and Music Interpretation by Large-space Extraction), a widely used open-source toolkit for audio feature extraction and classification of speech and music signals [39]”]; [ Page 11, lines 5- 9 “The model may use acoustic features from speech, linguistic features (embeddings) from transcribed speech, or both. As such, we use (1) the acoustic features extracted from speech audio data, (2) the text embeddings from each GPT-3 base model (Babbage or Ada), and (3) the combination of both as inputs for three different kinds of commonly used machine learning models, including Support Vector Classifier (SVC), Random Forest (RF), and Logistic Regression (LR)”]; [Page 3, lines 1-10 “We report the results from two tasks which include AD vs non-AD classification and AD severity prediction using a subject’s MMSE score. For the classification task, either the acoustic features or GPT-3 embeddings (Ada and Babbage) or both are fed into a machine-learning model such as support vector classifier (SVC), logistic regression (LR) or random forest (RF). As a comparison, we further perform finetuning on the GPT-3 model to see if there is any advantage over the GPT-3 embedding”]; [Page 3, lines 12-17 “In this section we present the AD classification results between AD and non-AD (or healthy control) subjects based on different features: our proposed GPT-3 based text embeddings, the acoustic features, and their combination. We also benchmark the GPT-3 based text embed dings against the mainstream fine-tuning approach. We show that the GPT-3 based text embeddings considerably outperform both the acoustic feature-based approach and the fine-tuned model”]; [ Page 5, lines 1-4 “To evaluate whether the acoustic features and the text embeddings can provide complementary information to augment the AD classification, we combine the acoustic features from speech audio data and the GPT-3 based text embeddings by simply concatenating them”].
However, Agbavor does not explicitly teach generating a first prompt related to a 
fluency evaluation request based on transcribed text corresponding to utterance voice of a user; acquiring evaluation feedback related to fluency in response to the first prompt through a large language model;
But Yao teaches generating a first prompt related to a fluency evaluation request based 
on transcribed text corresponding to utterance voice of a user; acquiring evaluation feedback related to fluency in response to the first prompt through a large language model - [0030 “In one embodiment, generative AI engine 102 can include an LLM. This LLM can generate coherent and contextually relevant text, which can form the basis of the conversation content for chatbot system 100, and answer questions based on a given context or passage of text. In general, generic LLMs can answer or attempt to answer any question or request (which are referred to as "prompt"). LLMs are typically not entirely intuitive about the exact nuances or specifics the user might be interested in, such as the dialogue carried out by a trained language learning partner”]; [0031 “In order to induce the LLM in generative AI engine 102 to provide the desired response, or to initiate a conversation with the appropriate messages, one aspect of the present disclosure uses a prompt engine 104 to generate the appropriate prompts to the LLM, such that the LLM can provide a desired output. Specifically, prompt engine 104 can generate prompts to cause generative AI engine 102 to initiate a dialogue session or to respond to a user-provided message”]; [0037 “In some aspects, the prompt engine is responsible for creating, based on the user audio messages, prompts to the generative AI engine to induce the AI engine to provide desired responses”]; [0070 “Note that if the user pronounces any of those words incorrectly, the audio-text conversion engine can recognize and convert such mispronounced word to an incorrectly spelled word, which is then included in the prompt sent to the generative AI engine. For example, if the user pronounces "Gracias" as "Graziaz," the prompt engine can include this mispronounced (which results in mis-spelling) word in the prompt to the generative AI engine. As a result, the generative AI engine can provide a message to help the user correct their mistake”]; [0144 “The methods and processes described herein can be executed by and/or included in hardware modules or apparatus. These modules or apparatus may include, but are not limited to, an application-specific integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), a dedicated or shared processor that executes a particular software module or a piece of code at a particular time, and/or other programmable-logic devices now known or later developed. When the hardware modules or apparatus are activated, they perform the methods and processes included within them”]; [0141 “FIG. 8 presents an exemplary computing system that facilitates an AI-based language learning partner chatbot, in accordance with an aspect of the present disclosure. In this example, a computing system 800 can include a processor 802, a memory device 804, and a storage device 808. Computing system 800 can also include a touch screen 812 which can display information and receive user input via touches, and an audio device 814 which can receive and transmit audio signals”]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor with the teaching of Yao because incorporating the prompt based language model processing of Yao would enable a complete end-to-end system in which speech is processed through prompt base model feedback and feature extractions. This would improve interpretability of results, enhance diagnostic insight and provide more accurate assessments. 
	Regarding claim 12, the method of claim 8, the method of claim 8, wherein the generating of the first prompt includes: generating the first prompt that allows acquisition of fluency evaluation feedback related to at least one of inclusion of key elements, consistency, term repetition, and wording accuracy in the transcribed text. Claim 12 is rejected for the same reasons as claim 5.
	Regarding claim 13, the method of claim 8, wherein the classifying of the cognitive impairment group includes: classifying the user into one of a dementia group, a mild cognitive impairment group, or a normal group based on the linguistic features and the acoustic features according to the transcribed text and the evaluation feedback. Claim 13 is rejected for the same reasons as claim 6. 
	Regarding claim 15,  Agbavor does teaches an electronic apparatus comprising: a memory in which at least one instruction related to an artificial intelligence model is stored; and a processor functionally connected to the memory, the processor executing the at least one instruction to: embedding from GPT-3 [26], which can be readily accessed via OpenAI Application Programming Interface (API). The OpenAI API, powered by a family of models with different capabilities and price points, can be applied to virtually any task that involves understanding or generating natural language or code. We use the GPT-3 for text embedding, which is powerful representation of the semantic meaning of a piece of text. We benchmark our GPT-3 embedding approach against both the conventional acoustic feature-based approach (Fig 1A) and the prevailing fine-tuned model];[ Page 10, lines 21- 24 “In this work, acoustic features are extracted directly from speech using OpenSMILE (open-source Speech and Music Interpretation by Large-space Extraction), a widely used open-source toolkit for audio feature extraction and classification of speech and music signals [39]”]; [ Page 11, lines 5- 9 “The model may use acoustic features from speech, linguistic features (embeddings) from transcribed speech, or both. As such, we use (1) the acoustic features extracted from speech audio data, (2) the text embeddings from each GPT-3 base model (Babbage or Ada), and (3) the combination of both as inputs for three different kinds of commonly used machine learning models, including Support Vector Classifier (SVC), Random Forest (RF), and Logistic Regression (LR)”]; [Page 3, lines 1-10 “We report the results from two tasks which include AD vs non-AD classification and AD severity prediction using a subject’s MMSE score. For the classification task, either the acoustic features or GPT-3 embeddings (Ada and Babbage) or both are fed into a machine-learning model such as support vector classifier (SVC), logistic regression (LR) or random forest (RF). As a comparison, we further perform finetuning on the GPT-3 model to see if there is any advantage over the GPT-3 embedding”]; [Page 3, lines 12-17 “In this section we present the AD classification results between AD and non-AD (or healthy control) subjects based on different features: our proposed GPT-3 based text embeddings, the acoustic features, and their combination. We also benchmark the GPT-3 based text embed dings against the mainstream fine-tuning approach. We show that the GPT-3 based text embeddings considerably outperform both the acoustic feature-based approach and the fine-tuned model”]; [ Page 5, lines 1-4 “To evaluate whether the acoustic features and the text embeddings can provide complementary information to augment the AD classification, we combine the acoustic features from speech audio data and the GPT-3 based text embeddings by simply concatenating them”].
However, Agbavor does not explicitly teach an electronic apparatus comprising a 
memory in which at least one instruction related to an artificial intelligence model is stored; and a processor functionally connected to the memory, the processor executing the at least one instruction to: generate a first prompt related to a fluency evaluation request based on utterance voice of a user, and acquire evaluation feedback related to fluency based on the first prompt through a large language model.
But Yao teaches electronic apparatus comprising a memory in which at least one
 instruction related to an artificial intelligence model is stored; and a processor functionally connected to the memory, the processor executing the at least one instruction to: generate a first prompt related to a fluency evaluation request based on utterance voice of a user, and acquire evaluation feedback related to fluency based on the first prompt through a large language model- [014 “FIG. 8 presents an exemplary computing system that facilitates an AI-based language learning partner chatbot, in accordance with an aspect of the present disclosure. In this example, a computing system 800 can include a processor 802, a memory device 804, and a storage device 808. Computing system 800 can also include a touch screen 812 which can display information and receive user input via touches, and an audio device 814 which can receive and transmit audio signals”]; [0142 “Storage device 808 can store data 230 as well as computer-executable instructions which when executed by processor 802 can cause processor 802 to implement a number of functions and features. - [0030 “In one embodiment, generative AI engine 102 can include an LLM. This LLM can generate coherent and contextually relevant text, which can form the basis of the conversation content for chatbot system 100, and answer questions based on a given context or passage of text. In general, generic LLMs can answer or attempt to answer any question or request (which are referred to as "prompt"). LLMs are typically not entirely intuitive about the exact nuances or specifics the user might be interested in, such as the dialogue carried out by a trained language learning partner”]; [0031 “In order to induce the LLM in generative AI engine 102 to provide the desired response, or to initiate a conversation with the appropriate messages, one aspect of the present disclosure uses a prompt engine 104 to generate the appropriate prompts to the LLM, such that the LLM can provide a desired output. Specifically, prompt engine 104 can generate prompts to cause generative AI engine 102 to initiate a dialogue session or to respond to a user-provided message”]; [0037 “In some aspects, the prompt engine is responsible for creating, based on the user audio messages, prompts to the generative AI engine to induce the AI engine to provide desired responses”]; [0070 “Note that if the user pronounces any of those words incorrectly, the audio-text conversion engine can recognize and convert such mispronounced word to an incorrectly spelled word, which is then included in the prompt sent to the generative AI engine. For example, if the user pronounces "Gracias" as "Graziaz," the prompt engine can include this mispronounced (which results in mis-spelling) word in the prompt to the generative AI engine. As a result, the generative AI engine can provide a message to help the user correct their mistake”]; [0144 “The methods and processes described herein can be executed by and/or included in hardware modules or apparatus. These modules or apparatus may include, but are not limited to, an application-specific integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), a dedicated or shared processor that executes a particular software module or a piece of code at a particular time, and/or other programmable-logic devices now known or later developed. When the hardware modules or apparatus are activated, they perform the methods and processes included within them”]; [0141 “FIG. 8 presents an exemplary computing system that facilitates an AI-based language learning partner chatbot, in accordance with an aspect of the present disclosure. In this example, a computing system 800 can include a processor 802, a memory device 804, and a storage device 808. Computing system 800 can also include a touch screen 812 which can display information and receive user input via touches, and an audio device 814 which can receive and transmit audio signals”]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor with the teaching of Yao because incorporating the prompt based language model processing of Yao would enable a complete end-to-end system in which speech is processed through prompt base model feedback and feature extractions. This would improve interpretability of results, enhance diagnostic insight and provide more accurate assessments. 
Regarding claim 19, the electronic apparatus of claim 15, wherein the processor executes the at least one instruction to: generate the first prompt allowing acquisition of fluency evaluation feedback related to at least one of inclusion of key elements, consistency, term repetition, and wording accuracy in the transcribed text.  Claim 19 is rejected for the same reasons as claim 5.
Claims [2, 7, 9 ,14, 17, 20] are rejected under 35 U.S.C. 103 as being unpatentable over Agbavor(Agbavor, F. and Liang, H., 2022. Predicting dementia from spontaneous speech using large language models. PLOS digital health, 1(12), p.e0000168. ) in view of Yao (US 2024/0321131 A) and in further view of Rentoumi (US. Patent No. 11,114,113 B2).
Regarding claim 2, Agbavor in view of Yao does not teach the electronic apparatus of 
claim 1, wherein the utterance voice includes a voice of the user that describes a painting or photo. 
However, Rentoumi teaches electronic apparatus of  claim 1, wherein the utterance 
voice includes a voice of the user that describes a painting or photo –[“Column 3, lines 31-44 “According to the present disclosure, a system for early detection of Alzheimer's disease is provided. The system generates a prediction of whether a patient has Alzheimer's disease, another form of dementia, or other neurodegenerative disease based on an occurrence of speech, such as the patient's speech in response to a task or free form speech. For example, the patient may be shown a picture and asked to describe the picture, be asked to retell a popular short story or fairy tale, or be asked to describe how to perform a specific task. The patient's speech is recorded and a transcript is generated of the speech for analysis. The transcript may be generated using available speech to text applications or, in some implementations, may be generated by hand”]. 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor in view of Yao with the teachings of Rentoumi because it would automatically analyze a user’s spoken description and generating contextual prompts or feedback based on spoken description. The visual stimulus would help generate intelligent prompts or feedback, thereby improving classification accuracy.  
Regarding Claim 7, Agbavor in view of Yao teaches the electronic apparatus of claim 1, text embeddings by simply concatenating them. Table 4 shows the results for both the 10-fold CV and evaluation on the test set for different machine learning models”]; [Page 11, lines 40-42 “We also report the averaged AUC scores, along with the corresponding standard deviations over the 10-fold CV when comparing the different models using acoustic features, GPT-3 embeddings (both Ada and Babbage) for AD classification”]; [Page 3, lines 1-11 “We report the results from two tasks which include AD vs non-AD classification and AD severity prediction using a subject’s MMSE score. For the classification task, either the acoustic features or GPT-3 embeddings (Ada and Babbage) or both are fed into a machine-learning model such as support vector classifier (SVC), logistic regression (LR) or random forest (RF). As a comparison, we further perform finetuning on the GPT-3 model to see if there is any advantage over the GPT-3 embedding. For the AD severity prediction, we perform the regression analysis based on both the acoustic features and GPT-3 embeddings to estimate a subject’s MMSE score using three regression models, i.e., support vector regressor (SVR), ridge regression (Ridge) and random forest regressor (RFR)”].
	However, Agbavor in view of Yao doesn’t teach the interface module is configured to generate a test opinion.
	But Rentoumi teaches the interface module is configured to generate a test opinion- [Page 2, lines 29-37 “The example system includes a prediction module including a trained classification model, wherein the trained classification model is trained to generate a prediction of the disease state for a patient based on the speech using the plurality of lingual features extracted from the speech. A communication interface is configured to return the prediction of the disease state and one or more analytics regarding the speech and the lingual features to a user device for display to a user”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor in view of Yao with the teachings Rentoumi because generating the prediction or opinion of the summarized classification results obtained from the transcribed text would allow the language model to show the opinion/prediction to the user. This combination represents predictable use of known techniques to transform classification outputs into user understandable feedback or opinion, thereby improving interpretability, usability, and practical value of the system by enabling meaningful opinion/prediction summaries of the cognitive assessment results. 
Regarding claim 9, the method of claim 8, wherein the utterance voice includes voice of the user that describes a painting or photo. Claim 9 is rejected for the same reasons as claim 2. 
Regarding claim 14, the method of claim 8, further comprising: summarizing a result of the classification on the cognitive impairment and the evaluation feedback through the large language model to generate a test opinion. Claim 14 is rejected for the same reasons as claim 7.
Regarding claim 17, Agbavor in view of Yao does not teach the electronic apparatus of claim 15, further comprising a communication module, wherein the processor executes the at least one instruction to: acquire the utterance voice from an external electronic apparatus through the communication module and provide the test opinion to the external electronic apparatus through the communication module. 
However, Rentoumi teaches the interface module is configured to generate a test opinion- [Page 2, lines 29-37 “The example system includes a prediction module including a trained classification model, wherein the trained classification model is trained to generate a prediction of the disease state for a patient based on the speech using the plurality of lingual features extracted from the speech. A communication interface is configured to return the prediction of the disease state and one or more analytics regarding the speech and the lingual features to a user device for display to a user”]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor in view of Yao with the teachings Rentoumi because this would allow the generating of a prediction or opinion and for the results to be displayed to a user via a communication module.  This combination represents predictable use of known techniques to transform outputs into user understandable feedback or opinion, thereby improving interpretability, usability, and practical value of the system by enabling meaningful opinion/prediction results. 
Regarding claim 20, The electronic apparatus of claim 15, wherein the processor executes the at least one instruction to: summarize a result of the classification on the cognitive impairment and the evaluation feedback through the large language model to generate a test opinion. Claim 20 is rejected for the same reasons as claim 7. 
Claims [3, 10, 18 ] are rejected under 35 U.S.C. 103 as being unpatentable over Agbavor(Agbavor, F. and Liang, H., 2022. Predicting dementia from spontaneous speech using large language models. PLOS digital health, 1(12), p.e0000168. ) in view of Yao (US 2024/0321131 A) and in further view of Weston (US2025/0132036A1).
 	Regarding claim 3,  Agbavor in view of Yao does not teach the electronic apparatus of claim 1, wherein the interface module is configured to generate the first prompt including the transcribed text and at least one evaluation criterion text among an explanatory phrase related to the transcribed text, a score range and a score unit of the fluency evaluation, and example transcribed text associated with a plurality of scores within the score range. 
However, Weston teaches the interface module is configured to generate the first 
prompt including the transcribed text and at least one evaluation criterion text among an explanatory phrase related to the transcribed text, a score range and a score unit of the fluency evaluation, and example transcribed text associated with a plurality of scores within the score range – [“0133 “In step S506, the SOP encoding 506 and the rating SOP encoding 507 are provided as inputs to a generative ML model 508 along with a transcript 510]; [0136 “As before, these inputs can be provided as inputs to the model in a prompt  optionally with instructions to format the output in a structured form, such as JSON or XML. The generative ML model 508 may be prompted to provide the output in the form of a "rating sheet", although the output could be in various forms of natural language or structured Text"]; [0111 "Segmentation" in this context refers to the grouping or mapping of different parts of a transcription to corresponding questions and answers in template ( e.g., SOP) for performing the clinical assessment”] ; [0113 “The method 400 uses an SOP 402, which is a template for administering a clinical assessment comprising a plurality of sections corresponding to different clinical questions. A first section comprises the "count to five" test described previously. A second section comprises a story recall test, wherein the interviewer tells a patient a story then asks the patient to recall some of the story, or as many aspects of the story as possible”]; [0128” The method 500 uses a similar approach to the methods 200, 300 and 400, wherein an SOP 502 is optionally fed into the SOP encoder 204 to provide an SOP encoding 506 in steps S502 and S504, respectively, as described previously. The SOP 502 is identical to the SOP 402 and provides instructions for administering the story recall test”]; [0129 “In the method 500, a rating SOP 503 is also optionally encoded by the SOP encoder 204 into a rating SOP encoding 507 in steps S502 and S504, respectively”]; [0130 “As shown, the rating SOP encoding 507 comprises an instructions sub-section and a text string entry providing instructions for scoring the task. The instructions can include guidance such as "allow paraphrases". A "story elements" subsection is provided that lists each aspect of the story. The example above has been limited to a single story element ("Allison") for simplicity, though it would be appreciated that further elements may be provided in practice. Guidance for scoring the particular story element is also provided in the form of a "scoring guidance" subsection. An "output_ schema" subsection is included to provide a format for the
story element. The output schema conditions the model to provide its output in a particular format. In this case, 'rating' the story recall means producing a report that includes the
element index and whether or not that element was recalled. A schema, or template, has children (a type and description) that tells you (a) what kind of data should be populated in
this field, and (b) a description of what the field means. A "recalled" section provides a description for the evaluation("whether this element was recalled") and a type of the
evaluation ("boo!"). In other examples, the evaluation could be measured in other ways, such using a scale of 1 to 10 which may be reflected in the "type" subsection"]; [0142 “The rating SOP encoding 507 can include instructions on how to interpret or adjust an evaluation based on disfluencies of the patient”].
 	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Agbavor in view of Yao with the teachings Weston because including at least one evaluation criteria with the transcribed text to generate the prompt would guide the language model to generate outputs consistent with the evaluation criteria, thereby improving accuracy and interpretability.  This would result in improved classification results. 	
	Regarding claim 10, the method of claim 8, wherein the generating of the first prompt includes: generating the first prompt including the transcribed text and at least one evaluation criterion text among explanatory phrase related to the transcribed text, a score range and a score unit of the fluency evaluation, and example transcribed text rated with a plurality of scores within the score range. Claim 10 is rejected for the same reasons as claim 3. 
Regarding claim 18, the electronic apparatus of claim 15, wherein the processor executes the at least one instruction to: generate the first prompt including the transcribed text and at least one evaluation criterion text among an explanatory phrase related to the transcribed text, a score range and a score unit of the fluency evaluation, and example transcribed text rated with a plurality of scores within the score range. Claim 18 is rejected for the same reasons as claim 3. 
Claims [4, 11 ] are rejected under 35 U.S.C. 103 as being unpatentable over Agbavor(Agbavor, F. and Liang, H., 2022. Predicting dementia from spontaneous speech using large language models. PLOS digital health, 1(12), p.e0000168. ) in view of Yao (US 2024/0321131 A) and in further view of Weston (US2025/0132036A1) and in further view of Kurlowicz “The Mini Mental State Examination (MMSE)” (https://cgatoolkit.ca/Uploads/ContentDocuments/MMSE.pdf). 
Regarding claim 4, Agbavor in view of Yao does not tech electronic apparatus of claim 3, wherein the plurality of scores include a lowest score and a highest score within the score range.
But Weston teaches scores i- [0111 "Segmentation" in this context refers to the grouping or mapping of different parts of a transcription to corresponding questions and answers in template ( e.g., SOP) for performing the clinical assessment”]; [0130 “As shown, the rating SOP encoding 507 comprises an instructions sub-section and a text string entry providing instructions for scoring the task. The instructions can include guidance such as "allow paraphrases". A "story elements" subsection is provided that lists each aspect of the story. The example above has been limited to a single story element ("Allison") for simplicity, though it would be appreciated that further elements may be provided in practice. Guidance for scoring the particular story element is also provided in the form of a "scoring guidance" subsection. An "output_ schema" subsection is included to provide a format for the story element. The output schema conditions the model to provide its output in a particular format. In this case, 'rating' the story recall means producing a report that includes the element index and whether or not that element was recalled. A schema, or template, has children (a type and description) that tells you (a) what kind of data should be populated in this field, and (b) a description of what the field means. A "recalled" section provides a description for the evaluation("whether this element was recalled") and a type of the
evaluation ("boo!"). In other examples, the evaluation could be measured in other ways, such using a scale of 1 to 10 which may be reflected in the "type" subsection"]; [0142 “The rating SOP encoding 507 can include instructions on how to interpret or adjust an evaluation based on disfluencies of the patient”];[ [0004 “Once administered, clinical assessments also need
to be evaluated or scored, which is known as rating].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Agbavor in view of Yao with the teachings Weston because scoring the transcribed text used to evaluate will enhance the quality and reliably of the transcribed text which in turn will help the language model generate outputs that are more consistent which then improves accuracy of the classification results. 
However, Weston doesn’t explicitly say scores include a lowest score and a highest score within the score range.
But, Kurlowicz teaches scores include lowest and a highest score – [Page 1 lines 8-10 “The Mini Mental State Examination (MMSE) is a tool that can be used to systematically and thoroughly assess mental status. It is an 11-question measure that tests five areas of cognitive function: orientation, registration, attention and calculation, recall, and language. The maximum score is 30. A score of 23 or lower is indicative of cognitive impairment. The MMSE takes only 5-10 minutes to administer and is therefore practical to use repeatedly and routinely”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Agbavor in view of Yao and in view of Weston with the teachings Kurlowicz because having a plurality of scoring ranging from high to low scores with the transcribed text used will enhance the quality and reliably of the transcribed text which in turn will help the language model generate outputs that are more consistent,  which then improves accuracy of the classification results. 
Regarding claim 11, the method of claim 10, wherein the plurality of scores include a lowest score and a highest score within the score range. Claim 11 is rejected for the same reasons as claim 4.
Claims [16] is rejected under 35 U.S.C. 103 as being unpatentable over Agbavor( Agbavor, F. and Liang, H., 2022. Predicting dementia from spontaneous speech using large language models. PLOS digital health, 1(12), p.e0000168.) in view of Yao (US 2024/0321131 A) and in further view of Luz (Detecting cognitive decline using speech only: The ADReSSo Challenge, arxiv.org/pdf/2104.09356). 
Regarding claim 16, Agbavor teaches the electronic apparatus of claim 15, wherein the processor executes the at least one instruction to: request training data of the artificial intelligence model in relation to an image, utterance voice, are totally 237 speech recordings, with 70/30 split balanced for demographics, resulting in 166 and 71 in the training set and the test set, respectively. In the training set, there are 87 samples from AD subjects and 79 from non-AD (or healthy control) subjects. The datasets were matched so as to avoid potential biases often overlooked in assessment of AD detection methods, including incidences of repetitive speech from the same individual, variations in speech quality, and imbalanced distribution of gender and age. The detailed procedures to match the data demographically according to propensity  scores were described in Luz et al. [21]”]; [ Page 10, lines 43-46 , Page 11, 1-2 “To fine tune our own custom GPT-3 models, we use the OpenAI command-line interface, which is released to the public. We simply follow the instructions about fine-tuning, provided by OpenAI, to prepare the training data that consists of 166 paragraphs, totaling 19,123 words that are used to fine tune one of the base models (Babbage and Ada in our case) with speech transcripts. Tokens used to train a model are relatively cheaper, as billed at 50% of the base prices”]. 
However, Agbavor does not explicitly teach request training data of the artificial intelligence model in relation to a transcribed text of a cognitively impaired patient from the large language model.
But Luz teaches that dataset that is derived from the ADReSSo Challenge also includes transcribed text – [Page 1, column 2, lines 5-11 “The ADReSSo Challenge provides a forum for researchers working on approaches to cognitive decline detection based on speech data to test their existing methods or develop novel approaches on a new shared standardized dataset. The approaches that performed best on last year’s dataset [4] employed features extracted from manual transcripts which were provided along with the audio data [6, 7]”].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Agbavor in view Yao with Luz because incorporating training data feedback in relation to image, voice, and transcribed text of impaired patients and using it to train language model enables data driven answer generation where the language models leverages patterns learned from training data to produce more accurate responses thereby improving diagnostic outputs. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHEZA ABDUL AZIZ whose telephone number is (571)272-9610. The examiner can normally be reached Monday-Friday 7:30am-5pm Alternate Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DANIEL C WASHBURN/               Supervisory Patent Examiner, Art Unit 2657
Read full office action
ELECTRONIC APPARATUS AND METHOD FOR CLASSIFYING COGNITIVE IMPAIRMENT BASED ON LARGE LANGUAGE MODEL

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

ELECTRONIC APPARATUS AND METHOD FOR CLASSIFYING COGNITIVE IMPAIRMENT BASED ON LARGE LANGUAGE MODEL

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email