Office Action Analysis: 18573925 — ARTIFICIAL INTELLIGENCE MODELING FOR MULTI-LINGUISTIC DIAGNOSTIC AND SCREENING OF MEDICAL DISORDERS

Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the present application filed on 08/20/2024.
In the preliminary amendment dated August 20, 2024, the following has occurred: Claims 3, 5-9, 11, 14 and 16-19 have been amended. 
Claims 1-20 are currently pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/22/2023 and 04/04/2024 were filed before the mailing date of the first office action on the merits.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. 

Priority
Acknowledgment is made of Applicant’s claim for priority under 35 U.S.C. § 371 of International Application No. PCT/US2022/035019 filed on 06/24/2022 which claims the benefit of Provisional Application No. 63/214,733 filed in the US on 06/24/2021.  

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 analysis: 
	Claims 1, 12 and 20 are directed to a system, method and a manufacture respectively and therefore all fall into one of the four statutory categories. (Step 1: Yes, the claims fall into one of the four statutory categories). 
Step 2A analysis - Prong one:
The substantially similar independent system, method and computer readable media claims, taking claim 12 as exemplary, recite the following limitations: A method for training a model for real-time patient diagnosis, comprising: receiving, by a processor, audio data and video data of a clinical encounter, the audio data comprising spoken words by an entity and the video data depicting the entity; retrieving, by the processor, clinical data regarding the entity; executing, by the processor, a model using the words of the audio data and the retrieved clinical data regarding the entity as input to output a plurality of clinical diagnoses for the entity; concurrently rendering, by the processor, the corresponding video data and audio data and the plurality of clinical diagnoses via a computing device associated with a user; and storing, by the processor, an indication of a selected clinical diagnosis from the plurality of clinical diagnoses responsive to receiving a selection of the clinical diagnosis at the computing device.
The examiner is interpreting the above bolded limitations as additional elements as further discussed below. The remaining un-bolded limitations above, as drafted, is a process that, under the broadest reasonable interpretation, covers certain methods of organizing human activity (i.e., managing personal behavior including following rules or instructions) but for recitation of generic computer components. That is, other than reciting a method implemented by a processor (computer), the claimed invention amounts to managing personal behavior or interaction between people. For example, but for the additional elements identified/bolded above, this claim encompasses a physician meeting with a patient in a clinic setting, retrieving data about that patient based on their encounter with them, determining multiple diagnoses for the patient and storing one diagnosis selected by the physician in the manner described in the identified abstract idea, supra. The Examiner notes that certain “method[s] of organizing human activity” includes a person’s interaction with a computer (see MPEP 2106.04(a)(2)(II)). If a claim limitation, under its broadest reasonable interpretation, covers managing personal behavior or interactions between people but for the recitation of generic computer components, then it falls within the “certain methods of organizing human activity” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. (Step 2A – Prong 1: Yes, the claims are abstract). 
Step 2A analysis - Prong two:
Claims 1, 12 and 20 recite additional elements beyond the abstract idea. Claim 1 recites a model, a computer, a processor, a memory, a network interface, audio data, video data and a computing device associated with a user.  Claim 12 recites a processor, audio data, video data, a model, and a computing device associated with a user.  Claim 20 recites non-transitory computer readable medium, instructions, a processor, a computer, audio data, video data, a model, and a computing device associated with a user. Examiner notes that the recited instructions appear to be purely software. 
This judicial exception is not integrated into a practical application. In particular, the claims recite a model, a computer, a processor, a memory, a network interface, a computing device associated with a user, non-transitory computer readable medium, and instructions which are recited at a high-level of generality (i.e., as a generic processor performing generic computer functions) such that it amounts to no more than mere instructions to apply the exceptions using a generic computer component. For example, Applicant’s specification explains that the computing device reads and executes software applications, receives and stores data, displays data, etc. (see Applicant’s specification paras 92-93, 96-97). 
Further, the additional elements of (1) audio data and (2) video data are each being interpreted as insignificant extra-solution activity. These limitations are recited at a high level of generality such that they amount to no more than mere data gathering, which is a form of extra-solution activity and therefore do not impose any meaningful limits on the claimed invention. MPEP 2106.04(d)(I) indicates that extra-solution data gathering activity cannot provide a practical application. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application.
Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, Claims 1, 12 and 20 are directed to an abstract idea without practical application. (Step 2A – Prong 2: No, the additional claimed elements are not integrated into a practical application). 
Step 2B analysis: 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using a model, a computer, a processor, a memory, a network interface, a computing device associated with a user, non-transitory computer readable medium, and instructions to perform the noted steps amounts to no more than mere instructions to apply the exception using a generic computer component. Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. The collective functions appear to be implemented using conventional computer systemization. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (“significantly more”).
Also, as discussed above with respect to integration of the abstract idea into a practical application, the additional elements of (1) audio data and (2) video data were considered extra-solution activity. This has been re-evaluated under the “significantly more” analysis and determined to be well-understood, routine, conventional activity in the field. 
The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity: i) receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); but see DDR Holdings, LLC v. Hotels.com, L.P., 773 F.3d 1245, 1258, 113 USPQ2d 1097, 1106 (Fed. Cir. 2014) ("Unlike the claims in Ultramercial, the claims at issue here specify how interactions with the Internet are manipulated to yield a desired result‐‐a result that overrides the routine and conventional sequence of events ordinarily triggered by the click of a hyperlink." (emphasis added)); iv) storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93. See MPEP §2106.05(d)(II). 
This listing is not meant to imply that all computer functions are well‐understood, routine, conventional activities, or that a claim reciting a generic computer component performing a generic computer function is necessarily ineligible. Courts have held computer‐implemented processes not to be significantly more than an abstract idea (and thus ineligible) where the claim as a whole amounts to nothing more than generic computer functions merely used to implement an abstract idea, such as an idea that could be done by a human analog (i.e., by hand or by merely thinking). On the other hand, courts have held computer-implemented processes to be significantly more than an abstract idea (and thus eligible), where generic computer components are able in combination to perform functions that are not merely generic. See MPEP §2106.05(d)(II) – emphasis added. 
Here, the steps are receiving or transmitting data over a network; and storing and retrieving information in memory – all of which have been recognized by the courts as well-understood, routine and conventional functions. See MPEP 2106.05(d)(II). 
The claims are directed to an abstract idea with additional generic computer elements that do not add meaningful limitations to the abstract idea because they require no more than a generic computer to perform generic computer functions that are well-understood, routine, and conventional activities previously known in the industry.  
For the next step of the analysis, it must be determined whether the limitations present in the claims represent a patent-eligible application of the abstract idea.  A claim directed to a judicial exception must be analyzed to determine whether the elements of the claim, considered both individually and as an ordered combination are sufficient to ensure that the claim as a whole amounts to significantly more than the exception itself.  
 For the role of a computer in a computer implemented invention to be deemed meaningful in the context of this analysis, it must involve more than performance of well-understood, routine, and conventional activities previously known to the industry. Further, the mere recitation of a generic computer cannot transform a patent ineligible abstract idea into a patent-eligible invention. See MPEP 2106.05(d). 
Applicant’s specification discloses the following: 
Applicant describes embodiments of the disclosure at a very high level to include the use of a wide variety of processors/computing devices, networks, servers, storage mediums, machine learning models, etc. (see Applicant’s spec paras 88, 91-92, 99-101, 155). For example, Applicant’s specification states that “The analytics server 410a may be any computing device comprising a processor and non-transitory machine-readable storage capable of executing the various tasks and processes described herein.” (see para 92). 
Generic computer components recited as performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system.  
In summary, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because 1) mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (“significantly more”) and 2) well-understood, routine, conventional activity cannot provide an inventive concept (“significantly more”). The claims do not provide an inventive concept significantly more than the abstract idea. Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. (Step 2B: No, the claims do not provide significantly more).
Dependent Claims 2-11 and 13-19 further define the abstract idea that is presented in independent Claims 1 and 12 respectively, and are further grouped as certain methods of organizing human activity and are abstract for the same reasons and basis as presented above. Further, Claims 2-8, 11 and 13-19 recite additional elements beyond the abstract idea. Claims 2 and 13 recite training the model. Claims 3, 6-7, 11, 14 and 17-18 recite a display of the computing device. Claims 4 and 15 recite executing a translation service and a plurality of translation services (Examiner interprets these services to be purely software. See also Applicant’s specification paras 109-110). Claims 8 and 19 recite a plurality of models. These additional elements are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component. For example, as noted above, the Applicant’s specification indicates the use of known models and computing devices. 
Further, Claims 5 and 16 recite transmitting a file. These transmitting steps are recited at a high level of generality (i.e., as a general means of transmitting data) and amounts to the mere transmission of data, which is a form of extra-solution activity, and thus cannot provide a practical application or significantly more. See MPEP 2106.04(d)(I). 
Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims do not recite additional elements that integrate the judicial exception into a practical application when considered both individually and as an ordered combination. Therefore, the dependent claims are also directed to an abstract idea. 
Thus, Claims 1-20 are rejected under 35 U.S.C. 101 as being directed to abstract ideas without significantly more. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 5, 7-9, 11-13, 16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shriberg et al. (US 20210110895) in view of Gray (US 6149585). 
Regarding Claim 1, Shriberg discloses the following limitations: 
A system for training a model for real-time patient diagnosis, comprising: a computer comprising a processor, memory, and a network interface, the processor configured to: receive audio data and video data of a clinical encounter, the audio data comprising spoken words by an entity and the video data depicting the entity; (Shriberg discloses a system comprising one or more computer processors and memory (a computer comprising a processor, memory) and the use of a network infrastructure (a network interface). Speech data and video data of the subject (video data depicting the entity) are received (receive audio data and video data of a clinical encounter) and processing the data using a model to generate one or more assessments of the mental state of the subject. Analysis of spoken language from patient (the audio data comprising spoken words by an entity) responses to assessment questions or captured conversations. The composite model may analyze, in real time, the audiovisual signal of the patient to estimate the patient's health (real-time patient diagnosis). – paras 5-6, 17, 31, 69, 150, 152, 159)  
retrieve clinical data regarding the entity; (Shriberg discloses that the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject.– para 6)  
execute a model using the words of the audio data and the retrieved clinical data regarding the entity as input, (Shriberg discloses that the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject (using the retrieved clinical data regarding the entity as input). Further, the speech data is processed using one or more models (using the words of the audio data regarding the entity as input).  – paras 6, 26)
the execution causing the model to output a plurality of clinical diagnoses for the entity; (Shriberg discloses generating one or more assessments of the mental state associated with the subject (for the entity). The mental state can comprise one or more medical, psychological, or psychiatric conditions or symptoms (output a plurality of clinical diagnoses). This may be done by providing assessment data as inputs to machine learning algorithms (the execution causing the model to output). – abstract; paras 9, 26, 154) 
concurrently render the corresponding video data and audio data and the plurality of clinical diagnoses via a computing device associated with a user; (Shriberg discloses that the assessment/report (the plurality of clinical diagnoses) can be configured to be displayed on a graphical user interface of an electronic device of the user (render…via a computing device associated with a user). The system may provide the clinician with the dialogue between itself and the patient. This dialogue may be a recording of the screening or monitoring process (concurrently render the corresponding video data and audio data).– paras 69-70, 168, 170) 
Shriberg does not disclose the following limitations met by Gray:  
and store an indication of a selected clinical diagnosis from the plurality of clinical diagnoses responsive to receiving a selection of the clinical diagnosis at the computing device. (Gray teaches a plurality of possible diagnoses are presented for selection by a user (an indication of a selected clinical diagnosis from the plurality of clinical diagnoses). The user is then provided with a recommended diagnostic task based on the selected possible diagnosis. The system is a repository (store) of the results of the diagnostic task or it can continue to process the patient data for other suspected diagnoses. Examiner notes that the stored diagnostic task is based on the selected diagnosis by the user. – abstract; col 3, lines 53-59) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified generating and displaying mental health assessments as disclosed by Shriberg to incorporate receiving a selection of a diagnosis by a user as taught by Gray in order to manage and improve patient diagnoses (see Gray col 1, lines 5-8). 

Regarding Claim 2, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of claim 1, wherein the processor is further configured to: label a feature vector comprising the words of the audio data and the retrieved clinical data with the indication of the selected clinical diagnosis; (Shriberg discloses that high-level feature representations (a feature vector) may include, for example, convolutional neural networks (CNNs), autoencoders, variational autoencoders, deep neural networks, and support vector machines of the acoustic model. The model can be trained on speech data from a plurality of other test subjects who have a clinical determination of the mental condition (the selected clinical diagnosis). The clinical determinations may serve as labels (label…with the indication of the selected clinical diagnosis) for the speech data (label a feature vector comprising the words of the audio data). The user data labeled for training may further include clinical records (label…the retrieved clinical data). – paras 77, 287, 302, 370-371)
and train the model with the labeled feature vector. (Shriberg discloses that labeling data (confirmed or imputed diagnoses of depression) (with the labeled feature vector) are fed to a series of trainers that train individual models, and subsequently fuse them into a combined model (train the model). – paras 77, 165, 287, 302, 370-371)

Regarding Claim 5, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of claim 1, wherein the processor is further configured to: select a clinical treatment plan based on the selected clinical diagnosis; (Gray teaches recommending a diagnostic task (select a clinical treatment plan) based on the selected possible diagnosis (based on the selected clinical diagnosis). – abstract)  
and transmit a file comprising the selected clinical treatment plan to the computing device. (Gray teaches that the recommended diagnostic task is "posted" into the system. – col 4, lines 10-13) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified generating and displaying mental health assessments as disclosed by Shriberg to incorporate recommending diagnostic tasks based on a selection of a diagnosis by a user as taught by Gray in order to manage and improve patient diagnoses (see Gray col 1, lines 5-8). 

Regarding Claim 7, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of claim 1, wherein the processor executing the model causes the model to output a confidence score for each of the plurality of clinical diagnoses, (Shriberg discloses that the method can further comprise using at least the output to generate a score (output a confidence score) where the score can comprise an estimate that the subject has the mental health disorder (for each of the plurality of clinical diagnoses). – paras 14, 23, 32) 
and wherein the processor is configured to concurrently render the plurality of clinical diagnoses by rendering the confidence score for each of the plurality of clinical diagnoses on a display of the computing device. (Shriberg discloses that the electronic report (the plurality of clinical diagnoses) may include the score (the confidence score) and can be configured to be displayed on a graphical user interface of a user's electronic device (a display of the computing device). – para 170 )

Regarding Claim 8, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of claim 1, wherein the processor is further configured to: identify one or more characteristics of the patient from the clinical data, the video data, or the audio data; (Shriberg discloses identifying a person's age, gender, ethnicity, educational background, accent/region they grew up in, etc. (identify one or more characteristics of the patient) from the demographic, clinical and social metadata (the clinical data). – paras 305-306, 312) 
and select the model from a plurality of models based on the one or more characteristics. (Shriberg discloses selecting the most appropriate/tailored model (select the model from a plurality of models) for each person based on the demographic, clinical and social data about the person (based on the one or more characteristics). For example, a Caucasian person may require different video models compared to an individual of African descent. Likewise, men and women often have divergent acoustic characteristics that necessitate the leveraging of different acoustic models to accurately classify them. – paras 305-306, 310-313, 376-377; FIG. 20B) 

Regarding Claim 9, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of claim 1, wherein the processor is configured to receive the audio data and video data of the clinical encounter by receiving the audio data and video data in real-time during the clinical encounter, (Shriberg discloses that the speech data can be received substantially in real-time as the subject is speaking and that the audiovisual signal of the patient is received and analyzed in real time (receiving the audio data and video data in real-time during the clinical encounter). – paras 64, 162, 315)
and wherein the processor is configured to concurrently render the corresponding video data and audio data and the plurality of clinical diagnoses via the computing device associated with the user by concurrently rendering the corresponding video data and audio data and the plurality of clinical diagnoses in real time during the clinical encounter. (Shriberg discloses transmitting the assessment (rendering the corresponding video data and audio data and the plurality of clinical diagnoses) to a healthcare provider to be used in evaluating the mental state of the subject, where the transmitting can be performed in real-time during the assessment (in real time during the clinical encounter). – para 12) 

Regarding Claim 11, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of claim 1, wherein the processor is further configured to: extract a term from the audio data comprising spoken words of the entity; (Shriberg discloses the method can further comprise extracting significant elements (extract a term) from the speech data. For example, from the patients response, “trouble sleeping” may be identified as a significant element.– paras 44, 250; FIG. 17) 
Select a decision tree comprising a set of questions based on the extracted term; (Shriberg discloses identifying significant elements (the extracted term) in the patient's speech. In particular, assessment test administrator 2202 uses language portions of composite model 2204 to identify distinct assertions in the portion of the audiovisual signal received after the last question asked and identifies related follow-up questions (Select a set of questions based on the extracted term).  – para 249-251) (Examiner interprets, using broadest reasonable interpretation, the decision tree comprising a set of questions to be a list or group of questions that are to be asked in a series or order. E.g., See Applicant’s specification paras 136-138) 
and sequentially render the set of questions on a display of the computing device during the clinical encounter based on second audio data comprising one or more answers to the set of questions, the answers spoken words by the entity. (Shriberg discloses collecting a series of speech data (the answers spoken words by the entity and follow-up questions to conduct a conversation with the patient (based on second audio data comprising one or more answers to the set of questions).  For example, in conversation 1700 (FIG. 17), assessment test administrator 2202 identifies three (3) topics for follow-up questions (sequentially render the set of questions) for the element of insomnia. The assessment test administrator may dequeue and enqueue any identified follow-up questions. The real-time system 302 (FIG. 22) assesses the current mental state of the patient using an interactive spoken conversation with the patient through patient device (a display of the computing device during the clinical encounter). – paras 44, 249-256, 273; FIG. 17) 

Regarding Claim 12, Shriberg discloses the following limitations: 
A method for training a model for real-time patient diagnosis, comprising: receiving, by a processor, audio data and video data of a clinical encounter, the audio data comprising spoken words by an entity and the video data depicting the entity; (Shriberg discloses a system comprising one or more computer processors (a processor) and memory and the use of a network infrastructure. Speech data and video data of the subject (video data depicting the entity) are received (receive audio data and video data of a clinical encounter) and processing the data using a model to generate one or more assessments of the mental state of the subject. Analysis of spoken language from patient (the audio data comprising spoken words by an entity) responses to assessment questions or captured conversations. The composite model may analyze, in real time, the audiovisual signal of the patient to estimate the patient's health (real-time patient diagnosis). – paras 5-6, 17, 31, 69, 150, 152, 159)   
retrieving, by the processor, clinical data regarding the entity; (Shriberg discloses that the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject.– para 6)  
executing, by the processor, a model using the words of the audio data and the retrieved clinical data regarding the entity as input (Shriberg discloses that the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject (using the retrieved clinical data regarding the entity as input). Further, the speech data is processed using one or more models (using the words of the audio data regarding the entity as input).  – paras 6, 26) 
to output a plurality of clinical diagnoses for the entity; (Shriberg discloses generating one or more assessments of the mental state associated with the subject (for the entity). The mental state can comprise one or more medical, psychological, or psychiatric conditions or symptoms (output a plurality of clinical diagnoses). This may be done by providing assessment data as inputs to machine learning algorithms. – abstract; paras 9, 26, 154)  
concurrently rendering, by the processor, the corresponding video data and audio data and the plurality of clinical diagnoses via a computing device associated with a user; (Shriberg discloses that the assessment/report (the plurality of clinical diagnoses) can be configured to be displayed on a graphical user interface of an electronic device of the user (render…via a computing device associated with a user). The system may provide the clinician with the dialogue between itself and the patient. This dialogue may be a recording of the screening or monitoring process (concurrently render the corresponding video data and audio data).– paras 69-70, 168, 170) 
Shriberg does not disclose the following limitations met by Gray:  
and storing, by the processor, an indication of a selected clinical diagnosis from the plurality of clinical diagnoses responsive to receiving a selection of the clinical diagnosis at the computing device. (Gray teaches a plurality of possible diagnoses are presented for selection by a user (an indication of a selected clinical diagnosis from the plurality of clinical diagnoses). The user is then provided with a recommended diagnostic task based on the selected possible diagnosis. The system is a repository (store) of the results of the diagnostic task or it can continue to process the patient data for other suspected diagnoses. Examiner notes that the stored diagnostic task is based on the selected diagnosis by the user. – abstract; col 3, lines 53-59) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified generating and displaying mental health assessments as disclosed by Shriberg to incorporate receiving a selection of a diagnosis by a user as taught by Gray in order to manage and improve patient diagnoses (see Gray col 1, lines 5-8). 

Regarding Claim 13, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The method of claim 12, further comprising: selecting, by the processor, a clinical treatment plan based on the selected clinical diagnosis; (Gray teaches recommending a diagnostic task (select a clinical treatment plan) based on the selected possible diagnosis (based on the selected clinical diagnosis). – abstract)  
and transmitting, by the processor, a file comprising the selected clinical treatment plan to the computing device. (Gray teaches that the recommended diagnostic task is "posted" into the system. – col 4, lines 10-13) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified generating and displaying mental health assessments as disclosed by Shriberg to incorporate recommending diagnostic tasks based on a selection of a diagnosis by a user as taught by Gray in order to manage and improve patient diagnoses (see Gray col 1, lines 5-8). 

Regarding Claim 16, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The method of claim 12, further comprising: selecting, by the processor, a clinical treatment plan based on the selected clinical diagnosis; (Gray teaches recommending a diagnostic task (select a clinical treatment plan) based on the selected possible diagnosis (based on the selected clinical diagnosis). – abstract)  
and transmitting, by the processor, a file comprising the selected clinical treatment plan to the computing device. (Gray teaches that the recommended diagnostic task is "posted" into the system. – col 4, lines 10-13) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified generating and displaying mental health assessments as disclosed by Shriberg to incorporate recommending diagnostic tasks based on a selection of a diagnosis by a user as taught by Gray in order to manage and improve patient diagnoses (see Gray col 1, lines 5-8). 

Regarding Claim 18, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The method of claim 12, wherein executing the model causes the model to output a confidence score for each of the plurality of clinical diagnoses, (Shriberg discloses that the method can further comprise using at least the output to generate a score (output a confidence score) where the score can comprise an estimate that the subject has the mental health disorder (for each of the plurality of clinical diagnoses). – paras 14, 23, 32) 
and wherein concurrently rendering the plurality of clinical diagnoses comprises rendering, by the processor, the confidence score for each of the plurality of clinical diagnoses on a display of the computing device. (Shriberg discloses that the electronic report (the plurality of clinical diagnoses) may include the score (the confidence score) and can be configured to be displayed on a graphical user interface of a user's electronic device (a display of the computing device). – para 170 ) 

Regarding Claim 19, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The method of claim 12, further comprising: identifying, by the processor, one or more characteristics of the patient from the clinical data, the video data, or the audio data; (Shriberg discloses identifying a person's age, gender, ethnicity, educational background, accent/region they grew up in, etc. (identify one or more characteristics of the patient) from the demographic, clinical and social metadata (the clinical data). – paras 305-306, 312) 
and selecting, by the processor, the model from a plurality of models based on the one or more characteristics. (Shriberg discloses selecting the most appropriate/tailored model (select the model from a plurality of models) for each person based on the demographic, clinical and social data about the person (based on the one or more characteristics). For example, a Caucasian person may require different video models compared to an individual of African descent. Likewise, men and women often have divergent acoustic characteristics that necessitate the leveraging of different acoustic models to accurately classify them. – paras 305-306, 310-313, 376-377; FIG. 20B) 

Regarding Claim 20, Shriberg discloses the following limitations: 
A non-transitory computer readable medium including encoded instructions that, when executed by a processor of a computer, cause the computer to: (Shriberg discloses a non-transitory computer readable-medium comprising machine-executable instructions that, upon execution by one or more computer processors, implements any of the foregoing methods. – para 30) 
receive audio data and video data of a clinical encounter, the audio data comprising spoken words by an entity and the video data depicting the entity; (Shriberg discloses a system comprising one or more computer processors (a processor of a computer) and memory and the use of a network infrastructure. Speech data and video data of the subject (video data depicting the entity) are received (receive audio data and video data of a clinical encounter) and processing the data using a model to generate one or more assessments of the mental state of the subject. Analysis of spoken language from patient (the audio data comprising spoken words by an entity) responses to assessment questions or captured conversations. The composite model may analyze, in real time, the audiovisual signal of the patient to estimate the patient's health. – paras 5-6, 17, 31, 69, 150, 152, 159)   
retrieve clinical data regarding the entity; (Shriberg discloses that the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject.– para 6)
execute a model using the words of the audio data and the retrieved clinical data regarding the entity as input to output a plurality of clinical diagnoses for the entity; (Shriberg discloses that the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject (execute a model using the retrieved clinical data regarding the entity as input). Further, the speech data is processed using one or more models (execute a model using the words of the audio data as input). One or more assessments of the mental state associated with the subject (for the entity) are generated. The mental state can comprise one or more medical, psychological, or psychiatric conditions or symptoms (output a plurality of clinical diagnoses). This may be done by providing assessment data as inputs to machine learning algorithms (execute a model).   – abstract; paras 6, 26,154)  
concurrently render the corresponding video data and audio data and the plurality of clinical diagnoses via a computing device associated with a user; (Shriberg discloses that the assessment/report (the plurality of clinical diagnoses) can be configured to be displayed on a graphical user interface of an electronic device of the user (render…via a computing device associated with a user). The system may provide the clinician with the dialogue between itself and the patient. This dialogue may be a recording of the screening or monitoring process (concurrently render the corresponding video data and audio data).– paras 69-70, 168, 170) 
Shriberg does not disclose the following limitations met by Gray:  
and store an indication of a selected clinical diagnosis from the plurality of clinical diagnoses responsive to receiving a selection of the clinical diagnosis at the computing device. (Gray teaches a plurality of possible diagnoses are presented for selection by a user (an indication of a selected clinical diagnosis from the plurality of clinical diagnoses). The user is then provided with a recommended diagnostic task based on the selected possible diagnosis. The system is a repository (store) of the results of the diagnostic task or it can continue to process the patient data for other suspected diagnoses. Examiner notes that the stored diagnostic task is based on the selected diagnosis by the user. – abstract; col 3, lines 53-59) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified generating and displaying mental health assessments as disclosed by Shriberg to incorporate receiving a selection of a diagnosis by a user as taught by Gray in order to manage and improve patient diagnoses (see Gray col 1, lines 5-8). 

Claims 3, 10 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Shriberg and Gray, and further in view of Crosley et al. (US 20160350287).
Regarding Claim 3, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of The system of claim 1, wherein the processor is further configured to: transcribe the words from the audio data into a text file; (Shriberg discloses language-specific speech recognition 3702 (FIG. 37) which produces text (into a text file) in the language spoken by the patient, i.e., the patient's language, from the audio signal received from the patient (transcribe the words from the audio). – para 401) 
and convert the words of the audio data from the text file into a second language from a first language, (Shriberg discloses that speech recognition is specific to the particular language of the speech. Language-specific speech recognition 3702 (FIG. 37) produces text (from the text file) in the language spoken by the patient, i.e., the patient's language, from the audio signal received from the patient (convert the words of the audio data). To enable application of language models 2214, which cannot process text in the patient's language in this illustrative example, translation engine 3704 translates the text (convert the words of the audio data from the text file) from the patient's language (from a first language) to a language (into a second language) that may be processed by language models 2214, e.g., English. – paras 400-401)
wherein concurrently rending the video data and the audio data comprises rendering the words in the second language as text on a display of the computing device. (Crosley teaches various embodiments for translation of speech in a video messaging application. A segment of streaming video is decoded to separate the visual component from the audio component. The audio component is then converted to text, which may then be translated and converted to a translation output comprising a new language. The translation output 137 may comprise text in a second language (rendering the words in the second language as text on a display of the computing device). The encoder 146 may combine the video residing in the video holding buffer 134 with the translated text data as subtitles by synchronizing the text translation output 137 with the previously separated visual component of the video data (wherein concurrently rending the video data and the audio data). – paras 22-24, 29; FIG. 3, item 137a)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the translation engine as disclosed by Shriberg to incorporate displaying the translated audio onto the video stream as taught by Crosley in order to avoid any language barriers (see Crosley para 2). 

Regarding Claim 10, Shriberg, Gray and Crosley disclose all the limitations above and further disclose the following limitations: 
The system of claim 9, wherein the video data further depicts the user. (Crosley teaches rendering audiovisual signals on user interfaces. Figure 3, for example, shows two people communicating via video conference. – abstract; para 31; FIG. 3) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have further modified processing and generating assessments in real time as disclosed by Shriberg to incorporate displaying video of all participants along with the translated text as taught by Crosley in order to avoid any language barriers (see Crosley para 2).

Regarding Claim 14, Shriberg, Gray and Crosley disclose all the limitations above and further disclose the following limitations: 
The method of claim 12, further comprising: transcribing, by the processor, the words from the audio data into a text file; (Shriberg discloses language-specific speech recognition 3702 (FIG. 37) which produces text (into a text file) in the language spoken by the patient, i.e., the patient's language, from the audio signal received from the patient (transcribe the words from the audio). – para 401) 
and converting, by the processor, the words of the audio data from the text file into a second language from a first language; (Shriberg discloses that speech recognition is specific to the particular language of the speech. Language-specific speech recognition 3702 (FIG. 37) produces text (from the text file) in the language spoken by the patient, i.e., the patient's language, from the audio signal received from the patient (convert the words of the audio data). To enable application of language models 2214, which cannot process text in the patient's language in this illustrative example, translation engine 3704 translates the text (convert the words of the audio data from the text file) from the patient's language (from a first language) to a language (into a second language) that may be processed by language models 2214, e.g., English. – paras 400-401) 
wherein concurrently rending the video data and the audio data via the computing device comprises rendering, by the processor, the words in the second language as text on a display of the computing device. (Crosley teaches various embodiments for translation of speech in a video messaging application. A segment of streaming video is decoded to separate the visual component from the audio component. The audio component is then converted to text, which may then be translated and converted to a translation output comprising a new language. The translation output 137 may comprise text in a second language (rendering the words in the second language as text on a display of the computing device). The encoder 146 may combine the video residing in the video holding buffer 134 with the translated text data as subtitles by synchronizing the text translation output 137 with the previously separated visual component of the video data (wherein concurrently rending the video data and the audio data). – paras 22-24, 29; FIG. 3, item 137a)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the translation engine as disclosed by Shriberg to incorporate displaying the translated audio onto the video stream as taught by Crosley in order to avoid any language barriers (see Crosley para 2).  

Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Shriberg, Gray and Crosley, further in view of Huang et al. (US 20170124071). 
Regarding Claim 4, Shriberg, Gray and Crosley discloses all the limitations above, however they do not disclose the following limitations met by Huang: 
The system of claim 3, wherein converting the words of the audio data from the text file into the second language comprises converting the words of the audio data into the second language by executing a first translation service,  (Huang teaches generating a plurality of candidate translations for the text (from the text file into the second language) and selecting a predetermined number of candidate translations (executing a first translation service) with highest translation quality scores as translations of the text to be translated. – abstract; paras 6, 9) 
the first translation service selected by the processor from a plurality of translation services by: inserting a first text file into each of the plurality of translation services, obtaining a plurality of translated text files each individually associated with a different translation service of the plurality of translation services; (Huang teaches obtaining at least one text (a first text file) to be translated and generating a plurality of candidate translations for the text (obtaining a plurality of translated text files each individually associated with a different translation service) and selecting a predetermined number of candidate translations with highest translation quality scores as translations of the text to be translated. – abstract; paras 6, 9) 
receiving one or more indications of errors for each of the plurality of translated text files; (Huang teaches extracting features from the candidate translations and calculating a translation quality score (one or more indications of errors) for the plurality of candidate translations (for each of the plurality of translated text files). – abstract; paras 6, 9)
calculating an error rate for each of the plurality of translation services based on the one or more indications of errors; (Huang teaches applying a translation quality prediction model to calculate translation quality scores (calculating an error rate) for the plurality of candidate translations (for each of the plurality of translation services). – abstract; paras 6, 9)
and selecting the first translation service responsive to a lowest calculated error rate having an association with the first translation service. (Huang teaches selecting a predetermined number of candidate translations (selecting the first translation service) with highest translation quality scores (a lowest calculated error rate). – para 9)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the translation engine as disclosed by Shriberg to incorporate selecting the highest quality translation service as taught by Huang in order to improve the linguistic accuracy of the translation result (see Huang para 6). 

Regarding Claim 15, Shriberg, Gray, Crosley and Huang disclose all the limitations above and further disclose the following limitations: 
The method of claim 14, wherein converting the words of the audio data from the text file into the second language comprises converting, by the processor, the words of the audio data into the second language by executing, by the processor, a first translation service, (Huang teaches generating a plurality of candidate translations for the text (from the text file into the second language) and selecting a predetermined number of candidate translations (executing a first translation service) with highest translation quality scores as translations of the text to be translated. – abstract; paras 6, 9) 
the first translation service selected by the processor from a plurality of translation services by: inserting, by the processor, a first text file into each of the plurality of translation services, obtaining a plurality of translated text files each individually associated with a different translation service of the plurality of translation services; (Huang teaches obtaining at least one text (a first text file) to be translated and generating a plurality of candidate translations for the text (obtaining a plurality of translated text files each individually associated with a different translation service) and selecting a predetermined number of candidate translations with highest translation quality scores as translations of the text to be translated. – abstract; paras 6, 9) 
receiving, by the processor, one or more indications of errors for each of the plurality of translated text files; (Huang teaches extracting features from the candidate translations and calculating a translation quality score (one or more indications of errors) for the plurality of candidate translations (for each of the plurality of translated text files). – abstract; paras 6, 9) 
calculating, by the processor, an error rate for each of the plurality of translation services based on the one or more indications of errors; (Huang teaches applying a translation quality prediction model to calculate translation quality scores (calculating an error rate) for the plurality of candidate translations (for each of the plurality of translation services). – abstract; paras 6, 9) 
and selecting, by the processor, the first translation service responsive to a lowest calculated error rate having an association with the first translation service. (Huang teaches selecting a predetermined number of candidate translations (selecting the first translation service) with highest translation quality scores (a lowest calculated error rate). – para 9)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the translation engine as disclosed by Shriberg to incorporate selecting the highest quality translation service as taught by Huang in order to improve the linguistic accuracy of the translation result (see Huang para 6). 

Claims 6 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Shriberg and Gray, further in view of Feder at al. (US 20080171916). 
Regarding Claim 6, Shriberg and Gray disclose all the limitations above and further disclose the following limitations: 
The system of claim 1, wherein the processor executing the model causes the model to output a confidence score for each of the plurality of clinical diagnoses, (Shriberg discloses that the method can further comprise using at least the output to generate a score (output a confidence score) where the score can comprise an estimate that the subject has the mental health disorder (for each of the plurality of clinical diagnoses). – paras 14, 23, 32) 
Shriberg and Gray do not disclose the following limitations met by Feder: 
the processor further configured to: generate a sequential order of the plurality of clinical diagnoses based on the confidence score for each of the plurality of clinical diagnoses, (Feder teaches a novel computer algorithm that diagnoses diseases in actual patients by creating a differential diagnosis list and calculating a total score, or probability (P) of each diagnosis (confidence score for each of the plurality of clinical diagnoses). Then, the all the diagnoses in the differential diagnosis list are sorted according to decreasing total probability (P) values (rate a sequential order of the plurality of clinical diagnoses based on the confidence score for each of the plurality of clinical diagnoses). – abstract; paras 5, 25, 65, 118, 216; Table 17 on page 19) 
wherein concurrently rendering the plurality of clinical diagnoses comprises rendering text identifying the plurality of clinical diagnoses in the sequential order on a display of the computing device. (Feder teaches creating a sorted list of all diagnoses in a differential diagnosis list according to their respective probabilities calculated. The algorithm and corresponding computer program emulates the diagnostic reasoning of a clinician and displays the differential diagnosis list according to decreasing total probability (P) values (comprises rendering text identifying the plurality of clinical diagnoses in the sequential order on a display of the computing device). – abstract; paras 216, 302, 310; Table 17 on page 19) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have further modified generating and displaying mental health assessments and incorporating the produced score into a summary report as disclosed by Shriberg to incorporate displaying a sorted differential diagnosis list as taught by Feder in order to enable a quicker and more economical achievement of final diagnoses (see Feder para 34). 

Regarding Claim 17, Shriberg, Gray and Feder discloses all the limitations above and further discloses the following limitations: 
The method of claim 12,  wherein executing the model causes the model to output a confidence score for each of the plurality of clinical diagnoses, (Shriberg discloses that the method can further comprise using at least the output to generate a score (output a confidence score) where the score can comprise an estimate that the subject has the mental health disorder (for each of the plurality of clinical diagnoses). – paras 14, 23, 32) 
and further comprising: generating, by the processor, a sequential order of the plurality of clinical diagnoses based on the confidence score for each of the plurality of clinical diagnoses,  (Feder teaches a novel computer algorithm that diagnoses diseases in actual patients by creating a differential diagnosis list and calculating a total score, or probability (P) of each diagnosis (confidence score for each of the plurality of clinical diagnoses). Then, the all the diagnoses in the differential diagnosis list are sorted according to decreasing total probability (P) values (rate a sequential order of the plurality of clinical diagnoses based on the confidence score for each of the plurality of clinical diagnoses). – abstract; paras 5, 25, 65, 118, 216; Table 17 on page 19) 
wherein concurrently rendering the plurality of clinical diagnoses comprises rendering, by the processor, strings identifying the plurality of clinical diagnoses in the sequential order on a display of the computing device. (Feder teaches creating a sorted list of all diagnoses in a differential diagnosis list according to their respective probabilities calculated. The algorithm and corresponding computer program emulates the diagnostic reasoning of a clinician and displays the differential diagnosis list according to decreasing total probability (P) values (comprises rendering text identifying the plurality of clinical diagnoses in the sequential order on a display of the computing device). – abstract; paras 216, 302, 310; Table 17 on page 19) 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have further modified generating and displaying mental health assessments and incorporating the produced score into a summary report as disclosed by Shriberg to incorporate displaying a sorted differential diagnosis list as taught by Feder in order to enable a quicker and more economical achievement of final diagnoses (see Feder para 34). 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KIMBERLY VANDER WOUDE whose telephone number is (703)756-4684. The examiner can normally be reached M-F 9 AM-5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PETER H CHOI can be reached at (469) 295-9171. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



                                                                                                                                                                                                        

/K.E.V./Examiner, Art Unit 3681                                                                                                                                                                                                        
/PETER H CHOI/Supervisory Patent Examiner, Art Unit 3681
Read full office action
ARTIFICIAL INTELLIGENCE MODELING FOR MULTI-LINGUISTIC DIAGNOSTIC AND SCREENING OF MEDICAL DISORDERS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

ARTIFICIAL INTELLIGENCE MODELING FOR MULTI-LINGUISTIC DIAGNOSTIC AND SCREENING OF MEDICAL DISORDERS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email