Last updated: April 19, 2026
Application No. 19/084,589
AUTOMATED REPORT GENERATION FOR AUTISM SPECTRUM DISORDER (ASD)

Non-Final OA §101§102§103
Filed
Mar 19, 2025
Examiner
KOLOSOWSKI-GAGER, KATHERINE
Art Unit
3687
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Mhealthcare Inc.
OA Round
1 (Non-Final)
This examiner grants 26% of cases after interview

— +33.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 358 resolved cases, 2023–2026
Examiner Intelligence

KOLOSOWSKI-GAGER, KATHERINE View full profile →
Grants only 26% of cases
Career Allow Rate
95 granted / 358 resolved
-25.5% vs TC avg
Strong +34% interview lift
Without
With
+33.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
54 currently pending
Career history
412
Total Applications
across all art units
Statute-Specific Performance

§101
35.0%
-5.0% vs TC avg
§103
33.9%
-6.1% vs TC avg
§102
14.5%
-25.5% vs TC avg
§112
12.5%
-27.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 358 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION

This action is in reference to the communication filed on 19 MAR 2025. 
Claims 1-20 are present and have been examined. 
Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. As explained below, the claim(s) are directed to an abstract idea without significantly more. 

	Step One: Is the Claim directed to a process, machine, manufacture or composition of matter? YES 

	With respect to claim(s) 1-20  the independent claim(s) 1, 8, 15  recite(s) a method, a non-transitory computer readable medium, and a system, each of which is a statutory category of invention. 

	Step 2A – Prong One: Is the claim directed to a law of nature, a natural phenomenon (product of nature) or an abstract idea? YES
With respect to claim(s) 1-20, the independent claim(s) (claims 1, 8, 15) is/are directed, in part, to:
Claim 1: A method, comprising:
receiving a stream of audiovisual data by a 
determining, by the 
generating a report based on the determined one or more diagnostic impressions.


Claim 15: A system for diagnosing autism spectrum disorder (ASD) in a patient, the system comprising:




These claim elements are considered to be abstract ideas because they are directed to concepts performed in the human mind, including evaluation, observation, judgement, and opinion. Receiving data or “capturing” audio visual data (i.e. through the eyes and ears), as well as determining an impression based on the data, and using any of this information to generate a report are all examples of such concepts.  If a claim limitation, under its broadest reasonable interpretation, covers concepts performed in the human mind, then it falls within the “mental process” grouping of abstract ideas. At least claims 1, 8 are further directed to mathematical concepts, in that the claims are directed to mathematical relationships, formulas, equations or calculations in that claims 1, 8 recite the use of a learning model which is an example of such a mathematical concept. If a claim limitation under its broadest reasonable interpretation covers mathematical relationships/formulas/equations and/or calculations, then it falls within the mathematical concepts grouping of abstract ideas.  
Accordingly, the claim recites an abstract idea.

Step 2A – Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application? NO.
	This judicial exception is not integrated into a practical application. In particular, the claim(s) 1, 8 at least imply the computer application of a learning model, claim 8 recites a non-transitory computer readable medium and a computing system, and claim 15 recites the use of an audio capture device, a video capture device, and a report generation component (again, with at least the implied application of a computing element). The computer application of the learning model itself is recited at a high level of generality and as such amount to no more than adding the words “apply it” to the judicial exception, or mere instructions to implement the abstract idea on a computer, or merely uses the computer as a tool to perform the abstract idea (see MPEP 2106.05f), or generally links the use of the judicial exception to a particular technological field of use/computing environment (see MPEP 2106.05h). Examiner finds that similarly, the computing elements in claim 8, and 15, as well as the “capture” devices in claim 15 are also found to be analogous to adding the words “apply it” or  to use the computer itself as a tool/generally link the abstract idea to the technical field of computing. Examiner finds no improvement to the functioning of the computer or any other technology or technical field (i.e. the capturing devices do not recite any improvements to the fields of audio or visual capture) in the above identified elements as claimed (see MPEP 2106.05a), nor any other application or use of the judicial exception in some meaningful way beyond a general like between the use of the judicial exception to a particular technological environment (see MPEP 2106.05e). Examiner further notes that sending/receiving of data in a computing system is generally found to be analogous to adding insignificant extra solution activity to the judicial exception(s) identified (see MPEP 2106.05g).
Accordingly, this/these additional element(s) do(es) not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? NO.
The independent claim(s) is/are additionally directed to claim elements such as: In particular, the claim(s) 1, 8 at least imply the computer application of a learning model, claim 8 recites a non-transitory computer readable medium and a computing system, and claim 15 recites the use of an audio capture device, a video capture device, and a report generation component (again, with at least the implied application of a computing element). When considered individually, the above identified claim elements only contribute generic recitations of technical elements to the claims. It is readily apparent, for example, that the claim is not directed to any specific improvements of these elements. Examiner looks to Applicant’s specification in
[028] … Although not required, aspects of the various components or systems are described in the general context of computer-executable instructions, such as routines executed by a general-purpose computer, e.g., mobile device, a server computer, or personal computer. The system can be practiced with other communications, data processing, or computer system configurations, including: Internet appliances, hand-held devices, wearable devices, or mobile devices (e.g., smart phones, tablets, laptops, smart watches), all manner of cellular or mobile phones, multi-processor systems, microprocessor-based or programmable consumer electronics, set-top boxes, network PCs, mini-computers, mainframe computers, AR/VR devices, gaming devices, and the like. Indeed, the terms “computer,” “host,” and “host computer,” and “mobile device” and “handset” are generally used interchangeably herein and refer to any of the above devices and systems, as well as any data processor.
[015]  A camera 120 or other video capture component may capture a stream of video (e.g., one or more video clips or images) during the assessment. In some cases, the camera 120 may be a camera of a mobile device held by a parent or caregiver of the patient 115.
[017] …  Further, a microphone 124 or other audio capture device may capture audible responses uttered or spoken by the patient 115 and/or the audio cues or voice commands.
 These passages, as well as others, makes it clear that the invention is not directed to a technical improvement. These passages describe the additional elements in terms of functional capabilities only – i.e. any computing device is appropriate, as is any and all video or audio capture devices.  When the claims are considered individually and as a whole, the additional elements noted above, appear to merely apply the abstract concept to a technical environment in a very general sense. The most significant elements of the claims, that is the elements that really outline the inventive elements of the claims, are set forth in the elements identified as an abstract idea.   The fact that the generic computing devices are facilitating the abstract concept is not enough to confer statutory subject matter eligibility.

As per dependent claims 2-7, 9-14, 16-20: 
Dependent claims 3, 5, 7, 10, 12, 14, 20 are not directed to any additional abstract ideas, but do at least nominally recite the use of a “computer vision module” – i.e. the use of a computer in analyzing the received data. Examiner notes this in the interest of compact prosecution, and makes reference to the analysis of the computer application of the modeling as noted above with reference to claims 1, 8. For similar reasons, Examiner concludes that these elements do not constitute a finding of a practical application nor significantly more than the abstract idea(s) identified above. 
Dependent claims 2, 4, 6 9, 11, 13, 16-19 are not directed any additional abstract ideas and are also not directed to any additional non-abstract claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claims and addressed above – such as command pair descriptions, extracted visual feature descriptions, and report generation parameters. While these descriptive elements may provide further helpful context for the claimed invention these elements do not serve to confer subject matter eligibility to the invention since their individual and combined significance is still not heavier than the abstract concepts at the core of the claimed invention.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 2, 6, 8, 9, 13, 15-19 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Sapiro et al  (US 11158403 B1, hereinafter Sapiro). 

In reference to claim 1, 8: 
Sapiro teaches: A method, comprising:
receiving a stream of audiovisual data by a machine learning (ML) model, wherein the audiovisual data includes multiple command-action pairs (at least [fig 1 and related text including col 4] “Computing platform 100 may be any suitable entity (e.g., a mobile device or a server) configurable for performing automated behavioral assessments via monitoring (e.g., video and/or audio recording) users for responses to one or more stimuli and automatically analyzing or coding the responses for determining a behavioral assessment. For example, computer platform 100 may include a memory and a processor for executing a module (e.g., an app or other software) for automated behavioral assessment. In this example, computer platform 100 may also include a user interface for providing stimuli (e.g., video, audio and/or text) designed to illicit certain responses from a user (e.g., a child or toddler) and a camera for recording or obtaining responses to the provided stimuli.” “BAM 104 and/or another module may generate, determine, and/or utilize stimuli (e.g., text, instructions, video, audio, etc.) for eliciting specific responses from a user. For example, BAM 104 and/or another module may instruct a user or a participant (e.g., the user's parent) to perform a task, e.g., to roll a ball, to speak a user's name at a particular time during the behavior assessment, or to stack a set of rings, where the instructed activity is intended to elicit a particular response (e.g., an emotion, a particular facial expression, a particular eye movement, or other response) from the user.” See also col 5, 6, and BAM 104 is the model, which instructs specific prompts in the recording for a particular response – i.e. “command action pairs” as shown in table 1);
determining, by the ML model, one or more diagnostic impressions based on an analysis of the multiple command-action pairs within the stream of audiovisual data (at least [fig 1 and related text including col 5] “In some embodiments, BAM 104 may generate and/or utilize stimuli designed to exhibit various behaviors that are known or believed to be indicative of autism or other behavioral disorders. For example, the following symptoms have been shown to reliably detect risk for autism based on both retrospective and prospective (high risk infants) studies: overall sustained attention to complex stimuli, reduced range of affective expression (e.g., less smiling and/or more neutral responses), failure to orient to when a user's name is called, and lack of social referencing (e.g., turning head toward a parent to share interest in a surprising or engaging event). In this example, BAM 104 may generate and/or utilize stimuli such that a user's actions can be analyzed with regard to whether any of these symptoms were expressed.”); and
generating a report based on the determined one or more diagnostic impressions (at least [fig 5 and related text, including col 19] “In step 506, a behavioral assessment associated with the user may be determined using the at least one response. In some embodiments, a behavioral assessment may include a behavioral coding, a mental health screening or diagnosis, an autism diagnosis, an attention deficient hyperactivity disorder (ADHD), an anxiety disorder diagnosis, an aggressiveness disorder diagnosis, an indicator indicating a likelihood of a behavioral disorder, a score or weight associated with the at least one stimulus or the at least one response, a recommendation, a referral to a service provider, a mental health related report, or any combination thereof. For example, after analyzing one or more responses from a user, BAM 104 may generate and provide a behavioral assessment including information about potentially relevant behavioral disorders and/or suggestions for alleviating and/or improving any symptoms associated with potentially relevant behavioral disorders. In another example, a behavioral assessment may indicate the likelihood of a user being affected by one or more behavioral disorders.”)

In reference to claim 2, 9: 
Sapiro further teaches: wherein each command-action pair includes: a text-based transcript of an audio command spoken to a subject; and one or more images of the subject responding to the audio command (at least [col 7 lines 10-20] “In some embodiments, computing platform 100 and/or BAM 104 may be communicatively coupled to a user interface 110 and a camera and/or a sensor (camera/sensor) 112. User interface 110 may be any interface for providing information (e.g., output) to a user and/or for receiving information (e.g., input) from a user. In some embodiments, user interface 110 may include a graphical user interface (GUI) for providing a questionnaire and/or for receiving input from a user and/or a display screen for displaying various stimuli to a user.” See also table 1 and related text for discussion of stimuli, and see [col 8 lines 18-36] for discussion of question prompts and recordings). 

In reference to claim 6, 13: 
Sapiro further teaches: wherein generating a report based on the determined one or more diagnostic impressions includes generating a report that includes:
information identifying quantitative measures utilized during an assessment of the subject; information identifying qualitative observations associated with the determined one or more diagnostic impressions; and information identifying one or more recommendations based on the identified qualitative observations (at least [fig 5 and related text, including col 19] “In step 506, a behavioral assessment associated with the user may be determined using the at least one response. In some embodiments, a behavioral assessment may include a behavioral coding, a mental health screening or diagnosis, an autism diagnosis, an attention deficient hyperactivity disorder (ADHD), an anxiety disorder diagnosis, an aggressiveness disorder diagnosis, an indicator indicating a likelihood of a behavioral disorder, a score or weight associated with the at least one stimulus or the at least one response, a recommendation, a referral to a service provider, a mental health related report, or any combination thereof. For example, after analyzing one or more responses from a user, BAM 104 may generate and provide a behavioral assessment including information about potentially relevant behavioral disorders and/or suggestions for alleviating and/or improving any symptoms associated with potentially relevant behavioral disorders. In another example, a behavioral assessment may indicate the likelihood of a user being affected by one or more behavioral disorders.”). 

In reference to claim 15:
Sapiro teaches: A system for diagnosing autism spectrum disorder (ASD) in a patient, the system comprising:
an audio capture device that captures audio cues spoken to the patient and audible patient responses during an assessment session (at least [fig 1 and related text including col 4] “Computing platform 100 may be any suitable entity (e.g., a mobile device or a server) configurable for performing automated behavioral assessments via monitoring (e.g., video and/or audio recording) users for responses to one or more stimuli and automatically analyzing or coding the responses for determining a behavioral assessment. For example, computer platform 100 may include a memory and a processor for executing a module (e.g., an app or other software) for automated behavioral assessment. In this example, computer platform 100 may also include a user interface for providing stimuli (e.g., video, audio and/or text) designed to illicit certain responses from a user (e.g., a child or toddler) and a camera for recording or obtaining responses to the provided stimuli.” “BAM 104 and/or another module may generate, determine, and/or utilize stimuli (e.g., text, instructions, video, audio, etc.) for eliciting specific responses from a user. For example, BAM 104 and/or another module may instruct a user or a participant (e.g., the user's parent) to perform a task, e.g., to roll a ball, to speak a user's name at a particular time during the behavior assessment, or to stack a set of rings, where the instructed activity is intended to elicit a particular response (e.g., an emotion, a particular facial expression, a particular eye movement, or other response) from the user.”);
a video capture device that captures a video feed of the patient during the assessment session (at least [fig 1 and related text include col 4] see discussion above; see also [col 7 and related text] “Camera/sensor 112 may represent any suitable entity (e.g., a camera sensor or camera chip in a smartphone) for recording visual images, audio, and/or other user input (e.g., motion). For example, camera/sensor 112 may include a two dimensional camera, a three dimensional camera, a heat-sensor camera, a motion sensor, a gyroscope sensor, or any combination thereof. In some embodiments, camera/sensor 112 may be usable for recording a user during a behavioral assessment.”; and 
a report generation component that automatically generates a report for the patient based on an analysis of the captured audio cues and the captured video feed  (at least [fig 5 and related text, including col 19] “In step 506, a behavioral assessment associated with the user may be determined using the at least one response. In some embodiments, a behavioral assessment may include a behavioral coding, a mental health screening or diagnosis, an autism diagnosis, an attention deficient hyperactivity disorder (ADHD), an anxiety disorder diagnosis, an aggressiveness disorder diagnosis, an indicator indicating a likelihood of a behavioral disorder, a score or weight associated with the at least one stimulus or the at least one response, a recommendation, a referral to a service provider, a mental health related report, or any combination thereof. For example, after analyzing one or more responses from a user, BAM 104 may generate and provide a behavioral assessment including information about potentially relevant behavioral disorders and/or suggestions for alleviating and/or improving any symptoms associated with potentially relevant behavioral disorders. In another example, a behavioral assessment may indicate the likelihood of a user being affected by one or more behavioral disorders.”). 

In reference to claim 16:
Sapiro further teaches: The system of claim 15, wherein the report generation component includes a machine leaning (ML) model configured to generate the report, by:
receiving a stream of audiovisual data that synchronizes the captured audio cues to the captured video feed (at least [fig 1 and related text including cols 4, 5] “BAM 104 and/or another module may generate, determine, and/or utilize stimuli (e.g., text, instructions, video, audio, etc.) for eliciting specific responses from a user. For example, BAM 104 and/or another module may instruct a user or a participant (e.g., the user's parent) to perform a task, e.g., to roll a ball, to speak a user's name at a particular time during the behavior assessment, or to stack a set of rings, where the instructed activity is intended to elicit a particular response (e.g., an emotion, a particular facial expression, a particular eye movement, or other response) from the user.” See also col 5, 6, and BAM 104 is the model, which instructs specific prompts in the recording for a particular response – i.e. “command action pairs” as shown in table 1);
determining one or more diagnostic impressions based on an analysis of the stream of audiovisual data (at least [fig 1 and related text including col 5] “In some embodiments, BAM 104 may generate and/or utilize stimuli designed to exhibit various behaviors that are known or believed to be indicative of autism or other behavioral disorders. For example, the following symptoms have been shown to reliably detect risk for autism based on both retrospective and prospective (high risk infants) studies: overall sustained attention to complex stimuli, reduced range of affective expression (e.g., less smiling and/or more neutral responses), failure to orient to when a user's name is called, and lack of social referencing (e.g., turning head toward a parent to share interest in a surprising or engaging event). In this example, BAM 104 may generate and/or utilize stimuli such that a user's actions can be analyzed with regard to whether any of these symptoms were expressed.”); and
generating the report based on the determined one or more diagnostic impressions (at least [fig 5 and related text, including col 19] “In step 506, a behavioral assessment associated with the user may be determined using the at least one response. In some embodiments, a behavioral assessment may include a behavioral coding, a mental health screening or diagnosis, an autism diagnosis, an attention deficient hyperactivity disorder (ADHD), an anxiety disorder diagnosis, an aggressiveness disorder diagnosis, an indicator indicating a likelihood of a behavioral disorder, a score or weight associated with the at least one stimulus or the at least one response, a recommendation, a referral to a service provider, a mental health related report, or any combination thereof. For example, after analyzing one or more responses from a user, BAM 104 may generate and provide a behavioral assessment including information about potentially relevant behavioral disorders and/or suggestions for alleviating and/or improving any symptoms associated with potentially relevant behavioral disorders. In another example, a behavioral assessment may indicate the likelihood of a user being affected by one or more behavioral disorders.”).

In reference to claim 17: 
Sapiro further teaches: wherein the stream of audiovisual data includes multiple command-action pairs(at least [fig 1 and related text including cols 4, 5] “BAM 104 and/or another module may generate, determine, and/or utilize stimuli (e.g., text, instructions, video, audio, etc.) for eliciting specific responses from a user. For example, BAM 104 and/or another module may instruct a user or a participant (e.g., the user's parent) to perform a task, e.g., to roll a ball, to speak a user's name at a particular time during the behavior assessment, or to stack a set of rings, where the instructed activity is intended to elicit a particular response (e.g., an emotion, a particular facial expression, a particular eye movement, or other response) from the user.” See also col 5, 6, and BAM 104 is the model, which instructs specific prompts in the recording for a particular response – i.e. “command action pairs” as shown in table 1); and 
wherein the one or more diagnostic impressions are determined based on an analysis of the multiple command-action pairs (at least [fig 1 and related text including col 5] “In some embodiments, BAM 104 may generate and/or utilize stimuli designed to exhibit various behaviors that are known or believed to be indicative of autism or other behavioral disorders. For example, the following symptoms have been shown to reliably detect risk for autism based on both retrospective and prospective (high risk infants) studies: overall sustained attention to complex stimuli, reduced range of affective expression (e.g., less smiling and/or more neutral responses), failure to orient to when a user's name is called, and lack of social referencing (e.g., turning head toward a parent to share interest in a surprising or engaging event). In this example, BAM 104 may generate and/or utilize stimuli such that a user's actions can be analyzed with regard to whether any of these symptoms were expressed.”). 

In reference to claim 18: 
wherein a command-action pair is an audio cue mapped to an action performed by the patient during the assessment session in response to the audio cue (at least [col 4/5] “In some embodiments, BAM 104 and/or another module may generate, determine, and/or utilize stimuli (e.g., text, instructions, video, audio, etc.) for eliciting specific responses from a user. For example, BAM 104 and/or another module may instruct a user or a participant (e.g., the user's parent) to perform a task, e.g., to roll a ball, to speak a user's name at a particular time during the behavior assessment, or to stack a set of rings, where the instructed activity is intended to elicit a particular response (e.g., an emotion, a particular facial expression, a particular eye movement, or other response) from the user…For example, the following symptoms have been shown to reliably detect risk for autism based on both retrospective and prospective (high risk infants) studies: overall sustained attention to complex stimuli, reduced range of affective expression (e.g., less smiling and/or more neutral responses), failure to orient to when a user's name is called, and lack of social referencing (e.g., turning head toward a parent to share interest in a surprising or engaging event). In this example, BAM 104 may generate and/or utilize stimuli such that a user's actions can be analyzed with regard to whether any of these symptoms were expressed.”)

In reference to claim 19:
Sapiro further teaches: wherein the report includes:
information identifying quantitative measures utilized during the assessment session; and
information identifying qualitative observations based on the analysis of the captured audio cues and the captured video feed (at least [fig 5 and related text, including col 19] “In step 506, a behavioral assessment associated with the user may be determined using the at least one response. In some embodiments, a behavioral assessment may include a behavioral coding, a mental health screening or diagnosis, an autism diagnosis, an attention deficient hyperactivity disorder (ADHD), an anxiety disorder diagnosis, an aggressiveness disorder diagnosis, an indicator indicating a likelihood of a behavioral disorder, a score or weight associated with the at least one stimulus or the at least one response, a recommendation, a referral to a service provider, a mental health related report, or any combination thereof. For example, after analyzing one or more responses from a user, BAM 104 may generate and provide a behavioral assessment including information about potentially relevant behavioral disorders and/or suggestions for alleviating and/or improving any symptoms associated with potentially relevant behavioral disorders. In another example, a behavioral assessment may indicate the likelihood of a user being affected by one or more behavioral disorders.”).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 3-5, 10-12, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sapiro in view of Shriberg et al (US 20210110895 A1, hereinafter Shriberg). 

In reference to claim 3, 10, 20: 
Sapiro teaches: wherein the ML model determines the one or more diagnostic impressions by:
receiving the stream of audiovisual data via a multi-model input processing module of the ML model [that provides both] audio data to video data to align voice commands within the audio data to actions performed by a subject and captured within the video data (at least [fig 1 and related text including col 4] “Computing platform 100 may be any suitable entity (e.g., a mobile device or a server) configurable for performing automated behavioral assessments via monitoring (e.g., video and/or audio recording) users for responses to one or more stimuli and automatically analyzing or coding the responses for determining a behavioral assessment. For example, computer platform 100 may include a memory and a processor for executing a module (e.g., an app or other software) for automated behavioral assessment. In this example, computer platform 100 may also include a user interface for providing stimuli (e.g., video, audio and/or text) designed to illicit certain responses from a user (e.g., a child or toddler) and a camera for recording or obtaining responses to the provided stimuli.” “BAM 104 and/or another module may generate, determine, and/or utilize stimuli (e.g., text, instructions, video, audio, etc.) for eliciting specific responses from a user. For example, BAM 104 and/or another module may instruct a user or a participant (e.g., the user's parent) to perform a task, e.g., to roll a ball, to speak a user's name at a particular time during the behavior assessment, or to stack a set of rings, where the instructed activity is intended to elicit a particular response (e.g., an emotion, a particular facial expression, a particular eye movement, or other response) from the user.” See also col 5, 6, and BAM 104 is the model, which instructs specific prompts in the recording for a particular response – i.e. “command action pairs” as shown in table 1);
identifying predicted actions or responses by the subject (at least [col 4] ““BAM 104 and/or another module may generate, determine, and/or utilize stimuli (e.g., text, instructions, video, audio, etc.) for eliciting specific responses from a user. For example, BAM 104 and/or another module may instruct a user or a participant (e.g., the user's parent) to perform a task, e.g., to roll a ball, to speak a user's name at a particular time during the behavior assessment, or to stack a set of rings, where the instructed activity is intended to elicit a particular response (e.g., an emotion, a particular facial expression, a particular eye movement, or other response) from the user.”);
extracting visual features via a computer vision (CV) module that analyzes the video data of the subject (at least (fig 1 and related text including col 7] “In some embodiments, camera/sensor 112 and/or BAM 104 may include functionality for identifying user responses. For example, camera/sensor 112 may be a three dimensional camera, such as a low-cost, three dimensional camera, configured to identify facial expressions or other responses associated with a moving or active subject (e.g., a face of a hyperactive young child). In this example, camera/sensor 112 and/or BAM 104 may include or utilize one or more algorithms for identifying a facial area using known or identifiable facial regions or landmarks (e.g., a nose, eyebrows, eyes, mouth, etc.) at various angles and/or head positions. Continuing with this example, camera/sensor 112 and/or BAM 104 may also include or utilize one or more algorithms for determining changes to the identified facial area and for determining whether such changes are indicative of one or more particular facial expressions or other responses. Representative algorithms are disclosed herein below.…For example, BAM 104 may track the number of emotions or facial expressions expressed by a user during a stimuli or test session. ” See also  [table 1 and related text] for discussion of constraint measured/collected, and [figs 4a-4d and related text] for discussion of expression) recognition.; and
detecting one or more behavior patterns of the subject via a generative artificial intelligence (Al) module that analyzes the extracted visual features synchronized to the identified predicted actions or responses by the subject (at least [col 5] “ In some embodiments, BAM 104 may generate and/or utilize stimuli designed to exhibit various behaviors that are known or believed to be indicative of autism or other behavioral disorders. For example, the following symptoms have been shown to reliably detect risk for autism based on both retrospective and prospective (high risk infants) studies: overall sustained attention to complex stimuli, reduced range of affective expression (e.g., less smiling and/or more neutral responses), failure to orient to when a user's name is called, and lack of social referencing (e.g., turning head toward a parent to share interest in a surprising or engaging event). In this example, BAM 104 may generate and/or utilize stimuli such that a user's actions can be analyzed with regard to whether any of these symptoms were expressed.” At [col 8] “In some embodiments, analyzing videos and/or other responses may include behavioral coding. Behavioral coding may include identifying and/or scoring user responses into quantitative metrics and may be automated (e.g., performed without user (e.g., human) assistance). For example, coding may involve one or more algorithms or techniques for identifying (e.g., from a recording of a user) various user responses (e.g., “smiling”, “turning head”, “crying”, etc.). In this example, coding may also include one or more algorithms or techniques for scoring identified user responses, e.g., into quantitative metrics that can be compared and/or processed. In another example, behavioral analysis and/or coding may include detecting a user's failure to disengage their attention (e.g., in response to having their name called) and/or to socially reference (e.g., by turning their head to) a parent in response to surprising or engaging stimuli. In yet another example, behavioral analysis and/or coding may include detecting a user's lack of emotional expressiveness and/or fewer smiles (e.g., happy expressions) as compared to a baseline.” At [col 7] “In some embodiments, BAM 104 may include functionality for analyzing video of a user for determining whether one or more responses are indicative of a behavioral disorder. For example, during an automated behavioral assessment, BAM 104 may analyze video of a user for responses to one or more provided stimuli. In this example, BAM 104 may compare the user responses and predetermined base (e.g., “normal” or appropriate) responses for determining whether the user responses are indicative of a behavioral disorder.”). Sapiro as cited teaches all the limitations above, and while Sapiro teaches applicability to audio and text prompting, it does not specifically disclose natural language processing of audio data, nor explicitly discloses synchronizing. Shriberg however does teach: 
synchronizing  audio data to video data to align voice commands within the audio data to actions performed by a subject and captured within the video data (at least [034-037] the NPL, acoustic, and visual outputs are fused into a single stream of data).
identifying [ ] responses by the subject via a voice command recognition module that applies natural language processing (NLP) to the audio data (at least [0367] “Additionally, a navigation module 2834 receives NLP outputs and semantically analyzes the NLP results for command language in near real time. Such commands may include statements such as “Can you repeat that?”, “Please speak up”, “I don't want to talk about that”, etc. These types of ‘command’ phrases indicate to the system that an immediate action is being requested by the user.” At [034-037] the NPL, acoustic, and visual outputs are fused into a single stream of data). Sapiro and Shriberg are analogous references as both disclose various means of using audio/visual data to effectively diagnose a mental health condition. One of ordinary skill in the art would have found the inclusion of NLP processing of audio data received in Shriberg, as well as synchronizing the collected sources, to be an obvious improvement to the audio data collected in Sapiro, as Sapiro also teaches an applicability to text based prompts to a user, and therefore one could reasonably infer that adding NPL capability would allow for a larger range of analysis of the collected data streams common to both references. Further, Shriberg teaches that models that have multiple sources “fused” allow for a greater confident level of each segment of the collected data (see 0034). 

In reference to claim 4, 11: 
Sapiro teaches: wherein the extracted visual features include facial expressions exhibited by the subject, gestures performed by the subject, or movements performed by the subject (at least (fig 1 and related text including col 7] “In some embodiments, camera/sensor 112 and/or BAM 104 may include functionality for identifying user responses. For example, camera/sensor 112 may be a three dimensional camera, such as a low-cost, three dimensional camera, configured to identify facial expressions or other responses associated with a moving or active subject (e.g., a face of a hyperactive young child). In this example, camera/sensor 112 and/or BAM 104 may include or utilize one or more algorithms for identifying a facial area using known or identifiable facial regions or landmarks (e.g., a nose, eyebrows, eyes, mouth, etc.) at various angles and/or head positions. Continuing with this example, camera/sensor 112 and/or BAM 104 may also include or utilize one or more algorithms for determining changes to the identified facial area and for determining whether such changes are indicative of one or more particular facial expressions or other responses. Representative algorithms are disclosed herein below.…For example, BAM 104 may track the number of emotions or facial expressions expressed by a user during a stimuli or test session. ” See also  [table 1 and related text] for discussion of constraint measured/collected, and [figs 4a-4d and related text] for discussion of expression) recognition. 

In reference to claim 5, 12: 
Sapiro further teaches: wherein the CV module analyzes the video data of the subject to extract the visual features by applying an object detection technique, a pose estimation technique, or an activity recognition technique (at least [figs 4a-d and related text, including col 12-13] “The analysis of facial expression is studied in computer vision, psychology, psychiatry, and marketing, all of which require a facial expression recognition (FER) system to be robust to changes in pose. In particular for the psychology and psychiatry fields, risk signs of anxiety and autism can be depicted from facial expressions as the participant is looking at various stimuli [1, 2]. Robustness to pose is especially important since the experts need to analyze participants in their natural states, in other words being observed in an unconstrained manner (see [3] and [4] for examples). Many state of the art facial expression approaches focus on frontal or nearly frontal images of the face [5, 6]. Changes in head pose or facial expression cause nonlinear transformations of the face in a 2D image, making it a non-trivial task to classify expressions under varying poses [7]. Approaches to handle facial expression across multiple poses fall within two main categories. The first category corresponds to approaches based on learning expression models on a discrete set of poses [8, 9]. For example, [8] employ a 2 stage approach where they first train a classifier to distinguish pose, and then train pose-dependent classifiers across expressions. The second category involves approaches that learn the mappings of the expressions as a function of pose [10, 11, 12]. Notably, [10] presents an accurate geometric based approach to first learn the transformation of facial points at any given pose to a frontal pose, then FER is performed on facial points from the projected frontal pose, thus requiring only one posed classifier. The work [12] adopts a Partial Least Squares approach, that has been explored in facial recognition, to model the relations between pairs of images of the same person at different poses and expressions.”). 

Claim(s) 7, 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sapiro in view of Adler et al (US 20220188601 A1, hereinafter Adler). 

In reference to claim 7, 14:
Sapiro further teaches: wherein determining the one or more diagnostic impressions based on the analysis of the multiple command-action pairs within the stream of audiovisual data includes:
performing context analysis of the stream of audiovisual data via a transformer model (at least [at least col 7] “some embodiments, BAM 104 may include functionality for analyzing video of a user for determining whether one or more responses are indicative of a behavioral disorder. For example, during an automated behavioral assessment, BAM 104 may analyze video of a user for responses to one or more provided stimuli. In this example, BAM 104 may compare the user responses and predetermined base (e.g., “normal” or appropriate) responses for determining whether the user responses are indicative of a behavioral disorder.” At [col 8] the responses or metrics are “normalized”); and
performing diagnostic inference of the stream of audiovisual data (at least [fig 1 and related text including col 5] “In some embodiments, BAM 104 may generate and/or utilize stimuli designed to exhibit various behaviors that are known or believed to be indicative of autism or other behavioral disorders. For example, the following symptoms have been shown to reliably detect risk for autism based on both retrospective and prospective (high risk infants) studies: overall sustained attention to complex stimuli, reduced range of affective expression (e.g., less smiling and/or more neutral responses), failure to orient to when a user's name is called, and lack of social referencing (e.g., turning head toward a parent to share interest in a surprising or engaging event). In this example, BAM 104 may generate and/or utilize stimuli such that a user's actions can be analyzed with regard to whether any of these symptoms were expressed.”). Sapiro as cited teaches all the limitations above, but does not teach encode/decoding nor the use of a DNN to evaluate the data. Adler however does teach: 
learning compact representations of the stream of audiovisual data via an autoencoder-decoder of the ML model (at least [fig 2 and related text] “Referring initially to FIG. 2, a portion of an EDNN is shown, implemented utilizing a fully-connected neural network autoencoder architecture 200, also referred to herein as an FNN AD model. The example architecture 200 in this embodiment comprises an input layer 202, a hidden layer encoder 204, a compressed layer 206, a hidden layer decoder 208, and an output layer 210. The hidden layer encoder 204 receives from the input layer 202 an input data subsequence having a relatively high dimension and generates a first intermediate data subsequence having a relatively low dimension for delivery to the compressed layer 206.”);
[diagnosing] the data via a deep neural network (DNN) (at least [0021] “In operation, the processing platform 102 is illustratively configured to obtain, from one or more of the data sources 105, data characterizing a given subject over time, to apply at least a portion of the obtained data to at least one EDNN implemented in the EDNN-based algorithms 110 to generate a prediction of at least one change in at least one of behavior and physiology of the given subject from the obtained data, and to execute at least one automated remedial action relating to the given subject based at least in part on the generated prediction, illustratively via the component controller 112. “ at [044] a “remedial” action is a diagnosis or reporting the information to a medical professional.”  Sapiro and Adler are analogous references as both disclose a diagnosis process for a mental illness. One of ordinary skill in the art would have found the inclusion of auto-encoder/decoder machine learning in conjunction with DNN, as taught by Adler, to be obvious to include in the diagnosis process of Sapiro, as Adler teaches “ Behavioral anomalies are often an early warning sign of mental health deterioration across a variety of conditions, including depression and psychosis. Accordingly, some embodiments disclosed herein predict early warning signs of psychotic relapse from passive sensing data. Other embodiments are applied in a wide variety of other use cases… One or more such embodiments illustratively further provide various types of automated remediation responsive to predictions generated by the one or more EDNNs. For example, some embodiments implement EDNN-based prediction and remediation algorithms to at least partially automate various aspects of patient care in healthcare applications such as telemedicine. “ (see 004, 005). As such, one of ordinary skill in the art would have found the improved speed with which a DNN can detect a behavioral change as taught by Adler to be an obvious improvement to the mental health diagnosis model as taught in Sapiro in order to provide earliest diagnosis – particularly as both references stress that an early diagnosis is imperative to providing appropriate intervention. 

Relevant Prior Art
The following references are made a part of the record: 

US 20170251985 A1 to Howard discloses a means of using facial information and expressions to determine a 

diagnosis.

Conclusion



Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHERINE KOLOSOWSKI-GAGER whose telephone number is (571)270-5920. The examiner can normally be reached Monday - Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mamon Obeid can be reached at 571-270-1813. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KATHERINE . KOLOSOWSKI-GAGER/
Primary Examiner
Art Unit 3687



/KATHERINE KOLOSOWSKI-GAGER/Primary Examiner, Art Unit 3687
Read full office action
Prosecution Timeline

Mar 19, 2025
Application Filed
Feb 21, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/075,541
Patent 12499467
PREDICTING THE EFFECTIVENESS OF A MARKETING CAMPAIGN PRIOR TO DEPLOYMENT
2y 5m to grant Granted Dec 16, 2025
17/582,882
Patent 12462273
SYSTEM AND METHOD FOR USING DEVICE DISCOVERY TO PROVIDE ADVERTISING SERVICES
2y 5m to grant Granted Nov 04, 2025
18/029,948
Patent 12462938
MACHINE-LEARNING MODEL FOR GENERATING HEMOPHILIA PERTINENT PREDICTIONS USING SENSOR DATA
2y 5m to grant Granted Nov 04, 2025
17/310,176
Patent 12444507
BAYESIAN CAUSAL INFERENCE MODELS FOR HEALTHCARE TREATMENT USING REAL WORLD PATIENT DATA
2y 5m to grant Granted Oct 14, 2025
18/180,325
Patent 12437315
SYSTEMS AND METHODS FOR DYNAMICALLY DETERMINING EVENT CONTENT ITEMS
2y 5m to grant Granted Oct 07, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
26%
Grant Probability
60%
With Interview (+33.6%)
4y 3m
Median Time to Grant
Low
PTA Risk
Based on 358 resolved cases by this examiner. Grant probability derived from career allow rate.