Prosecution Insights
Last updated: April 19, 2026
Application No. 18/434,201

SYSTEM AND METHOD FOR ADJUSTING DIGITAL CONTENT BASED ON USER ENGAGEMENT ASSESSMENT

Final Rejection §101§103
Filed
Feb 06, 2024
Examiner
BOLEN, NICHOLAS D
Art Unit
3624
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Constructor Education And Research Genossenschaft
OA Round
2 (Final)
10%
Grant Probability
At Risk
3-4
OA Rounds
4y 3m
To Grant
20%
With Interview

Examiner Intelligence

Grants only 10% of cases
10%
Career Allow Rate
12 granted / 122 resolved
-42.2% vs TC avg
Moderate +10% lift
Without
With
+10.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
29 currently pending
Career history
151
Total Applications
across all art units

Statute-Specific Performance

§101
36.5%
-3.5% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 122 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Notice to Applicant Claims 1, 3, 12-13 and 17 are presently amended. Claims 1-20 are pending. Response to Amendment Applicant’s amendments are acknowledged. Response to Arguments Applicant' s arguments filed 9/4/2025 have been fully considered in view of further consideration of statutory law, Office policy, precedential common law, and the cited prior art as necessitated by the amendments to the claims, and are persuasive in-part for the reasons set forth below. 35 USC § 112f Interpretation First, Applicant argues that “Without waiver or disclaimer, and solely in an effort to further prosecution, the at-issue limitations have been clarified with respective sufficient structure to perform the respective claimed functions. (See Published Application, [0033].) Applicant respectfully requests withdrawal of any interpretation under 35 U.S.C. § 112(f)” [Arguments, page 10]. In response, Applicant’s arguments are considered and are persuasive. Examiner observes that the previously identified ‘unit’ and ‘module’ elements are presently amended with sufficient structure such that the claims no longer require interpretation under 35 U.S.C. § 112(f). 35 USC § 101 Rejections First, Applicant argues that “measuring user engagement while interacting with digital content and adjusting digital content specific to a user is beyond the sub-groupings of social activities, teaching, or following rules or instructions. In particular, the limited "social activities, teaching, or following rules or instructions" are decidedly unique to human activity, which is inapposite to the claimed adjusting digital content specific to a user based on the user's interaction with the system. Claim 1 therefore does not recite a mental process under Step 2A Prong One” [Arguments, pages 11-12]. In response, Applicant’s arguments are considered but are not persuasive. Examiner respectfully disagrees and maintains that the present invention recites an abstract idea. First, with respect to the assertion that Claim 1 “does not recite a mental process under Step 2A Prong One”, Examiner respectfully has not alleged that the present claims recite a mental process. Instead, Examiner maintains that the present claims recite certain methods of organizing human activity. In particular, Examiner maintains that adjusting digital content based on analyzing user engagement data is considered to describe steps for managing personal behavior. Further, the present limitations describe steps for commercial or legal interactions which includes agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; and business relations. Specifically, adjusting digital content based on analyzing user engagement data is considered to describe steps for advertising marketing or sales activities or behaviors. Thus, claims 1 and 12 recite concepts identified as abstract ideas. As such, Examiner remains unpersuaded. Second, Applicant argues that “The specific adjustment of digital content based on at least one calculated engagement score, the focus time, and the attention recovery time necessarily limits the claim in am meaningful way such that the claim is not designed to monopolize any purported exception. Claim 1 therefore integrates any purported abstract idea into a practical application under Step 2A Prong Two. Nevertheless, without waiver or disclaimer, and solely in an effort to further prosecution, claim 1 has been amended to recite, in combination with the other limitations of the claim, "adjusting the digital content…” Such dynamic modification plainly limits the claim in a meaningful way and the claim as a whole integrates any purported judicial exception into a practical application by the specific digital content adjustment and providing to a user device. (See Published Application, [0089]-[0090].) Amended claim 1 therefore integrates any purported abstract idea into a practical application under Step 2A Prong Two” [Arguments, pages 12-13]. In response, Applicant’s arguments are considered but are not persuasive. Examiner respectfully disagrees and maintains that the present invention recites a judicial exception without significantly more. In response to the assertion that “Such dynamic modification plainly limits the claim in a meaningful way and the claim as a whole integrates any purported judicial exception into a practical application by the specific digital content adjustment and providing to a user device” and “therefore integrates any purported abstract idea into a practical application”, Examiner respectfully disagrees and observes that the modification of digital content is not claimed at a level of specificity that could be considered to demonstrate a practical application. In particular, the claims do not describe how the at least one of the engagement score, the focus time, and the attention recovery time are utilized to modify the digital content, other than to claim that at least one of those parameters are relied upon. Further, the claimed sequence modification and organization modifications of the digital content is not described in any meaningful way as how they relate to or rely upon the aforementioned engagement score, focus time, and attention recovery time. Further still, the additional elements including, for example, the claimed machine-learning models are broadly recited and are considered to amount to merely including instructions to implement an abstract idea in a computer environment (i.e. “apply it”), as discussed in MPEP § 2106.05(f). Thus, Examiner respectfully maintains that the present invention recites a judicial exception without significantly more. As such, Examiner remains unpersuaded. 35 USC § 103 Rejections First, Applicant argues that “Claim 1 requires, in combination with the other limitations of the claim, "collecting user data including video images from a camera, input data from an input/output device, application activity data, and system events data" (emphasis added). The Office Action asserts that paragraph [0074] of Zhou discloses the required input data. (Office Action, p. 11.) However, [0074] merely discloses that "the processor 202 may communicate with an input/output (I/O) interfaces 222, which may enable interfacing the one or more input devices 220 " which is not that data from such I/O is collected as user data for subsequent analysis. Further, the Office Action asserts that paragraph [0124] of Zhou discloses the required system events data. (Office Action, p. 11.) However, [0124] merely discloses that "another device in communication with the attendee device via the network is configured to send at least some of the presenter data to the attendee device", which is not system events data collected as user data for subsequent analysis” [Arguments, pages 13-14]. In response, Applicant’s arguments are considered but are not persuasive. With respect to the argument that Zhao, at “[0074] merely discloses that "the processor 202 may communicate with an input/output (I/O) interfaces 222, which may enable interfacing the one or more input devices 220" which is not that data from such I/O is collected as user data for subsequent analysis”, Examiner respectfully disagrees and maintains that Zhao renders the above-argued elements of the limitation obvious. In particular, Examiner directs the Applicant to, (Zhao, ¶ 122, In some examples, the presenter 110 presents a lecture showing education material (ex. textbook) to attendees. The presenter's device includes a camera and microphone for capturing video of the presenter's face and/or instructional objects (e.g. a textbook and whiteboard) and audio input of the presenter's voice. The presenter data includes this video data (as visual presentation data) and audio data, as well as presentation interaction data indicating interaction with the education material (e.g. pointer movement, drawing, and text input by various input devices of the presenter device such as mouse, stylus, finger touch and keyboard). Here, Zhao explicitly discloses collecting input data from various I/O devices in accordance with the claims of the present invention. Similarly, with regard to the assertion that Zhao, at [0124] merely discloses that "another device in communication with the attendee device via the network is configured to send at least some of the presenter data to the attendee device", which is not system events data collected as user data for subsequent analysis”, Examiner respectfully disagrees and maintains that sending presentation data through the networked system to attendee devices and collecting attention level data as recited in ¶ 124 of Zhao amounts to the collection of the broadly claimed and undefined ‘system data’ of the present system. Thus, Examiner respectfully maintains that the art of record renders the above-argued limitation obvious. As such, Examiner remains unpersuaded. Second, Applicant argues that “Claim 1 requires "applying a plurality of machine-learning models, wherein the machine-learning models are configured to analyze footage and a dataset for face training and to analyze the collected user data, wherein the plurality of machine learning models includes models for face detection, emotion detection, focus detection, and user activity detection" (emphasis added). The Office Action asserts that the boilerplate disclosure in Zhou [0167] of "an algorithm trained using machine learning and training data" is evidence of the required machine learning models. Quite simply, "a mathematical function that may be selected based on a rule-based or machine learning-based algorithm, e.g., a rule-based expert system designed based on collected data, or an algorithm trained using machine learning and training data" as disclosed in Zhou [0167] is not evidence of each of models for face detection, emotion detection, focus detection, and user activity detection, as required” [Arguments, pages 14-15]. In response, Applicant’s arguments are considered but are not persuasive. With respect to the argument that Zhao “"a mathematical function that may be selected based on a rule-based or machine learning-based algorithm, e.g., a rule-based expert system designed based on collected data, or an algorithm trained using machine learning and training data" as disclosed in Zhou [0167] is not evidence of each of models for face detection, emotion detection, focus detection, and user activity detection, as required”, Examiner respectfully disagrees and maintains that the machine learning algorithm elements of Zhao, in combination with the face detection, emotion detection, focus detection, and user activity detection elements of Zhao renders the above argued limitation obvious. In particular, Examiner directs the Applicant to (Zhao, ¶ 98, The useful attendee video information may include body movement for example, gesture to get attention (raising hands, pointing or the like) (discloses user activity detection), gesture to express agreement or disagreement (e.g. by nodding heads), facial expressions indicative of the attention of the attendees, such as, eye gaze, posture, unintentional body movement, or the like. All such useful attendee video information may be in the series of images received by the video conferencing server 250 which are processed by the video conferencing server 250 to determine various attributes of the attendees 120. The video conferencing server 250 may use the attributes to determine an indication about the attendees 120. An indication may include but are not limited to if one or more attendees 120 want to ask questions, if one or more attendees 120 are attentive and understanding the presenter information 132, if one or more attendees are getting sentimental or emotional about the presenter information 132, if one or more attendees 120 are laughing and enjoying the presenter information 132, if one or more attendees 120 are not attentive or lost interest in the presenter information 132 or the like), (Id., ¶ 164, At 718, the attention monitoring system processes the video frame of attendee data to determine a current raw attention level (discloses analyzing footage), a, for the attendee. After determining a, the attention monitoring system sets a value A.sub.k/n=a, indicating a raw attention level for the current (k/n).sup.th sample. Thus, if k=120 at the current frame and n=10, the current sample is the 12th sample, and A.sub.12=a.), and to (Id., ¶ 100, In order to determine various attributes (e.g. body movement including raising hands, waving hands, pointing hands applauding, facial expression, or the like) associated with the attendees 120 in the attendee information 134, in certain embodiments, the video conferencing server 250 may process the attendee video information 134 (i.e. perform face detection and body detection on a series of images included in the attendee information 134) using any suitable computer-vision technique as described below. In certain embodiments, the video conferencing server 250 may be configured to process attendee audio/sound information included in the attendee information 134). Here, Zhao discloses face detection, attention level detection (i.e. focus detection), emotion detection and gesture detection (i.e. user activity detection) in accordance with the present invention. Examiner combines these elements with the machine learning modeling elements of Zhao to render the above-argued limitation obvious. As such, Examiner remains unpersuaded. Third, Applicant argues that “Claim 1 further requires "estimating an attention recovery time" (emphasis added). Quite simply, Zhou fails to disclose an attention recovery time. In particular, Zhou discloses an "attention level", not an attention recovery time. (Office Action, p, 18; citing Zhou FIG. 8, [0182]; see [0182], "FIG. 8 is a first user interface screen 800 showing the current attention levels of multiple attendees, as well as an overall current attention level." (emphasis added).)” [Arguments, page 15]. In response, Applicant’s arguments are considered but are not persuasive. With respect to the argument that Zhou fails to disclose an attention recovery time, Examiner respectfully disagrees and maintains that the graph depicted in Fig. 8 of Zhao, in combination with other previously cited elements of Zhao renders the above-argued and presently amended limitation obvious. In particular, Examiner directs the Applicant to (Zhao, ¶ 182, Attention states as shown in the example screen 800 may be identified by categorizing ranges of the attention level of an attendee, e.g., the B value generated by method 700. In this example screen 800, attention level is shown on a scale of 0 to 100, with a top range (e.g., 76-100) being categorized as “attentive”, a medium-high range (e.g., 51-75) being categorized as “fair”, a medium-low range (e.g., 26-50) being categorized as “distracted”, and a bottom range (e.g., 0-25) being categorized as “sleepy”. It will be appreciated that various embodiments may categorize or characterize attention levels differently). Here, and in combination with the Fig. 8, which scores the attention level of an attendee over time and depicts an graphed array of data used to determine the focus time and the attention recovery time between peaks of focus time, Zhao renders the presently amended limitation obvious. As such, Examiner remains unpersuaded. Fourth, Applicant argues that “Claim 1 further requires "adjusting the digital content provided to a particular user based on at least one calculated engagement score, the focus time, and the attention recovery time." The Office Action concedes that Zhou fails to disclose the required adjusting and instead refers to Tunick. (Office Action, pp. 18-20.) In particular the Office Action asserts that "report generating application 304 of Tunick "may enhance, enlarge, or change colors of all or some advertisements or reports not being viewed by the viewer" based on a gaze shifting determination. (See Tunick, [0079], [0083].) However, such purported adjustment is not based on at least one calculated engagement score, a focus time, and a attention recovery time as required” [Arguments, page 15]. In response, Applicant’s arguments are considered but are not persuasive. With respect to the argument that Tunick’s gaze shifting determination is not based on at least one calculated engagement score, a focus time, and an attention recovery time as required, Examiner respectfully disagrees. In particular, Examiner respectfully maintains that the gaze shifting determination amounts to a determination based on focus time, wherein the focus time of Tunick could be considered to be an instantaneous focus time (i.e. as soon as the user’s gaze is shifted, a focus time is calculated and the digital content is adjusted. Thus, Examiner respectfully maintains that the art of record renders the above-argued limitation obvious. As such, Examiner remains unpersuaded. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Step 1: Claims 1-20 are directed to statutory categories, namely a process (claims 1-11), and a machine (claims 12-20). Step 2A, Prong 1: Claims 1 and 12 in part, recite the following abstract idea: …A … method for measuring user engagement while interacting with digital content, the method comprising: collecting user data including…, input data from…, application activity data, and system events data; applying … are configured to analyze footage and a dataset for face training and to analyze the collected user data, wherein the plurality of … includes models for face detection, emotion detection, focus detection, and user activity detection; associating each user session with corresponding events and event parameters output from… ; calculating the total time of user focus based on marked events and event parameters for each user session as a focus time, wherein the marked events reflect the total time of user focus; assessing changes in user engagement scores throughout each user session using the marked events, the event parameters, and the focus time; populating a focused time array comprising a plurality of times of focused engagement, each of the plurality of times corresponding to at least one corresponding event; populating an interruption time array comprising a plurality of times of interruption in a user's interaction with at least one corresponding event; estimating an attention recovery time using the focused time array and the interruption time array and at least two consecutive focused events from the plurality of times of focused engagement in the focused time array and a maximum of the plurality of times of interruption, wherein the attention recovery time quantifies the time taken by the user to re-engage with content after an interruption; and adjusting the digital content provided to a particular user based on at least one calculated engagement score, the focus time, and the attention recovery time including by dynamically modifying an organization and a sequence of the digital content for the particular user and providing the adjusted digital content… of the particular user, without modifying the digital content for at least one other user [Claim 1], A system for measuring user engagement while interacting with digital content, comprising: …to capture user interactions, including… device inputs, application activities, and system events as collected data; an engagement analysis service configured to process the collected data and generate raw engagement events by correlating and filtering collected data; …to aggregate the raw engagement events, and generate an individual client metric and a group metric as aggregated data, populate a focused time array comprising a plurality of times of focused engagement, each of the plurality of times corresponding to at least one corresponding event; populate an interruption time array comprising a plurality of times of interruption in a user's interaction with at least one corresponding event;estimate an attention recovery time using the focused time array and the interruption time array and at least two consecutive focused events from the plurality of times of focused engagement in the focused time array and a maximum of the plurality of times of interruption, wherein the attention recovery time quantifies the time taken by the user to re-engage with content after an interruption; … to receive the aggregated data and provide a visual representation of engagement data over time using …; …to collaborate with … to dynamically modify the digital content based on the engagement data including by dynamically modifying an organization and a sequence of the digital content for a particular user and providing the adjusted digital content to …of the particular user, without modifying the digital content for at least one other user. [Claim 12]. These concepts are not meaningfully different than the following concepts identified by the MPEP: Concepts relating to certain methods of organizing human activity. The aforementioned limitations describe steps for managing personal behavior or relationships or interactions between people, including social activities, teaching, and following rules or instructions. Specifically, adjusting digital content based on analyzing user engagement data is considered to describe steps for managing personal behavior. Further, the aforementioned limitations describe steps for commercial or legal interactions which includes agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; and business relations. Specifically, adjusting digital content based on analyzing user engagement data is considered to describe steps for advertising marketing or sales activities or behaviors. As such, claims 1 and 12 recite concepts identified as abstract ideas. The dependent claims recite limitations relative to the independent claims, including, for example: …further comprising generating personalized content recommendations based on the engagement scores and the focus time [Claim 2], …wherein adjusting the digital content comprises altering the pacing of content delivery to match a certain engagement score [Claim 3], …wherein adjusting the digital content comprises modifying the format of content presentation to align with preferences [Claim 4], …wherein adjusting the digital content comprises incorporating interactive elements, quizzes, or gamification to increase engagement scores [Claim 5], …wherein adjusting the digital content includes transitioning from a text-based format to a graphic-based format, based on user engagement patterns [Claim 6], The limitations of these dependent claims are merely narrowing the abstract idea identified in the independent claims, and thus, the dependent claims also recite abstract ideas. Step 2A, Prong 2: This judicial exception is not integrated into a practical application. In particular, claims 1 and 12 only recite the following additional elements – …computer-implemented… video images from a camera… an input/output device…; …a plurality of machine-learning models, wherein the machine-learning models… machine-learning models…; …at least one of the machine-learning models…; …to memory accessible by a device… [Claim 1], …at least one processor and memory operably coupled to the at least one processor…; …a data collection module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor… screen images, webcam video…; … a data aggregation module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor…; a reporting module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor…; a graphical monitoring interface comprising instructions stored in the memory that,when executed by the at least one processor, cause the at least one processor…; and a content adjustment module… a Learning Management System (LMS)…; …to memory accessible by a device… [Claim 12]. The machine learning models, modules and executable instructions are recited at a high-level of generality (see MPEP § 2106.05(a)), like the following MPEP example: iii. Gathering and analyzing information using conventional techniques and displaying the result, TLI Communications, 823 F.3d at 612-13, 118 USPQ2d at 1747-48; Furthermore, the computer implemented element is considered to amount to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05(f)), like the following MPEP example: i. A commonplace business method or mathematical algorithm being applied on a general purpose computer, Alice Corp. Pty. Ltd. V. CLS Bank Int’l, 573 U.S. 208, 223, 110 USPQ2d 1976, 1983 (2014); Gottschalk v. Benson, 409 U.S. 63, 64, 175 USPQ 673, 674 (1972); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); Accordingly, these additional elements do not integrate the abstract idea into a practical application. The remaining dependent claims do not recite any new additional elements, and thus do not integrate the abstract idea into a practical application. Step 2B: Claims 1 and 12 and their underlying limitations, steps, features and terms, considered both individually and as a whole, do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the following reasons: Independent claims 1 and 12 only recite the following additional elements – …computer-implemented… video images from a camera… an input/output device…; …a plurality of machine-learning models, wherein the machine-learning models… machine-learning models…; …at least one of the machine-learning models…; …to memory accessible by a device… [Claim 1], …at least one processor and memory operably coupled to the at least one processor…; …a data collection module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor… screen images, webcam video…; … a data aggregation module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor…; a reporting module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor…; a graphical monitoring interface comprising instructions stored in the memory that,when executed by the at least one processor, cause the at least one processor…; and a content adjustment module… a Learning Management System (LMS)…; …to memory accessible by a device… [Claim 12]. These elements do not amount to significantly more than the abstract idea for the reasons discussed in 2A prong 2 with regard to MPEP 2106.05(a) and MPEP 2106.05(f). By the failure of the elements to integrate the abstract idea into a practical application there, the additional elements likewise fail to amount to an inventive concept that is significantly more than an abstract idea here, in Step 2B. As such, both individually or in combination, these limitations do not add significantly more to the judicial exception. The remaining dependent claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the dependent claims do not recite any new additional elements other than those mentioned in the independent claims, which amount to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05(f)). As such, these claims are not patent eligible. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-5 and 8-19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al., U.S. Publication No. 2023/0222932 [hererinafter Zhou] in view of Tunick et al., U.S. Publication No. 2008/0147488 [hereinafter Tunick]. Regarding Claim 1, Zhou discloses …A computer-implemented method for measuring user engagement while interacting with digital content, the method comprising: collecting user data including video images from a camera, input data from an input/output device, application activity data, and system events data (Zhou, ¶ 6, In various embodiments described herein, methods, systems, computing devices, and processor-readable media are disclosed that provide context-aware estimation of student attention in online learning. (discloses measuring user engagement while interacting with digital content) In contrast to existing approaches, which monitor student attention levels throughout an entire class session or presentation, the present disclosure describes embodiments that filter or restrict the time periods in which student attention is monitored or assessed to those time periods in which student attention is important. These time periods of high attention importance may be determined by processing data from the teacher, such as audio data representing the teacher's voice and/or visual presentation data representing slides or other visual material being presented to the students. By limiting attention monitoring to periods in which attention is important, embodiments described herein may achieve a more accurate and relevant measure of student engagement with educational content, which may be a more useful and relevant metric for measuring progress toward the desired objectives (e.g., student success and learning gains) than overall student attention levels during an entire class session), (Id., ¶ 122, In some examples, the presenter 110 presents a lecture showing education material (ex. textbook) to attendees. The presenter's device includes a camera and microphone for capturing video of the presenter's face and/or instructional objects (e.g. a textbook and whiteboard) and audio input of the presenter's voice. The presenter data includes this video data (as visual presentation data) and audio data, as well as presentation interaction data indicating interaction with the education material (e.g. pointer movement, drawing, and text input by various input devices of the presenter device such as mouse, stylus, finger touch and keyboard) (discloses input data)), (Id., ¶ 74, the processor 202 may communicate with an input/output (I/O) interfaces 222, which may enable interfacing the one or more input devices 220 (e.g., a keyboard, a mouse, a joystick, trackball, fingerprint detector and the like) and/or output devices 222 (e.g., a printer, peripheral display device, and the like)), (Id., ¶ 121, Presenter interaction data indicates an interaction of the speaker with the visual presentation, and may be derived from one or more other types of presenter data. For example, presenter interaction data may include indications that the presenter is moving a pointer or laser pointer (in captured video), (discloses application activity data) is moving a cursor or mouse pointer (on a computer desktop, e.g., the presenter device), is touching a touch-sensitive user input device of the presenter device with a finger or stylus, is moving a pen or piece of chalk on a writing surface (in captured video), or is inputting text through a text input device of the presenter device. In some examples, the presenter data may include a pre-recorded presentation (including audio and/or video content) having a duration spanning the presentation period and having one or more annotations indicating time periods of the presentation period during which it is important for attendees to pay attention. In some examples, these annotations may indicate an attention importance level for one or more periods of the presentation period, such as high/medium/low attention importance levels, or a continuous scalar value indicating an attention importance level. The attention importance level indicates a degree to which it is important for an attendee 120 to pay attention to the content being presented by the presenter 110, as described in greater detail below with reference to step 306), (Id., ¶ 124, the attention monitoring system, the presenter device, or another device in communication with the attendee device via the network is configured to send at least some of the presenter data to the attendee device (discloses system data). The attendee device is configured to display the visual presentation data of the presenter data on a display, and to play the audio data of the presenter data on a speaker. The attendee device includes a camera for capturing images of the student's head as a sequence of video frames. These video frames are sent to the attention monitoring system via the network as attendee data (discloses video image data)), (Id., ¶ 125, the attendee data may be processed to determine attendee attention levels (according to step 308 below) locally on the attendee device before sending the resulting attention level data to the attention monitoring system, instead of sending the video data to the attention monitoring system and performing step 308 on the attention monitoring system, as described below. In such embodiments, the attendee data may include attention level data as determined at step 308 below. In some embodiments, attention level data may be generated by the attendee device at all times, but may only be requested by the attention monitoring system during periods of high attention importance (as determined below at step 306), in response to which request the attendee device would send the attention level data as part of the attendee data); applying a plurality of machine-learning models, wherein the machine-learning models are configured to analyze footage and a dataset for face training and to analyze the collected user data, wherein the plurality of machine-learning models includes models for face detection, emotion detection, focus detection, and user activity detection (Id., ¶ 164, At 718, the attention monitoring system processes the video frame of attendee data to determine a current raw attention level (discloses analyzing footage), a, for the attendee. After determining a, the attention monitoring system sets a value A.sub.k/n=a, indicating a raw attention level for the current (k/n).sup.th sample. Thus, if k=120 at the current frame and n=10, the current sample is the 12th sample, and A.sub.12=a.), (Id., ¶ 167, At 720, the attention monitoring system processes the previous L raw attention levels A.sub.(k/n)−L+1 through A.sub.(k/n) to calculate a smoothed attention level b, and a value B.sub.(k/n) is set to B.sub.(k/n)=b. The processing of the previous L raw attention levels to generate the smoothed attention value b can be based on a mathematical function that may be selected based on a rule-based or machine learning-based algorithm (discloses machine learning model for focus detection), e.g., a rule-based expert system designed based on collected data, or an algorithm trained using machine learning and training data. In some embodiments, the raw attention level are processed by removing the lowest 10% and highest 10% of raw attention levels from the past L raw attention levels, and then calculating a mean of the remaining samples. It will be appreciated that other smoothing functions may be applied to one or more raw attention levels A to calculate the values of b and B.sub.(k/n) in different embodiments), (Id., ¶ 98, The useful attendee video information may include body movement for example, gesture to get attention (raising hands, pointing or the like) (discloses user activity detection), gesture to express agreement or disagreement (e.g. by nodding heads), facial expressions indicative of the attention of the attendees, such as, eye gaze, posture, unintentional body movement, or the like. All such useful attendee video information may be in the series of images received by the video conferencing server 250 which are processed by the video conferencing server 250 to determine various attributes of the attendees 120. The video conferencing server 250 may use the attributes to determine an indication about the attendees 120. An indication may include but are not limited to if one or more attendees 120 want to ask questions, if one or more attendees 120 are attentive and understanding the presenter information 132, if one or more attendees are getting sentimental or emotional (discloses emotion detection) about the presenter information 132, if one or more attendees 120 are laughing and enjoying the presenter information 132, if one or more attendees 120 are not attentive or lost interest in the presenter information 132 or the like), (Id., ¶ 100, In order to determine various attributes (e.g. body movement including raising hands, waving hands, pointing hands applauding, facial expression, or the like) associated with the attendees 120 in the attendee information 134, in certain embodiments, the video conferencing server 250 may process the attendee video information 134 (i.e. perform face detection (discloses face detection) and body detection on a series of images included in the attendee information 134) using any suitable computer-vision technique as described below. In certain embodiments, the video conferencing server 250 may be configured to process attendee audio/sound information included in the attendee information 134); associating each user session with corresponding events and event parameters output from at least one of the machine-learning models (Id., ¶ 102, In certain non-limiting embodiments, the video conferencing server 250 may be configured to perform face detection on the attendee video information 134 to detect one or more faces in the attendee video information 134, where each detected face corresponds to one attendee 120 in the attendee video information 134. Based on each face detected in the attendee video information 134, the video conferencing server 250 may generate a bounding box for each respective detected face. Further, the video conferencing server 250 may be configured to perform face recognition on each respective detected face in the attendee video information 134. Performing face recognition for each respective detected face includes monitoring changes in the bounding box generated for a respective face in the attendee video information 134 to determine facial attributes for the respective detected face and analyzing the facial attributes for the respective detected face to infer (i.e. predict) a facial expression, emotion, or attention for the respective detected face. Examples of facial attributes include head pose, face landmark (e.g., forehead, lips, eyes) (discloses event parameters), and eye gaze. Examples of facial expressions inferred (i.e. predicted) for a detected face (i.e. a attendee 120 of the video conference) include laughing, smiling, nodding, (discloses events) examples of attention inferred (i.e. predicted) for a detected face include looking at the attendee display, and examples of emotion inferred (i.e. predicted) for a detected face are having a serious expression), (Id., ¶ 167, At 720, the attention monitoring system processes the previous L raw attention levels A.sub.(k/n)−L+1 through A.sub.(k/n) to calculate a smoothed attention level b, and a value B.sub.(k/n) is set to B.sub.(k/n)=b. The processing of the previous L raw attention levels to generate the smoothed attention value b can be based on a mathematical function that may be selected based on a rule-based or machine learning-based algorithm (discloses machine learning model for focus detection), e.g., a rule-based expert system designed based on collected data, or an algorithm trained using machine learning and training data. In some embodiments, the raw attention level are processed by removing the lowest 10% and highest 10% of raw attention levels from the past L raw attention levels, and then calculating a mean of the remaining samples. It will be appreciated that other smoothing functions may be applied to one or more raw attention levels A to calculate the values of b and B.sub.(k/n) in different embodiments); calculating the total time of user focus based on marked events and event parameters for each user session as a focus time, wherein the marked events reflect the total time of user focus (Id., ¶ 178, FIG. 8 is a first user interface screen 800 showing the current attention levels of multiple attendees, as well as an overall current attention level), (Id., ¶ 179, The screen 800 shows a current class attention level indicator 802 indicating an aggregate attention level for a group of multiple attendees, e.g., a class consisting of multiple students. The current class attention level indicator 802 is shown as a circle containing a textual representation 804 of the current class attention level (shown here as “25%”, indicating that 25 percent of attendees of the presentation are in an “attentive” state as defined below), as well as a graphical representation 806 of the current class attention level (shown here as a coloured arc along 25% of the circumference of a circle, indicating that 25 percent of attendees of the presentation are in an attentive state). Supplementary text 808 shows additional indicia of current class attention (shown here are the text “2 sleepy”, indicating that 2 of the students in the class are in a “sleepy” state as defined below), (Id., ¶ 182, Attention states as shown in the example screen 800 may be identified by categorizing ranges of the attention level of an attendee, e.g., the B value generated by method 700. In this example screen 800, attention level is shown on a scale of 0 to 100, with a top range (e.g., 76-100) being categorized as “attentive”, a medium-high range (e.g., 51-75) being categorized as “fair”, a medium-low range (e.g., 26-50) being categorized as “distracted”, and a bottom range (e.g., 0-25) being categorized as “sleepy”. It will be appreciated that various embodiments may categorize or characterize attention levels differently), (Id., Fig. 8. figure depicts a graph with the total amount of focus time (i.e. “attentive”)); PNG media_image1.png 517 432 media_image1.png Greyscale assessing changes in user engagement scores throughout each user session using the marked events, the event parameters, and the focus time (Id., ¶ 180, Each attendee is represented by an avatar 810, such as a photo or icon identifying the attendee. The avatar 810 is surrounded by a graphical representation 812 of the attendee's current attention level (e.g., the last B value generated by method 700 for that attendee)), (Id., ¶ 102, Performing face recognition for each respective detected face includes monitoring changes in the bounding box generated for a respective face in the attendee video information 134 to determine facial attributes for the respective detected face and analyzing the facial attributes for the respective detected face to infer (i.e. predict) a facial expression, emotion, or attention for the respective detected face. Examples of facial attributes include head pose, face landmark (e.g., forehead, lips, eyes) (discloses event parameters), and eye gaze. Examples of facial expressions inferred (i.e. predicted) for a detected face (i.e. a attendee 120 of the video conference) include laughing, smiling, nodding, (discloses events) examples of attention inferred (i.e. predicted) for a detected face include looking at the attendee display, and examples of emotion inferred (i.e. predicted) for a detected face are having a serious expression); populating a focused time array comprising a plurality of times of focused engagement, each of the plurality of times corresponding to at least one corresponding event (Id., ¶ 186, The overall class attention over time shown in graph 908 may be calculated in some embodiments by calculating a mean or other aggregation or averaging function of the attention level (e.g., B) of each attendee. (discloses a focused/distracted time array comprised of a plurality of times corresponding to events/modules) An overall attendee attention level for the presentation, or for an interval of the presentation, may be similarly calculated by calculating a mean or other aggregation or averaging function of the attention levels (e.g., B values) of the attendee over the entire presentation or interval), (Id., ¶ 187, The screen 900 shows a top students area 914 including a list of attendees having a high overall attention level for the presentation relative to the other attendees. Each top student (i.e. attendee having a high overall attention level for the presentation) is shown with his or her avatar 916, name 918, and an indicator 920 of how many times the attendee has been a top student over a period of time such as a semester or over all time to date), (Id., ¶ 188, The screen 900 shows an attention performance by module area 922 including a bar graph showing attention metrics for each of a plurality of modules of the presentation. In this example, each module corresponds to an interval of the presentation as defined above with reference to step 310 of method 300. In this screen 900, the X axis 926 shows five modules (discloses events (i.e. education modules)), each module having a textual identification of overall class attention for the module (e.g., “Good”, “Fair”, or “Poor”). The overall class attention for a module may be calculated based on some combination of the metrics for the module), (Id., ¶ 189, The metrics shown for each module are shown as bars of the bar graph, with height indicating a higher level of that metric as indicated by the Y axis 924 (discloses attentive and distracted/interruption levels at the time of each presentation module), which shows a number of students categorized by that metric. For example, module 1 is shown having “good” performance based on a first metric 928 showing how many students' overall attention performance was “attentive” during module 1, a second metric 930 showing how many students' overall attention performance was “distracted” during module 1, and a third metric 932 showing how many students' overall attention performance was “sleepy” during module 1); populating an interruption time array comprising a plurality of times of interruption in a user's interaction with at least one corresponding event (Id., ¶ 186, The overall class attention over time shown in graph 908 may be calculated in some embodiments by calculating a mean or other aggregation or averaging function of the attention level (e.g., B) of each attendee. (discloses a focused/distracted time array comprised of a plurality of times corresponding to events/modules) An overall attendee attention level for the presentation, or for an interval of the presentation, may be similarly calculated by calculating a mean or other aggregation or averaging function of the attention levels (e.g., B values) of the attendee over the entire presentation or interval), (Id., ¶ 187, The screen 900 shows a top students area 914 including a list of attendees having a high overall attention level for the presentation relative to the other attendees. Each top student (i.e. attendee having a high overall attention level for the presentation) is shown with his or her avatar 916, name 918, and an indicator 920 of how many times the attendee has been a top student over a period of time such as a semester or over all time to date), (Id., ¶ 188, The screen 900 shows an attention performance by module area 922 including a bar graph showing attention metrics for each of a plurality of modules of the presentation. In this example, each module corresponds to an interval of the presentation as defined above with reference to step 310 of method 300. In this screen 900, the X axis 926 shows five modules (discloses events (i.e. education modules)), each module having a textual identification of overall class attention for the module (e.g., “Good”, “Fair”, or “Poor”). The overall class attention for a module may be calculated based on some combination of the metrics for the module), (Id., ¶ 189, The metrics shown for each module are shown as bars of the bar graph, with height indicating a higher level of that metric as indicated by the Y axis 924 (discloses attentive and distracted/interruption levels at the time of each presentation module), which shows a number of students categorized by that metric. For example, module 1 is shown having “good” performance based on a first metric 928 showing how many students' overall attention performance was “attentive” during module 1, a second metric 930 showing how many students' overall attention performance was “distracted” during module 1, and a third metric 932 showing how many students' overall attention performance was “sleepy” during module 1); estimating an attention recovery time using the focused time array and the interruption time array and at least two consecutive focused events from the plurality of times of focused engagement in the focused time array and a maximum of the plurality of times of interruption, wherein the attention recovery time quantifies the time taken by the user to re-engage with content after an interruption (Id., Fig. 8. figure depicts an array used to determine the focus time and the attention recovery time between peaks of focus time (i.e. “attentive”)), (Id., ¶ 182, Attention states as shown in the example screen 800 may be identified by categorizing ranges of the attention level of an attendee, e.g., the B value generated by method 700. In this example screen 800, attention level is shown on a scale of 0 to 100, with a top range (e.g., 76-100) being categorized as “attentive”, a medium-high range (e.g., 51-75) being categorized as “fair”, a medium-low range (e.g., 26-50) being categorized as “distracted”, and a bottom range (e.g., 0-25) being categorized as “sleepy”. It will be appreciated that various embodiments may categorize or characterize attention levels differently), (Id., ¶ 189, The metrics shown for each module are shown as bars of the bar graph, with height indicating a higher level of that metric as indicated by the Y axis 924 (discloses attentive and distracted/interruption levels at the time of each presentation module), which shows a number of students categorized by that metric. For example, module 1 is shown having “good” performance based on a first metric 928 showing how many students' overall attention performance was “attentive” during module 1, a second metric 930 showing how many students' overall attention performance was “distracted” during module 1, and a third metric 932 showing how many students' overall attention performance was “sleepy” during module 1); While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …and adjusting the digital content provided to a particular user based on at least one calculated engagement score, the focus time, and the attention recovery time including by dynamically modifying an organization and a sequence of the digital content for the particular user and providing the adjusted digital content to memory accessible by a device of the particular user, without modifying the digital content for at least one other user. However, Tunick discloses … and adjusting the digital content provided to a particular user based on at least one calculated engagement score, the focus time, and the attention recovery time including by dynamically modifying an organization and a sequence of the digital content for the particular user and providing the adjusted digital content to memory accessible by a device of the particular user, without modifying the digital content for at least one other user (Tunick, ¶ 79, Referring to FIG. 4A, at step 402, the gaze sensing application 302 monitors a viewer's gaze with respect to a selected display. The gaze sensing application 302 may use inputs that are provided by the gaze tracking unit 204 to determine and display coordinates of the viewer's gaze in relation to at least one display. In a preferred embodiment, the gaze sensing application 302 uses the gaze coordinates to determine the exact angle of the viewer's gaze in relation to one of the displays. At step 404, the gaze sensing application 302 detects the viewer's eyes shifting toward at least one selected display. The gaze sensing application 302 may be configured to detect a viewer's eyes shifting toward a selected advertisement, or a portion thereof, shown on the display, for example. Alternatively, the gaze sensing application 302 may be configured to detect the viewer's eyes shifting away from one or more displays. Also, events other than the viewer's gaze shifting toward from the screen or a portion thereof may be detected, and could trigger the steps of the method described below), (Id., ¶ 83, In an alternative example, the report generating application 304 may initiate a process of alerting a client, an operator or an administrator upon detecting that the viewer's gaze has shifted toward the display or to one or more advertisements being displayed on the display. Alternatively, the report generating application 304 may enhance, enlarge, or change colors of all or some advertisements or reports not being viewed by the viewer. (discloses adjusting the organization and sequence digital content based on user engagement) Further, the report generating application 304 may reorganize the ads and other content being displayed on the display, or may cover some or all ads not being viewed by a viewer with some other content (discloses reorganizing digital content for some ads and users). Also, the process of alerting an administrator could include providing email alerts, mobile device alerts, and other types of alerts. The message content or the type of the alert used may depend on data not being viewed by a viewer at the display or portions of the display. Also, it should be understood that the process of alerting an administrator may be initiated at the time when the viewer shifts his attention toward the display or the ad, or at some other time, such as upon detecting an alert triggering condition along with the viewer's attention being toward a display or an advertisement. For example, an administrator may be alerted at specific times in a video sequence), (Id., ¶ 104, In one example, a data object associated with a face in a video image comprises fields or components corresponding to one or more of the following features, without limitation: (1) the center of the face in image coordinates; (2) a unique sequential identifier of the face; (3) an indicator of the time (or video frame) in which the face first appeared; (4) a number of (video) frames in which the face has been found (referred to as the "Foundframes" parameter); (5) a number of frames in which the face is not found since the face first appeared (referred to as the "Unfoundframes" parameter, for example)); (6) coordinates defining a rectangle containing the face; (7) a flag indicating whether or not the face has appeared in a previous frame; and/or (8) a flag indicating whether or not the face is considered a person, or an "impression" (referred to as the "Person" parameter, for example), (Id., ¶ 50, The gaze tracking unit 204 and/or the client terminal 221 may also be capable of detecting, determining and/or monitoring other data relating to an impression. In some instances the client terminal 221 analyzes images detected by the gaze tracking unit 204 and obtains or determines additional information relating to a viewer or the viewer's actions. Thus, impression data may comprise various types of information relating to an advertisement and the viewers thereof. For example, impression data may comprise demographic data--a viewer's age, race or ethnicity, and/or gender, and/or information concerning the average age of all viewers, and information showing the distribution of viewers by race, ethnicity or gender. Impression data may also comprise information relating to a viewer's facial expressions and/or emotions, information relating to a viewer's voice (including an analysis of words spoken by the viewer), and/or information concerning a viewer's gestures, including how the viewer moves and interacts with the display. Impression data may further include information indicating repeat viewer tracking (if the same person or people have appeared multiple times before a single display or before different, selected displays in a store). Impression data may additionally include information about a viewer's use of cellphones or other mobile devices. For example, the impression data may include data indicating whether and how often a viewer made any phone calls, used Bluetooth, used text messaging, etc.); It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention. The motivation for doing so would have been to provide “improved systems and methods enabling advertisers to obtain and analyze information concerning the effectiveness of actual viewing…” (Tunick, ¶ 16), wherein such improvements would benefit Zhou’s method which seeks to “not only improve the quality and effectiveness of attention estimation as described above, but may also save machine power used to process the data” [Tunick, ¶ 16; Zhou, ¶ 7]. Regarding Claim 2, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …further comprising generating personalized content recommendations based on the engagement scores and the focus time. However, Tunick discloses … discloses …further comprising generating personalized content recommendations based on the engagement scores and the focus time (Tunick, ¶ 89, Other information and analyses may be included in a report, as well. Analyses may be automatically generated, or generated manually by human operators. For example, various analyses and graphs may be provided in a report showing how advertisers and/or venue owners may act upon the data to improve sales, product awareness, etc. A report may also include information showing which advertisements among a group of advertisements are most successful. A report may indicate the age, ethnicity and/or gender distributions of the viewers over a selected time period and changes in the distribution over time. Information showing correlations between impression data to purchase data, to customer loyalty data, or to any other desired data set, may also be included in a report. Any of such information may be expressed within a report in the form of textual description or in the form of multi-dimensional graphs, charts and other illustrations. A report may additionally indicate or suggest strategies to capitalize on the impression data--for example, if the impression data indicates a large number of viewers of a very young age are proximate to the advertising location, the display should display or play an advertisement for a toy (discloses content recommendations based on engagement and focus)). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claim 3, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …wherein adjusting the digital content further comprises altering the pacing of content delivery to match a certain engagement score. However, Tunick discloses …wherein adjusting the digital content further comprises altering the pacing of content delivery to match a certain engagement score (Tunick, ¶ 88, The reporting mechanism may be tailored to user requirements in specific systems and implementations to retrieve whatever data is relevant for their analysis. In one example, this may include reporting any or all of the impression data discussed above, including the number of unique "looks" or impressions, the duration of these impressions, their start and stop times for coordination with content exhibition, demographic data, and/or any other data or metadata retrieved through processing the relevant data structures or the addition of structures to capture available information that might also be useful. This data may be recorded in a report generated in a format selected based on user requests. In one example, a report may be generated in HTML, and a report may be made accessible through any number of mechanisms, on-line or off-line, such as permanent or dial-up internet or modem connection, writing files to removable media such as CD-ROM, or displaying on-screen at any time a user requests or examining them remotely using a standard web-browser or mobile device. (discloses content pacing and delivery based on engagement and focus)). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claim 4, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …wherein adjusting the digital content comprises modifying the format of content presentation to align with preferences. However, Tunick discloses …wherein adjusting the digital content comprises modifying the format of content presentation to align with preferences (Tunick, ¶ 4, In the realm of web-based advertising, metrics such as click tracking use electronic media to track user interest in certain ads, allowing website owners to sell ad space based on the "pay-per-click" business model. Software programs record exactly which ads people click on, gathering information about viewer preferences. Once given the data, advertisers can choose to continue or modify their advertisements), (Id., ¶ 83, In an alternative example, the report generating application 304 may initiate a process of alerting a client, an operator or an administrator upon detecting that the viewer's gaze has shifted toward the display or to one or more advertisements being displayed on the display. Alternatively, the report generating application 304 may enhance, enlarge, or change colors of all or some advertisements or reports not being viewed by the viewer. (discloses adjusting digital content format) Further, the report generating application 304 may reorganize the ads and other content being displayed on the display, or may cover some or all ads not being viewed by a viewer with some other content. Also, the process of alerting an administrator could include providing email alerts, mobile device alerts, and other types of alerts. The message content or the type of the alert used may depend on data not being viewed by a viewer at the display or portions of the display. Also, it should be understood that the process of alerting an administrator may be initiated at the time when the viewer shifts his attention toward the display or the ad, or at some other time, such as upon detecting an alert triggering condition along with the viewer's attention being toward a display or an advertisement. For example, an administrator may be alerted at specific times in a video sequence). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claim 5, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …wherein adjusting the digital content comprises incorporating interactive elements, quizzes, or gamification to increase engagement scores. However, Tunick discloses …wherein adjusting the digital content comprises incorporating interactive elements, quizzes, or gamification to increase engagement scores (Tunick, ¶ 4, In the realm of web-based advertising, metrics such as click tracking use electronic media to track user interest in certain ads, allowing website owners to sell ad space based on the "pay-per-click" business model. Software programs record exactly which ads people click on, gathering information about viewer preferences. Once given the data, advertisers can choose to continue or modify their advertisements), (Id., ¶ 83, In an alternative example, the report generating application 304 may initiate a process of alerting a client, an operator or an administrator upon detecting that the viewer's gaze has shifted toward the display or to one or more advertisements being displayed on the display. Alternatively, the report generating application 304 may enhance, enlarge, or change colors of all or some advertisements or reports not being viewed by the viewer. (discloses adjusting digital content format) Further, the report generating application 304 may reorganize the ads and other content being displayed on the display, or may cover some or all ads not being viewed by a viewer with some other content. Also, the process of alerting an administrator could include providing email alerts, mobile device alerts, and other types of alerts. The message content or the type of the alert used may depend on data not being viewed by a viewer at the display or portions of the display. Also, it should be understood that the process of alerting an administrator may be initiated at the time when the viewer shifts his attention toward the display or the ad, or at some other time, such as upon detecting an alert triggering condition along with the viewer's attention being toward a display or an advertisement. For example, an administrator may be alerted at specific times in a video sequence), (Id., ¶ 40, In an example of one embodiment of the invention, a method to monitor a viewer's attention with respect to a display, such as an advertisement, and to charge a party, such as an advertiser, an amount based on the viewer's activity, is provided. Thus, in one example, at least one impression by a person with respect to a display is detected, where the at least one impression includes at least one instance when the person's gaze is directed toward the display. Information concerning the at least one impression is recorded, and a party associated with the display is charged an amount determined based at least in part on the recorded information. In other examples, an impression may include one or more viewer actions, such as talking, smiling, laughing, gestures, interaction with the display, etc. (discloses interaction-based digital content elements)). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claim 8, the combination of Zhou and Tunick discloses …the method of claim 1… Zhou further discloses …further comprising generating at least one report summarizing engagement metrics, wherein the engagement metrics includes a set of quantifiable measurements related to user interaction and involvement with digital content, including user focus duration, emotional responses, or activity patterns (Zhou, ¶ 43, FIG. 9 is a second user interface screen of a presenter device showing the attention levels of multiple attendees over the course of an entire presentation, broken down by interval, as well as a list of top attendees for the presentation based on each attendee's overall attention levels during the presentation, in accordance with various embodiments of the present disclosure), (Id., ¶ 184, The screen 900 shows an overall class performance area 902 including a textual and graphical representation 904 of overall class performance (shown here as the text “48% attentive” and a coloured arc around 48% around the circumference of a circle, both indicating that for the entire presentation period 48% of the attendees had an overall attention level of “attentive”). Supplementary text 906 shows additional indicia of current class attention or performance during the presentation (shown here are the text “14 know/6 don't”, indicating that 14 students knew the answers to a problem posed during the presentation and 6 did not know the answer), (Id., ¶ 185, The screen 900 shows an attention performance by time area 907 including a graph 908 of overall class attention over the course of the presentation period. The X axis 912 of the graph 908 is time, spanning the presentation period (shown here are approximately 50 minutes). The Y axis 910 of the graph 908 is an overall class attention level ranging from 0 to 100). PNG media_image2.png 355 473 media_image2.png Greyscale Regarding Claim 9, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …wherein the machine-learning models are continuously updated and improved based on newly collected user data to enhance detection accuracy. However, Tunick discloses …wherein the machine-learning models are continuously updated and improved based on newly collected user data to enhance detection accuracy (Tunick, ¶ 6, In the field of machine learning there exist a number of mathematical techniques which, when applied on a dataset of images, can yield object or feature detection and recognition in varying time-frames and with different degrees of reliability), (Id., ¶ 7, A linear classifier is the simplest technique for feature detection, or classification. It is a computed mathematical model of a decision (yes/no in the simplest case, although more complex decisions are common) created using techniques of linear algebra and statistical analysis…), (Id., ¶ ,8 (source: Wikipedia, "Linear Classifier") where f is a linear discriminant function that converts the dot-product of the real vectors w and x into the correct output y. The vector w in this case is a vector of "weights" which can be "learned" by the classifier through an update procedure, and the vector x is a vector of features for which classification is required. There is also often the addition of a "bias" which is typically represented by w.sub.0 that can also be learned, so that y=w.sub.0 when the dot-product itself is equal to 0. The weights determine how the features are divided into the yes/no categories in a "linearly separable" space (a space which can be divided into regions representing yes and no)), (Id., ¶ 9, If it is presumed that all the possible differences in a dataset (an image in this case) can be represented by a number of N-dimensional x vectors then it is possible to create a "decision-surface" or N-1 dimensional hyperplane that divides the N-dimensional space into the yes/no categories. Linear classifiers learn this decision-making ability through a "training" procedure in which a number of correct and incorrect examples are introduced and labeled accordingly. A mathematical mean regression (linear in this example, although logarithmic is also used) or some other learning procedure is applied to the variable in question, typically the weights and bias, until a reasonable amount of error is obtained (discloses continuously updating and improving a machine learning model to enhance accuracy)). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the machine learning elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claim 10, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …wherein adjusting the digital content is performed in real-time based on a current engagement level during a session, where the current engagement level during a session is a dynamically assessed degree of user involvement and interaction with the digital content. However, Tunick discloses …wherein adjusting the digital content is performed in real-time based on a current engagement level during a session, where the current engagement level during a session is a dynamically assessed degree of user involvement and interaction with the digital content (Tunick, ¶ 83, In an alternative example, the report generating application 304 may initiate a process of alerting a client, an operator or an administrator upon detecting that the viewer's gaze has shifted toward the display or to one or more advertisements being displayed on the display. Alternatively, the report generating application 304 may enhance, enlarge, or change colors of all or some advertisements or reports not being viewed by the viewer. (discloses adjusting digital content format) Further, the report generating application 304 may reorganize the ads and other content being displayed on the display, or may cover some or all ads not being viewed by a viewer with some other content. Also, the process of alerting an administrator could include providing email alerts, mobile device alerts, and other types of alerts. The message content or the type of the alert used may depend on data not being viewed by a viewer at the display or portions of the display. Also, it should be understood that the process of alerting an administrator may be initiated at the time when the viewer shifts his attention toward the display or the ad, or at some other time, such as upon detecting an alert triggering condition along with the viewer's attention being toward a display or an advertisement. For example, an administrator may be alerted at specific times in a video sequence), (Id., ¶ 40, In an example of one embodiment of the invention, a method to monitor a viewer's attention with respect to a display, such as an advertisement, and to charge a party, such as an advertiser, an amount based on the viewer's activity, is provided. Thus, in one example, at least one impression by a person with respect to a display is detected, where the at least one impression includes at least one instance when the person's gaze is directed toward the display. Information concerning the at least one impression is recorded, and a party associated with the display is charged an amount determined based at least in part on the recorded information. In other examples, an impression may include one or more viewer actions, such as talking, smiling, laughing, gestures, interaction with the display, etc. (discloses interaction-based digital content elements)). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claim 11, the combination of Zhou and Tunick discloses …the method of claim 1… Zhou further discloses …further wherein the user data includes physiological data, and the machine-learning models for emotion detection utilize physiological signals to assess emotional states (Zhou, ¶ 5, A number of systems and methods of estimating students' attention or engagement have been proposed. For example, US Patent Application Publication No. 2015/0099255A1, entitled “ADAPTIVE LEARNING ENVIRONMENT DRIVEN BY REAL-TIME IDENTIFICATION OF ENGAGEMENT LEVEL”, proposes an online education system which estimates students' engagement based on facial motion capture, eye tracking, speech recognition and gesture or posture, and reports the summarized estimation results. A similar approach is described in Zaletelj, J., & Koŝir, A. (2017), Predicting students' attention in the classroom from Kinect facial and body features, EURASIP journal on image and video processing, 2017(1), 80, estimating students' attention in the classroom using a Microsoft™ Kinect™ camera system to detect facial and body features of students. A third similar approach is described in Monkaresi, H., Bosch, N., Calvo, R. A., & D'Mello, S. K. (2016), Automated detection of engagement using video-based estimation of facial expressions and heart rate, IEEE Transactions on Affective Computing, 8(1), 15-28. In each of these existing approaches, student attention is monitored throughout a session (e.g., a class session or presentation by a teacher) to determine a student's attention or engagement level), (Id., ¶ 102, Performing face recognition for each respective detected face includes monitoring changes in the bounding box generated for a respective face in the attendee video information 134 to determine facial attributes for the respective detected face and analyzing the facial attributes for the respective detected face to infer (i.e. predict) a facial expression, emotion, or attention for the respective detected face. Examples of facial attributes include head pose, face landmark (e.g., forehead, lips, eyes), and eye gaze. Examples of facial expressions inferred (i.e. predicted) for a detected face (i.e. a attendee 120 of the video conference) include laughing, smiling, nodding, examples of attention inferred (i.e. predicted) for a detected face include looking at the attendee display, and examples of emotion inferred (i.e. predicted) for a detected face are having a serious expression). Regarding Claim 12, Zhou discloses …A system for measuring user engagement while interacting with digital content, comprising: at least one processor and memory operably coupled to the at least one processor (Zhou, ¶ 69, The processor 202 is configured to communicate with the storage unit 204, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The processor 202 is also configured to communicate with the memory(ies) 206, which may include volatile memory (e.g. random access memory (RAM)) and non-volatile or non-transitory memory (e.g., a flash memory, magnetic storage, and/or a read-only memory (ROM)). The non-transitory memory(ies) store applications or programs that include software instructions for execution by the processor 202, such as to carry out examples described in the present disclosure. The non-transitory memory store a video conferencing application as described in further detail below. Examples of non-transitory computer readable media include a…), (Id., ¶ 70, RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage); a data collection module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor to capture user interactions, including screen images, webcam video, device inputs, application activities, and system events as collected data (Zhou, ¶ 6, In various embodiments described herein, methods, systems, computing devices, and processor-readable media are disclosed that provide context-aware estimation of student attention in online learning. (discloses measuring user engagement while interacting with digital content) In contrast to existing approaches, which monitor student attention levels throughout an entire class session or presentation, the present disclosure describes embodiments that filter or restrict the time periods in which student attention is monitored or assessed to those time periods in which student attention is important. These time periods of high attention importance may be determined by processing data from the teacher, such as audio data representing the teacher's voice and/or visual presentation data representing slides or other visual material being presented to the students. By limiting attention monitoring to periods in which attention is important, embodiments described herein may achieve a more accurate and relevant measure of student engagement with educational content, which may be a more useful and relevant metric for measuring progress toward the desired objectives (e.g., student success and learning gains) than overall student attention levels during an entire class session), (Id., ¶ 122, In some examples, the presenter 110 presents a lecture showing education material (ex. textbook) to attendees. The presenter's device includes a camera and microphone for capturing video of the presenter's face and/or instructional objects (e.g. a textbook and whiteboard) and audio input of the presenter's voice. The presenter data includes this video data (as visual presentation data) and audio data, as well as presentation interaction data indicating interaction with the education material (e.g. pointer movement, drawing, and text input by various input devices of the presenter device such as mouse, stylus, finger touch and keyboard) (discloses input data)), (Id., ¶ 74, the processor 202 may communicate with an input/output (I/O) interfaces 222, which may enable interfacing the one or more input devices 220 (e.g., a keyboard, a mouse, a joystick, trackball, fingerprint detector and the like) and/or output devices 222 (e.g., a printer, peripheral display device, and the like)), (Id., ¶ 121, Presenter interaction data indicates an interaction of the speaker with the visual presentation, and may be derived from one or more other types of presenter data. For example, presenter interaction data may include indications that the presenter is moving a pointer or laser pointer (in captured video), (discloses application activity data) is moving a cursor or mouse pointer (on a computer desktop, e.g., the presenter device), is touching a touch-sensitive user input device of the presenter device with a finger or stylus, is moving a pen or piece of chalk on a writing surface (in captured video), or is inputting text through a text input device of the presenter device. In some examples, the presenter data may include a pre-recorded presentation (including audio and/or video content) having a duration spanning the presentation period and having one or more annotations indicating time periods of the presentation period during which it is important for attendees to pay attention. In some examples, these annotations may indicate an attention importance level for one or more periods of the presentation period, such as high/medium/low attention importance levels, or a continuous scalar value indicating an attention importance level. The attention importance level indicates a degree to which it is important for an attendee 120 to pay attention to the content being presented by the presenter 110, as described in greater detail below with reference to step 306), (Id., ¶ 124, the attention monitoring system, (discloses data collection module) the presenter device, or another device in communication with the attendee device via the network is configured to send at least some of the presenter data to the attendee device (discloses system data). The attendee device is configured to display the visual presentation data of the presenter data on a display, and to play the audio data of the presenter data on a speaker. The attendee device includes a camera for capturing images of the student's head as a sequence of video frames. These video frames are sent to the attention monitoring system via the network as attendee data (discloses video image data)), (Id., ¶ 69, The processor 202 is configured to communicate with the storage unit 204, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The processor 202 is also configured to communicate with the memory(ies) 206, which may include volatile memory (e.g. random access memory (RAM)) and non-volatile or non-transitory memory (e.g., a flash memory, magnetic storage, and/or a read-only memory (ROM)). The non-transitory memory(ies) store applications or programs that include software instructions for execution by the processor 202, such as to carry out examples described in the present disclosure. The non-transitory memory store a video conferencing application as described in further detail below. Examples of non-transitory computer readable media include a…), (Id., ¶ 70, RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage), (Id., ¶ 125, the attendee data may be processed to determine attendee attention levels (according to step 308 below) locally on the attendee device before sending the resulting attention level data to the attention monitoring system, instead of sending the video data to the attention monitoring system and performing step 308 on the attention monitoring system, as described below. In such embodiments, the attendee data may include attention level data as determined at step 308 below. In some embodiments, attention level data may be generated by the attendee device at all times, but may only be requested by the attention monitoring system during periods of high attention importance (as determined below at step 306), in response to which request the attendee device would send the attention level data as part of the attendee data), (Id., ¶ 87, presenter information 132 may include but not limited to a live video of the presenter 110 captured by the camera 216 associated with the presenter client device 112 (hereinafter referred to as presenter camera 216), audio/sound at the presenter 110 side captured by the microphone 212 associated with the presenter client device 112 (hereinafter referred to as presenter microphone 212), content (e.g. a slide of an MS PowerPoint™ presentation, a page of MS Word™ document, videos, images (discloses screen images), pictures or the like) displayed in graphic user interface (GUI) 130 associated with the video conferencing system software on the display device 208 associated with the presenter client device 112 (hereinafter referred to as presenter display 208); an engagement analysis service configured to process the collected data and generate raw engagement events by correlating and filtering collected data (Id., ¶ 111, In certain non-limiting embodiments, the video conferencing server 250 may be configured to filter out the attendee video information 134 acting as noise (discloses filtering collected data) in the attendee video information 134. By way of example, if one or more attendees 120 are eating and/or drinking, someone is moving around or someone crossed behind one or more attendees 120, one or more attendees are traveling and moving background is captured by one or more attendees cameras 216, such portion of the attendee video information may not provide any useful information that may be directly or indirectly related to the ongoing video conference. The video conferencing server 250 may be configured to remove such portion of the attendee video information 134), (Id., ¶ 112, In certain non-limiting embodiments, the video conferencing server 250 may be configured to process attendee audio/sound information present in the attendee information 134. In some of the non-exhaustive examples, the video conferencing server 250 may analyze the attendee audio/sound information to determine if the attendees 120 are applauding or one or more of the attendees 120 are asking questions. In certain non-limiting embodiments, the video conferencing server 250 may be configured to filter out some of attendee audio/sound information that is acting as noise in the attendee information 134. For example, the video conferencing server 250 may filter out a part of the of attendee audio/sound information including coughing, sneezing, baby crying, dog barking, traffic sound, playing music/TV in the background, table knock, phone ringing, talking to someone else or any such sound associated with one or more attendees 120 or generated in the surrounding environment of one or more attendees 120 which may not be directly related to the ongoing video conference), (Id., ¶ 102, In certain non-limiting embodiments, the video conferencing server 250 may be configured to perform face detection on the attendee video information 134 to detect one or more faces in the attendee video information 134, where each detected face corresponds to one attendee 120 in the attendee video information 134. Based on each face detected in the attendee video information 134, the video conferencing server 250 may generate a bounding box for each respective detected face. Further, the video conferencing server 250 may be configured to perform face recognition on each respective detected face in the attendee video information 134. Performing face recognition for each respective detected face includes monitoring changes in the bounding box generated for a respective face in the attendee video information 134 to determine facial attributes for the respective detected face and analyzing the facial attributes for the respective detected face to infer (i.e. predict) a facial expression, emotion, or attention for the respective detected face. Examples of facial attributes include head pose, face landmark (e.g., forehead, lips, eyes) (discloses event parameters), and eye gaze. Examples of facial expressions inferred (i.e. predicted) for a detected face (i.e. a attendee 120 of the video conference) include laughing, smiling, nodding, (discloses engagement events) examples of attention inferred (i.e. predicted) for a detected face include looking at the attendee display, and examples of emotion inferred (i.e. predicted) for a detected face are having a serious expression); a data aggregation module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor to aggregate the raw engagement events, and generate an individual client metric and a group metric as aggregated data (Id., ¶ 114, Once, the attendee information 134 is processed, the video conferencing server 250 (discloses data aggregation module) may be configured to aggregate the processed attendee information 134. By way of non-exhaustive examples, during the ongoing video conference, in response to the presenter 110 presenting the presenter information 132, the attendees 120 may applaud. In another example, in response to the presenter 110 presenting the presenter information 132, one or more of the attendees 120 may raise their hands or wave their hands to ask questions. In process of aggregating the processed attendee information 134, the video conferencing server 250 may keep a record of a type facial expressions or body movements of the attendees 120. Such record may include but not limited to a number of attendees 120 applauded, a number of attendees 120 raised their hands along with which particular attendees 120 have raised their hands and the like), (Id., ¶ 69, The processor 202 is configured to communicate with the storage unit 204, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The processor 202 is also configured to communicate with the memory(ies) 206, which may include volatile memory (e.g. random access memory (RAM)) and non-volatile or non-transitory memory (e.g., a flash memory, magnetic storage, and/or a read-only memory (ROM)). The non-transitory memory(ies) store applications or programs that include software instructions for execution by the processor 202, such as to carry out examples described in the present disclosure. The non-transitory memory store a video conferencing application as described in further detail below. Examples of non-transitory computer readable media include a…), (Id., ¶ 70, RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage); populate a focused time array comprising a plurality of times of focused engagement, each of the plurality of times corresponding to at least one corresponding event (Id., ¶ 186, The overall class attention over time shown in graph 908 may be calculated in some embodiments by calculating a mean or other aggregation or averaging function of the attention level (e.g., B) of each attendee. (discloses a focused/distracted time array comprised of a plurality of times corresponding to events/modules) An overall attendee attention level for the presentation, or for an interval of the presentation, may be similarly calculated by calculating a mean or other aggregation or averaging function of the attention levels (e.g., B values) of the attendee over the entire presentation or interval), (Id., ¶ 187, The screen 900 shows a top students area 914 including a list of attendees having a high overall attention level for the presentation relative to the other attendees. Each top student (i.e. attendee having a high overall attention level for the presentation) is shown with his or her avatar 916, name 918, and an indicator 920 of how many times the attendee has been a top student over a period of time such as a semester or over all time to date), (Id., ¶ 188, The screen 900 shows an attention performance by module area 922 including a bar graph showing attention metrics for each of a plurality of modules of the presentation. In this example, each module corresponds to an interval of the presentation as defined above with reference to step 310 of method 300. In this screen 900, the X axis 926 shows five modules (discloses events (i.e. education modules)), each module having a textual identification of overall class attention for the module (e.g., “Good”, “Fair”, or “Poor”). The overall class attention for a module may be calculated based on some combination of the metrics for the module), (Id., ¶ 189, The metrics shown for each module are shown as bars of the bar graph, with height indicating a higher level of that metric as indicated by the Y axis 924 (discloses attentive and distracted/interruption levels at the time of each presentation module), which shows a number of students categorized by that metric. For example, module 1 is shown having “good” performance based on a first metric 928 showing how many students' overall attention performance was “attentive” during module 1, a second metric 930 showing how many students' overall attention performance was “distracted” during module 1, and a third metric 932 showing how many students' overall attention performance was “sleepy” during module 1); populate an interruption time array comprising a plurality of times of interruption in a user's interaction with at least one corresponding event (Id., ¶ 186, The overall class attention over time shown in graph 908 may be calculated in some embodiments by calculating a mean or other aggregation or averaging function of the attention level (e.g., B) of each attendee. (discloses a focused/distracted time array comprised of a plurality of times corresponding to events/modules) An overall attendee attention level for the presentation, or for an interval of the presentation, may be similarly calculated by calculating a mean or other aggregation or averaging function of the attention levels (e.g., B values) of the attendee over the entire presentation or interval), (Id., ¶ 187, The screen 900 shows a top students area 914 including a list of attendees having a high overall attention level for the presentation relative to the other attendees. Each top student (i.e. attendee having a high overall attention level for the presentation) is shown with his or her avatar 916, name 918, and an indicator 920 of how many times the attendee has been a top student over a period of time such as a semester or over all time to date), (Id., ¶ 188, The screen 900 shows an attention performance by module area 922 including a bar graph showing attention metrics for each of a plurality of modules of the presentation. In this example, each module corresponds to an interval of the presentation as defined above with reference to step 310 of method 300. In this screen 900, the X axis 926 shows five modules (discloses events (i.e. education modules)), each module having a textual identification of overall class attention for the module (e.g., “Good”, “Fair”, or “Poor”). The overall class attention for a module may be calculated based on some combination of the metrics for the module), (Id., ¶ 189, The metrics shown for each module are shown as bars of the bar graph, with height indicating a higher level of that metric as indicated by the Y axis 924 (discloses attentive and distracted/interruption levels at the time of each presentation module), which shows a number of students categorized by that metric. For example, module 1 is shown having “good” performance based on a first metric 928 showing how many students' overall attention performance was “attentive” during module 1, a second metric 930 showing how many students' overall attention performance was “distracted” during module 1, and a third metric 932 showing how many students' overall attention performance was “sleepy” during module 1); estimate an attention recovery time using the focused time array and the interruption time array and at least two consecutive focused events from the plurality of times of focused engagement in the focused time array and a maximum of the plurality of times of interruption, wherein the attention recovery time quantifies the time taken by the user to re-engage with content after an interruption (Id., Fig. 8. figure depicts an array used to determine the focus time and the attention recovery time between peaks of focus time (i.e. “attentive”)), (Id., ¶ 182, Attention states as shown in the example screen 800 may be identified by categorizing ranges of the attention level of an attendee, e.g., the B value generated by method 700. In this example screen 800, attention level is shown on a scale of 0 to 100, with a top range (e.g., 76-100) being categorized as “attentive”, a medium-high range (e.g., 51-75) being categorized as “fair”, a medium-low range (e.g., 26-50) being categorized as “distracted”, and a bottom range (e.g., 0-25) being categorized as “sleepy”. It will be appreciated that various embodiments may categorize or characterize attention levels differently), (Id., ¶ 189, The metrics shown for each module are shown as bars of the bar graph, with height indicating a higher level of that metric as indicated by the Y axis 924 (discloses attentive and distracted/interruption levels at the time of each presentation module), which shows a number of students categorized by that metric. For example, module 1 is shown having “good” performance based on a first metric 928 showing how many students' overall attention performance was “attentive” during module 1, a second metric 930 showing how many students' overall attention performance was “distracted” during module 1, and a third metric 932 showing how many students' overall attention performance was “sleepy” during module 1); a reporting module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor to receive the aggregated data and provide a visual representation of engagement data over time using a graphical monitoring interface Id., ¶ 178, FIG. 8 is a first user interface screen 800 showing the current attention levels of multiple attendees, as well as an overall current attention level), (Id., ¶ 179, The screen 800 shows a current class attention level indicator 802 indicating an aggregate attention level for a group of multiple attendees, e.g., a class consisting of multiple students. The current class attention level indicator 802 is shown as a circle containing a textual representation 804 of the current class attention level (shown here as “25%”, indicating that 25 percent of attendees of the presentation are in an “attentive” state as defined below), as well as a graphical representation 806 of the current class attention level (shown here as a coloured arc along 25% of the circumference of a circle, indicating that 25 percent of attendees of the presentation are in an attentive state). Supplementary text 808 shows additional indicia of current class attention (shown here are the text “2 sleepy”, indicating that 2 of the students in the class are in a “sleepy” state as defined below), (Id., ¶ 182, Attention states as shown in the example screen 800 may be identified by categorizing ranges of the attention level of an attendee, e.g., the B value generated by method 700. In this example screen 800, attention level is shown on a scale of 0 to 100, with a top range (e.g., 76-100) being categorized as “attentive”, a medium-high range (e.g., 51-75) being categorized as “fair”, a medium-low range (e.g., 26-50) being categorized as “distracted”, and a bottom range (e.g., 0-25) being categorized as “sleepy”. It will be appreciated that various embodiments may categorize or characterize attention levels differently), (Id., Fig. 8. figure depicts a graph with the total amount of focus time (i.e. “attentive”)), (Id., ¶ 69, The processor 202 is configured to communicate with the storage unit 204, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The processor 202 is also configured to communicate with the memory(ies) 206, which may include volatile memory (e.g. random access memory (RAM)) and non-volatile or non-transitory memory (e.g., a flash memory, magnetic storage, and/or a read-only memory (ROM)). The non-transitory memory(ies) store applications or programs that include software instructions for execution by the processor 202, such as to carry out examples described in the present disclosure. The non-transitory memory store a video conferencing application as described in further detail below. Examples of non-transitory computer readable media include a…), (Id., ¶ 70, RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage); PNG media_image1.png 517 432 media_image1.png Greyscale … a Learning Management System (LMS)… (Zhou,¶ 34, FIG. 1 is a schematic diagram of a video conferencing system for online learning (discloses learning management system), suitable for implementing various embodiments of the present disclosure). While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …and a content adjustment module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor to collaborate with … to dynamically modify the digital content based on the engagement data including by dynamically modifying an organization and a sequence of the digital content for the particular user and providing the adjusted digital content to memory accessible by a device of the particular user, without modifying the digital content for at least one other user. However, Tunick discloses … and a content adjustment module comprising instructions stored in the memory that, when executed by the at least one processor, cause the at least one processor to collaborate with … to dynamically modify the digital content based on the engagement data including by dynamically modifying an organization and a sequence of the digital content for the particular user and providing the adjusted digital content to memory accessible by a device of the particular user, without modifying the digital content for at least one other user (Tunick, ¶ 79, Referring to FIG. 4A, at step 402, the gaze sensing application 302 monitors a viewer's gaze with respect to a selected display. The gaze sensing application 302 may use inputs that are provided by the gaze tracking unit 204 to determine and display coordinates of the viewer's gaze in relation to at least one display. In a preferred embodiment, the gaze sensing application 302 uses the gaze coordinates to determine the exact angle of the viewer's gaze in relation to one of the displays. At step 404, the gaze sensing application 302 detects the viewer's eyes shifting toward at least one selected display. The gaze sensing application 302 may be configured to detect a viewer's eyes shifting toward a selected advertisement, or a portion thereof, shown on the display, for example. Alternatively, the gaze sensing application 302 may be configured to detect the viewer's eyes shifting away from one or more displays. Also, events other than the viewer's gaze shifting toward from the screen or a portion thereof may be detected, and could trigger the steps of the method described below), (Id., ¶ 83, In an alternative example, the report generating application 304 may initiate a process of alerting a client, an operator or an administrator upon detecting that the viewer's gaze has shifted toward the display or to one or more advertisements being displayed on the display. Alternatively, the report generating application 304 may enhance, enlarge, or change colors of all or some advertisements or reports not being viewed by the viewer. (discloses adjusting the organization and sequence digital content based on user engagement) Further, the report generating application 304 may reorganize the ads and other content being displayed on the display, or may cover some or all ads not being viewed by a viewer with some other content (discloses reorganizing digital content for some ads and users). Also, the process of alerting an administrator could include providing email alerts, mobile device alerts, and other types of alerts. The message content or the type of the alert used may depend on data not being viewed by a viewer at the display or portions of the display. Also, it should be understood that the process of alerting an administrator may be initiated at the time when the viewer shifts his attention toward the display or the ad, or at some other time, such as upon detecting an alert triggering condition along with the viewer's attention being toward a display or an advertisement. For example, an administrator may be alerted at specific times in a video sequence), (Id., ¶ 104, In one example, a data object associated with a face in a video image comprises fields or components corresponding to one or more of the following features, without limitation: (1) the center of the face in image coordinates; (2) a unique sequential identifier of the face; (3) an indicator of the time (or video frame) in which the face first appeared; (4) a number of (video) frames in which the face has been found (referred to as the "Foundframes" parameter); (5) a number of frames in which the face is not found since the face first appeared (referred to as the "Unfoundframes" parameter, for example)); (6) coordinates defining a rectangle containing the face; (7) a flag indicating whether or not the face has appeared in a previous frame; and/or (8) a flag indicating whether or not the face is considered a person, or an "impression" (referred to as the "Person" parameter, for example), (Id., ¶ 50, The gaze tracking unit 204 and/or the client terminal 221 may also be capable of detecting, determining and/or monitoring other data relating to an impression. In some instances the client terminal 221 analyzes images detected by the gaze tracking unit 204 and obtains or determines additional information relating to a viewer or the viewer's actions. Thus, impression data may comprise various types of information relating to an advertisement and the viewers thereof. For example, impression data may comprise demographic data--a viewer's age, race or ethnicity, and/or gender, and/or information concerning the average age of all viewers, and information showing the distribution of viewers by race, ethnicity or gender. Impression data may also comprise information relating to a viewer's facial expressions and/or emotions, information relating to a viewer's voice (including an analysis of words spoken by the viewer), and/or information concerning a viewer's gestures, including how the viewer moves and interacts with the display. Impression data may further include information indicating repeat viewer tracking (if the same person or people have appeared multiple times before a single display or before different, selected displays in a store). Impression data may additionally include information about a viewer's use of cellphones or other mobile devices. For example, the impression data may include data indicating whether and how often a viewer made any phone calls, used Bluetooth, used text messaging, etc.), (Id., ¶ 27, In another example of an embodiment of the invention, a system to acquire information concerning actions by individuals with respect to a display is provided. The system comprises a memory configured to store data. The system further comprises at least one processor configured to examine an image comprising a representation of at least one first person, identify a first face of the first person, and compare the first face to one or more second faces of one or more respective second persons, the second faces being represented by data stored in the at least one memory. If the first face matches a second face, the processor updates second data representing the matching second face based at least in part on the first face. If the first face does not match any second face stored in the at least one memory, the processor stores third data representing the first face in the at least one memory. The processor is also configured to generate a report based at least in part on information relating to the first and second faces stored in the at least one memory, and provide the report to a party in response to a request for desired information relating to first person and second persons), (Id., ¶ 69, The processor 202 is configured to communicate with the storage unit 204, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The processor 202 is also configured to communicate with the memory(ies) 206, which may include volatile memory (e.g. random access memory (RAM)) and non-volatile or non-transitory memory (e.g., a flash memory, magnetic storage, and/or a read-only memory (ROM)). The non-transitory memory(ies) store applications or programs that include software instructions for execution by the processor 202, such as to carry out examples described in the present disclosure. The non-transitory memory store a video conferencing application as described in further detail below. Examples of non-transitory computer readable media include a…), (Id., ¶ 70, RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claim 13, the combination of Zhou and Tunick discloses …the system of claim 12… Zhou further discloses …wherein the data collection module is at least one of a desktop screening unit comprising instructions stored in the memory for execution by at least one processor, a web-camera control unit comprising instructions stored in the memory for execution by at least one processor, a system events control unit comprising instructions stored in the memory for execution by at least one processor, or an application activity control unit comprising instructions stored in the memory for execution by at least one processor (Zhou, ¶ 124, the attention monitoring system, (discloses data collection module comprising a system events control unit) the presenter device, or another device in communication with the attendee device via the network is configured to send at least some of the presenter data to the attendee device (discloses system data). The attendee device is configured to display the visual presentation data of the presenter data on a display, and to play the audio data of the presenter data on a speaker. The attendee device includes a camera for capturing images of the student's head as a sequence of video frames. These video frames are sent to the attention monitoring system via the network as attendee data (discloses video image data)), (Id., ¶ 125, the attendee data may be processed to determine attendee attention levels (according to step 308 below) locally on the attendee device before sending the resulting attention level data to the attention monitoring system, instead of sending the video data to the attention monitoring system and performing step 308 on the attention monitoring system, as described below. In such embodiments, the attendee data may include attention level data as determined at step 308 below. In some embodiments, attention level data may be generated by the attendee device at all times, but may only be requested by the attention monitoring system during periods of high attention importance (as determined below at step 306), in response to which request the attendee device would send the attention level data as part of the attendee data), (Id., ¶ 87, presenter information 132 may include but not limited to a live video of the presenter 110 captured by the camera 216 associated with the presenter client device 112 (hereinafter referred to as presenter camera 216), audio/sound at the presenter 110 side captured by the microphone 212 associated with the presenter client device 112 (hereinafter referred to as presenter microphone 212), content (e.g. a slide of an MS PowerPoint™ presentation, a page of MS Word™ document, videos, images, pictures or the like) displayed in graphic user interface (GUI) 130 associated with the video conferencing system software on the display device 208 associated with the presenter client device 112 (hereinafter referred to as presenter display 208)). Regarding Claim 14, the combination of Zhou and Tunick discloses …the system of claim 12… Zhou further discloses …wherein the data collection module captures screen interactions, device inputs, application activities, system events, or webcam video (Zhou, ¶ 124, the attention monitoring system, (discloses data collection module capturing webcam video data) the presenter device, or another device in communication with the attendee device via the network is configured to send at least some of the presenter data to the attendee device (discloses system data). The attendee device is configured to display the visual presentation data of the presenter data on a display, and to play the audio data of the presenter data on a speaker. The attendee device includes a camera for capturing images of the student's head as a sequence of video frames. These video frames are sent to the attention monitoring system via the network as attendee data (discloses video image data)). Regarding Claim 15, the combination of Zhou and Tunick discloses …the system of claim 12… Zhou further discloses … wherein the data aggregation module aggregates raw engagement events.(Zhou, ¶ 114, Once, the attendee information 134 is processed, the video conferencing server 250 (discloses data aggregation module) may be configured to aggregate the processed attendee information 134. By way of non-exhaustive examples, during the ongoing video conference, in response to the presenter 110 presenting the presenter information 132, the attendees 120 may applaud. In another example, in response to the presenter 110 presenting the presenter information 132, one or more of the attendees 120 may raise their hands or wave their hands to ask questions. In process of aggregating the processed attendee information 134, the video conferencing server 250 may keep a record of a type facial expressions or body movements (discloses raw engagement event data) of the attendees 120. Such record may include but not limited to a number of attendees 120 applauded, a number of attendees 120 raised their hands along with which particular attendees 120 have raised their hands and the like). Regarding Claim 16, the combination of Zhou and Tunick discloses …the system of claim 12… While suggested in at least Fig. 1 and related text, Zhou does not explicitly disclose …wherein the digital content is dynamically modified to adapt to an individual user based on an educational principle or a user-provided preference. However, Tunick discloses …wherein the digital content is dynamically modified to adapt to an individual user based on an educational principle or a user-provided preference (Tunick, ¶ 4, In the realm of web-based advertising, metrics such as click tracking use electronic media to track user interest in certain ads, allowing website owners to sell ad space based on the "pay-per-click" business model. Software programs record exactly which ads people click on, gathering information about viewer preferences. Once given the data, advertisers can choose to continue or modify their advertisements), (Id., ¶ 83, In an alternative example, the report generating application 304 may initiate a process of alerting a client, an operator or an administrator upon detecting that the viewer's gaze has shifted toward the display or to one or more advertisements being displayed on the display. Alternatively, the report generating application 304 may enhance, enlarge, or change colors of all or some advertisements or reports not being viewed by the viewer. (discloses adjusting digital content format) Further, the report generating application 304 may reorganize the ads and other content being displayed on the display, or may cover some or all ads not being viewed by a viewer with some other content. Also, the process of alerting an administrator could include providing email alerts, mobile device alerts, and other types of alerts. The message content or the type of the alert used may depend on data not being viewed by a viewer at the display or portions of the display. Also, it should be understood that the process of alerting an administrator may be initiated at the time when the viewer shifts his attention toward the display or the ad, or at some other time, such as upon detecting an alert triggering condition along with the viewer's attention being toward a display or an advertisement. For example, an administrator may be alerted at specific times in a video sequence). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou to include the content adjustment elements of Tunick in the analogous art of monitoring viewer attention for the same reasons as stated for claim 1. Regarding Claims 17-19, these claims recite limitations substantially similar to claims 3-5, respectively, and are rejected for the same reasons as stated above. Claims 6-7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over in view of Tunick and in further view of Alailima et al., U.S. Publication No. 2020/0174557 [hereinafter Alailima]. Regarding Claim 6, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text of Zhou, the combination of Zhou and Tunick does not explicitly disclose …wherein adjusting the digital content includes transitioning from a text-based format to a graphic-based format, based on user engagement patterns. However, Alailima discloses …wherein adjusting the digital content includes transitioning from a text-based format to a graphic-based format, based on user engagement patterns (Alailima, ¶ 132, In any example herein, the exemplary apparatus can be configured to communicate with one or more of a cognitive monitoring component, a disease monitoring component, and a physiological measurement component, to provide for biofeedback and/or neurofeedback of data to the computing device, for adjusting a type or a difficulty level of one or more of the task, the interference, and the computerized adjustable element, to achieve the desired performance level of the individual. As a non-limiting example, the biofeedback can be based on physiological measurements of the individual as they interact with the apparatus, to modify the type or a difficulty level of one or more of the task, the interference, and the computerized adjustable element based on the measurement data indicating, e.g., the individual's attention, mood, (discloses adjusting content based on engagement) or emotional state. As a non-limiting example, the neurofeedback can be based on measurement and monitoring of the individual using a cognitive and/or a disease monitoring component as the individual interacts with the apparatus, to modify the type or a difficulty level of one or more of the task, the interference, and the computerized adjustable element based on the measurement data indicating, e.g., the individual's cognitive state, disease state (including based on data from monitoring systems or behaviors related to the disease state)), (Id., ¶ 9, The one or more processors may configure the at least one computerized adjustable element as at least one of a sound, an image, or a word). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou and the content adjustment elements of Tunick to include the further content adjustment elements of Alailima in the analogous art of a cognitive platform including computerized elements. The motivation for doing so would have been to provide an improved method “ to render computerized adjustable element[s] for the purpose of assessing or adjusting emotional biases in attention, interpretation, or memory, and to collected data indicative of the user interaction with the platform product” [Alailima, ¶ 110], wherein such improvements would benefit Tunick’s method which seeks to provide “improved systems and methods enabling advertisers to obtain and analyze information concerning the effectiveness of actual viewing…” (Tunick, ¶ 16), and wherein such improvements would further benefit Zhou’s method which seeks to “not only improve the quality and effectiveness of attention estimation as described above, but may also save machine power used to process the data” [Alailima, ¶ 110; Tunick, ¶ 16; Zhou, ¶ 7]. Regarding Claim 7, the combination of Zhou and Tunick discloses …the method of claim 1… While suggested in at least Fig. 1 and related text of Zhou, the combination of Zhou and Tunick does not explicitly disclose …wherein adjusting the digital content comprises modifying the difficulty level of the content, either increasing complexity for high user engagement scores or simplifying content for lower engagement scores However, Alailima discloses …wherein adjusting the digital content comprises modifying the difficulty level of the content, either increasing complexity for high user engagement scores or simplifying content for lower engagement scores (Alailima, ¶ 132, In any example herein, the exemplary apparatus can be configured to communicate with one or more of a cognitive monitoring component, a disease monitoring component, and a physiological measurement component, to provide for biofeedback and/or neurofeedback of data to the computing device, for adjusting a type or a difficulty level of one or more of the task, the interference, and the computerized adjustable element, to achieve the desired performance level of the individual. As a non-limiting example, the biofeedback can be based on physiological measurements of the individual as they interact with the apparatus, to modify the type or a difficulty level of one or more of the task, the interference, and the computerized adjustable element based on the measurement data indicating, e.g., the individual's attention, mood, (discloses adjusting content based on engagement) or emotional state. As a non-limiting example, the neurofeedback can be based on measurement and monitoring of the individual using a cognitive and/or a disease monitoring component as the individual interacts with the apparatus, to modify the type or a difficulty level of one or more of the task, the interference, and the computerized adjustable element based on the measurement data indicating, e.g., the individual's cognitive state, disease state (including based on data from monitoring systems or behaviors related to the disease state)), (Id., ¶ 96, In any example herein, the cognitive platform can be configured to collect data indicative of a reaction time of a user's response relative to the time of presentation of the tasks (including an interference with a task). For example, the computing device can be configured to cause the platform product or cognitive platform to provide smaller or larger reaction time window for a user to provide a response to the tasks as an example way of adjusting the difficulty level). It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the user engagement elements of Zhou and the content adjustment elements of Tunick to include the further content adjustment elements of Alailima in the analogous art of a cognitive platform including computerized elements for the same reasons as stated for claim 1. Regarding Claim 20, this claim recites limitations substantially similar to those in claim 6, and is rejected for the same reasons as stated above. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 2. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Karmarkar, U.S. Publication No. 2025/0166020 discloses a gamified digital commerce marketplace. Srivastava et al., U.S. Publication No. 2023/0261894 discloses a meeting session control based on attention determination. Chappell, III et al., U.S. Publication No. 2023/0047787 discloses controlling progress of audio-video content based on sensor data of multiple users, composite neuro-physiological state and/or content engagement power. 3. Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS D BOLEN whose telephone number is (408)918-7631. The examiner can normally be reached Monday - Friday 8:00 AM - 5:00 PM PST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patty Munson can be reached at (571) 270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /NICHOLAS D BOLEN/Examiner, Art Unit 3624 /HAMZEH OBAID/Primary Examiner, Art Unit 3624
Read full office action

Prosecution Timeline

Feb 06, 2024
Application Filed
Jun 13, 2025
Non-Final Rejection — §101, §103
Aug 14, 2025
Interview Requested
Aug 28, 2025
Applicant Interview (Telephonic)
Aug 28, 2025
Examiner Interview Summary
Sep 04, 2025
Response Filed
Feb 09, 2026
Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12205077
SMART REMINDERS FOR RESPONDING TO EMAILS
2y 5m to grant Granted Jan 21, 2025
Patent 12198105
SMART REMINDERS FOR RESPONDING TO EMAILS
2y 5m to grant Granted Jan 14, 2025
Patent 12093873
USER PERFORMANCE ANALYSIS AND CORRECTION FOR S/W
2y 5m to grant Granted Sep 17, 2024
Patent 11935077
OPERATIONAL PREDICTIVE SCORING OF COMPONENTS AND SERVICES OF AN INFORMATION TECHNOLOGY SYSTEM
2y 5m to grant Granted Mar 19, 2024
Patent 11635224
OPERATION SUPPORT SYSTEM, OPERATION SUPPORT METHOD, AND NON-TRANSITORY RECORDING MEDIUM
2y 5m to grant Granted Apr 25, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
10%
Grant Probability
20%
With Interview (+10.5%)
4y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 122 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month