Last updated: May 29, 2026

Application No. 18/604,503

MULTIMODAL DATA-BASED METHOD AND SYSTEM FOR RECOGNIZING COGNITIVE ENGAGEMENT IN CLASSROOM

Non-Final OA §101§103§112

Filed

Mar 14, 2024

Priority

Jul 12, 2023 — CN 2023108565023

Examiner

DULANEY, KATHLEEN YUAN

Art Unit

2666

Tech Center

2600 — Communications

Assignee

Central China Normal University

OA Round

1 (Non-Final)

Interview Optional

— +23.7% interview lift. Examiner has a relatively high allowance rate (77%); +23.7% interview lift. A written response may suffice.

Based on 659 resolved cases, 2023–2026

Examiner Intelligence

DULANEY, KATHLEEN YUAN View full profile →

Grants 77% — above average

Career Allowance Rate

508 granted / 659 resolved

+15.1% vs TC avg

Strong +24% interview lift

Without

With

+23.7%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

24 currently pending

Career history

693

Total Applications

across all art units

Statute-Specific Performance

§101

1.4%

-38.6% vs TC avg

§103

78.7%

+38.7% vs TC avg

§102

6.3%

-33.7% vs TC avg

§112

13.0%

-27.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 659 resolved cases

Office Action

§101 §103 §112

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

The USPTO “Interim Guidelines for Examination of Patent Applications for Patent Subject Matter Eligibility” (Official Gazette notice of 22 November 2005), Annex IV, reads as follows:

Descriptive material can be characterized as either "functional descriptive material" or "nonfunctional descriptive material." In this context, "functional descriptive material" consists of data structures and computer programs which impart functionality when employed as a computer component. (The definition of "data structure" is "a physical or logical relationship among data elements, designed to support specific data manipulation functions." The New IEEE Standard Dictionary of Electrical and Electronics Terms 308 (5th ed. 1993).) "Nonfunctional descriptive material" includes but is not limited to music, literary works and a compilation or mere arrangement of data.

When functional descriptive material is recorded on some computer-readable medium it becomes structurally and functionally interrelated to the medium and will be statutory in most cases since use of technology permits the function of the descriptive material to be realized. Compare In re Lowry, 32 F.3d 1579, 1583-84, 32 USPQ2d 1031, 1035 (Fed. Cir. 1994) (claim to data structure stored on a computer readable medium that increases computer efficiency held statutory) and Warmerdam, 33 F.3d at 1360-61, 31 USPQ2d at 1759 (claim to computer having a specific data structure stored in memory held statutory product-by-process claim) with Warmerdam, 33 F.3d at 1361, 31 USPQ2d at 1760 (claim to a data structure per se held nonstatutory).

In contrast, a claimed computer-readable medium encoded with a computer program is a computer element which defines structural and functional interrelationships between the computer program and the rest of the computer which permit the computer program's functionality to be realized, and is thus statutory. See Lowry, 32 F.3d at 1583-84, 32 USPQ2d at 1035.


Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter as follows.  Claim 9 defines a “system”.  However, while the preamble defines a “system”, which would typically be indicative of an “apparatus”, the body of the claim lacks definite structure indicative of a physical apparatus. Therefore, the claim as a whole appears to be nothing more than a “system” of software elements, thus defining functional descriptive material per se.  
Functional descriptive material may be statutory if it resides on a “non-transitory computer-readable medium or computer-readable memory”.  The claim(s) indicated above lack structure, and do not define a computer readable medium and are thus non-statutory for that reason (i.e., “When functional descriptive material is recorded on some computer-readable medium it becomes structurally and functionally interrelated to the medium and will be statutory in most cases since use of technology permits the function of the descriptive material to be realized” – Guidelines Annex IV).  The scope of the presently claimed invention encompasses products that are not necessarily computer readable, and thus NOT able to impart any functionality of the recited program.  The examiner suggests:
1.	Amending the claim(s) to embody the program on “non-transitory computer-readable medium” or equivalent; assuming the specification does NOT define the computer readable medium as a “signal”, “carrier wave”, or “transmission medium” which are deemed non-statutory; or
2.	Adding structure to the body of the claim that would clearly define a statutory apparatus.
Any amendment to the claim should be commensurate with its corresponding disclosure.
It is noted that claims 1-8 are considered eligible subject matter.  Even if claim 1 could be interpreted as an abstract idea, the claims contain limitations that provide a practical application, i.e. student cognitive engagement measuring.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a dataset construction module”, “a multidimensional representation module”, “a multimodal recognition module”, and “a result fusion module” in claim 9.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Objections
Claims 4, 7 and 8 are objected to because of the following informalities: Claim 4 claims “a convolution calculation are performed” in line 17 where the examiner believes the applicant intends to claim “a convolution calculation is performed”.   In claim 7, line 20, the applicant ends the sentence and starts a new one on line 21.  Claims should consist of only one sentence.   Claim 8 recites the word “recognization” in line 2.  “Recognization” is not a known word, but the applicant may intend to be claiming “recognition”.  Appropriate correction is required.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim limitations “a dataset construction module”, “a multidimensional representation module”, “a multimodal recognition module”, and “a result fusion module”  invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Support cannot be found in the specification that connects the modules to any structure.  Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claims 3-8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 3, 5 and 7, the word "etc." renders the claims indefinite because the claim(s) include(s) elements not actually disclosed (those encompassed by "etc."), thereby rendering the scope of the claims unascertainable.  See MPEP § 2173.05(d).
Claim 3 recites the limitation "the foundation" in line 13.  There is insufficient antecedent basis for this limitation in the claim.
Claim 4 recites the limitation “this modal” in line 4.  IT is unclear as to which modal the applicant is referring to.
Claim 4 recites the limitation "the number of convolution channels" in line 11.  There is insufficient antecedent basis for this limitation in the claim.
Claim 4 recites the limitation “this modal features” in line 39.  It is unclear as to which modal features the applicant is referring to.
Claim 5 recites the limitation "the number of convolution kernels" in line 9.  There is insufficient antecedent basis for this limitation in the claim.
Claim 6 recites the limitations “w", “R” and “kk” in line 10 and does not define the variables.  
Claim 6 recites the limitation “it” in line 14.  It is unclear as to what “it” is referring to.
Claim 7 recites the limitation “we can” in lines 10-11.  It is unclear as to who “we” is, and further if any of the limitations following “can” are limiting because it is unclear if the recording data is exported or not.
Claim 7 refers to “FIG. 2” in line 17.  It is unclear as to what part or if all of figure 2 is being claimed.  The applicant should not refers to figures of the specification in the claims.
Claim 7 recites the limitation “We” in line 21.  IT is unclear as to who “We” is referring to.
Claim 7 recites “this” in line 24.  It is unclear as to what “this” is referring to.
Claim 8 recite the variable “R” in line 6, but does not define the variable.
Claim 8 recites the limitation "the surveys" in line 14.  There is insufficient antecedent basis for this limitation in the claim.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2 and 9 are rejected under 35 U.S.C. 103(a) as being unpatentable over U.S. Patent Application Publication No. 20150099255 (Aslan et al) view of U.S. Patent Application Publication No. 20230252224 (Tran) and U.S. Patent Application Publication No. 20200097850 (Bae et al).
Regarding claim 1, Aslan et al discloses a multimodal data-based method for recognizing cognitive engagement in classroom, comprising: step 1, constructing a dataset of student cognitive engagement recognition based on multimodal data in a classroom, the collected data (page 4, paragraph 33) together considered a dataset of fig. 1, item 110; step 2, constructing a multidimensional representation summary model of cognitive engagement concept based on multimodal data in a classroom, i.e. the dataset collected in fig. 1, item 124 with the multidimensional data of fig. 1, item 126, 110, 120, 122 including 2D and 3D images (page 2, paragraph 21); step 3, employing three methods to recognize a cognitive behavior (page 4, table 1, “consistent focus” or “positive body language”), a cognitive emotion (page 4, table 1, “fun and excitement”), and a cognitive speech (page 4, table 1, “verbal participation”) from multimodal data (page 4, paragraph 33), and, obtaining three recognition results of different modal data, i.e. the methods used to recognize the engagement levels of table 1 with recognition results of page 4, paragraph 36- page 5, paragraph 41; and step 4, fusing three single-modal recognition results obtained in step 3 (page 4, paragraph 34), and, obtaining a final cognitive engagement level of each student (page 4, paragraph 34, page 3, paragraph 30, fig. 5).
Aslan et al does not disclose expressly using deep learning methods to recognize behavior, emotion and speech, and training a model to fuse results.
Tran et al discloses using deep learning methods to recognize behavior, emotion and speech (Page 13, paragraph 235).
Aslan et al and Tran et al are combinable because they are from the same field of endeavor, i.e. recognizing human behavior.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to use deep learning methods.
The suggestion/motivation for doing so would have been to provide a more accurate method by using adaptable recognition.
Aslan et al (as modified by Tran et al) does not disclose expressly training a model to fuse results.
Bae et al discloses training a model’s weights to fuse results (page 6, paragraph 65).
Aslan et al (as modified by Tran et al)  and Bae et al are combinable because they are from the same field of endeavor, i.e. weighing results.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to train the weights.
The suggestion/motivation for doing so would have been to provide a more accurate, robust method by finding the optimal weights from data.
Therefore, it would have been obvious to combine the method of Aslan et al with the deep learning methods for recognition of Tran et al and the training of weights of Bae et al to obtain the invention as specified in claim 1.
Regarding claim 9, Aslan et al discloses a multimodal data-based system for recognizing cognitive engagement in classroom (Fig. 1), comprising: a dataset construction module (fig. 1, item 110) configured to construct a dataset of student cognitive engagement recognition based on multimodal data in classroom, the constructing of the dataset (fig. 1, item 110) of combined multimodal data (fig 1, item 112, 114, 116, 118 ; a multidimensional representation module (fig. 1, item 124) configured to obtain three dimensional representation of cognitive engagement concept in classroom i.e. the dataset collected in fig. 1, item 124 with the multidimensional data of fig. 1, item 126, 110, 120, 122 including obtaining 3D images (page 2, paragraph 21); a multimodal recognition module (fig. 2, item 124, 128) configured to recognize cognitive behavior, (page 4, table 1, “consistent focus” or “positive body language”), cognitive emotion (page 4, table 1, “fun and excitement”), and cognitive speech (page 4, table 1, “verbal participation”) through recognition models based on multimodal data respectively (page 4, paragraph 33), then, output three engagement recognition results i.e. the methods used to recognize the engagement levels of table 1 with recognition results of page 4, paragraph 36- page 5, paragraph 41; and a result fusion module configured to fuse three results of different modalities (page 4, paragraph 34), weights of different modalities are adjusted (page 4, paragraph 35), and then a decision-making method with weights of cognitive engagement guided by the surveys is utilized to output an overall level of cognitive engagement (page 4, paragraph 34, page 3, paragraph 30). Tran et al discloses using deep learning methods to recognize behavior, emotion and speech (Page 13, paragraph 235).  Bae et al discloses training a model’s weights to fuse results (page 6, paragraph 65).
Regarding claim 2, Aslan et al discloses in step 2, multimodal data comprises body posture (table 1, “Body posture”), head posture (table 1, “head pose”), eye movement (table 1, “eye tracking”), facial expression (table 1, facial expression”), class audio (Table 1, “speech recognition”), and speech text (Table 1, “Text…recognition”).  

 Allowable Subject Matter
Claims 3-8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 3 contains allowable subject matter regarding in step 2, a multidimensional representation summary model of cognitive engagement concept in classroom is constructed from three dimensions of cognitive behavior, cognitive emotion, and cognitive speech, specific construction steps are of the claimed 3 parts: (1) representing a cognitive behavior of cognitive engagement in a classroom by visual- behavioral-modal data encompassing student’s body postures, as claimed, for a video frame during class at time f, vectorizing an image corresponding to a moment, then, representing each pixel point of a whole image with a value of [0,9] as a representation result A of visual-modal encompassing body posture; (2) representing a cognitive emotion of cognitive engagement in a classroom by visual-emotional-modal encompassing student’s facial expressions, for a class video frame at time f, automatically extracting face images using an Open source Computer Vision (OpenCV) library, using extracted face images as the foundation for cognitive emotion at time f, then, representing each pixel point of a face image with a value of [0,9] to form a representation result B of visual-modal encompassing facial expression; and (3) representing a cognitive speech of cognitive engagement in a classroom by audio-verbal-modal encompassing student’s class audio, then, jointly representing cognitive speech by two ways of a pre-trained word vector and a word vector with parameters, a representation result is C.  
Claim 7 contains allowable subject matter regarding in step 1, (1) in a classroom environment, led by a teacher who imparts instruction naturally, there are multiple students participating in activities and knowledge construction, a teacher is allowed to fuse advanced technology tools and teaching modes to carry out different class activities; (2) recording students learning state in a non-invasive and non-perceptive manner, as claimed, by first, mounting a high-definition camera in front of a classroom, then, opening a camera before a class and closes the camera after a class to record a class learning situation in real-time, so  recording data is exported from a terminal system as a foundation of a cognitive engagement recognition; (3) developing a data annotation system to guide manual annotation, as claimed, during multimodal data annotation, cognitive behavior is annotated using visual-modal data with body posture., cognitive emotion is annotated using visual-modal data with facial expressions, and cognitive speech is annotated using class audio-modal data with class audio, a data annotation system is detailed using the representation and observation indices for the listed data and classification of fig. 2; (4) simultaneously annotating part of the recording data by multiple annotators, carrying out a consultation on inconsistent places, and, annotating the recording data on a large scale; (5) employing an after-class questionnaire to acquire a genuine cognitive engagement using a Likert five-point scoring method as a guidance of multimodal fusion training; and (6) extracting many video frames to obtain students cognitive engagement state at different granularities, a frame extraction rate is every 25, 50, …, or 25*f (f is an integer) frames/time (in other words cognitive engagement state is extracted every 25 frames), this condition aligns with a video frame rate of 25 fps, a frame extraction rate is configured to train deep learning models for cognitive engagement.  
Claim 8 contains allowable subject matter regarding in step 4, a recognition of a final cognitive engagement level encompassing a cognitive behavior, a cognitive emotion and a cognitive speech is achieved by calculating the overall level of engagement using the claimed equation, wherein the training networks of the three cognitive engagement states are trained separately and then calculating the claimed engagements by the claimed three deep learning models, wherein the three beta parameters are to be learned.  It is noted that Aslan et al also calculates Engagement with weighted equation (page 4, paragraph 34) but does not only use 3 states weighted by a variable as claimed.  

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHLEEN YUAN DULANEY whose telephone number is (571)272-2902. The examiner can normally be reached M1:9am-5pm, th1:9am-1pm, fri1 9am-3pm, m2: 9am-5pm, t2:9-5 th2:9am-5pm, f2: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached at 5712703717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KATHLEEN Y DULANEY/Primary Examiner, Art Unit 2666                                                                                                                                                                                                        1/20/2026

Read full office action

Prosecution Timeline

Mar 14, 2024

Application Filed

Apr 22, 2026

Non-Final Rejection mailed — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/172,497

Patent 12638668

Method and Astrophotographic Apparatus for Acquiring Images of Targets in Sky Area

3y 3m to grant Granted May 26, 2026

18/116,658

Patent 12631569

METHOD, DEVICE, SYSTEM AND COMPUTER READABLE MEDIUM FOR RAPIDLY DETECTING PEST EGG IN GRAIN BASED ON PEST EGG AND PEST HOLE STRUCTURE FEATURES

3y 2m to grant Granted May 19, 2026

18/007,299

Patent 12620110

IMAGE PROCESSING DEVICE, STEREO CAMERA DEVICE, MOBILE OBJECT, DISPARITY CALCULATING METHOD, AND IMAGE PROCESSING METHOD

3y 3m to grant Granted May 05, 2026

18/013,514

Patent 12605131

A SYSTEM AND METHOD FOR THE QUANTIFICATION OF CONTRAST AGENT

3y 3m to grant Granted Apr 21, 2026

18/025,643

Patent 12602801

IMAGE PROCESSING CIRCUITRY AND IMAGE PROCESSING METHOD FOR DEPTH ESTIMATION IN A TIME-OF-FLIGHT SYSTEM

3y 1m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

77%

Grant Probability

99%

With Interview (+23.7%)

3y 1m (~11m remaining)

Median Time to Grant

Low

PTA Risk

Based on 659 resolved cases by this examiner. Grant probability derived from career allowance rate.