Office Action Analysis: 18025619 — PERSON EVALUATION INFORMATION GENERATION METHOD

Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The office action is being examined in response to the application filed by the applicant on September 22nd, 2025.
Claims 1 – 5, 9 – 13, and 15 have been amended and are hereby entered.
Claim 18 has been added and are hereby entered.
Claims 1 – 15 and 17 - 18 are currently pending and under examination.
This action is made FINAL.

Response to Arguments
Applicant’s arguments filed on September 22nd, 2025 have been fully considered but they are not persuasive.
Regarding applicant’s arguments against the 101 rejections, the rejections are maintained for the following reasons.
Prong 2A Step 1: The Applicant argues that amended claim 1 is not directed to a mental process because certain recited limitations allegedly “cannot practically be performed in the human mind” citing MPEP 2106.04(a)(2), subsection III. A. Applicant specifically relies on the recitation of:
acquiring… video and audio data related to a plurality of persons…;
generating… a time-stamped sequence of speech and motion information… identifying a specific body motion of the at least one person to be evaluated; and
based on a time stamp… searching the time-stamped sequence of the person… within a predetermined time period following the trigger event;
However, these arguments are not persuasive. As set forth in the prior office action, the focus of claim 1, when viewed as a whole, is directed to observing human speech and body motion, comparing or associating one person’s behavior with another’s behavior, and generating an evaluation based on that comparison. Such activities constitute mental processes, including observation, evaluation, comparison, and judgment, which are expressly identified as abstract ideas under MPEP 2106.04(a)(2). The mere fact that the claim recites performance of these steps “by at least one processor” or using a particular class of model does not change the underlying character of the claimed subject matter. 

Further, Applicant’s reliance on the recitation of a “video transformer-based model” is unavailing. The claim does not recite any specific architecture, training process, data representation, or algorithmic improvement of the transformer model, nor does it describe how the model improves upon video processing, accuracy, or computational efficiency. The claim merely uses the model as a tool to extract descriptive information (text identifying body motions). Thus, the model is invoked only to automate the extraction of information that a human could conceptually identify by watching a video, such as identifying a body motion and associating it with a response. As explained in MPEP § 2106.04(a)(2), claims that merely use a computer to perform steps that reflect human observation and evaluation remain mental processes even if they are not performed “entirely in the human mind.”

Applicant further asserts that searching a time-based sequence to identify a responsive motion cannot be performed mentally. However, time-based comparison and correlation are classic forms of abstract data analysis. A human reviewer can readily a) note when a trigger event occurs, b) observe subsequent behavior within a defined time window, and c) assess responsiveness based on timing. Implementing this evaluation using timestamps and automated searching is nothing more than applying generic computer functionality to an abstract evaluative task, which does not remove the claim from the mental process grouping. Accordingly, claim 1 still recites a judicial exception under Step 2A, Prong One.


Regarding the applicant’s arguments of rejections under 35 U.S.C. § 102 for the amended/pending claims: Applicant’s arguments submitted in response to the rejections under 35 U.S.C. §§ 102 and 103 have been fully considered but are not persuasive for the reasons set forth below. Accordingly, the rejections are maintained and made FINAL.

Applicant argues that independent claim 1 is not anticipated by Bellamy because Bellamy allegedly fails to disclose (i) identifying a trigger event corresponding to a preset pattern of speech or motion, (ii) searching time-stamped sequences to identify a responsive motion within a predetermined time window, and (iii) associating text information of a trigger event with text information of a responsive motion. The Examiner respectfully disagrees.

As cited in the prior Office Action, Bellamy discloses acquiring video and audio data of multiple participants in an interaction and analyzing speech, gesture, reaction, gaze, and motion using machine-learning techniques to evaluate interpersonal dynamics (see, e.g., Bellamy [¶0054] and FIGS. 2, 4, and 6). Applicant asserts that Bellamy “merely displays recognized interplay” between participants. However, this argument improperly narrows Bellamy. Bellamy explicitly teaches: detecting and classifying multiple forms of human interactions [¶0030, 0033, 0054], analyzing relationships and reactions between participants [¶0031 – 0032, 0047 – 0048], as well as generating evaluative information based on those interactions [¶0047 – 0050]. Such analysis necessarily involves identifying interaction events and correlating participant responses, which meets the claimed limitation of generating evaluation information based on speech and motion information of multiple persons.

Further, the Examiner respectfully disagrees regarding Applicant’s arguments against the prior art failing to teach the “trigger event(s)” and “responsive motion”. Bellamy’s detection of a participant’s speech, gesture, gaze or action, and another participant’s reaction or response thereto reasonably corresponds to a trigger event as well as a responsive motion. Bellamy’s analysis of interaction dynamics inherently requires associating initiating actions with responses, which satisfies the claimed identification and association limitations when given their broadest reasonable interpretation.

Additionally, Time-stamping is inherent within the Bellamy reference. Bellamy’s processing of video and audio streams necessarily involves temporal alignment of events. The use of video frames, audio segments, and interaction sequences inherently provides time-based ordering, which reasonably corresponds to the claimed “time-stamped sequence” and identification of events occurring within a time window. Explicit recitation of the word “time-stamp” is not required for anticipation where the reference necessarily performs the claimed function (refer to 2163.07(a) In re Robertson).

Finally, Applicant further argues that Bellamy does not disclose “text information.” However, Bellamy’s classification and labeling of detected behaviors using machine-learning outputs constitutes informational representations of motions and speech. The claims do not require any specific encoding format, and Bellamy’s labeled interaction data reasonably meets this limitation.

Accordingly, Bellamy discloses each limitation of claim 1, either expressly or inherently, and the rejection of claims 1 – 15 and 17 - 18 under § 102 and 103 are maintained.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 – 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more, and therefore does not recite patent-eligible subject matter. Firstly, it should be stated that claim 1 will be representative of the independent claim set 1, 9 and 15.

Step 2A Prong 1:
The claims are directed to acquiring video and audio data, generating speech and motion information as well as evaluation information, as well as outputting the evaluation information, which falls into the abstract category of mental processes and certain methods of organizing human activity, specifically monitoring and managing human behavior as well as data recognition and analysis, as identified in the 2019 PEG, which are judicial exceptions. For example, claim 1 recites:
acquiring, by at least one processor, video data and audio data related to a plurality of persons participating in an interaction;
generating, for each of the plurality of persons, a time-stamped sequence of speech and motion information, wherein generating the speech and motion information for at least one person to be evaluated comprises applying a video transformer-based model to the video data to extract text information identifying a specific body motion of the at least one person to be evaluated;
generating, by the at least one processor, evaluation information on a basis of speech and motion information of at least one other person, a trigger event corresponding to a preset pattern of speech or motion; 
based on a time stamp of the identified trigger event, searching the time-stamped sequence of the person to be evaluated to identify a responsive motion that occurs within a predetermined time period following the trigger event; and
generating the evaluation information by associating the text information of the identified trigger event with the text information of the identified responsive motion; and
outputting the evaluation information;

These steps recite collecting, analyzing, comparing, and associating information about human behavior and interactions (i.e., interpreting speech/motion, identifying patterns, correlating response timing, and producing an evaluative association), which are activities that constitute judgments and evaluations analogous to what a human reviewer could perform when watching an interaction and assessing a person’s behavior relative to another’s behavior. Therefore, the claim is directed to evaluating a person’s performance/behavior during an interaction, which is further directed to managing and evaluating interpersonal interactions, a form of certain methods of organizing human activity. See MPEP 2106.05(a)(2).

Accordingly, the claim recites a mental process within the meaning of MPEP § 2106.04(a)(2), as the limitations are directed to concepts such as observation, recognition of patterns, comparison, correlation, and evaluation.

More importantly, the recitation that the steps are performed “by at least one processor,” and that a “video-transformer based model” is applied, does not remove the claim from the mental processes grouping where the claim as a whole is directed to automating the same type of evaluative reasoning and correlation.

Therefore, claim 1 recites an abstract idea (mental processes and certain methods of organizing human activity). Claims 2 – 15 and 17 – 18 similarly recite the abstract idea because they depend from or parallel claim 1 and add further limitations that continue to describe organizing, associating, and presenting speech/motion information relating to the evaluation. 

Step 2A Prong 2: For independent claims 1, 9 and 15, The judicial exception is not integrated into a practical application because the claims and their additional feature element(s) of a device (claim 9), at least one processor (claim 9), at least one memory (claim 9), individually an in combination, merely are used as a tool to perform the abstract idea (refer to MPEP 2106.05(f)). These element features including the computer used are recited at a high level of generality and are performed generally to apply the abstract idea without placing any limits on how these steps are performed distinctively from generic computer components and without having reach function to generally “apply” it to a computer. See MPEP 2106.05(f). Thus, the additional elements do not apply, rely on, or use the abstract idea in a meaningful way beyond linking it to a generic computing environment.
Thus, these limitations are also “merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than exception itself, and cannot integrate a judicial exception into a practical application” (refer to MPEP 2106.05(h)). Therefore, this is indicative of the fact that the claim set has not integrated the abstract idea into a practical application and therefore, the claims are found to be directed to the abstract idea identified by the examiner.

Step 2B: For independent claims 1, 9 and 15, The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Because the additional element(s) recited of at least one memory, a device, and at least one processor (claim 9), are not sufficient to amount significantly more than the judicial exception as these elements are further reciting the abstract idea. As indicated in the Step 2A Prong 2 analysis, the additional element(s) in the claims are merely, using a generic computer device or computing technologies and/or other machinery merely as a tool to a mere instruction to practice the invention. Thus, these claims do not render the claims as being eligible (refer to MPEP 2106.05(f) and 2106.05(h)). This is because the claimed invention must improve “upon conventional functioning of a computer, or upon conventional technology or technological processes a technical explanation as to how to implement the invention should be present in the specification. That is, the disclosure must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art (see MPEP 2106.05(a)). The rationale set forth for the 2nd prong of the eligibility test above is also applicable and re-evaluated in the Step 2B analysis. Therefore, this rationale is sufficient for its rejection basis as it is not patent eligible and no comments are necessary as it is also consistent with MPEP 2106.

For dependent claims 2 – 8, 10 – 14, and 17 - 18 these claims cover or fall under the same abstract idea of a method of organizing human activity and mental processes. They describe additional limitation steps of: 
Claims 2 – 8, 10 – 14, and 17 - 18: further describes the abstract idea of the method for evaluating an individual’s behavior/performance by analyzing both their speech and body movements, generating evaluation information based on the acquired audio and video data, as well as comparing or associating behavior between two individuals. Thus, being directed to the abstract idea group of mental processes as these functions encompass observation, evaluation, judgment, and opinion and can be performed mentally or in pen and paper.

Step 2A Prong 2 and Step 2B: For dependent claims, these claims do not include additional elements, but further instruct one to practice the abstract idea by using general computer components that merely are used as a tool. Thus, it amounts no more than mere instructions to apply the exception using a generic computer component (MPEP 2106.05(f)). Therefore, these claim limitations amount to no more than mere instructions to apply the exception using generic computer components and or computing technologies (e.g., that are merely deployed to be used as a tool; see MPEP 2106.05(f)). 

Additionally, these elements and their limitations are “merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application” (MPEP 2106.05(h)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 

Therefore, claims 1 – 15 and 17 - 18 are rejected under 35 U.S.C. § 101 for being directed to an abstract idea without sufficient integration into a practical application, and the additional elements do not add significantly more than the judicial exception.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 – 15 are rejected under 35 U.S.C. 102 as being unpatentable over Bellamy (U.S. Pub No. 20190147367A1).

Regarding claims 1 and 15:
Bellamy discloses:
acquiring, by at least one processor, video data and audio data related to a plurality of persons participating in an interaction; (In ¶0022; Fig. 1: teaches “Meeting software 114 enables server 102 to mine streams of video and audio, using machine learning algorithms, to identify, isolate and tag individuals, within a room or set of rooms and use those simultaneous video and audio streams.” 
generating, for each of the plurality of persons, a time-stamped sequence of speech and motion information, wherein generating the speech and motion information for at least one person to be evaluated comprises applying a video transformer-based model to the video data to extract text information identifying a specific body motion of the at least one person to be evaluated; (In ¶0022; Fig. 1: teaches “to identify, isolate and tag individuals, within a room or set of rooms and use those simultaneous video and audio streams, again using machine learning algorithms, to estimate state defined as: body or body component movement, gesture, eye movement/gaze and pupil dilation, pulse rate, and via infra-red (IR), body heat, as well as individual and group dynamics/interaction. In addition, the algorithms are used to infer such things as individual and group emotion, meaning of gestures, eye movement/gaze/pupil dilation and use body heat and pulse rate as additional information sources and confirmation sources.” Bellamy’s processing of video and audio streams necessarily involves temporal alignment of events, which reasonably corresponds to the claimed “time-stamp sequence” and identification of events occurring within a time window.)
generating, by the at least one processor, evaluation information on a basis of speech and motion information of at least one other person, a trigger event corresponding to a preset pattern of speech or motion; (In ¶0054; Fig. 2, 4, 6: teaches “meeting software 114 displays the recognized interplay, exchange, between participants using a collection of machine learning algorithms, wherein the collection of machine learning algorithms captures the different levels of human interaction such as touch, gaze, speech, reference, gesture, reaction, etc. In the example embodiment of FIG.”)
based on a time stamp of the identified trigger event, searching the time-stamped sequence of the person to be evaluated to identify a responsive motion that occurs within a predetermined time period following the trigger event; and [¶0056]: A desired type of meeting may be selected, allowing values to be asserted by the ML model to create ‘various interventions’.
generating the evaluation information by associating the text information of the identified trigger event with the text information of the identified responsive motion; and [¶0056]: A desired type of meeting may be selected, allowing values to be asserted by the ML model to create ‘various interventions.’ “..such as signaling privately to individuals to “calm down” or suggesting a side meeting between two participants.”
outputting the evaluation information. (In ¶0028; Fig. 1: teaches “In one embodiment, user interface 130 is a graphical user interface. A graphical user interface (GUI) is a type of interface that allows users to interact with peripheral devices (i.e., external computer hardware that provides input and output for a computing device, such as a keyboard and mouse) through graphical icons and visual indicators as opposed to text-based interfaces, typed command labels, or text navigation.”)

Regarding claim 2:
Bellamy discloses:
generating the evaluation information in which speech and motion information related to a motion of the person to be evaluated, generated from video data in which the person to be evaluated is captured, and the speech and motion information of the at least one other person are associated with each other. (In ¶0022; Fig. 1: teaches “Over a short period, the room(s) use machine learning algorithms to learn the behaviors of specific individuals and of a group and provide a time synchronous stream of outputs that can be used as input or feedback to the machine learning algorithms. This includes the option of tagging to attribute semantics to video and audio, and, in some cases, may be crowd-sourced.”)

Regarding claim 3:
Bellamy discloses:
generating the evaluation information in which speech and motion information represented by text related to the motion of the person to be evaluated, acquired from the video data, and the speech and motion information of the at least one other person are associated with each other. (In ¶0028; Fig. 1: teaches “A user interface, such as user interface 130, refers to the information (e.g., graphic, text, sound) that a program presents to a user and the control sequences the user employs to control the program. A graphical user interface (GUI) is a type of interface that allows users to interact with peripheral devices (i.e., external computer hardware that provides input and output for a computing device, such as a keyboard and mouse) through graphical icons and visual indicators as opposed to text-based interfaces, typed command labels, or text navigation. The actions in GUIs are often performed through direct manipulation of the graphical elements. User interface 130 sends and receives information through meeting software 134 to server 102.”)


Regarding claim 4:
Bellamy discloses:
generating the evaluation information in which speech and motion information based on a preset motion of the at least one other person and speech and motion information specifying amotion of the person to be evaluated corresponding to the motion represented by the speech and motion information of the at least one other person are associated with each other. (In ¶0022; Fig. 1: teaches “to estimate state defined as: body or body component movement, gesture, eye movement/gaze and pupil dilation, pulse rate, and via infra-red (IR), body heat, as well as individual and group dynamics/interaction. In addition, the algorithms are used to infer such things as individual and group emotion, meaning of gestures, eye movement/gaze/pupil dilation and use body heat and pulse rate as additional information sources and confirmation sources. Over a short period, the room(s) use machine learning algorithms to learn the behaviors of specific individuals and of a group and provide a time synchronous stream of outputs that can be used as input or feedback to the machine learning algorithms.” [Examiner’s Note: In view of BRI, the examiner interprets the claim language as using a predefined motion (the preset) and matching it with a corresponding motion from the person being evaluated.]

Regarding claim 5:
Bellamy discloses:
generating the evaluation information in which the speech and motion information of the person to be evaluated, associated with the speech and motion information of the at least one other person, and the video data at a time of the motion of the person to be evaluated, specified by the speech and motion information of the person to be evaluated, are associated with each other. (In ¶0031; Figs. 1 – 2: teaches “In step 204, meeting software 114 analyzes and categorizes the interactions/state and tags the state of each person over a period of time. For example, by analyzing the video stream of the meeting, meeting software 114 identifies “Rob”, “Jon” and “Danny” as currently attending the meeting, wherein “Rob”, “Jon” and “Danny” are tagged as part of the video stream. With additional analysis of the video stream, meeting software 114 can observe that Danny and Rob entered the room at about the same time, but Jon has been in the room for much of the day. From analysis of the audio portion of the video stream, or a separate audio stream of the meeting, meeting software 114 can discover that Rob is doing most of the talking with the occasional contribution by Danny, and Jon is saying nothing.”)

Regarding claim 6:
Bellamy discloses:
outputting the speech and motion information of the person to be evaluated and the video data associated with each other, in association with each other. (In ¶0054; Figs. 1 and 6C: teaches “Referring again to FIG. 6C, meeting software 114 displays the recognized interplay, exchange, between participants using a collection of machine learning algorithms, wherein the collection of machine learning algorithms captures the different levels of human interaction such as touch, gaze, speech, reference, gesture, reaction, etc.”)

Regarding claim 7:
Bellamy discloses:
wherein the evaluation information includes an evaluation result of the person to be evaluated. (In ¶0017: teaches “Embodiments in accordance with the present invention provide methods of detecting and tracking participation in a meeting and generating interventions to improve the participation by utilizing one or more sensors in a physical and remote environment to detect emotional, attentional, and dispositional states. A unified meeting representation is maintained, and interventions are delivered using avatars.”)

Regarding claim 9:
Bellamy discloses:
at least one memory configured to store instructions; and (In ¶0066; Fig. 10: teaches “Server computer 1000 includes communications fabric 1002, which provides communications between computer processor(s) 1004, memory 1006, persistent storage 1008, communications unit 1010, and input/output (I/O) interface(s) 1012.”)
at least one processor configured to execute instructions to: (In ¶0066; Fig. 10: teaches “Server computer 1000 includes communications fabric 1002, which provides communications between computer processor(s) 1004, memory 1006, persistent storage 1008, communications unit 1010, and input/output (I/O) interface(s) 1012.”)
acquire video data and audio data related to a plurality of persons participating in an interaction (In ¶0022; Fig. 1: teaches “Meeting software 114 enables server 102 to mine streams of video and audio, using machine learning algorithms, to identify, isolate and tag individuals, within a room or set of rooms and use those simultaneous video and audio streams.”)
generate, for each of the plurality of persons, a time-stamped sequence of speech and motion information, wherein to generate the speech and motion information for at least one person to be evaluated, the at least one processor is configured to apply a video-transformer based model to the video data to extract text information identifying a specific body motion of the person to be evaluated; and (In ¶0022; Fig. 1: teaches “to identify, isolate and tag individuals, within a room or set of rooms and use those simultaneous video and audio streams, again using machine learning algorithms, to estimate state defined as: body or body component movement, gesture, eye movement/gaze and pupil dilation, pulse rate, and via infra-red (IR), body heat, as well as individual and group dynamics/interaction. In addition, the algorithms are used to infer such things as individual and group emotion, meaning of gestures, eye movement/gaze/pupil dilation and use body heat and pulse rate as additional information sources and confirmation sources.”)
generate evaluation information by: identifying, from the time-stamped sequence of speech and motion information of at least one other person, a trigger event corresponding to a preset pattern of speech or motion;(In ¶0022; Fig. 1: teaches “Over a short period, the room(s) use machine learning algorithms to learn the behaviors of specific individuals and of a group and provide a time synchronous stream of outputs that can be used as input or feedback to the machine learning algorithms. This includes the option of tagging to attribute semantics to video and audio, and, in some cases, may be crowd-sourced.”)
based on a time stamp of the identified trigger event, searching the time-stamped sequence of the person to be evaluated to identify a responsive motion that occurs within a predetermined time period following the trigger event; and [¶0056]: A desired type of meeting may be selected, allowing values to be asserted by the ML model to create ‘various interventions’.
generating the evaluation information by associating the text information of the identified trigger event with the text information of the identified responsive motion; and [¶0056]: A desired type of meeting may be selected, allowing values to be asserted by the ML model to create ‘various interventions.’ “..such as signaling privately to individuals to “calm down” or suggesting a side meeting between two participants.”
output the evaluation information. (In ¶0028; Fig. 1: teaches “In one embodiment, user interface 130 is a graphical user interface. A graphical user interface (GUI) is a type of interface that allows users to interact with peripheral devices (i.e., external computer hardware that provides input and output for a computing device, such as a keyboard and mouse) through graphical icons and visual indicators as opposed to text-based interfaces, typed command labels, or text navigation.”)


Regarding claim 10:
Bellamy discloses:
wherein the at least one processor is configured to execute the instructions generate the evaluation information in which speech and motion information related to a motion of the person to be evaluated, acquired from video data in which the person to be evaluated is captured, and the speech and motion information of the at least one other person are associated with each other. (In ¶0022; Fig. 1: teaches “Over a short period, the room(s) use machine learning algorithms to learn the behaviors of specific individuals and of a group and provide a time synchronous stream of outputs that can be used as input or feedback to the machine learning algorithms. Over a long period, meeting software 114 also uses machine learning algorithms to continuously train and learn about short-term activities, new settings, meeting types, and interactions. In other example embodiments, meeting software 114 may be one or more components of operating system 112.”)

Regarding claim 11:
Bellamy discloses:
wherein the at least one processor is configured to execute the instructions to generate the evaluation information in which speech and motion information represented by text related to the motion of the person to be evaluated, acquired from the video data, and the speech and motion information of the at least one other person are associated with each other. (In ¶0028; Figs. 1, 7 and 10: teaches “A user interface, such as user interface 130, refers to the information (e.g., graphic, text, sound) that a program presents to a user and the control sequences the user employs to control the program. User interface 130 sends and receives information through meeting software 134 to server 102.”

Regarding claim 12:
Bellamy discloses:
wherein the at least one processor is configured to execute the instructions generate the evaluation information in which speech and motion information based on a preset motion of the at least one other person and speech and motion information specifying a motion of the person to be evaluated corresponding to the motion represented by the speech and motion information of the at least one other person are associated with each other. (In ¶0033; Fig. 1: teaches “Meeting software 114 is able to mine audio and video streams and identify, isolate, and tag individuals, within a room or set of rooms, and use those simultaneous audio and video streams to determine one or more states for each person such as, a body or body component movement, a gesture, eye movement/gaze and pupil dilation, pulse rate, and via IR, body heat, as well as individual and group dynamics/interaction and tag those to individuals or groups. In addition, the algorithms will infer such things as individual and group emotion, meaning of gestures, eye movement/gaze/pupil dilation and use body heat and pulse rate as additional information sources and confirmation sources.” Alternatively, ¶0033 also teaches “Over a long period, meeting software 114 also uses machine learning algorithms to continuously train and learn about short term activities, new settings, meeting types and interactions.”)

Regarding claim 13:
Bellamy discloses:
wherein the at least one processor is configured to execute the instructions to generate the evaluation information in which the speech and motion information of the person to be evaluated, associated with the speech and motion information of the at least one other person, and the video data at a time of the motion of the person to be evaluated, specified by the speech and motion information of the person to be evaluated, are associated with each other. (In ¶0033; Fig. 1: teaches “Meeting software 114 is able to mine audio and video streams and identify, isolate, and tag individuals, within a room or set of rooms, and use those simultaneous audio and video streams to determine one or more states for each person such as, a body or body component movement, a gesture, eye movement/gaze and pupil dilation, pulse rate, and via IR, body heat, as well as individual and group dynamics/interaction and tag those to individuals or groups. In addition, the algorithms will infer such things as individual and group emotion, meaning of gestures, eye movement/gaze/pupil dilation and use body heat and pulse rate as additional information sources and confirmation sources.”

Regarding claim 14:
Bellamy discloses:
wherein the at least one processor is configured to execute the instructions output the speech and motion information of the person to be evaluated and the video data associated with each other, in association with each other. (In ¶0034; Figs. 1 – 2: teaches “Meeting software 114 takes an action based on the analysis of the analyzed and categorized interactions/state of the tagged individuals as depicted in step 206. For example, meeting software 114 can display the analyzed and categorized interactions/state of the tagged individuals on a screen presenting the unified meeting room with avatars that represent all of the meeting participants. In another example embodiment, meeting software 114 can present the meeting in a virtual reality environment, where participants could observe the scenes of the unified meeting room, or by using VR goggles.”)

Claim Rejections - 35 USC § 103
 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
 	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 8, 17 and 18 are rejected under 35 U.S.C. 102 as being unpatentable over Bellamy (U.S. Pub No. 20190147367A1) in view of Bhaskaran (U.S. Pub No. 20160364692A1).

Regarding claim 8:
Though Bellamy discloses detecting interactions among users and participants during a meeting held over a network, Bellamy does not explicitly disclose wherein the person to be evaluated is an interviewee. However, Bhaskaran teaches:
wherein the person to be evaluated is an interviewee. (In ¶0013; Figs. 1 – 2: teaches “FIG. 2 illustrates an exemplary block diagram of a virtual interviewing system of FIG. 1 in accordance with some embodiments of the present disclosure;”)

It would have been obvious to one of ordinary skill in the art before the earliest effective filing date of the claimed invention to combine Bellamy’s disclosed methods of detecting interactions among users during meetings with wherein the person to be evaluated is an interviewee, as “there is an increased focus on reduction in manual processes and automation to increase productivity, finding the right talent on an ongoing basis is a major pre-requisite for retaining competitive advantage.”

Regarding claim 17:
Though Bellamy teaches machine learning models, Bellamy does not disclose evaluation score element of the limitation. Thus, Bhaskaran teaches:
wherein the evaluation information includes an evaluation score for the person to be evaluated, the evaluation score being generated by a machine learning model. [¶0031]: A candidate score, ranking, and other hierarchical data are generated for the candidate.

It would have been obvious to one of ordinary skill in the art before the earliest effective filing date of the claimed invention to combine Bellamy’s disclosed methods of detecting interactions among users during meetings with that of generating an evaluation score, as “there is an increased focus on reduction in manual processes and automation to increase productivity, finding the right talent on an ongoing basis is a major pre-requisite for retaining competitive advantage.”

Regarding claim 18:
Bellamy does not disclose the limitation below. Thus, Bhaskaran teaches:
wherein the output evaluation information is configured to support a human evaluator's decision making regarding the person to be evaluated. [¶0050 – 0051]: The assessment and scoring module (ASM) is used to evaluate a candidate’s responses by implementing a plethora of semantic techniques to evaluate validity and correctness, which is then saved to a knowledgebase.

It would have been obvious to one of ordinary skill in the art before the earliest effective filing date of the claimed invention to combine Bellamy’s disclosed methods of detecting interactions among users during meetings with output evaluation information support, as “there is an increased focus on reduction in manual processes and automation to increase productivity, finding the right talent on an ongoing basis is a major pre-requisite for retaining competitive advantage.”




Pertinent Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Choi (K.R. Patent No. 101898648B1) is pertinent because it is directed to “image analysis, and more specifically, it relates to a method and apparatus for detecting an interaction group between image characters, which can more accurately determine an interactive group considering not only the positional information of an individual in the image but also the emotional relationship between individuals And more particularly,”
Hasdell (E.P. Patent No. 2450877B1) is pertinent because it is directly related to “a system and method of speech evaluation.”
Li (U.S. Pub No. US20030154084A1) is pertinent because it is related to “the field of object identification in video data. More particularly, the invention relates to a method and system for identifying a speaking person within video data.”
Wu (U.S. Patent No. US10796217B2) is pertinent because it is directed to “Systems and methods for automatically interviewing a technical candidate are provided. The systems and method determine emotional states of the candidate and relevance scores for one or more provided answers from the candidates. The systems and methods utilized the emotional states and relevance scores to determine the next type of question and the appropriate difficulty level for the next question to ask the candidate during an automated interview.”
Shaburov (U.S. Patent No. US9576190B2) is pertinent because it is directly related to “video conferencing and, more particularly, to systems and methods for recognizing emotions of participants in video conferencing.”
Cao (U.S. Patent No. US10586368B2) is pertinent because it is related to “a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice.”
Cunico (U.S. Patent No. US10878226B2) is pertinent because it is directed to “the field of video conferencing, and more particularly to analyzing sentiment of attendees in a video conference.”

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bill Chen whose telephone number is (571)270-0660. The examiner can normally be reached Monday - Friday 8:30am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nathan Uber can be reached on (571) 270-3923. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/BILL CHEN/Examiner, Art Unit 3626                                             

/NATHAN C UBER/Supervisory Patent Examiner, Art Unit 3626
Read full office action
PERSON EVALUATION INFORMATION GENERATION METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

PERSON EVALUATION INFORMATION GENERATION METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email