Office Action Analysis: 18697413 — METHOD FOR RECOGNIZING EMOTION, TRAINING METHOD, APPARATUSES, DEVICE, STORAGE MEDIUM AND PRODUCT

Office Action

§101 §103
DETAILED ACTION 
Notice of Pre-AIA  or AIA  Status 
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Information Disclosure Statement 
	The information disclosure statements (IDS) submitted on 03/29/2024 and 08/09/2024 have been considered by the examiner.
Claim Objections
The claims are objected to for the following minor informalities: 
In claim 7, line 9, “when a updated i is less than N” should read “whenan updated i is less than N.”
In claim 24, line 6, “when the current training number of times does not reaches the preset number of times” should read “when the current training number of times does not reach the preset number of times.”
Appropriate corrections are required. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-3, 19, 20, 25 and 26 are rejected under 35 U.S.C. 101 
Regarding Independent Claim 1 and its dependent claims 2-13, and 22-24,
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 1 recites, in part:
“acquiring to-be-recognized spiking sequences corresponding to video information;
and recognizing the to-be-recognized spiking sequences…, so as to obtain a corresponding emotion category.” 
The limitations as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. The limitations of:
“acquiring to-be-recognized… sequences corresponding to video information;
and recognizing the to-be-recognized… sequences…, so as to obtain a corresponding emotion category.”
is a step, under BRI, that a human can also perform through mental processes such as observation and evaluation such as, the human mind can observe a set of given data corresponding to video information, and match them to a corresponding category of emotion. 
Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the following additional element(s) – 
“acquiring to-be-recognized spiking sequences…”
“by using a spiking neural network emotion recognition model”
The additional elements “acquiring to-be-recognized spiking sequences…” – recites an insignificant extra-solution activity of data gathering; “by using a spiking neural network emotion recognition model” – recites an intended use for a generic well-known neural network model recited at a high level of generality, without limiting further, in details, on how the neural network/machine learning model(s) function to arrive at such an outcome.
These additional elements are recited as a mere attempt to implement the abstract ideas/judicial exceptions using generic neural network/machine learning models.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim as a whole is directed to an abstract idea. Please see MPEP §2106.04.(d).III.C. 
Step 2B Analysis: there are no additional elements, such as for these additional elements as indicated above, that amount to significantly more than the judicial exception. Please see MPEP §2106.05. The claim is directed to an abstract idea.
For all of the foregoing reasons, claim 1 does not comply with the requirements of 35 USC 101. 
Accordingly, the dependent claims 2 and 3 do not provide elements that overcome the deficiencies of the independent claim 1. 
Moreover, claim 2 recites, in part, wherein clauses of merely further specification of the element that it depends on, therefore, not an indication of an integration of the abstract ideas into a practical application nor considered significantly more. 
Claim 3 recites in part “wherein training… comprises: acquiring test sets of a plurality of emotion categories; and performing test training… by using the test sets” which is a recitation of an insignificant extra-solution activity of data gathering step; under BRI, this is a step that a human can also perform through mental processes of observation and evaluation, such as observing a set of given categories for emotion and performing training using the given test sets; “on the pre-established spiking neural network emotion recognition model… to obtain a trained spiking neural network emotion recognition model” – is a generic well-known neural network model recited at a high level of generality, it’s in the claim as a mere attempt to implement the abstract ideas/judicial exceptions using a generic neural network model without further limiting how, in details, the model works to arrive at such an outcome. 
Claim 4 is directed to a specific technological implementation for processing dynamic visual data using spiking representations, which is different from conventional processing. Thus, the abstract idea is integrated into a practical application – the claim is eligible under 35 U.S.C. 101. 
Accordingly, dependent claims 5-13, and 22-24, are also determined to be patent eligible under 35 U.S.C. 101. Since these dependent claims do not have any indication of an integration of the abstract ideas into a practical application nor considered significantly more.
Regarding Independent Claim 19,
The independent claim 19 recites analogous limitations to the independent claim 1, hence, these analogous limitations are not 35 U.S.C. 101 eligible for the reasons above in the claim 1 analysis. Furthermore, claim 19 recites some additional features such as, “an electronic device, comprising: a memory, for storing a computer program; and a processor, for executing the computer program to cause the processor to:” which are features of generic computer and computer components recited at a high level of generality to perform generic well-known functions such as a processor processing instruction stored in a memory, etc. For all the foregoing reasons, claim 19 does not comply with the requirements of 35 U.S.C. 101. 
Regarding Independent Claim 20 and its dependent claims 25 and 26,
The independent claim 20 recites analogous limitations to the independent claim 1, hence, these analogous limitations are not 35 U.S.C. 101 eligible for the reasons above in the claim 1 analysis. Furthermore, claim 20 recites some additional features such as, “a computer non-transitory readable storage medium, wherein the computer non-transitory readable storage medium stores a computer program, which when executed by a processor, cause the processor to:” which are features of generic computer and computer components recited at a high level of generality to perform generic well-known functions such as a processor processing instruction stored in a memory, etc. For all the foregoing reasons, claim 20 does not comply with the requirements of 35 U.S.C. 101. 
The dependent claims 25 and 26, each recite analogous limitations to the dependent claims 2 and 3, hence, these analogous limitations are not 35 U.S.C. 101 eligible for the reasons above in the analysis above. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed
invention is not identically disclosed as set forth in section 102 of this title, if the
differences between the claimed invention and the prior art are such that the claimed
invention as a whole would have been obvious before the effective filing date of the
claimed invention to a person having ordinary skill in the art to which the claimed
invention pertains. Patentability shall not be negated by the manner in which the
invention was made.

Claims 1-12, 19, 20, and 22-26 are rejected under 35 U.S.C. 103 as being unpatentable over NUMAOKA (US 20220366723 A1), hereinafter referenced as NUMAOKA in view of SRINIVASA (US 20200218959 A1), hereinafter referenced as SRINIVASA. 

Regarding claim 1, NUMAOKA teaches a method for recognizing emotion (Fig. 10, Paragraph [0142] - NUMAOKA discloses FIG. 10 illustrates a procedure of the emotion recognition processing by the emotion inference processing logic 901 including a plurality of artificial intelligence functions in FIG. 9 in the form of a flowchart), comprising: 
acquiring to-be-recognized spiking sequences (Fig. 5, Paragraph [0111] - NUMAOKA discloses a recognition data preprocessing logic 501 performs conversion processing before input of converting a data format of output data from each of the sensors into a data format that can be input to the artificial intelligence that performs the emotion recognition processing. Paragraph [0192] - NUMAOKA discloses in order to reduce the amount of data to be stored, it is possible to store only a few frames [wherein frames is spiking sequences] in the preceding and subsequent sections in which the emotion change is particularly large. Please see FIG. 5, Paragraph [0094, 0193] wherein the neural network is a spiking neural network.) 
corresponding to information (Fig. 5, Paragraph [0088] - NUMAOKA further discloses the computer device 210 can be equipped with a position sensor (including a global positioning system (GPS) or the like) 311, an image sensor 312, a sound sensor (including a microphone or the like) 313, an odor sensor 314, a taste sensor 315, a tactile sensor 316, or other sensors as a group of sensors for learning an emotion and recognizing an emotion by the artificial intelligence function.); 
and recognizing the to-be-recognized spiking sequences (Fig. 5, Paragraph [0192] - NUMAOKA discloses in order to reduce the amount of data to be stored, it is possible to store only a few frames [wherein frames is sequences] in the preceding and subsequent sections in which the emotion change is particularly large. See also Paragraph [0193].)
 by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category (Fig. 5, Paragraph [0115] - NUMAOKA further discloses the emotion inference processing logic 502 includes, for example, artificial intelligence such as CNN, DNN, RNN, reinforcement learning neural network, autoencoder, SNN, or SVM. The artificial intelligence function of the emotion inference processing logic 502 is applied with the learned emotion recognition model read from the database 306 and infers a human emotion from the recognition data input via the recognition data preprocessing logic 501.).
Although NUMAOKA explicitly teaches acquiring to-be-recognized spiking sequences corresponding to information (Fig. 5, Paragraph [0094, 0111, 0193]),
NUMAOKA is silent on video information.
However, SRINIVASA explicitly teaches video information (Fig. 2, Paragraph [0034] - SRINIVASA discloses spiking neural network 230 is configured to process video information for at least one type of action detection. Nodes (or “neurons”) of spiking neural network 230 may be variously coupled, each via a respective one of synapses 222, to receive a respective one of one or more input spike trains 220—some or all of which may represent, or otherwise be based on, differential video data.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having video information.
Wherein having NUMAOKA’s method for recognizing emotion comprising: acquiring to-be-recognized spiking sequences corresponding to video information. 
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 2, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 1,
NUMAOKA further teaches wherein before recognizing the to-be-recognized spiking sequences by using the spiking neural network emotion recognition model (Fig. 5, Paragraph [0115, 0192]), the method further comprises:
training a pre-established spiking neural network emotion recognition model (Fig. 3, Paragraph [0094] – NUMAOKA discloses the emotion learning processing logic 304 includes an artificial intelligence using a learning model such as a convolutional neural network (CNN), a deep neural network (DNN), a recurrent neural network (RNN), a reinforcement learning (reinforcement learning) neural network, an autoencoder, a spiking neural network (SNN), or a support vector machine (SVM).), 
to obtain a trained spiking neural network emotion recognition model (Fig. 5, Paragraph [0095] – NUMAOKA discloses the emotion learning processing logic 304 performs learning of the artificial intelligence for emotion recognition through training (for example, deep learning) by inputting new learning data to the artificial intelligence to manufacture new learning model for emotion recognition different from the model before training.).

Regarding claim 3, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 2,
NUMAOKA further teaches wherein training the pre-established spiking neural network emotion recognition model (Fig. 3, Paragraph [0094] – NUMAOKA discloses the emotion learning processing logic 304 includes an artificial intelligence using a learning model such as a convolutional neural network (CNN), a deep neural network (DNN), a recurrent neural network (RNN), a reinforcement learning (reinforcement learning) neural network, an autoencoder, a spiking neural network (SNN), or a support vector machine (SVM).), 
to obtain the trained spiking neural network emotion recognition model (Fig. 5, Paragraph [0095] – NUMAOKA discloses the emotion learning processing logic 304 performs learning of the artificial intelligence for emotion recognition through training (for example, deep learning) by inputting new learning data to the artificial intelligence to manufacture new learning model for emotion recognition different from the model before training.), comprises:
acquiring test sets of a plurality of emotion categories (Fig. 6, Paragraph [0120] - NUMAOKA discloses in a case where it is determined that emotion learning of artificial intelligence may be performed or emotion recognition by artificial intelligence may be performed based on the determination criterion data 307 based on the guidelines (Yes in step S602), the learning data preprocessing logic 301 or the recognition data preprocessing logic 501 acquires learning data or recognition data from various sensors mounted on the computer device 210, the local database 303 in the computer device 210, or the cloud infrastructure 120 (step S603). See also Paragraph [0088].); 
and performing test training on the pre-established spiking neural network emotion recognition model by using the test sets (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning.), 
to obtain a trained spiking neural network emotion recognition model (Fig. 6, Paragraph [0004] – NUMAOKA discloses the face image pattern is used as an input of the neural network, the output of the neural network is associated with labels of emotions such as “anger”, “disgust”, “fear”, . . . , each face image pattern input to the neural network is compared with a label of an output considered to be appropriate, and thereby the neural network is learned or trained. Paragraph [0094] – NUMAOKA further discloses emotion learning processing logic 304 includes an artificial intelligence using a learning model such as… a spiking neural network (SNN).).

Regarding claim 4, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 2,
NUMAOKA further teaches wherein training the pre-established spiking neural network emotion recognition model (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning.), 
to obtain the trained spiking neural network emotion recognition model (Fig. 6, Paragraph [0004] – NUMAOKA discloses the face image pattern is used as an input of the neural network, the output of the neural network is associated with labels of emotions such as “anger”, “disgust”, “fear”, . . . , each face image pattern input to the neural network is compared with a label of an output considered to be appropriate, and thereby the neural network is learned or trained. See also Paragraph [0094].), comprises: 
pre-establishing an emotion recognition-based dynamic visual data set (Fig. 3, Paragraph [0088] – NUMAOKA discloses the computer device 210 can be equipped with a position sensor (including a global positioning system (GPS) or the like) 311, an image sensor 312, a sound sensor (including a microphone or the like) 313, an odor sensor 314, a taste sensor 315, a tactile sensor 316, or other sensors as a group of sensors for learning an emotion and recognizing an emotion by the artificial intelligence function. Paragraph [0090] – NUMAOKA further discloses a learning data preprocessing logic 301 performs conversion processing before input of converting a data format of output data from each of the sensors 311 to 316 . . . into a data format that can be input to the artificial intelligence that performs the emotion learning processing. Paragraph [0105] – NUMAOKA further discloses the other sensors may be any combination of sensors such as… a Dynamic Vision Sensor (DVS). Note that the DVS includes a SNN.); 
and training the pre-established spiking neural network emotion recognition model by using the dynamic visual data set (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning. See also Paragraph [0105].), 
to obtain a trained spiking neural network emotion recognition model (Fig. 6, Paragraph [0004] – NUMAOKA discloses the face image pattern is used as an input of the neural network, the output of the neural network is associated with labels of emotions such as “anger”, “disgust”, “fear”, . . . , each face image pattern input to the neural network is compared with a label of an output considered to be appropriate, and thereby the neural network is learned or trained. Paragraph [0094].).

Regarding claim 5, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 4, 
NUMAOKA further teaches wherein the process of pre-establishing the emotion recognition-based dynamic visual data set (Fig. 3, Paragraph [0088, 0090, 0105]) comprises: 
acquiring emotion recognition-based raw visual data (Fig. 3, Paragraph [0089] – NUMAOKA discloses in a case of learning an artificial intelligence function that recognizes an emotion from a facial expression of a human, the computer device 210 needs to be equipped with at least the image sensor 312 and perform training of the artificial intelligence function by inputting image data acquired by the image sensor 312 to the artificial intelligence. See also Paragraph [0091].); 
or directly acquiring a plurality of spiking sequences corresponding to the raw visual data by using a dynamic visual camera (Fig. 4, Paragraph [0105] – NUMAOKA discloses the other sensors may be any combination of sensors such as… a Dynamic Vision Sensor (DVS). Note that the DVS includes a SNN. Furthermore, inputs from other sensors may be directly input to the full coupling layer 403 without passing through the context recognition processing logic 305.); 
Although NUMAOKA explicitly teaches and establishing an emotion recognition-based dynamic visual data set on the basis of the plurality of spiking sequences (Fig. 6, Paragraph [0004] – NUMAOKA discloses the face image pattern is used as an input of the neural network, the output of the neural network is associated with labels of emotions such as “anger”, “disgust”, “fear”, . . . , each face image pattern input to the neural network is compared with a label of an output considered to be appropriate, and thereby the neural network is learned or trained. See also Paragraph [0107].).
NUMAOKA fails to explicitly teach performing simulation processing on the raw visual data by using a dynamic visual sensor simulation method, to obtain a plurality of spiking sequences corresponding to the raw visual data; 
However, SRINIVASA explicitly teaches performing simulation processing on the raw visual data by using a dynamic visual sensor simulation method (Fig. 3, Paragraph [0041] – SRINIVASA discloses (at 320) generating frames of differential video data based on raw video data. The generating may include calculating differences each between a respective two frames of video data.), 
to obtain a plurality of spiking sequences corresponding to the raw visual data (Fig. 3, Paragraph [0042] – SRINIVASA discloses (at 340) applying a spiking neural network to the one or more input signals, wherein the applying includes communicating one or more spike trains each between respective nodes of the spiking neural network. The one or more spike trains may each be based on the frames of differential video data—e.g., wherein a sequence of signal spiking by a given spike train is based on a sequence of the frames of differential video data.); 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having performing simulation processing on the raw visual data by using a dynamic visual sensor simulation method, to obtain a plurality of spiking sequences corresponding to the raw visual data.
Wherein having NUMAOKA’s method for recognizing emotion wherein having performing simulation processing on the raw visual data by using a dynamic visual sensor simulation method, to obtain a plurality of spiking sequences corresponding to the raw visual data. 
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 6, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 5, 
NUMAOKA fails to explicitly teach wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain the plurality of spiking sequences corresponding to the raw visual data, comprises: sequentially traversing N frames of video frame images in the raw dynamic video data, wherein N represents a total number of video frame images contained in the raw visual data; when traversing to a current ith frame, converting a video frame image of the current ith frame from an RGB color space to a grayscale space, and taking the converted video frame data as current video frame data, wherein the numerical range of i is from 1 to N; and when the value of i is equal to 1, assigning all floating-point data of the current video frame data to a first output channel of a first time step of simulation data, to obtain a spiking sequence composed of the first output channel, and taking the current video frame data as a previous video frame.
However, SRINIVASA explicitly teaches wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method (Fig. 3, Paragraph [0041] – SRINIVASA discloses (at 320) generating frames of differential video data based on raw video data. The generating may include calculating differences each between a respective two frames of video data.), 
to obtain the plurality of spiking sequences corresponding to the raw visual data (Fig. 3, Paragraph [0042] – SRINIVASA discloses (at 340) applying a spiking neural network to the one or more input signals, wherein the applying includes communicating one or more spike trains each between respective nodes of the spiking neural network. The one or more spike trains may each be based on the frames of differential video data—e.g., wherein a sequence of signal spiking by a given spike train is based on a sequence of the frames of differential video data.), comprises: 
sequentially traversing N frames of video frame images in the raw dynamic video data, wherein N represents a total number of video frame images contained in the raw visual data (Figs. 4A-B, Paragraph [0047] – SRINIVASA discloses video information such as that variously represented by one or more of frames 400-405 may be communicated in system 200, for example, and/or may be processed with a spiking neural network according to method 300.); 
when traversing to a current ith frame, converting a video frame image of the current ith frame from an RGB color space to a grayscale space, and taking the converted video frame data as current video frame data (Fig. 2, Paragraph [0106-0107] – SRINIVASA discloses in Example 6, the subject matter of any one or more of Examples 1, 2, 4 and 5 optionally includes the computer device further comprising circuitry to encode raw video data of the video sequence into the frames of differential video data. In Example 7, the subject matter of Example 6 optionally includes the computer device further comprising circuitry to convert the raw video data from a polychromatic color space format to a monochromatic color space format.), 
wherein the numerical range of i is from 1 to N (Fig. 7, Paragraph [0077] – SRINIVASA discloses graphs 710, 720, 730 represent concurrent outputs by nodes N1, N2, N3 during processing which detects that some first differential video data represents an activity of a first action type which, for example, is associated with node N1. Detection of the first action type may be based on a peak 740 of the output in graph 710—e.g., where peak 740 occurs while graphs 720, 730 are relatively quiescent.); 
and when the value of i is equal to 1, assigning all floating-point data of the current video frame data to a first output channel of a first time step of simulation data (Fig. 8, Paragraph [0088] – SRINIVASA discloses hardware and/or executing software (such as that of selector logic 240, for example) may select among from the multiple output signals a first output signal which is provided by the first spiking neural network.), 
to obtain a spiking sequence composed of the first output channel, and taking the current video frame data as a previous video frame (Fig. 8, Paragraph [0088] – SRINIVASA discloses based on the selecting of the first output signal, signaling may be generated to communicate that the test video sequence includes a representation of an instance of the first activity type. In such an embodiment, comparing the multiple output signals may comprise comparing a measure of output signal spiking from each of the plurality of trained spiking neural networks. See also Fig. 6, Paragraph [0072].).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain the plurality of spiking sequences corresponding to the raw visual data, comprises: sequentially traversing N frames of video frame images in the raw dynamic video data, wherein N represents a total number of video frame images contained in the raw visual data; when traversing to a current ith frame, converting a video frame image of the current ith frame from an RGB color space to a grayscale space, and taking the converted video frame data as current video frame data, wherein the numerical range of i is from 1 to N; and when the value of i is equal to 1, assigning all floating-point data of the current video frame data to a first output channel of a first time step of simulation data, to obtain a spiking sequence composed of the first output channel, and taking the current video frame data as a previous video frame.
Wherein having NUMAOKA’s method for recognizing emotion wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain the plurality of spiking sequences corresponding to the raw visual data, comprises: sequentially traversing N frames of video frame images in the raw dynamic video data, wherein N represents a total number of video frame images contained in the raw visual data; when traversing to a current ith frame, converting a video frame image of the current ith frame from an RGB color space to a grayscale space, and taking the converted video frame data as current video frame data, wherein the numerical range of i is from 1 to N; and when the value of i is equal to 1, assigning all floating-point data of the current video frame data to a first output channel of a first time step of simulation data, to obtain a spiking sequence composed of the first output channel, and taking the current video frame data as a previous video frame.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 7, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 6,
NUMAOKA fails to explicitly teach wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences, further comprises: when i is not equal to 1, respectively assigning the first output channel and a second output channel according to a preset threshold and a grayscale difference value between the current video frame and the previous video frame, and taking the current video frame data as the previous video frame; updating the value of i by adding 1; and when a updated i is less than N, executing the step of converting the video frame image of the current ith frame from the RGB color space to the grayscale space, and taking the converted video frame data as the current video frame data.
However, SRINIVASA explicitly teaches wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences (Fig. 3, Paragraph [0041, 0042]), further comprises:
when i is not equal to 1, respectively assigning the first output channel and a second output channel according to a preset threshold and a grayscale difference value between the current video frame and the previous video frame, and taking the current video frame data as the previous video frame (Figs. 4A-B, Paragraph [0052] – SRINIVASA discloses pixels, representing differential motion information across different times, may be tracked so that a basic action signature is captured. For example, this may be done by monitoring a difference of pixel intensity values (P.sub.diff) between consecutive frames of raw video data, and comparing the difference against some threshold (ε, that is user-defined) to detect the moving edges (for each location (x, y) within a frame)—e.g., according to the following: [see equations 2, 3, 4].);
updating the value of i by adding 1 (Fig. 8, Paragraph [0088] – SRINIVASA discloses a comparison may be made of multiple output signals which are each from a different respective one of the plurality of spiking neural networks—e.g., where the output signals are each based on the same given test video sequence.); 
and when a updated i is less than N, executing the step of converting the video frame image of the current ith frame from the RGB color space to the grayscale space, and taking the converted video frame data as the current video frame data (Fig. 2, Paragraph [0106-0107] – SRINIVASA discloses in Example 6, the subject matter of any one or more of Examples 1, 2, 4 and 5 optionally includes the computer device further comprising circuitry to encode raw video data of the video sequence into the frames of differential video data. In Example 7, the subject matter of Example 6 optionally includes the computer device further comprising circuitry to convert the raw video data from a polychromatic color space format to a monochromatic color space format. See also Fig. 6, Paragraph [0072]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences, further comprises: when i is not equal to 1, respectively assigning the first output channel and a second output channel according to a preset threshold and a grayscale difference value between the current video frame and the previous video frame, and taking the current video frame data as the previous video frame; updating the value of i by adding 1; and when a updated i is less than N, executing the step of converting the video frame image of the current ith frame from the RGB color space to the grayscale space, and taking the converted video frame data as the current video frame data.
Wherein having NUMAOKA’s method for recognizing emotion wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences, further comprises: when i is not equal to 1, respectively assigning the first output channel and a second output channel according to a preset threshold and a grayscale difference value between the current video frame and the previous video frame, and taking the current video frame data as the previous video frame; updating the value of i by adding 1; and when a updated i is less than N, executing the step of converting the video frame image of the current ith frame from the RGB color space to the grayscale space, and taking the converted video frame data as the current video frame data.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 8, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 7,
NUMAOKA fails to explicitly teach wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences, further comprises: when the updated i is not less than N, completing traversing of the N frames of video frame images in the raw dynamic video data, to obtain spiking sequences composed of the first output channel and the second output channel.
wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences (Fig. 3, Paragraph [0041, 0042]), further comprises: 
when the updated i is not less than N, completing traversing of the N frames of video frame images in the raw dynamic video data (Fig. 1, Paragraph [0024] – SRINIVASA discloses the determination of whether a particular neuron “fires” to provide data to a further connected neuron is dependent on the activation function applied by the neuron and the weight of the synaptic connection (e.g., w.sub.ij) from neuron i (e.g., located in a layer of the first set of nodes 110) to neuron j (e.g., located in a layer of the second set of nodes 130).), 
to obtain spiking sequences composed of the first output channel and the second output channel (Fig. 2, Paragraph [0035] – SRINIVASA discloses an already-trained spiking neural network 230 may communicate (for example, via the illustrative synapses 232 shown) one or more output spike trains which are based on one or more input spike trains 220. Such one or more output spike trains may include an instance of the type of output signaling which corresponds to the action type.). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences, further comprises: when the updated i is not less than N, completing traversing of the N frames of video frame images in the raw dynamic video data, to obtain spiking sequences composed of the first output channel and the second output channel.
Wherein having NUMAOKA’s method for recognizing emotion wherein the process of performing simulation processing on the raw visual data by using the dynamic visual sensor simulation method, to obtain corresponding spiking sequences, further comprises: when the updated i is not less than N, completing traversing of the N frames of video frame images in the raw dynamic video data, to obtain spiking sequences composed of the first output channel and the second output channel.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].
----------
Regarding claim 9, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 7, 
NUMAOKA fails to explicitly teach wherein respectively assigning the first output channel and the second output channel according to the grayscale difference value between the current video frame and the previous video frame and the preset threshold, comprises: calculating, for each pixel, a grayscale difference value between the current video frame and the previous video frame at the pixel; assigning 1 to a position corresponding to the first output channel when the grayscale difference value is greater than the preset threshold; or assigning 1 to a position corresponding to the second output channel when the grayscale difference value is less than the preset threshold.
However, SRINIVASA explicitly teaches wherein respectively assigning the first output channel and the second output channel according to the grayscale difference value between the current video frame and the previous video frame and the preset threshold (Figs. 4A-B, Paragraph [0052], Equations 2-4]), comprises: 
calculating, for each pixel, a grayscale difference value between the current video frame and the previous video frame at the pixel (Fig. 4A, Paragraph [0049] – SRINIVASA discloses frame 404 is a frame of differential video data which is calculated based on a difference between frames 401, 402. For a given pixel of frame 404, a value of that pixel may be based on a difference between two corresponding pixels of frames 401, 402.); 
assigning 1 to a position corresponding to the first output channel when the grayscale difference value is greater than the preset threshold (Figs. 4A-B, Paragraph [0052] – SRINIVASA discloses pixels, representing differential motion information across different times, may be tracked so that a basic action signature is captured. For example, this may be done by monitoring a difference of pixel intensity values (P.sub.diff) between consecutive frames of raw video data, and comparing the difference against some threshold (ε, that is user-defined) to detect the moving edges (for each location (x, y) within a frame)—e.g., according to the following: [see equations 2, 3, 4]. Paragraph [0052] – SRINIVASA further discloses a resulting weighted spike difference WSpike.sub.diff (x, y) may then be converted into a binary signal to yield the respective spike patterns for each frame of differential video data.);
 or assigning 1 to a position corresponding to the second output channel when the grayscale difference value is less than the preset threshold (Figs. 4A-B, Paragraph [0052] – SRINIVASA discloses pixels, representing differential motion information across different times, may be tracked so that a basic action signature is captured. For example, this may be done by monitoring a difference of pixel intensity values (P.sub.diff) between consecutive frames of raw video data, and comparing the difference against some threshold (ε, that is user-defined) to detect the moving edges (for each location (x, y) within a frame)—e.g., according to the following: [see equations 2, 3, 4]. Paragraph [0052] – SRINIVASA further discloses a resulting weighted spike difference WSpike.sub.diff (x, y) may then be converted into a binary signal to yield the respective spike patterns for each frame of differential video data.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein respectively assigning the first output channel and the second output channel according to the grayscale difference value between the current video frame and the previous video frame and the preset threshold, comprises: calculating, for each pixel, a grayscale difference value between the current video frame and the previous video frame at the pixel; assigning 1 to a position corresponding to the first output channel when the grayscale difference value is greater than the preset threshold; or assigning 1 to a position corresponding to the second output channel when the grayscale difference value is less than the preset threshold.
Wherein having NUMAOKA’s method for recognizing emotion wherein respectively assigning the first output channel and the second output channel according to the grayscale difference value between the current video frame and the previous video frame and the preset threshold, comprises: calculating, for each pixel, a grayscale difference value between the current video frame and the previous video frame at the pixel; assigning 1 to a position corresponding to the first output channel when the grayscale difference value is greater than the preset threshold; or assigning 1 to a position corresponding to the second output channel when the grayscale difference value is less than the preset threshold.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].
----------
Regarding claim 10, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 4, 
NUMAOKA further teaches the process of training the pre-established spiking neural network emotion recognition model by using the dynamic visual data set (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning. See also Paragraph [0105]., 
to obtain the trained spiking neural network emotion recognition model (Fig. 6, Paragraph [0004] – NUMAOKA discloses the face image pattern is used as an input of the neural network, the output of the neural network is associated with labels of emotions such as “anger”, “disgust”, “fear”, . . . , each face image pattern input to the neural network is compared with a label of an output considered to be appropriate, and thereby the neural network is learned or trained. See also Paragraph [0094].), comprises: 
initializing a parameter weight of the pre-established spiking neural network emotion recognition model (Fig. 3, Paragraph [0095] – NUMAOKA discloses the emotion learning processing logic 304 performs learning of the artificial intelligence for emotion recognition through training (for example, deep learning) by inputting new learning data to the artificial intelligence to manufacture new learning model for emotion recognition different from the model before training. In a case where the artificial intelligence is configured by a neural network, learning progresses so as to estimate an optimum output for an input while changing a coupling weighting coefficient between neurons by repeating learning using learning data, and a structured learning model for emotion recognition including the coupling weighting coefficient between neurons is manufactured.);
using the dynamic visual data set as an input to a current spiking neural network in the spiking neural network emotion recognition model (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning. See also Paragraph [0105].),
to obtain a trained spiking neural network emotion recognition model (Fig. 5, Paragraph [0095] – NUMAOKA discloses the emotion learning processing logic 304 performs learning of the artificial intelligence for emotion recognition through training (for example, deep learning) by inputting new learning data to the artificial intelligence to manufacture new learning model for emotion recognition different from the model before training.).
NUMAOKA fails to explicitly teach wherein the spiking neural network comprises a voting neuronal population component; and obtaining an output frequency of a voting neuronal population of each emotion category via forward propagation of the current spiking neural network; calculating, regarding each emotion category, an error between the output frequency of the voting neuronal population of the emotion category and a real label of a corresponding emotion category; calculating a gradient corresponding to the parameter weight according to the error, and updating the parameter weight of the current spiking neural network by using the gradient; judging whether the current spiking neural network after updating the parameter weight converges; and when it is judged that the current spiking neural network after updating the parameter weight has converged, stopping training.
However, SRINIVASA explicitly teaches wherein the spiking neural network comprises a voting neuronal population component (Fig. 8, Paragraph [0086] – SRINIVASA discloses one such assignment may be done for each auto model. Subsequently, a majority vote may be taken across all the assignments (or all auto models) for each test input to predict its class.);
and obtaining an output frequency of a voting neuronal population of each emotion category via forward propagation of the current spiking neural network (Fig. 8, Paragraph [0086] – SRINIVASA discloses a given test instance may be assigned a particular class label based on the output neuron class that generated the highest spiking response. One such assignment may be done for each auto model. Subsequently, a majority vote may be taken across all the assignments (or all auto models) for each test input to predict its class. For instance, if a majority of the auto models (i.e., ≥3) predict a particular action class (say Class 1), then an activity class output from the model may be Class 1. Now, a single auto model may not have sufficient information to recognize a particular test input correctly. One or more others of the remaining scans or auto models, in that case, may compensate for the insufficient or inaccurate output classification from one individual models.); 
calculating, regarding each emotion category, an error between the output frequency of the voting neuronal population of the emotion category and a real label of a corresponding emotion category (Fig. 6, paragraph [0074] – SRINIVASA discloses for a given target pattern, the synaptic weights of the spiked neural network may be selectively potentiated or depressed according to the following: Δw=X.sub.trace(T.sub.target−T.sub.actual) where T.sub.actual and T.sub.Target denote the time of occurrence of actual [wherein actual is the output frequency of the voting neuronal population] and desired [wherein desired is a real label] (respectively) spiking activity at the post-synaptic neuron during the simulation, and where presynaptic trace X.sub.trace models the pre-neuronal spiking history.); 
calculating a gradient corresponding to the parameter weight according to the error, and updating the parameter weight of the current spiking neural network by using the gradient (Fig. 6, Paragraph [0074] – SRINIVASA discloses Equation 12 may variously depress (or potentiate) weights at times when actual (or target) spike activity occurs. This may promote convergence of the actual spiking toward a desired network activity as training progresses.); 
judging whether the current spiking neural network after updating the parameter weight converges (Fig. 6, Paragraph [0074]); 
and when it is judged that the current spiking neural network after updating the parameter weight has converged, stopping training (Fig. 6, Paragraph [0074] – SRINIVASA discloses Equation 12 may variously depress (or potentiate) weights at times when actual (or target) spike activity occurs. This may promote convergence of the actual spiking toward a desired network activity as training progresses. Once the desired/actual activity become similar, the learning may stop, since at time instants where both desired and actual spike occur nearly simultaneously, the weight update value as per Equation 12 may become 0 or otherwise negligibly small.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein the spiking neural network comprises a voting neuronal population component; and obtaining an output frequency of a voting neuronal population of each emotion category via forward propagation of the current spiking neural network; calculating, regarding each emotion category, an error between the output frequency of the voting neuronal population of the emotion category and a real label of a corresponding emotion category; calculating a gradient corresponding to the parameter weight according to the error, and updating the parameter weight of the current spiking neural network by using the gradient; judging whether the current spiking neural network after updating the parameter weight converges; and when it is judged that the current spiking neural network after updating the parameter weight has converged, stopping training.
Wherein having NUMAOKA’s method for recognizing emotion wherein the spiking neural network comprises a voting neuronal population component; and obtaining an output frequency of a voting neuronal population of each emotion category via forward propagation of the current spiking neural network; calculating, regarding each emotion category, an error between the output frequency of the voting neuronal population of the emotion category and a real label of a corresponding emotion category; calculating a gradient corresponding to the parameter weight according to the error, and updating the parameter weight of the current spiking neural network by using the gradient; judging whether the current spiking neural network after updating the parameter weight converges; and when it is judged that the current spiking neural network after updating the parameter weight has converged, stopping training.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].
-
Regarding claim 11, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 10, 
NUMAOKA further teaches wherein the process of training the pre-established spiking neural network emotion recognition model by using the dynamic visual data set (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning. See also Paragraph [0105].), 
to obtain the trained spiking neural network emotion recognition model (Fig. 6, Paragraph [0004] – NUMAOKA discloses the face image pattern is used as an input of the neural network, the output of the neural network is associated with labels of emotions such as “anger”, “disgust”, “fear”, . . . , each face image pattern input to the neural network is compared with a label of an output considered to be appropriate, and thereby the neural network is learned or trained. See also Paragraph [0094].), further comprises: 
returning to execute the step of using the dynamic visual data set as the input to the current spiking neural network in the spiking neural network emotion recognition model (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning. See also Fig. 11, Paragraph [0149, 0105].),
NUMAOKA fails to explicitly teach when it is judged that the current spiking neural network after updating the parameter weight has not converged, and obtaining the output frequency of the voting neuronal population of each emotion category via forward propagation of the current spiking neural network.
However, SRINIVASA explicitly teaches when it is judged that the current spiking neural network after updating the parameter weight has not converged (Fig. 6, Paragraph [0074] – SRINIVASA discloses Equation 12 may variously depress (or potentiate) weights at times when actual (or target) spike activity occurs. This may promote convergence of the actual spiking toward a desired network activity as training progresses. Once the desired/actual activity become similar, the learning may stop, since at time instants where both desired and actual spike occur nearly simultaneously, the weight update value as per Equation 12 may become 0 or otherwise negligibly small.),
and obtaining the output frequency of the voting neuronal population of each emotion category via forward propagation of the current spiking neural network (Fig. 8, Paragraph [0086] – SRINIVASA discloses a given test instance may be assigned a particular class label based on the output neuron class that generated the highest spiking response. One such assignment may be done for each auto model. Subsequently, a majority vote may be taken across all the assignments (or all auto models) for each test input to predict its class. For instance, if a majority of the auto models (i.e., ≥3) predict a particular action class (say Class 1), then an activity class output from the model may be Class 1. Now, a single auto model may not have sufficient information to recognize a particular test input correctly. One or more others of the remaining scans or auto models, in that case, may compensate for the insufficient or inaccurate output classification from one individual models.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having when it is judged that the current spiking neural network after updating the parameter weight has not converged, and obtaining the output frequency of the voting neuronal population of each emotion category via forward propagation of the current spiking neural network.
Wherein having NUMAOKA’s method for recognizing emotion wherein when it is judged that the current spiking neural network after updating the parameter weight has not converged, and obtaining the output frequency of the voting neuronal population of each emotion category via forward propagation of the current spiking neural network.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 12, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 10, 
NUMAOKA fails to explicitly teach wherein judging whether the current spiking neural network after updating the parameter weight converges according to the following manners: judging whether the current spiking neural network converges by judging whether current training number of times of the current spiking neural network after updating the parameter weight reaches a preset number of times; or judging whether the current spiking neural network converges by judging whether an error reduction degree of the current spiking neural network after updating the parameter weight is stabilized within a preset range; or judging whether the current spiking neural network converges by judging whether an error of the current spiking neural network after updating the parameter weight is less than an error threshold; or judging whether the current spiking neural network after updating the parameter weight converges by a verification set in the dynamic visual data set.
However, SRINIVASA explicitly teaches wherein judging whether the current spiking neural network after updating the parameter weight converges (Fig. 6, Paragraph [0074]) according to the following manners:
judging whether the current spiking neural network converges by judging whether current training number of times of the current spiking neural network after updating the parameter weight reaches a preset number of times (Fig. 8, paragraph [0084] – SRINIVASA discloses in each time period, a particular spike frame for a given scan may be processed by the reservoir. For patterns where total number of frames is less than 300, the network may still continue to train (even in absence of input activity) on the inherent reservoir activity generated due to the recurrent dynamics. This may enable the reservoir to generalize its dynamical activity over similar input patterns belonging to the same class while discriminating patterns from other classes.); 
or judging whether the current spiking neural network converges by judging whether an error reduction degree of the current spiking neural network after updating the parameter weight is stabilized within a preset range (Fig. 4A, Paragraph [0054] – SRINIVASA discloses frame 404 (or alternatively, frame 405) may be a frame-based representation of spike train inputs which are based on a detection of movement that is represented by differential pixel values of frames 400, 401. Frames 404, 405 of differential video data may be generated, for example, based on a fixed threshold level for a spike input. In some embodiments, generation of some or all such frames may be additionally or alternatively based on a varying threshold level which, for example, selectively weights signal input for various frames (or sub-frame portions thereof). Such a weighted input may improve the detection of edge artifacts and/or other subtler movements which are indicia of a particular action.); 
or judging whether the current spiking neural network converges by judging whether an error of the current spiking neural network after updating the parameter weight is less than an error threshold (Fig. 4A, Paragraph [0054] – SRINIVASA discloses frame 404 (or alternatively, frame 405) may be a frame-based representation of spike train inputs which are based on a detection of movement that is represented by differential pixel values of frames 400, 401. Frames 404, 405 of differential video data may be generated, for example, based on a fixed threshold level for a spike input. In some embodiments, generation of some or all such frames may be additionally or alternatively based on a varying threshold level which, for example, selectively weights signal input for various frames (or sub-frame portions thereof). Such a weighted input may improve the detection of edge artifacts and/or other subtler movements which are indicia of a particular action.); 
or judging whether the current spiking neural network after updating the parameter weight converges by a verification set in the dynamic visual data set (Fig. 5, Paragraph [0062] – SRINIVASA discloses driven model 520 may be providing targets (f.sub.out) for auto model 520′ to be implemented by a spiking network configured for a given task. To achieve this, driven model 520 may be trained on an input (f.sub.D), provided via input synapses 512, that is a high pass filtered version of the desired output (f.sub.out). Paragraph [0069] – SRINIVASA further discloses the spiking output neurons of auto model 520′ (with neuronal parameters similar to those of Equation 9) may integrate the net current w′*x(t) to fire action potentials that converge toward the desired spiking activity with supervised STDP learning.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein judging whether the current spiking neural network after updating the parameter weight converges according to the following manners: judging whether the current spiking neural network converges by judging whether current training number of times of the current spiking neural network after updating the parameter weight reaches a preset number of times; or judging whether the current spiking neural network converges by judging whether an error reduction degree of the current spiking neural network after updating the parameter weight is stabilized within a preset range; or judging whether the current spiking neural network converges by judging whether an error of the current spiking neural network after updating the parameter weight is less than an error threshold; or judging whether the current spiking neural network after updating the parameter weight converges by a verification set in the dynamic visual data set.
Wherein having NUMAOKA’s method for recognizing emotion wherein judging whether the current spiking neural network after updating the parameter weight converges according to the following manners: judging whether the current spiking neural network converges by judging whether current training number of times of the current spiking neural network after updating the parameter weight reaches a preset number of times; or judging whether the current spiking neural network converges by judging whether an error reduction degree of the current spiking neural network after updating the parameter weight is stabilized within a preset range; or judging whether the current spiking neural network converges by judging whether an error of the current spiking neural network after updating the parameter weight is less than an error threshold; or judging whether the current spiking neural network after updating the parameter weight converges by a verification set in the dynamic visual data set.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 19, NUMAOKA teaches an electronic device (Fig. 1, Paragraph [0054] – NUMAOKA discloses a hardware configuration example of an information processing apparatus 100), comprising: 
a memory, for storing a computer program (Fig. 1, #102 called a storage device, Paragraph [0062] – NUMAOKA discloses a plurality of computer programs including an OS, artificial intelligence function verification manufacturing software, application software with artificial intelligence function, and a graphical user interface (GUI) is installed in the storage device 102.);
and a processor, for executing the computer program (Fig. 1, #101 called central processing unit (CPU), Paragraph [0062] – NUMAOKA discloses CPU 101 can execute these computer programs under an execution environment provided by the OS.) to cause the processor to: 
acquire to-be-recognized spiking sequences (Fig. 5, Paragraph [0111] - NUMAOKA discloses a recognition data preprocessing logic 501 performs conversion processing before input of converting a data format of output data from each of the sensors into a data format that can be input to the artificial intelligence that performs the emotion recognition processing. Paragraph [0192] - NUMAOKA discloses in order to reduce the amount of data to be stored, it is possible to store only a few frames [wherein frames is spiking sequences] in the preceding and subsequent sections in which the emotion change is particularly large. Please see FIG. 5, Paragraph [0094, 0193] wherein the neural network is a spiking neural network.)
corresponding to information (Fig. 5, Paragraph [0088] - NUMAOKA further discloses the computer device 210 can be equipped with a position sensor (including a global positioning system (GPS) or the like) 311, an image sensor 312, a sound sensor (including a microphone or the like) 313, an odor sensor 314, a taste sensor 315, a tactile sensor 316, or other sensors as a group of sensors for learning an emotion and recognizing an emotion by the artificial intelligence function.); 
and recognize the to-be-recognized spiking sequences (Fig. 5, Paragraph [0192] - NUMAOKA discloses in order to reduce the amount of data to be stored, it is possible to store only a few frames [wherein frames is sequences] in the preceding and subsequent sections in which the emotion change is particularly large. See also Paragraph [0193].)
by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category (Fig. 5, Paragraph [0115] - NUMAOKA further discloses the emotion inference processing logic 502 includes, for example, artificial intelligence such as CNN, DNN, RNN, reinforcement learning neural network, autoencoder, SNN, or SVM. The artificial intelligence function of the emotion inference processing logic 502 is applied with the learned emotion recognition model read from the database 306 and infers a human emotion from the recognition data input via the recognition data preprocessing logic 501.).
Although NUMAOKA explicitly teaches acquire to-be-recognized spiking sequences corresponding to information (Fig. 5, Paragraph [0094, 0111, 0193]), 
NUMAOKA is silent on video information.
However, SRINIVASA explicitly teaches video information (Fig. 2, Paragraph [0034] - SRINIVASA discloses spiking neural network 230 is configured to process video information for at least one type of action detection. Nodes (or “neurons”) of spiking neural network 230 may be variously coupled, each via a respective one of synapses 222, to receive a respective one of one or more input spike trains 220—some or all of which may represent, or otherwise be based on, differential video data.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA of having an electronic device, comprising: a memory, for storing a computer program; and a processor, for executing the computer program to cause the processor to: acquire to-be-recognized spiking sequences corresponding to information; and recognize the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having video information.
Wherein having NUMAOKA’s electronic device, comprising: a processor, for executing the computer program to cause the processor to: acquire to-be-recognized spiking sequences corresponding to video information. 
The motivation behind the modification would have been to obtain a more accurate device for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 20, NUMAOKA teaches a computer non-transitory readable storage medium (Fig. 1, #102 called a storage device, Paragraph [0057] – NUMAOKA discloses the storage device 102 may include a mass external storage device such as a hard disk drive (HDD)), 
wherein the computer non-transitory readable storage medium stores a computer program (Fig. 1, Paragraph [0062] – NUMAOKA discloses a plurality of computer programs including an OS, artificial intelligence function verification manufacturing software, application software with artificial intelligence function, and a graphical user interface (GUI) is installed in the storage device 102.), 
which when executed by a processor (Fig. 1, #101 called central processing unit (CPU), Paragraph [0062] – NUMAOKA discloses CPU 101 can execute these computer programs under an execution environment provided by the OS.), cause the processor to: 
acquire to-be-recognized spiking sequences (Fig. 5, Paragraph [0111] - NUMAOKA discloses a recognition data preprocessing logic 501 performs conversion processing before input of converting a data format of output data from each of the sensors into a data format that can be input to the artificial intelligence that performs the emotion recognition processing. Paragraph [0192] - NUMAOKA discloses in order to reduce the amount of data to be stored, it is possible to store only a few frames [wherein frames is spiking sequences] in the preceding and subsequent sections in which the emotion change is particularly large. Please see FIG. 5, Paragraph [0094, 0193] wherein the neural network is a spiking neural network.)
corresponding to information (Fig. 5, Paragraph [0088] - NUMAOKA further discloses the computer device 210 can be equipped with a position sensor (including a global positioning system (GPS) or the like) 311, an image sensor 312, a sound sensor (including a microphone or the like) 313, an odor sensor 314, a taste sensor 315, a tactile sensor 316, or other sensors as a group of sensors for learning an emotion and recognizing an emotion by the artificial intelligence function.); 
and recognize the to-be-recognized spiking sequences (Fig. 5, Paragraph [0192] - NUMAOKA discloses In order to reduce the amount of data to be stored, it is possible to store only a few frames [wherein frames is sequences] in the preceding and subsequent sections in which the emotion change is particularly large. See also Paragraph [0193].)
by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category (Fig. 5, Paragraph [0115] - NUMAOKA further discloses the emotion inference processing logic 502 includes, for example, artificial intelligence such as CNN, DNN, RNN, reinforcement learning neural network, autoencoder, SNN, or SVM. The artificial intelligence function of the emotion inference processing logic 502 is applied with the learned emotion recognition model read from the database 306 and infers a human emotion from the recognition data input via the recognition data preprocessing logic 501.).
Although NUMAOKA explicitly teaches acquire to-be-recognized spiking sequences corresponding to information (Fig. 5, Paragraph [0094, 0111, 0193]), 
NUMAOKA is silent on video information.
However, SRINIVASA explicitly teaches video information (Fig. 2, Paragraph [0034] - SRINIVASA discloses spiking neural network 230 is configured to process video information for at least one type of action detection. Nodes (or “neurons”) of spiking neural network 230 may be variously coupled, each via a respective one of synapses 222, to receive a respective one of one or more input spike trains 220—some or all of which may represent, or otherwise be based on, differential video data.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA of having a computer, non-transitory readable storage medium, wherein the computer non-transitory readable storage medium stores a computer program, which when executed by a processor, cause the processor to: acquire to-be-recognized spiking sequences corresponding to information; and recognize the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having video information.
Wherein having NUMAOKA’s computer non-transitory readable storage medium, when executed by a processor, cause the processor to: acquire to-be-recognized spiking sequences corresponding to video information. 
The motivation behind the modification would have been to obtain a computer non-transitory readable storage medium for storing a computer program for a more accurate method of recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 22, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 5, 
NUMAOKA fails to explicitly wherein the spiking sequences corresponding to one piece of the raw visual data are a spiking sequence array constituted by spiking sequences at each pixel positions of each video pictures in the whole raw visual data. 
However, SRINIVASA explicitly teaches wherein the spiking sequences corresponding to one piece of the raw visual data are a spiking sequence array constituted by spiking sequences at each pixel positions of each video pictures in the whole raw visual data (Fig. 2, Paragraph [0038] – SRINIVASA discloses one or more input spike trains 220 may be generated based on frames 212—e.g., wherein pixels of frames 212 are variously encoded as respective signal spikes of one or more input spike trains 220. Although some embodiments are not limited in this regard. system 200 may include (or alternatively, be coupled to) an encoder 204 to generate the one or more input spike trains 220 based on frames 212.). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein the spiking sequences corresponding to one piece of the raw visual data are a spiking sequence array constituted by spiking sequences at each pixel positions of each video pictures in the whole raw visual data. 
Wherein having NUMAOKA’s method for recognizing emotion wherein the spiking sequences corresponding to one piece of the raw visual data are a spiking sequence array constituted by spiking sequences at each pixel positions of each video pictures in the whole raw visual data. 
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 23, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 10, 
NUMAOKA fails to explicitly teach wherein the step of calculating the gradient corresponding to the parameter weight according to the error comprises: obtaining a final average error according to errors corresponding to voting neuronal populations; calculating the gradient corresponding to the parameter weight according to the average error.
However, SRINIVASA explicitly teaches wherein the step of calculating the gradient corresponding to the parameter weight according to the error (Fig. 6, Paragraph [0074] – SRINIVASA discloses Equation 12 may variously depress (or potentiate) weights at times when actual (or target) spike activity occurs.) comprises: 
obtaining a final average error according to errors corresponding to voting neuronal populations (Fig. 6, Paragraph [0074] – SRINIVASA discloses this may promote convergence of the actual spiking toward a desired network activity as training progresses. Once the desired/actual activity become similar, the learning may stop, since at time instants where both desired and actual spike occur nearly simultaneously, the weight update value as per Equation 12 may become 0 or otherwise negligibly small.); 
calculating the gradient corresponding to the parameter weight according to the average error (Fig. 6, Paragraph [0074] – SRINIVASA discloses for a given target pattern, the synaptic weights of the spiked neural network may be selectively potentiated or depressed according to the following: Δw=X.sub.trace(T.sub.target−T.sub.actual)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein the step of calculating the gradient corresponding to the parameter weight according to the error comprises: obtaining a final average error according to errors corresponding to voting neuronal populations; calculating the gradient corresponding to the parameter weight according to the average error.
Wherein having NUMAOKA’s method for recognizing emotion wherein the step of calculating the gradient corresponding to the parameter weight according to the error comprises: obtaining a final average error according to errors corresponding to voting neuronal populations; calculating the gradient corresponding to the parameter weight according to the average error.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 24, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 10, 
NUMAOKA fails to explicitly teach the methods for judging whether the current spiking neural network converges comprises: judging whether the current training number of times reaches a preset number of times; when the current training number of times reaches a preset number of times, determining that the current spiking neural network has converged, and when the current training number of times does not reaches the preset number of times, determining that the current spiking neural network has not converged; or judging whether an error reduction degree of the current spiking neural network is stabilized within a preset range; when the error reduction degree of the current spiking neural network is stabilized within the preset range, determining that the current spiking neural network has converged; and when the error reduction degree of the current spiking neural network is not stabilized within the preset range, determining that the current spiking neural network has not converged; or judging whether the current spiking neural network converges by judging whether an error based on the current spiking neural network is less than an error threshold; when the error based on the current spiking neural network is less than the error threshold, determining that the current spiking neural network has converged, and when the error based on the current spiking neural network is not less than the error threshold, determining that the current spiking neural network has not converged.
However, SRINIVASA explicitly teaches the methods for judging whether the current spiking neural network converges (Fig. 6, Paragraph [0074]) comprises: 
judging whether the current training number of times reaches a preset number of times (Fig. 8, Paragraph [0084] – SRINIVASA discloses in each time period, a particular spike frame for a given scan may be processed by the reservoir. For patterns where total number of frames is less than 300, the network may still continue to train (even in absence of input activity) on the inherent reservoir activity generated due to the recurrent dynamics.); 
when the current training number of times reaches a preset number of times, determining that the current spiking neural network has converged, and when the current training number of times does not reaches the preset number of times, determining that the current spiking neural network has not converged (Fig. 8, Paragraph [0086] – SRINIVASA discloses after training is done, test instances f.sub.in(center), f.sub.in(right), . . . f.sub.in(bottom) of CRLTB scans may be passed each to a respective one of the auto models Auto1, Auto2, . . . , Auto5 simultaneously, and the respective output activity f.sub.out_c, f.sub.out_r, . . . f.sub.out_b, observed. A given test instance may be assigned a particular class label based on the output neuron class that generated the highest spiking response. One such assignment may be done for each auto model. Subsequently, a majority vote may be taken across all the assignments (or all auto models) for each test input to predict its class. For instance, if a majority of the auto models (i.e., ≥3) predict a particular action class (say Class 1), then an activity class output from the model may be Class 1. Now, a single auto model may not have sufficient information to recognize a particular test input correctly. One or more others of the remaining scans or auto models, in that case, may compensate for the insufficient or inaccurate output classification from one individual models. See also Paragraph [0084].); 
or judging whether an error reduction degree of the current spiking neural network is stabilized within a preset range (Fig. 4A, Paragraph [0054] – SRINIVASA discloses frame 404 (or alternatively, frame 405) may be a frame-based representation of spike train inputs which are based on a detection of movement that is represented by differential pixel values of frames 400, 401. Frames 404, 405 of differential video data may be generated, for example, based on a fixed threshold level for a spike input. In some embodiments, generation of some or all such frames may be additionally or alternatively based on a varying threshold level which, for example, selectively weights signal input for various frames (or sub-frame portions thereof). Such a weighted input may improve the detection of edge artifacts and/or other subtler movements which are indicia of a particular action.); 
when the error reduction degree of the current spiking neural network is stabilized within the preset range, determining that the current spiking neural network has converged; and when the error reduction degree of the current spiking neural network is not stabilized within the preset range, determining that the current spiking neural network has not converged (Fig. 8, Paragraph [0086] – SRINIVASA discloses after training is done, test instances f.sub.in(center), f.sub.in(right), . . . f.sub.in(bottom) of CRLTB scans may be passed each to a respective one of the auto models Auto1, Auto2, . . . , Auto5 simultaneously, and the respective output activity f.sub.out_c, f.sub.out_r, . . . f.sub.out_b, observed. A given test instance may be assigned a particular class label based on the output neuron class that generated the highest spiking response. One such assignment may be done for each auto model. Subsequently, a majority vote may be taken across all the assignments (or all auto models) for each test input to predict its class. For instance, if a majority of the auto models (i.e., ≥3) predict a particular action class (say Class 1), then an activity class output from the model may be Class 1. Now, a single auto model may not have sufficient information to recognize a particular test input correctly. One or more others of the remaining scans or auto models, in that case, may compensate for the insufficient or inaccurate output classification from one individual models. See also Fig. 4A, Paragraph [0054].); 
or judging whether the current spiking neural network converges by judging whether an error based on the current spiking neural network is less than an error threshold (Fig. 4A, Paragraph [0054] – SRINIVASA discloses frame 404 (or alternatively, frame 405) may be a frame-based representation of spike train inputs which are based on a detection of movement that is represented by differential pixel values of frames 400, 401. Frames 404, 405 of differential video data may be generated, for example, based on a fixed threshold level for a spike input. In some embodiments, generation of some or all such frames may be additionally or alternatively based on a varying threshold level which, for example, selectively weights signal input for various frames (or sub-frame portions thereof). Such a weighted input may improve the detection of edge artifacts and/or other subtler movements which are indicia of a particular action.); 
when the error based on the current spiking neural network is less than the error threshold, determining that the current spiking neural network has converged, and when the error based on the current spiking neural network is not less than the error threshold, determining that the current spiking neural network has not converged (Fig. 8, Paragraph [0086] – SRINIVASA discloses after training is done, test instances f.sub.in(center), f.sub.in(right), . . . f.sub.in(bottom) of CRLTB scans may be passed each to a respective one of the auto models Auto1, Auto2, . . . , Auto5 simultaneously, and the respective output activity f.sub.out_c, f.sub.out_r, . . . f.sub.out_b, observed. A given test instance may be assigned a particular class label based on the output neuron class that generated the highest spiking response. One such assignment may be done for each auto model. Subsequently, a majority vote may be taken across all the assignments (or all auto models) for each test input to predict its class. For instance, if a majority of the auto models (i.e., ≥3) predict a particular action class (say Class 1), then an activity class output from the model may be Class 1. Now, a single auto model may not have sufficient information to recognize a particular test input correctly. One or more others of the remaining scans or auto models, in that case, may compensate for the insufficient or inaccurate output classification from one individual models. See also Paragraph [0054].).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having the methods for judging whether the current spiking neural network converges comprises: judging whether the current training number of times reaches a preset number of times; when the current training number of times reaches a preset number of times, determining that the current spiking neural network has converged, and when the current training number of times does not reaches the preset number of times, determining that the current spiking neural network has not converged; or judging whether an error reduction degree of the current spiking neural network is stabilized within a preset range; when the error reduction degree of the current spiking neural network is stabilized within the preset range, determining that the current spiking neural network has converged; and when the error reduction degree of the current spiking neural network is not stabilized within the preset range, determining that the current spiking neural network has not converged; or judging whether the current spiking neural network converges by judging whether an error based on the current spiking neural network is less than an error threshold; when the error based on the current spiking neural network is less than the error threshold, determining that the current spiking neural network has converged, and when the error based on the current spiking neural network is not less than the error threshold, determining that the current spiking neural network has not converged.
Wherein having NUMAOKA’s method for recognizing emotion wherein the methods for judging whether the current spiking neural network converges comprises: judging whether the current training number of times reaches a preset number of times; when the current training number of times reaches a preset number of times, determining that the current spiking neural network has converged, and when the current training number of times does not reaches the preset number of times, determining that the current spiking neural network has not converged; or judging whether an error reduction degree of the current spiking neural network is stabilized within a preset range; when the error reduction degree of the current spiking neural network is stabilized within the preset range, determining that the current spiking neural network has converged; and when the error reduction degree of the current spiking neural network is not stabilized within the preset range, determining that the current spiking neural network has not converged; or judging whether the current spiking neural network converges by judging whether an error based on the current spiking neural network is less than an error threshold; when the error based on the current spiking neural network is less than the error threshold, determining that the current spiking neural network has converged, and when the error based on the current spiking neural network is not less than the error threshold, determining that the current spiking neural network has not converged.
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].

Regarding claim 25, NUMAOKA in view of SRINIVASA teach the computer non-transitory readable storage medium as claimed in claim 20, 
NUMAOKA further teaches wherein the computer program further causes the processor (Fig. 1, Paragraph [0062]) to: 
train a pre-established spiking neural network emotion recognition model (Fig. 3, Paragraph [0094] – NUMAOKA discloses the emotion learning processing logic 304 includes an artificial intelligence using a learning model such as a convolutional neural network (CNN), a deep neural network (DNN), a recurrent neural network (RNN), a reinforcement learning (reinforcement learning) neural network, an autoencoder, a spiking neural network (SNN), or a support vector machine (SVM).), 
to obtain a trained spiking neural network emotion recognition model (Fig. 5, Paragraph [0095] – NUMAOKA discloses the emotion learning processing logic 304 performs learning of the artificial intelligence for emotion recognition through training (for example, deep learning) by inputting new learning data to the artificial intelligence to manufacture new learning model for emotion recognition different from the model before training.).

Regarding claim 26, NUMAOKA in view of SRINIVASA teach the computer non-transitory readable storage medium as claimed in claim 25, 
NUMAOKA further teaches wherein the computer program further causes the processor (Fig. 1, Paragraph [0062]) to: 
acquire test sets of a plurality of emotion categories (Fig. 6, Paragraph [0120] - NUMAOKA discloses in a case where it is determined that emotion learning of artificial intelligence may be performed or emotion recognition by artificial intelligence may be performed based on the determination criterion data 307 based on the guidelines (Yes in step S602), the learning data preprocessing logic 301 or the recognition data preprocessing logic 501 acquires learning data or recognition data from various sensors mounted on the computer device 210, the local database 303 in the computer device 210, or the cloud infrastructure 120 (step S603). See also Paragraph [0088].); 
and perform test training on the pre-established spiking neural network emotion recognition model by using the test sets (Fig. 6, Paragraph [0015] – NUMAOKA discloses emotion learning unit performs training to input learning data to the artificial intelligence function and perform the emotion recognition when the preprocessing unit determines to permit the learning.), 
to obtain a trained spiking neural network emotion recognition model (Fig. 6, Paragraph [0004] – NUMAOKA discloses the face image pattern is used as an input of the neural network, the output of the neural network is associated with labels of emotions such as “anger”, “disgust”, “fear”, . . . , each face image pattern input to the neural network is compared with a label of an output considered to be appropriate, and thereby the neural network is learned or trained. Paragraph [0094] – NUMAOKA further discloses emotion learning processing logic 304 includes an artificial intelligence using a learning model such as… a spiking neural network (SNN).).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over NUMAOKA (US 20220366723 A1), hereinafter referenced as NUMAOKA in view of SRINIVASA (US 20200218959 A1), hereinafter referenced as SRINIVASA, further in view of WU (US 20230042187 A1), hereinafter referenced as WU. 

----Regarding claim 13, NUMAOKA in view of SRINIVASA teach the method for recognizing emotion as claimed in claim 10, 
NUMAOKA fails to explicitly teach wherein the spiking neural network further comprises: a feature extraction component and an emotion mapping component, and the emotion mapping component is configured to map spiking sequences outputted by the voting neuronal population to a final emotion category. 
However, SRINIVASA explicitly teaches wherein the spiking neural network further comprises: a feature extraction component (Fig. 2, Paragraph [0036] – SRINIVASA discloses selector logic 240 may include any of various processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) and/or other circuitry configured to identify an action type as corresponding to the one or more output spike trains—e.g., wherein such identifying includes selecting one action type from a plurality of actions types which system 200 is trained to recognize.) 
and an emotion mapping component (Fig. 3, Paragraph [0044] – SRIVINASA discloses based on the output signal, method 300 may (at 360) perform one of training the spiked neural network to recognize an action type [wherein recognize an action type is emotion mapping], or classifying a video sequence as including a representation of an instance of the action type.), 
and the emotion mapping component is configured to map spiking sequences outputted by the voting neuronal population to a final emotion category (Fig. 3, Paragraph [0044] – SRIVINASA discloses based on the output signal, method 300 may (at 360) perform one of training the spiked neural network to recognize an action type, or classifying a video sequence as including a representation of an instance of the action type.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of SRINIVASA having wherein the spiking neural network further comprises: a feature extraction component and an emotion mapping component, and the emotion mapping component is configured to map spiking sequences outputted by the voting neuronal population to a final emotion category. 
Wherein having NUMAOKA’s method for recognizing emotion wherein the spiking neural network further comprises: a feature extraction component and an emotion mapping component, and the emotion mapping component is configured to map spiking sequences outputted by the voting neuronal population to a final emotion category. 
The motivation behind the modification would have been to obtain a more accurate method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and SRINIVASA relate to mechanisms for processing input data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and SRINIVASA relates to determining a classification of a video sequence with a spiking neural network; the time-dependent interaction via spikes turn these networks into rich dynamical systems, making them more powerful than the current generation of neural nets making these networks orders of magnitude more power-efficient. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and SRINIVASA (US 20200218959 A1), Paragraph [0002, 0019].
SRINIVASA fails to explicitly teach wherein the feature extraction component comprises a single forward extraction unit composed of convolution, normalization, Parametric Leaky-Integrate and Fire (PLIF) model and average pooling, and a network unit composed of two fully-connected layers and two PLIF, which are arranged at intervals; 
However, WU explicitly teaches wherein the feature extraction component (Fig. 5, Paragraph [0091] – WU discloses the behavior recognition system includes a data preprocessing module 510, a feature extraction module 520, a network identification module 530, a network fusion module 540, and a two-stream fusion module 550.) 
comprises a single forward extraction unit composed of convolution (Fig. 1, Paragraph [0055] – WU discloses in operation S3, spatio-temporal convolution processing is performed respectively on the feature maps of the frame images of each video clip and the feature maps of the optical flow images of each video clip, and a spatial prediction result (i.e., category probability distribution of spatial streams) and a temporal prediction result), 
normalization, Parametric Leaky-Integrate and Fire (PLIF) model and average pooling (Fig. 4, Paragraph [0086] – WU discloses the neural network described in the present disclosure adopts a network with a convergence of ANN and Spiking Neural Network (SNN), that is, the ConvLIF layer and the LIF layer are converged with the normalization layer and the pooling layer.), 
and a network unit composed of two fully-connected layers (Fig. 3, Paragraph [0068] – WU discloses the neural network includes: n Blocks (the Net Block in FIG. 3), a Reshape layer (the Reshape Layer in FIG. 3), an LIF layer (the LIF Layer in FIG. 3), a fully connected layer (the FC Layer in FIG. 3), and a Softmax layer (the Softmax Layer in FIG. 3). Paragraph [0086] – WU further discloses the LIF layer is a fully connected layer having a time series, can process information having a time series)
and two PLIF, which are arranged at intervals (Fig. 3, Paragraph [0110] – WU discloses as shown in FIG. 3, the neural network includes: n Blocks, a Reshape layer, an LIF layer, a fully connected layer and a Softmax layer; and each Block includes: a ConvLIF layer and a pooling layer, which are cascaded.);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date the claimed invention was made to combine the teachings of NUMAOKA in view of SRINIVASA of having a method for recognizing emotion, comprising: acquiring to-be-recognized spiking sequences corresponding to video information; and recognizing the to-be-recognized spiking sequences by using a spiking neural network emotion recognition model, so as to obtain a corresponding emotion category, with the teachings of WU having wherein the feature extraction component comprises a single forward extraction unit composed of convolution, normalization, Parametric Leaky-Integrate and Fire (PLIF) model and average pooling, and a network unit composed of two fully-connected layers and two PLIF, which are arranged at intervals.
Wherein having NUMAOKA’s method for recognizing emotion wherein the feature extraction component comprises a single forward extraction unit composed of convolution, normalization, Parametric Leaky-Integrate and Fire (PLIF) model and average pooling, and a network unit composed of two fully-connected layers and two PLIF, which are arranged at intervals.
The motivation behind the modification would have been to obtain an enhanced method for recognizing emotion from video data using a spiking neural network where the time-based relationship between signal spikes allows for a more power-efficient model, since both NUMAOKA and WU relate to mechanisms for processing video data using artificial intelligence functions, wherein NUMAOKA discloses an information processing apparatus that performs human emotion recognition at a necessary level on the basis of predetermined guidelines, an information processing method, and an artificial intelligence model manufacturing method that can improve the recognition accuracy and provide the emotion recognition utilization service with a high user satisfaction level, and WU relates to a behavior recognition method, a behavior recognition system, an electronic device, and a computer-readable storage medium, which not only can produce a convolution effect in an Artificial Neural Network (ANN) and reduce the calculation amount and the weight, but also can associate a plurality of pictures for processing of time series information thereamong, thereby improving recognition accuracy. Please see NUMAOKA (US 20220366723 A1), Paragraph [0024, 0141], and WU (US 20230042187 A1), Paragraph [0004, 0057].

Conclusion 
Listed below are the prior arts made of record and not relied upon but are considered pertinent to applicant’s disclosure. 
Athreya et al. (US 20210357751 A1) - Examples for event-based processing using the output of a deep neural network are described herein. In some examples, event format data may be provided to a spiking neural network (SNN). The SNN may perform processing on the event format data. The SNN may be trained for processing the event format data based on an output of a deep neural network (DNN) trained for processing of sensing data.… Fig. 1, Abstract. 
Martin et al. (US 20190370598 A1) - Described is a system for detecting change of context in a video stream on an autonomous platform. The system extracts salient patches from image frames in the video stream. Each salient patch is translated to a concept vector. A recurrent neural network is enervated with the concept vector, resulting in activations of the recurrent neural network. The activations are classified, and the classified activations are mapped onto context classes. A change in context class is detected in the image frames, and the system causes the autonomous platform to perform an automatic operation to adapt to the change of context class.… Figs. 3, 6, Abstract. 
Tran et al. (US 20180253840 A1) - A mirror system includes a visual display disposed to convey information and images during an active period; and the visual display disposed to provide a reflected image during an inactive period; a learning machine receiving data from one or more cameras; and a processor coupled to the visual display, the learning machine, and the camera.… Fig. 1, Abstract. 
Cao et al. (US 20170300788 A1) - Described is a system for object detection in images or videos using spiking neural networks. An intensity saliency map is generated from an intensity of an input image having color components using a spiking neural network. Additionally, a color saliency map is generated from a plurality of colors in the input image using a spiking neural network. An object detection model is generated by combining the intensity saliency map and multiple color saliency maps. The object detection model is used to detect multiple objects of interest in the input image.… Fig. 1, Abstract.
Piekniewski et al. (US 20130297542 A1) - Apparatus and methods for feedback in a spiking neural network. In one approach, spiking neurons receive sensory stimulus and context signal that correspond to the same context. When the stimulus provides sufficient excitation, neurons generate response. Context connections are adjusted according to inverse spike-timing dependent plasticity. When the context signal precedes the post synaptic spike, context synaptic connections are depressed. Conversely, whenever the context signal follows the post synaptic spike, the connections are potentiated… Fig. 4, Abstract. 
Bekolay et al. (US 20180197529 A1) - A system extracting features from a time-varying signal comprising a computer processor and a computer readable medium having computer executable instructions for providing: a bank of bandpass filters; a module approximating the output of those filters with nonlinear components; a module representing a decorrelated projection of the output of the filters with nonlinear components; and a module representing the temporal derivative of the decorrelated information with nonlinear components… Fig. 1, Abstract. 
 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEZAWIT N SHIMELES whose telephone number is (571)272-7663. The examiner can normally be reached M-F 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at (571) 272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BEZAWIT NOLAWI SHIMELES/Examiner, Art Unit 2673    
                                                                                                                                                                                                    /CHINEYERE WILLS-BURNS/Supervisory Patent Examiner, Art Unit 2673
Read full office action
METHOD FOR RECOGNIZING EMOTION, TRAINING METHOD, APPARATUSES, DEVICE, STORAGE MEDIUM AND PRODUCT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD FOR RECOGNIZING EMOTION, TRAINING METHOD, APPARATUSES, DEVICE, STORAGE MEDIUM AND PRODUCT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email