Office Action Analysis: 18024242 — SYSTEM AND METHOD FOR AVOIDING CATASTROPHIC FORGETTING IN AN ARTIFICIAL NEURAL NETWORK

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. FR2009220, filed on September 11, 2020.

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on March 1, 2023 is/are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement(s) is/are being considered by the examiner.

Claim Objections
Claims 1 and 13 are objected to because the claims recite “training the first artificial neural network, or another artificial neural network” and “using the trained first or further artificial neural network.” It is unclear if this further artificial neural network refers to the previously mentioned another artificial neural network or a distinct artificial neural network. For purposes of examination, the further artificial neural network will be construed as another artificial neural network.

Claims 8 and 19 are objected to because the claims recite the first sample. However, claims 8 and 19 describe injecting a second sample, but not a first sample¸ and their respective parent claims (1 and 13) also do not introduce the term first sample. For purposes of examination, examiner will construe the first sample as the second sample.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4 and 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over HASSAN (US 20210303886 A1) in view of ANS ("Neural networks with a self-refreshing memory: knowledge transfer in sequential learning tasks without catastrophic forgetting"), AGUIRRE (US 20190135300 A1), and NAKATA (US 20200090036 A1), hereafter HASSAN, ANS, AGUIRRE, and NAKATA respectively.

Regarding Claim 1:
HASSAN teaches:
A method comprising: training an artificial neural network by: initially training a first artificial neural network with first input data and first pseudo data, […] (HASSAN [0031] teaches: "At block 320, a pre-trained deep neural network model is trained (i.e., training) using the plurality of real roadway scene images (i.e., with first input data) and the plurality of augmented roadway scene images (i.e., and first pseudo data) to generate a newly-trained deep neural network model. For example, the deep neural network model 116 stored in the memory 106 is trained to detect traffic lights in image data using the plurality of real roadway scene images and the plurality of augmented roadway scene images. In some implementations, the pre-trained deep neural network model is an untrained deep neural network model (i.e., a first artificial neural network).")
HASSAN is not relied upon for teaching:
wherein the first pseudo data is or was generated by a second artificial neural network in a virgin state, or by the first artificial neural network while in a virgin state;
generating second pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge from the first artificial neural network to the second artificial neural network; and
training the first artificial neural network, or another artificial neural network, with the second pseudo data and second input data; and
using the trained first or further artificial neural network in a hardware system to control one or more actuators.
However, ANS teaches: generating second pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge from the first artificial neural network to the second artificial neural network; (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output (i.e., generating second pseudo data) and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 (i.e., using the first artificial neural network) auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. [...] For each of the successive random input seeds, the corresponding pseudoitems generated by reverberation within NET 1 are trained in NET 2.")
training the first artificial neural network, or another artificial neural network, with the second pseudo data and second input data; (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2  input and output layers for training (i.e., training [...] another artificial neural network). [...] A new population of external items (i.e., second input data) must be trained concurrently with pseudoitems originating continuously (i.e., second pseudo data) from NET 2, these pseudoitems being generated exactly as in NET 1 during stage (I). Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items.")
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN and ANS before them, to include ANS’ reverberating architecture with self-refreshing memory in HASSAN’s detection method. One would have been motivated to make such a combination in order to overcome catastrophic forgetting in sequential learning tasks (ANS [pg. 4, section 2.1 A dual-network architecture]).
HASSAN in view of ANS is not relied upon for teaching:
wherein the first pseudo data is or was generated by a second artificial neural network in a virgin state, or by the first artificial neural network while in a virgin state;
using the trained first or further artificial neural network in a hardware system to control one or more actuators.
However, AGUIRRE teaches: using the trained first or further artificial neural network in a hardware system to control one or more actuators. (AGUIRRE [0026] teaches: "[…] collect sensor data, process the sensor data, and analyze the sensor data as disclosed herein to detect anomalies in the sensor data and control operations of the autonomous vehicle 100." AGUIRRE [0087] teaches: "[0087] FIG. 12 is a block diagram of an example processing platform 1200 structured to execute the instructions of FIG. 11 to implement the example autonomous driving apparatus 300 of FIG. 3 and/or the example anomaly detection apparatus 306 of FIGS. 3 and 4 to perform unsupervised multimodal anomaly detection for the example autonomous vehicle 100 (FIGS. 1-5). The processor platform 1200 can be, for example, a server, a computer, a self-learning machine (e.g., a neural network) (i.e., trained neural network), or any other type of computing device." Examiner's note: under BRI, "one […] actuator" can be interpreted as the autonomous vehicle, which is controlled by using the outputs of the collected and processed sensor data. A person having ordinary skill in the art could configure ANS’ NET 1 neural network to control AGUIRRE’s autonomous vehicle (i.e., actuator).
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, ANS, and AGUIRRE before them, to include AGUIRRE’s sensors that collects data to train auto-encoders and control the operations of an autonomous vehicle in HASSAN and ANS’ detection method. One would have been motivated to make such a combination in order to address sensor malfunction over time in autonomous vehicles and making the autonomous vehicle computer sufficiently unaffected by changes in sensor operation, thereby increasing the flexibility and robustness of the autonomous vehicle computer (AGUIRRE [0096]).
HASSAN in view of ANS and AGUIRRE is not relied upon for teaching, but NAKATA teaches: wherein the first pseudo data is or was generated by a second artificial neural network in a virgin state, or by the first artificial neural network while in a virgin state; (NAKATA [0030] teaches: "The generation unit 101 initializes a generation model that generates pseudo data (Step S101). For example, the generation model is a neural network model with three layers. [...] In the initialization, the weight of each node of the generation model is set randomly, for example." Examiner's note: Applicant defines "virgin state" as a neural network with its parameters and weights set to random values, per paragraph [0072] of the instant application. Under broadest reasonable interpretation, a second artificial neural network in a virgin state can be interpreted as the generation model neural network being initialized so that the weight of each node is set randomly.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, ANS, AGUIRRE and NAKATA before them, to include NAKATA's generation model that generates pseudo data in HASSAN, HASSAN, ANS, and AGUIRRE’s detection method. One would have been motivated to make such a combination in order to generate pseudo data to increase the precision of the classification model (NAKATA [0051]).

Regarding Claim 4:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 1 as outlined above. ANS further teaches:
prior to generating the second pseudo data, at least partially transferring knowledge held by the first artificial neural network to the second artificial neural network, […] (ANS [pg. 3, section 2. Reverberating networks with a self-refreshing memory: sequential learning without catastrophic forgetting] teaches: "A simple explanation of the basic functioning of the architecture, shown in figure 1, would be to consider an initial state in which NET 1 has already learned completely a given set of external associative items and NET 2 is still "empty" (i.e. with random connection weights). [...] Stage (I) is in fact intended to ‘transport’ the previously learned information from NET 1 to NET 2.")
[...] wherein the second pseudo data is generated using the second artificial neural network, wherein the training of the first artificial neural network with the second pseudo data and second input data is performed at least partially in parallel with the generation of pseudo data by the second artificial neural network. (ANS [pg. 1, Abstract] teaches: "We show that transfer is more efficient when two related tasks are sequentially learned than when they are learned concurrently. ANS [pg. 8, section 2.2. simulations] teaches: "In stage (I), NET 2 is trained (i.e., training of the first artificial neural network) on pseudoitems (i.e., with the second pseudo data) generated by NET 1 from random input seeds (i.e., wherein the second pseudo data is generated using the second artificial neural network)." ANS [pg. 2, section 1. Introduction] teaches: "Once the learning of the first set A of associative pairs is completed and before the learning of the second set B starts, the network is stimulated by random input patterns, each generating a corresponding output pattern. These input–output pairs are successively stored in a pseudopopulation which is then considered as having captured something reflecting the set A structure. During the learning of the second set B (i.e., second input data), the network is concurrently trained (i.e., performed as least partially in parallel) on the input–output pairs previously stored in the pseudopopulation (i.e., with generation of pseudo data by the second artificial neural network).")

Regarding Claim 7:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 1 as outlined above. ANS further teaches:
wherein generating the first pseudo data comprises: a) injecting a first random sample into the first or second artificial neural network, (ANS [pg. 7, section 2.2 Simulations] teaches: "The noise generator produces binary inputs, though random real-valued inputs should be suitable as well." ANS [pg. 5, Figure 1] teaches Stage (I) with NET 1 receiving the binary inputs or random real-valued inputs (i.e., injecting a first random sample into the first or second artificial neural network) from the Noise Generator. Figure 1 also shows Stage (II), where NET 2 receives inputs from the Noise Generator for generating input activity exactly as for NET 1 during Stage (I).)
wherein the first or second artificial neural network is configured to implement at least an auto-associative function for replicating input samples at one or more of its outputs, at least some of the replicated input samples present at the outputs forming the first pseudo data. (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. [...] For each of the successive random input seeds, the corresponding pseudoitems generated by reverberation within NET 1 are trained in NET 2." ANS [pg. 6-7, section 2.2 Simulations] teaches: "For each of the two simulation conditions, we start from an initial state of the system in which NET 1 has already completely learn the first set of 916 items (Dec-Add or Max-Op population) and NET 2 is still empty." Examiner's note: ANS’ Figure 1 explicitly teaches NET 1 and NET 2 having an auto-associative part (i.e., to implement at least an auto-associative function) which generates pseudoitems by the reinjecting process. Therefore, under broadest reasonable interpretation, for replicating input samples at one or more of its outputs can be interpreted as the reinjecting process that generates pseudoitems (i.e., at least some of the replicated input samples present at the outputs forming the first pseudo data) based on the Noise Generator inputs to NET 1 or NET 2, as described in ANS’ Figure 1.)

Regarding Claim 8:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 1 as outlined above. ANS further teaches:
wherein generating the second pseudo data comprises: a) injecting a second sample into the first or second artificial neural network, (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process.")
wherein the first or second artificial neural network is configured to implement at least an auto-associative function for replicating input samples at one or more of its outputs, at least some of the replicated input samples present at the outputs forming the second pseudo data, (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. [...] For each of the successive random input seeds, the corresponding pseudoitems generated by reverberation within NET 1 are trained in NET 2." ANS [pg. 6-7, section 2.2 Simulations] teaches: "For each of the two simulation conditions, we start from an initial state of the system in which NET 1 has already completely learn the first set of 916 items (Dec-Add or Max-Op population) and NET 2 is still empty." Examiner's note: ANS’ Figure 1 explicitly teaches NET 1 and NET 2 having an auto-associative part (i.e., to implement at least an auto-associative function) which generates pseudoitems by the reinjecting process. Therefore, under broadest reasonable interpretation, for replicating input samples at one or more of its outputs can be interpreted as the reinjecting process that generates pseudoitems (i.e., at least some of the replicated input samples present at the outputs forming the first pseudo data) based on the Noise Generator inputs to NET 1 or NET 2, as described in ANS’ Figure 1.)
wherein the first sample is a random sample or a real sample. (ANS [pg. 7, section 2.2 Simulations] teaches: "The noise generator produces binary inputs, though random real-valued inputs should be suitable as well." Examiner’s note: Examiner construes the first sample as the second sample, as noted in the objections for claims 8 and 19.)

Regarding Claim 9:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 7 as outlined above. ANS further teaches:
wherein generating the first and/or second pseudo data further comprises: b) reinjecting a pseudo sample, generated based on the replicated sample present at the one or more outputs of the first or second artificial neural network, into the first or second artificial neural network in order to generate a new replicated sample at the one or more outputs; and c) repeating b) one or more times to generate a plurality of reinjected pseudo samples; (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process.")
wherein the first and/or second pseudo data comprises at least two of said reinjected pseudo samples originating from the same first or second sample and corresponding output values generated by the first or second artificial neural network. (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process. After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. The current NET 1 output plays the role of a pseudo heteroassociative target for NET 2 and the current NET 1 input plays the role of both a pseudo input and a pseudo auto-associative target for NET 2." Furthermore, ANS [pg. 4, Figure 1] teaches the back and forth flow of activity for both NET 1 and NET 2, where both networks can reinject the auto-associative target inputs (i.e., reinjected pseudo samples originating from the same first or second sample) and output hetero-associative targets (i.e., corresponding output values generated by the first or second artificial neural network).)

Regarding Claim 10:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 9 as outlined above. ANS further teaches:
wherein the first and/or second artificial neural network implements a learning function, […], (ANS [pg. 3-4, section 2.1 A dual-network architecture] teaches: "When NET 1 (i.e., the first artificial neural network) is presented with a given external input–target pair to learn (i.e., implements a learning function), the error function to be minimized in the learning algorithm is based not only on the error between the computed output and the output target, but also on the error between the computed activation from hidden units to input units and the external input pattern. [...] The current NET 1 output plays the role of a pseudo heteroassociative target for NET 2 […].” ANS [pg. 4, Figure 1] teaches the back and forth flow of activity for both NET 1 and NET 2, where both networks can and output hetero-associative targets.)
and wherein the corresponding output values of the first pseudo data comprise pseudo […] generated by the learning function based on the reinjected pseudo samples. (ANS [pg. 4, Figure 1] teaches NET 1 learning pseudoitems generated by NET2 and outputting a pseudo heteroassociative target (i.e., pseudo targets generated by the learning function). This process is continuous for learning new populations, and therefore the pseudoitem outputs of NET 2 are reinjected into NET 1 for producing new output and input activity, based on the reinjected pseudoitems from NET2.
NAKATA further teaches: which is for example a classification function (NAKATA [0015] teaches: "(S3) A classification model is learned to accurately classify the training data and the pseudo data into an existing class and a pseudo class. (S4) The classification model is learned to accurately classify the training data into a class of each supervised label.")
[…] pseudo labels […] (NAKATA [0015] teaches: "(S3) A classification model is learned to accurately classify the training data and the pseudo data into an existing class and a pseudo class. (S4) The classification model is learned to accurately classify the training data into a class of each supervised label.")

Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of ANS, AGUIRRE, and NAKATA as applied above to claim 1, and further in view of ROBINS (“Catastrophic Forgetting, Rehearsal and Pseudorehearsal”) and MAYER (US 20210241074 A1), hereafter ROBINS and MAYER.

Regarding Claim 2:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 1 as outlined above. NAKATA further teaches:
prior to initially training the […] artificial neural network: generating the first pseudo data using the […] artificial neural network while in the virgin state; (NAKATA [0030] teaches: "The generation unit 101 initializes a generation model that generates pseudo data (Step S101). For example, the generation model is a neural network model with three layers. [...] In the initialization, the weight of each node of the generation model is set randomly, for example." Examiner's note: Applicant defines "virgin state" as a neural network with its parameters and weights set to random values, per paragraph [0072] of the instant application. Under broadest reasonable interpretation, the […] artificial neural network in a virgin state can be interpreted as the generation model neural network being initialized so that the weight of each node is set randomly.)
However, HASSAN in view of ANS, AGUIRRE, and NAKATA are not relied upon for teaching, but ROBINS teaches: generating the first pseudo data using the first artificial neural network while in […] state; (ROBINS [pg. 12, section 4. Pseudorehearsal] teaches: "This approach, which we have called `pseudorehearsal’, therefore provides a method for integrating new information into a network (i.e., the first artificial neural network) without requiring any access to the population on which the network was originally trained. [...] Pseudorehearsal is based on the use in the rehearsal process of artificially constructed populations of `pseudo-items’ instead of the `actual’ previously learned items. A pseudo-item is constructed by generating a new input vector (setting at random 50% of input elements to 0 and 50% to 1 as usual), and passing it forwards through the network in the standard way. Whatever output vector this input generates becomes the associated target output. (Note that using standard backpropagation these output vectors will contain real values instead of the binary values used in the actual items.” Examiner’s note: ROBINS teaches a single network generating pseudo-items for a pseudorehearsal process in order to integrate new information into a single network.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, ANS, AGUIRRE, NAKATA, and ROBINS before them, to include ROBINS’ pseudorehearsal in HASSAN, ANS, AGUIRRE, and NAKATA’s detection method. One would have been motivated to make such a combination in order to provide the benefits of rehearsal without requiring access to the original information on which the network was trained, by retraining the model with previously learned information as new information is introduced (ROBINS [pg. 4-5, section 1. Introduction).
However, HASSAN in view of ANS, AGUIRRE, NAKATA, and ROBINS are not relied upon for teaching, but MAYER teaches: and storing the first pseudo data to a memory. (MAYER [0026] teaches: "In an embodiment, the system comprises a data storage, wherein the data storage is configured to store the digital data and the generated synthetic digital data, or wherein the data storage is configured to only store the generated synthetic digital data. This achieves the advantage that either a large data set, which includes both real world and synthetic data, can be provided, or a data set, which only includes synthetic data, can be provided. The latter can be used when the real world data is classified or should not be used for other reasons.")
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN in view of ANS, AGUIRRE, NAKATA, ROBINS, and MAYER before them, to include MAYER’s data storage configured to stored generated synthetic data in  HASSAN, ANS, AGUIRRE, NAKATA, and ROBINS’ detection method. One would have been motivated to make such a combination for the advantage of providing a large dataset, which includes real world and synthetic data or a data set which only includes synthetic data in case the real data should not be used (MAYER [0026]).

Regarding Claim 3:
HASSAN in view of ANS, AGUIRRE, NAKATA, and ROBINS teaches the elements of claim 2 as outlined above. ROBINS further teaches:
wherein the second pseudo data is generated by the first artificial neural network and stored […] prior to training the first artificial neural network with the second pseudo data and the second input data. (ROBINS [pg. 12, section 4. Pseudorehearsal] teaches: "This approach, which we have called `pseudorehearsal’, therefore provides a method for integrating new information into a network (i.e., the first artificial neural network) without requiring any access to the population on which the network was originally trained. [...] Pseudorehearsal is based on the use in the rehearsal process of artificially constructed populations of `pseudo-items’ instead of the `actual’ previously learned items. A pseudo-item is constructed by generating a new input vector (setting at random 50% of input elements to 0 and 50% to 1 as usual), and passing it forwards through the network in the standard way. Whatever output vector this input generates becomes the associated target output. (Note that using standard backpropagation these output vectors will contain real values instead of the binary values used in the actual items.” ROBINS [pg. 8, 2.3 Recency Rehearsal] teaches: “In our simulations, the base population (1, 4 or 20 items) was trained using backpropagation in the usual way as described in Section 2.1. Additional items were then added one at a time and trained in a buffer of the four most recent items […].” ROBINS [pg. 10, section 3.3. Sweep Rehearsal] teaches: “Intervening items are introduced one at a time and trained in a buffer with three previously learned items as usual; however, the three previously learned items are chosen at random for each epoch. Training progresses over a number of epochs until the new item, which is always in the buffer, is trained to criterion.” Examiner’s note: ROBINS teaches a single network generating pseudo-items for a pseudorehearsal process in order to integrate new information into a single network. Additionally, ROBINS teaches a buffer that includes (i.e., stored) previously learned items.)
MAYER further teaches: wherein the […] pseudo data is […] stored to the memory (MAYER [0026] teaches: "In an embodiment, the system comprises a data storage, wherein the data storage is configured to store the digital data and the generated synthetic digital data, or wherein the data storage is configured to only store the generated synthetic digital data. This achieves the advantage that either a large data set, which includes both real world and synthetic data, can be provided, or a data set, which only includes synthetic data, can be provided. The latter can be used when the real world data is classified or should not be used for other reasons.")

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of ANS, AGUIRRE, and NAKATA as applied above to claim 1, and further in view of MAJUMDAR (US 20170039469 A1), hereafter MAJUMDAR.

Regarding Claim 5:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 1 as outlined above. ANS further teaches:
if the one or more third input data samples do not correspond to a […] that is already known, generating third pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge again from the first artificial neural network to the second artificial neural network, [...] (ANS [pg. 2, section 1. Introduction] teaches: "Catastrophic interference can be eliminated in sequential learning by using a rehearsal mechanism: the old information previously learned by a network is continually refreshed (i.e. retrained) during the learning of new information." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output (i.e., generating third pseudo data) and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 (i.e., using the first artificial neural network) auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. [...] For each of the successive random input seeds, the corresponding pseudoitems generated by reverberation within NET 1 are trained in NET 2." ANS [pg. 4, Figure 1] teaches: “Stage (I): the NET 2 network is learning pseudoitems generated by the reverberating process in NET 1 (transport of NET 1memory towards NET 2) (i.e., following at least partially transferring knowledge […] from the first artificial neural network to the second artificial neural network). Stage (II): the NET 1 network is learning external items along with pseudoitems generated by the reverberating process in NET 2 (learning with self-refreshing of old information).” ANS [pg. 4, section 2.1 A dual-network architecture] teaches: “Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items.” ANS [pg.1, Abstract] teaches: “With a self-refreshing memory network knowledge can be saved for a long time and therefore reused in subsequent acquisitions.” Examiner’s note: ANS teaches a method for acquiring new knowledge in a way that each time a new population with items that do not belong to a learnt population are to be learned, the learning is done with self-refreshing memory for transferring knowledge from NET 1 to NET 2. (i.e., transferring knowledge again). Furthermore, ANS’ Figure 1 Stage (II) teaches NET2 generating auto-associative targets (i.e., pseudo data) after Stage (I), where the transport of NET 1 memory towards NET 2 occurs.)
[…] and training the first artificial neural network with the third pseudo data and third input data samples. (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2  input and output layers for training. [...] A new population of external items must be trained concurrently with pseudoitems originating continuously from NET 2, these pseudoitems being generated exactly as in NET 1 during stage (I). Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items (i.e., third input data samples).")
HASSAN in view of ANS, AGUIRRE, and NAKATA is not relied upon for teaching, but MAJUMDAR teaches: detecting, using a novelty detector, whether one or more third input data samples correspond to a class that is already known to the first artificial neural network; (MAJUMDAR [0003] teaches: "Certain aspects of the present disclosure generally relate to machine learning and, more particularly, to improving systems and methods for detecting unknown classes and initializing classifiers for unknown classes." MAJUMDAR [0038] teaches: "Furthermore, a second classifier (i.e., using a novelty detector) may be specified to categorize a sample as known or unknown based on the classification scores." MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network (i.e., the first artificial neural network), a sample (i.e., third input data samples) may not belong to any of the classes that were used to train a multi-class classifier (i.e., first artificial neural network). Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class. The multi-class classifier may be referred to as the classifier." MAJUMDAR [0071] teaches: "Additionally, in some cases there may only be one class for a classifier. For example, the classifier may be specified to determine whether an object is a dog. In this example, when an image is presented to the classifier, the object is classified either as a dog or not a dog. Still, it may be desirable to create a framework to initialize a classifier to recognize an input as belonging to the one known class or to assign an unknown label to the input (i.e., correspond to a class that is already known)." Examiner's note: Paragraphs [0136]-[0138] of the specification describe using a novelty detector for implementing incremental learning after training and during the inference phase (e.g., testing phase) of the ANN. Under broadest reasonable interpretation, the one or more third input data samples are input data fed to the artificial during network during the inference phase. MAJUMDAR [0069] explicitly teaches initializing a classifier (i.e., novelty detector) for labeling the input data as known or unknown during the test of a network.)
[…] class that is already known […] (MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network, a sample may not belong to any of the classes that were used to train a multi-class classifier. Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, ANS, AGUIRRE, NAKATA, and MAJUMDAR before them, to include MAJUMDAR’s method of detecting unknown classes using classifier to label an object as belonging to a known or unknown class in HASSAN, ANS, AGUIRRE, and NAKATA’s detection method. One would have been motivated to make such a combination so that when a machine learning model receives a sample during a test phase, we can detect if the sample belongs to a known or unknown class (MAJUMDAR [0069]).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of ANS, AGUIRRE, and NAKATA as applied above to claim 1, and further in view of MAJUMDAR and DERR (US 20200007249 A1), hereafter DERR.

Regarding Claim 6:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 1 as outlined above. HASSAN in view of ANS, AGUIRRE, and NAKATA is not relied upon for teaching, but MAJUMDAR teaches:
detecting, by a controller, whether one or more third input data samples correspond to a new distribution not already learnt by the first artificial neural network; (MAJUMDAR [0003] teaches: "Certain aspects of the present disclosure generally relate to machine learning and, more particularly, to improving systems and methods for detecting unknown classes and initializing classifiers for unknown classes." MAJUMDAR [0041] teaches: “In an aspect of the present disclosure, the instructions loaded into the general-purpose processor 102 (i.e., controller) may comprise code for generating a first classifier for first classes. In one configuration, the output has a dimension of at least two. The instructions loaded into the general-purpose processor 102 may also comprise code for designing a second classifier to receive the output of the first classifier to decide whether the input data belongs to the first classes or at least one second class.” MAJUMDAR [0038] teaches: "Furthermore, a second classifier may be specified to categorize a sample as known or unknown based on the classification scores." MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network, a sample (i.e., third input data samples) may not belong to any of the classes (i.e., correspond to a new distribution) that were used to train a multi-class classifier (i.e., first artificial neural network). Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class. The multi-class classifier may be referred to as the classifier." MAJUMDAR [0071] teaches: "Additionally, in some cases there may only be one class for a classifier. For example, the classifier may be specified to determine whether an object is a dog. In this example, when an image is presented to the classifier, the object is classified either as a dog or not a dog. Still, it may be desirable to create a framework to initialize a classifier to recognize an input as belonging to the one known class or to assign an unknown label to the input.")
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, ANS, AGUIRRE, NAKATA, and MAJUMDAR before them, to include MAJUMDAR’s processor for detecting unknown classes using classifier to label an object as belonging to a known or unknown class in HASSAN, ANS, AGUIRRE, and NAKATA’s detection method. One would have been motivated to make such a combination so that when a machine learning model receives a sample during a test phase, we can detect if the sample belongs to a known or unknown class (MAJUMDAR [0069]).
However, HASSAN in view of ANS, AGUIRRE, NAKATA, and MAJUMDAR are not relied upon for teaching, but DERR teaches: if the one or more third input data samples do correspond to the new distribution, creating a new system for learning the one or more third input data samples, the new system comprising at least a further first artificial neural network. (DERR [0186] teaches: "At block 1306, known signals are classified and unknown signals detected using RF measurements 1304 from RF measurement source 1302. Results 1308 including known signal classes and unknown signal classes are reported to users 1310. At block 1312, unknown signal characterization is performed on the detected unknown signals from block 1306 using incremental learning techniques to train new classifiers to classify unknown signals." Examiner's note: Paragraph [0136] of the specification states: "Furthermore, after training and during the inference phase of the ANN, it may be desirable to permit incremental learning, such that new classes of data samples can continue to be learn and handled by the system." DERR triggers the incremental learning for training new classifiers to classify unknown signals. A person having ordinary skill in the art would recognize that incremental learning is conceptually equivalent to ANS' sequential learning.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, ANS, AGUIRRE, NAKATA, MAJUMDAR, and DERR before them, to include DERR’s training of new classifiers when unknown signals (i.e., input data) is received in HASSAN, ANS, AGUIRRE, NAKATA, and MAJUMDAR’s detection method. One would have been motivated to make such a combination in order to incrementally build learned models and deploying classifiers for improving the accuracy of a deployed monitoring system over time (DERR [0183]).

Claims 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of ANS, AGUIRRE, and NAKATA as applied above to claim 1, and further in view of MAJUMDAR and DELGADO (US 20210089895 A1), hereafter DELGADO.

Regarding Claim 11:
HASSAN in view of ANS, AGUIRRE, and NAKATA teaches the elements of claim 1 as outlined above. AGUIRRE further teaches:
providing, by one or more sensors, one or more third input data samples to a novelty detector; (AGUIRRE [0041] teaches: "The anomaly detection apparatus 306 is provided with the example anomaly detector 412 (i.e., novelty detector) to detect anomalies represented in collected sensor data (i.e., one or more third input data samples). For example, the anomaly detector 412 can detect an anomaly in the probabilistic deviation estimation generated by the extractive deviation distribution analyzer 410." AGUIRRE [0026] teaches: “The example autonomous driving apparatus 300 represents operational components that execute as part of operating the autonomous vehicle 100 to collect sensor data, process the sensor data, and analyze the sensor data as disclosed herein to detect anomalies in the sensor data and control operations of the autonomous vehicle 100.”)
However, HASSAN in view of ANS, AGUIRRE, and NAKATA is not relied upon for teaching, but MAJUMDAR teaches: detecting whether the one or more third input data samples correspond to a class that is already known to the first artificial neural network, and if so, […] (MAJUMDAR [0003] teaches: "Certain aspects of the present disclosure generally relate to machine learning and, more particularly, to improving systems and methods for detecting unknown classes and initializing classifiers for unknown classes." MAJUMDAR [0038] teaches: "Furthermore, a second classifier may be specified to categorize a sample as known or unknown based on the classification scores." MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network (i.e., the first artificial neural network), a sample (i.e., third input data samples) may not belong to any of the classes that were used to train a multi-class classifier (i.e., first artificial neural network). Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class. The multi-class classifier may be referred to as the classifier." MAJUMDAR [0071] teaches: "Additionally, in some cases there may only be one class for a classifier. For example, the classifier may be specified to determine whether an object is a dog. In this example, when an image is presented to the classifier, the object is classified either as a dog or not a dog. Still, it may be desirable to create a framework to initialize a classifier to recognize an input as belonging to the one known class or to assign an unknown label to the input (i.e., correspond to a class that is already known)." Examiner's note: Paragraphs [0136]-[0138] of the specification describe using a novelty detector for implementing incremental learning after training and during the inference phase (e.g., testing phase) of the ANN. Under broadest reasonable interpretation, the one or more third input data samples are input data fed to the artificial during network during the inference phase. MAJUMDAR [0069] explicitly teaches initializing a classifier (i.e., novelty detector) for labeling the input data as known or unknown during the test of a network.)
However, HASSAN in view of ANS, AGUIRRE, NAKATA, and MAJUMDAR is not relied upon for teaching, but DELGADO teaches: […] providing the third input data samples to an inference module comprising the trained first or further artificial neural network in order to generate a predicted label for controlling the one or more actuators. (DELGADO [0009] teaches: “Furthermore, an indication of confidence in the reliability of the models results (predictions) is desirable.” DELGADO [0010] teaches: “For a counterfactual generation device, the obtained explanations and measures of reliability of the class predictions from a neural network (i.e., inference module) may further be used for the control of devices, such as the vehicle control for automated driving (i.e., for controlling the one or more actuators).” DELGADO [0100] teaches: “In a neural network designed for classification such as neural network 300, the output layer 303 receives values from at least one of the preceding hidden layers, e.g., from hidden layer 302b.” DELGADO [0101] teaches: “[0101] In the following, class predictions may also be referred to as predictions, predicted class labels or predicted classification labels (i.e., predicted label).” DELGADO [0104] teaches: “In order for the neural network 300 to be able to classify (i.e., in order to generate a predicted label) input sensor data (i.e., providing the third input data samples to an inference module), in particular image data, it is first trained (i.e., comprising the trained first or further artificial neural network) accordingly based on input training (image) data.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, ANS, AGUIRRE, NAKATA, MAJUMDAR, and DELGADO before them, to include DELGADO’s generating prediction labels for controlling an autonomous vehicle in HASSAN, ANS, AGUIRRE, NAKATA, and MAJUMDAR’s detection method. One would have been motivated to make such a combination in order to find meaningful (understandable and/or plausible for a human being) of the class prediction and/or to identify the root causes of the classification and/or to show to a user of the method which significant and meaningful changes are necessary for the neural network to change the classification score of a given input sensor data sample (DELGADO [0014]).

Regarding Claim 12:
HASSAN in view of ANS, AGUIRRE, NAKATA, MAJUMDAR, and DELGADO teaches the elements of claim 11 as outlined above. MAJUMDAR further teaches:
if the one or more third input data samples do not correspond to a class that is already known to the first artificial neural network: (MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network (i.e., the first artificial neural network), a sample (i.e., third input data samples) may not belong to any of the classes that were used to train a multi-class classifier (i.e., first artificial neural network). Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class (i.e., do not correspond to a class that is already known).
ANS further teaches: generating third pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge again from the first artificial neural network to the second artificial neural network; (ANS [pg. 2, section 1. Introduction] teaches: "Catastrophic interference can be eliminated in sequential learning by using a rehearsal mechanism: the old information previously learned by a network is continually refreshed (i.e. retrained) during the learning of new information." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output (i.e., generating third pseudo data) and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 (i.e., using the first artificial neural network) auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. [...] For each of the successive random input seeds, the corresponding pseudoitems generated by reverberation within NET 1 are trained in NET 2." ANS [pg. 4, Figure 1] teaches: “Stage (I): the NET 2 network is learning pseudoitems generated by the reverberating process in NET 1 (transport of NET 1memory towards NET 2) (i.e., following at least partially transferring knowledge […] from the first artificial neural network to the second artificial neural network). Stage (II): the NET 1 network is learning external items along with pseudoitems generated by the reverberating process in NET 2 (learning with self-refreshing of old information).” ANS [pg. 4, section 2.1 A dual-network architecture] teaches: “Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items.” ANS [pg.1, Abstract] teaches: “With a self-refreshing memory network knowledge can be saved for a long time and therefore reused in subsequent acquisitions.” Examiner’s note: ANS teaches a method for acquiring new knowledge in a way that each time a new population with items that do not belong to a learnt population are to be learned, the learning is done with self-refreshing memory for transferring knowledge from NET 1 to NET 2. (i.e., transferring knowledge again). Furthermore, ANS’ Figure 1 Stage (II) teaches NET2 generating auto-associative targets (i.e., pseudo data) after Stage (I), where the transport of NET 1 memory towards NET 2 occurs.)
training the first artificial neural network with the third pseudo data and third input data samples. (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2  input and output layers for training. [...] A new population of external items must be trained (i.e., training) concurrently with pseudoitems originating continuously from NET 2, these pseudoitems being generated exactly as in NET 1 during stage (I). Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items (i.e., third input data samples).")

Claims 13 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN (US 20200302339 A1) and ANS, hereafter RAGHAVAN.

Regarding Claim 13:
HASSAN teaches: 
A system comprising: a first artificial neural network; (HASSAN [0031] teaches: “In some implementations, the pre-trained deep neural network model is an untrained deep neural network model (i.e., a first artificial neural network).")
initially training the first artificial neural network with first input data and with the first pseudo data; (HASSAN [0031] teaches: "At block 320, a pre-trained deep neural network model is trained (i.e., training) using the plurality of real roadway scene images (i.e., with first input data) and the plurality of augmented roadway scene images (i.e., and first pseudo data) to generate a newly-trained deep neural network model. For example, the deep neural network model 116 stored in the memory 106 is trained to detect traffic lights in image data using the plurality of real roadway scene images and the plurality of augmented roadway scene images. In some implementations, the pre-trained deep neural network model is an untrained deep neural network model (i.e., a first artificial neural network).")
HASSAN is not relied upon for teaching:
either a second artificial neural network in a virgin state and configured to generate first pseudo data, or a memory storing the first pseudo data generated by the first artificial neural network while in a virgin state;
one or more actuators; and
one or more circuits or processors configured to generate old memories for use in training the first artificial neural network by:
generating second pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge from the first artificial neural network to the second artificial neural network;
training the first artificial neural network, or another artificial neural network, with the second pseudo data and second input data; and
controlling the one or more actuators using the trained first or further artificial neural network.
However, NAKATA teaches: either a second artificial neural network in a virgin state and configured to generate first pseudo data, or a memory storing the first pseudo data generated by the first artificial neural network while in a virgin state; (NAKATA [0030] teaches: "The generation unit 101 initializes a generation model that generates pseudo data (i.e., and configured to generate first pseudo data) (Step S101). For example, the generation model is a neural network model with three layers. [...] In the initialization, the weight of each node of the generation model is set randomly, for example." Examiner's note: Applicant defines "virgin state" as a neural network with its parameters and weights set to random values, per paragraph [0072] of the instant application. Under broadest reasonable interpretation, a second artificial neural network in a virgin state can be interpreted as the generation model neural network being initialized so that the weight of each node is set randomly.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN and NAKATA before them, to include NAKATA's generation model that generates pseudo data in HASSAN’s detection method. One would have been motivated to make such a combination in order to generate pseudo data to increase the precision of the classification model (NAKATA [0051]).
HASSAN in view of NAKATA is not relied upon for teaching:
one or more actuators; and
one or more circuits or processors configured to generate old memories for use in training the first artificial neural network by:
generating second pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge from the first artificial neural network to the second artificial neural network;
training the first artificial neural network, or another artificial neural network, with the second pseudo data and second input data; and
controlling the one or more actuators using the trained first or further artificial neural network.
However, AGUIRRE teaches: one or more actuators; (AGUIRRE [0026] teaches: “The example autonomous driving apparatus 300 represents operational components that execute as part of operating the autonomous vehicle 100 to collect sensor data, process the sensor data, and analyze the sensor data as disclosed herein to detect anomalies in the sensor data and control operations of the autonomous vehicle 100 (i.e., one or more actuators).”)
controlling the one or more actuators using the trained first or further artificial neural network. (AGUIRRE [0026] teaches: "[…] collect sensor data, process the sensor data, and analyze the sensor data as disclosed herein to detect anomalies in the sensor data and control operations of the autonomous vehicle 100." AGUIRRE [0087] teaches: "[0087] FIG. 12 is a block diagram of an example processing platform 1200 structured to execute the instructions of FIG. 11 to implement the example autonomous driving apparatus 300 of FIG. 3 and/or the example anomaly detection apparatus 306 of FIGS. 3 and 4 to perform unsupervised multimodal anomaly detection for the example autonomous vehicle 100 (FIGS. 1-5). The processor platform 1200 can be, for example, a server, a computer, a self-learning machine (e.g., a neural network) (i.e., trained neural network), or any other type of computing device." Examiner's note: under BRI, "one […] actuator" can be interpreted as the autonomous vehicle, which is controlled by using the outputs of the collected and processed sensor data. A person having ordinary skill in the art could configure ANS’ NET 1 neural network to control AGUIRRE’s autonomous vehicle (i.e., actuator).)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, NAKATA, and AGUIRRE before them, to include AGUIRRE’s sensors that collects data to train auto-encoders and control the operations of an autonomous vehicle in HASSAN and NAKATA’ detection method. One would have been motivated to make such a combination in order to address sensor malfunction over time in autonomous vehicles and making the autonomous vehicle computer sufficiently unaffected by changes in sensor operation, thereby increasing the flexibility and robustness of the autonomous vehicle computer (AGUIRRE [0096]).
HASSAN in view of NAKATA and AGUIRRE is not relied upon for teaching:
one or more circuits or processors configured to generate old memories for use in training the first artificial neural network by:
generating second pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge from the first artificial neural network to the second artificial neural network;
training the first artificial neural network, or another artificial neural network, with the second pseudo data and second input data; and
However, RAGHAVAN teaches: one or more circuits or processors configured to generate old memories for use in training the first artificial neural network (RAGHAVAN [0009] teaches: “Furthermore, the techniques disclosed herein may enable a machine learning system to perform scalable, lifelong learning of solutions for new tasks the machine learning system has not previously been trained to solve, while reducing the occurrence of catastrophic forgetting (e.g., forgetting solutions to old tasks as a result of learning solutions to new tasks).” RAGHAVAN [0010] teaches: “[…] generative memory is capable of generating the auxiliary data from old tasks for use in training the machine learning model to obtain labels for new tasks for which the machine learning model has not previously been trained.” RAGHAVAN [0043] teaches: “Computation engine 230 includes machine learning system 102, observational module 118 and generative memory 104. Each of machine learning system 102, observational module 118, and generative memory 104 may represent software executable by processing circuitry 206 and stored on storage device 208, or a combination of hardware and software. Such processing circuitry 206 may include any one or more of a microprocessor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or equivalent discrete or integrated logic circuitry.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, NAKATA, AGUIRRE, and RAGHAVAN before them, to include RAGHAVAN’s generative memory in HASSAN, NAKATA, and AGUIRRE’s detection method. One would have been motivated to make such a combination in order to obtain labels for new tasks for which the machine learning model has not previously learned to reduce the occurrence of catastrophic forgetting (RAGHAVAN [0009-0010]).
HASSAN in view of NAKATA, AGUIRRE, and RAGHAVAN is not relied upon for teaching, but ANS teaches: generating second pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge from the first artificial neural network to the second artificial neural network; (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output (i.e., generating second pseudo data) and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 (i.e., using the first artificial neural network) auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. [...] For each of the successive random input seeds, the corresponding pseudoitems generated by reverberation within NET 1 are trained in NET 2.")
training the first artificial neural network, or another artificial neural network, with the second pseudo data and second input data; (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2  input and output layers for training (i.e., training [...] another artificial neural network). [...] A new population of external items (i.e., second input data) must be trained concurrently with pseudoitems originating continuously (i.e., second pseudo data) from NET 2, these pseudoitems being generated exactly as in NET 1 during stage (I). Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items.")
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, NAKATA, AGUIRRE, RAGHAVAN and ANS before them, to include ANS’ reverberating architecture with self-refreshing memory in HASSAN, NAKATA, AGUIRRE, and RAGHAVAN’s detection method. One would have been motivated to make such a combination in order to overcome catastrophic forgetting in sequential learning tasks (ANS [pg. 4, section 2.1 A dual-network architecture]).

Regarding Claim 18:
HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS teaches the elements of claim 13 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale. NAKATA further teaches:
wherein the one or more circuits or processors is configured to generate the first pseudo data […] (NAKATA [0027] teaches: “For example, the units (the generation unit 101, the learning unit 102, the classification unit 103, and the output control unit 104) described above are implemented by a singular processor or plural processors (i.e., one or more […] processors).” NAKATA [0030] teaches: "The generation unit 101 initializes a generation model that generates pseudo data (i.e., configured to generate the first pseudo data) (Step S101).”)

Regarding Claim 19:
HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS teaches the elements of claim 13 as outlined above. NAKATA further teaches: 
wherein the one or more circuits or processors is configured to generate the […] pseudo data […] (NAKATA [0027] teaches: “For example, the units (the generation unit 101, the learning unit 102, the classification unit 103, and the output control unit 104) described above are implemented by a singular processor or plural processors (i.e., one or more […] processors).” NAKATA [0030] teaches: "The generation unit 101 initializes a generation model that generates pseudo data (i.e., configured to generate […] pseudo data) (Step S101).”)
ANS further teaches: […] generate the second pseudo data by: a) injecting a second sample into the first or second artificial neural network, wherein the first or second artificial neural network is configured to implement at least an auto-associative function for replicating input samples at one or more of its outputs, at least some of the replicated input samples present at the outputs forming the second pseudo data, wherein the first sample is a random sample or a real sample. (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process." Examiner’s note: A person of ordinary skill in the art would recognize that ANS teaches generating pseudo auto-associative targets and that NAKATA teaches a generation unit for generating pseudo data, and therefore ANS’ pseudo auto-associative targets (i.e., pseudo data) can be generated using NAKATA’s generation unit.)

Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS as applied to claim 13 above, and further in view of MAJUMDAR.

Regarding Claim 14:
HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS teaches the elements of claim 13 as outlined above. ANS further teaches:
generate third pseudo data using the first artificial neural network, or using the second artificial neural network following at least partially transferring knowledge again from the first artificial neural network to the second artificial neural network; (ANS [pg. 2, section 1. Introduction] teaches: "Catastrophic interference can be eliminated in sequential learning by using a rehearsal mechanism: the old information previously learned by a network is continually refreshed (i.e. retrained) during the learning of new information." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "This first resulting input layer activity is then reinjected in the hidden layer, which creates a new output (i.e., generating third pseudo data) and input activity. This second input activity is reinjected in the hidden layer, hence recreating a third input–output activity, and so on. This back and forth flow of activity between the hidden and input layers is termed a ‘reverberating’ process." ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 (i.e., using the first artificial neural network) auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2 input and output layers for training. [...] For each of the successive random input seeds, the corresponding pseudoitems generated by reverberation within NET 1 are trained in NET 2." ANS [pg. 4, Figure 1] teaches: “Stage (I): the NET 2 network is learning pseudoitems generated by the reverberating process in NET 1 (transport of NET 1memory towards NET 2) (i.e., following at least partially transferring knowledge […] from the first artificial neural network to the second artificial neural network). Stage (II): the NET 1 network is learning external items along with pseudoitems generated by the reverberating process in NET 2 (learning with self-refreshing of old information).” ANS [pg. 4, section 2.1 A dual-network architecture] teaches: “Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items.” ANS [pg.1, Abstract] teaches: “With a self-refreshing memory network knowledge can be saved for a long time and therefore reused in subsequent acquisitions.” Examiner’s note: ANS teaches a method for acquiring new knowledge in a way that each time a new population with items that do not belong to a learnt population are to be learned, the learning is done with self-refreshing memory for transferring knowledge from NET 1 to NET 2. (i.e., transferring knowledge again). Furthermore, ANS’ Figure 1 Stage (II) teaches NET2 generating auto-associative targets (i.e., pseudo data) after Stage (I), where the transport of NET 1 memory towards NET 2 occurs.)
training the first artificial neural network with the third pseudo data and third input data samples. (ANS [pg. 4, section 2.1 A dual-network architecture] teaches: "After a fixed number of reinjections (which is a simulation parameter, denoted R) within the NET 1 auto-associative part, the current generated input and output activities are, respectively, transmitted to the NET 2  input and output layers for training. [...] A new population of external items must be trained concurrently with pseudoitems originating continuously from NET 2, these pseudoitems being generated exactly as in NET 1 during stage (I). Subsequently, stages (I) and (II) are supposed to work alternatively for each new occurring population of ‘actual’ items (i.e., third input data samples).")
However, HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS is not relied upon for teaching, but MAJUMDAR teaches: further comprising a novelty detector configured to: detect whether one or more third input data samples correspond to a class that is already known to the first artificial neural network; (MAJUMDAR [0003] teaches: "Certain aspects of the present disclosure generally relate to machine learning and, more particularly, to improving systems and methods for detecting unknown classes and initializing classifiers for unknown classes." MAJUMDAR [0038] teaches: "Furthermore, a second classifier (i.e., a novelty detector) may be specified to categorize a sample as known or unknown (i.e., configured to detect) based on the classification scores." MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network (i.e., the first artificial neural network), a sample (i.e., third input data samples) may not belong to any of the classes that were used to train a multi-class classifier (i.e., first artificial neural network). Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class. The multi-class classifier may be referred to as the classifier." MAJUMDAR [0071] teaches: "Additionally, in some cases there may only be one class for a classifier. For example, the classifier may be specified to determine whether an object is a dog. In this example, when an image is presented to the classifier, the object is classified either as a dog or not a dog. Still, it may be desirable to create a framework to initialize a classifier to recognize an input as belonging to the one known class or to assign an unknown label to the input (i.e., correspond to a class that is already known)." Examiner's note: Paragraphs [0136]-[0138] of the specification describe using a novelty detector for implementing incremental learning after training and during the inference phase (e.g., testing phase) of the ANN. Under broadest reasonable interpretation, the one or more third input data samples are input data fed to the artificial during network during the inference phase. MAJUMDAR [0069] explicitly teaches initializing a classifier (i.e., novelty detector) for labeling the input data as known or unknown during the test of a network.)
wherein, if the one or more third input data samples do not correspond to a class that is already known, the one or more circuits or processors is further configured to: (MAJUMDAR [0038] teaches: "Furthermore, a second classifier may be specified to categorize a sample as known or unknown (i.e., configured to detect) based on the classification scores." MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network, a sample (i.e., the one or more third input data samples) may not belong to any of the classes that were used to train a multi-class classifier (i.e., first artificial neural network). Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class.” Examiner’s note: MAJUMDAR explicitly teaches generating a framework during the test of a network to label objects in cases where a class does not belong to any of the classes used during training (i.e., where, if the one or more third input data samples do not correspond to a class that is already known).)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, NAKATA, AGUIRRE, RAGHAVAN, ANS and MAJUMDAR before them, to include MAJUMDAR’s method of detecting unknown classes using classifier to label an object as belonging to a known or unknown class in HASSAN, NAKATA, AGUIRRE, RAGHAVAN, and ANS’ detection method. One would have been motivated to make such a combination so that when a machine learning model receives a sample during a test phase, we can detect if the sample belongs to a known or unknown class (MAJUMDAR [0069]).

Regarding Claim 15:
HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, ANS, and MAJUMDAR teaches the elements of claim 14 as outlined above. AGUIRRE further teaches:
further comprising one or more sensors configured to provide the one or more third input data samples. (AGUIRRE [0041] teaches: "The anomaly detection apparatus 306 is provided with the example anomaly detector 412 to detect anomalies represented in collected sensor data (i.e., one or more third input data samples). For example, the anomaly detector 412 can detect an anomaly in the probabilistic deviation estimation generated by the extractive deviation distribution analyzer 410." AGUIRRE [0026] teaches: “The example autonomous driving apparatus 300 represents operational components that execute as part of operating the autonomous vehicle 100 to collect sensor data, process the sensor data, and analyze the sensor data as disclosed herein to detect anomalies in the sensor data and control operations of the autonomous vehicle 100.” AGUIRRE [0044] teaches: “To process input sensor data from multiple ones of the sensors (i.e., one or more sensors configured to provide the one or more third input data samples) 202, 204, 206, 304, multiple ones of the auto-encoder 600 can be employed as described below in connection with FIG. 7.”)

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, ANS, and MAJUMDAR as applied to claim 14 above, and further in view of DELGADO.

Regarding Claim 16:
HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, ANS, and MAJUMDAR teaches the elements of claim 14 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 11 and is rejected for similar reasons as claim 11 using similar teachings and rationale. Additionally, DELGADO teaches:
[…] the one or more circuits or processors is further configured to provide the third input data samples to an inference module comprising the trained first or further artificial neural network in order to generate a predicted label for controlling the one or more actuators. (DELGADO [0180] teaches: “The method of FIG. 6 may be performed by one or more processors. The term “processor” can be understood as any type of entity that allows the processing of data or signals. For example, the data or signals may be treated according to at least one (i.e., one or more than one) specific function performed by the processor.” DELGADO [0058] teaches: “FIG. 6 shows a flow diagram illustrating an exemplary method of the present invention for generating a counterfactual data sample using a neural network.” DELGADO [0010] teaches: “For a counterfactual generation device, the obtained explanations and measures of reliability of the class predictions from a neural network (i.e., inference module) may further be used for the control of devices, such as the vehicle control for automated driving (i.e., for controlling the one or more actuators).”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, NAKATA, AGUIRRE, RAGHAVAN, ANS, MAJUMDAR, and DELGADO before them, to include DELGADO’s generating prediction labels for controlling an autonomous vehicle in HASSAN, NAKATA, AGUIRRE, RAGHAVAN, ANS, and MAJUMDAR’s detection method. One would have been motivated to make such a combination in order to find meaningful (understandable and/or plausible for a human being) of the class prediction and/or to identify the root causes of the classification and/or to show to a user of the method which significant and meaningful changes are necessary for the neural network to change the classification score of a given input sensor data sample (DELGADO [0014]).

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS as applied to claim 13 above, and further in view of MAJUMDAR and DERR.

Regarding Claim 17:
HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS teaches the elements of claim 13 as outlined above. HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, and ANS is not relied upon for teaching, but MAJUMDAR teaches:
further comprising a controller configured to: detect whether one or more third input data samples correspond to a new distribution not already learnt by the first artificial neural network; (MAJUMDAR [0003] teaches: "Certain aspects of the present disclosure generally relate to machine learning and, more particularly, to improving systems and methods for detecting unknown classes and initializing classifiers for unknown classes." MAJUMDAR [0041] teaches: “In an aspect of the present disclosure, the instructions loaded into the general-purpose processor 102 (i.e., controller) may comprise code for generating a first classifier for first classes. In one configuration, the output has a dimension of at least two. The instructions loaded into the general-purpose processor 102 may also comprise code for designing a second classifier to receive the output of the first classifier to decide whether the input data belongs to the first classes or at least one second class.” MAJUMDAR [0038] teaches: "Furthermore, a second classifier may be specified to categorize a sample as known or unknown based on the classification scores." MAJUMDAR [0069] teaches: "Still, in some cases, such as during the test of a network, a sample (i.e., third input data samples) may not belong to any of the classes (i.e., correspond to a new distribution not already learnt) that were used to train a multi-class classifier (i.e., first artificial neural network). Thus, it may be desirable to generate a framework to initialize the multi-class classifier to label an object as belonging to a known class or belonging to an unknown class. The multi-class classifier may be referred to as the classifier." MAJUMDAR [0071] teaches: "Additionally, in some cases there may only be one class for a classifier. For example, the classifier may be specified to determine whether an object is a dog. In this example, when an image is presented to the classifier, the object is classified either as a dog or not a dog. Still, it may be desirable to create a framework to initialize a classifier to recognize an input as belonging to the one known class or to assign an unknown label to the input.")
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, NAKATA, AGUIRRE, RAGHAVAN, ANS, and MAJUMDAR before them, to include MAJUMDAR’s processor for detecting unknown classes using classifier to label an object as belonging to a known or unknown class in HASSAN, NAKATA, AGUIRRE, RAGHAVAN, and ANS’ detection method. One would have been motivated to make such a combination so that when a machine learning model receives a sample during a test phase, we can detect if the sample belongs to a known or unknown class (MAJUMDAR [0069]).
However, HASSAN in view of NAKATA, AGUIRRE, RAGHAVAN, ANS, and MAJUMDAR is not relied upon for teaching, but DERR teaches: if the one or more third input data samples correspond to the new distribution, to create a new system for learning the one or more third input data samples, the new system comprising at least a further first artificial neural network. (DERR [0186] teaches: "At block 1306, known signals are classified and unknown signals detected using RF measurements 1304 from RF measurement source 1302. Results 1308 including known signal classes and unknown signal classes are reported to users 1310. At block 1312, unknown signal characterization is performed on the detected unknown signals from block 1306 using incremental learning techniques to train new classifiers to classify unknown signals." Examiner's note: Paragraph [0136] of the specification states: "Furthermore, after training and during the inference phase of the ANN, it may be desirable to permit incremental learning, such that new classes of data samples can continue to be learn and handled by the system." DERR triggers the incremental learning for training new classifiers to classify unknown signals. A person having ordinary skill in the art would recognize that incremental learning is conceptually equivalent to ANS' sequential learning.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HASSAN, NAKATA, AGUIRRE, RAGHAVAN, ANS, MAJUMDAR, and DERR before them, to include DERR’s training of new classifiers when unknown signals (i.e., input data) is received in HASSAN, NAKATA, AGUIRRE, RAGHAVAN, ANS, and MAJUMDAR’s detection method. One would have been motivated to make such a combination in order to incrementally build learned models and deploying classifiers for improving the accuracy of a deployed monitoring system over time (DERR [0183]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
TERJEK (US 20200372297 A1) relates to a randomly initialized generator that generates fake samples.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Alvaro S Laham Bauzo whose telephone number is (571)272-5650. The examiner can normally be reached Mon-Fri 7:30 AM - 11:00 AM | 1:00 PM - 5:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/A.S.L./Examiner, Art Unit 2146

/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
SYSTEM AND METHOD FOR AVOIDING CATASTROPHIC FORGETTING IN AN ARTIFICIAL NEURAL NETWORK

This examiner grants 33% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SYSTEM AND METHOD FOR AVOIDING CATASTROPHIC FORGETTING IN AN ARTIFICIAL NEURAL NETWORK

This examiner grants 33% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email