DETAILED ACTION
This action is responsive to the claims filed on 08/15/2025. Claims 1-27 are pending for examination.
This action is Final.
Response to Arguments
Applicant’s arguments with respect to the 35 U.S.C. 102/103 traversal of the claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-6, 10-11, 13-15, and 26-27 are rejected under 35 U.S.C. 103 as being unpatentable by Michielli et al. (Michielli, N., Acharya, U. R., & Molinari, F. (2019). Cascaded LSTM recurrent neural network for automated sleep stage classification using single-channel EEG signals. Computers in biology and medicine, 106, 71-81.), hereafter referred to as Michielli, in view of Kim et al., (Kim, J., El-Khamy, M., & Lee, J. (2017). Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv preprint arXiv:1701.03360.), hereafter referred to as Kim.
Claim 1: Michielli teaches the following limitations:
A method performed by one or more computers, the method comprising: obtaining a network input; (Michielli, abstract, paragraph 2, “In this study, a novel cascaded recurrent neural network (RNN) architecture based on long short-term memory (LSTM) blocks, is proposed for the automated scoring of sleep stages using EEG signals derived from a single-channel. Fifty-five time and frequency-domain features were extracted from the EEG signals and fed to feature reduction algorithms to select the most relevant ones. The selected features constituted as the inputs to the LSTM networks”, features extracted from signals serve as inputs to the network.)
and generating a network output for the network input by, at each time step of a time step sequence comprising a plurality of time steps: (Michielli, page 74, col. 2, section 2.6, paragraph 2, “In the RNN architecture the input layer is a sequence layer which takes the input a sequence of vectors { … } , which contain all the features for each timestep; then the network computes a sequence of hidden activations { … } and the output vector { … } for T timsteps”, network output is generated for each timestep t.)
processing a time step input derived from the network input using a cascaded neural network to generate a candidate network output for the time step, (Michielli, page 75, col. 2, paragraph 1, “The LSTM memory cell consists of five components: the memory cell c <t > (a new variable computed for each timestep), the candidate value c˜ <t > for replacing the memory cell at each timestep and three gates defined as update gate u, forget gate f and output gate o. The memory cell is useful to remember certain values even for a long time during the training process.”, candidate values are provided to the generation of the output for a particular time step.)
wherein the cascaded neural network comprises a plurality of neural network blocks that are arranged in a stack one after another, (Michielli, page 76, col. 1, paragraph 1, “In this study, we employed a cascade of two RNNs with LSTM units. The first network took the input the features selected by mRMR algorithm and performed 4-class (W, N1-REM, N2 and N3) classification (the N1 and REM epochs were merged into a single class), while the second network used the input new features computed by PCA of the correctly classified N1-REM epochs by the first RNN and classified these epochs into two classes (N1 and REM).”, the cascaded neural network comprises a plurality of neural networks arranged in a cascade (a stack one after another).)
and wherein each of the plurality of neural network blocks is configured to, for each particular time step of a plurality of particular time steps in the time step sequence: receive a block input for the neural network block for the particular time step; (Michielli, page 74, col. 2, section 2.6, paragraph 2, “In the RNN architecture the input layer is a sequence layer which takes the input a sequence of vectors { … } , which contain all the features for each timestep;”, each of the RNN blocks in the cascade are provided an input layer for each timestep.)
apply a learned block transformation to the block input for the particular time step to generate a transformed block input for the particular time step; (Michielli, page 72, section 2, paragraph 1, “Both RNNs shared the first three steps: data acquisition, signal pre-processing and feature extraction from single-channel EEG signals. Subsequently, a feature selection or feature transformation method was adopted to reduce the number of input features for neural network.”, a feature transformation is applied to the input to generate a reduced input feature set (a transformed block input) for a particular time step.)
Kim, in the same field of neural network implementation, teaches the following limitations which Michielli fails to teach:
PNG
media_image1.png
436
662
media_image1.png
Greyscale
Figure 1 of Kim, Residual LSTM: A shortcut from a prior output layer
h
t
l
-
1
t is added to a projection output
m
t
l
.
W
h
l
is a dimension matching matrix between input and output. If K is equal to M, it is replaced with an identity matrix.
and generate a block output for the particular time step, comprising applying a skip connection for the neural network block to at least (i) the block input for the particular time step (Kim, page 2, section 3, “a shortcut path is added to an LSTM output layer
h
t
l
…
h
t
l
=
o
t
l
*
o
t
l
+
W
h
l
*
x
t
l
;
Where
W
h
l
can be replaced by an identity matrix if the dimension of
x
t
l
matches that of
h
t
l
.”, the shortcut path is the skip connection at the block output that adds the block input x_t (via
W
h
l
or the identity matrix) to the blocks internal/”delay path” output
m
t
l
.)
and (ii) an output of a delay component within the neural network block that operates on respective transformed block inputs that have each been generated by the neural network block by applying the learned block transformation to a respective block input for each of one or more preceding time steps that precede the current time step in the time step sequence. (Kim, page 2, section 2.3, “
c
t
l
=
f
t
l
*
c
t
-
1
l
+
i
t
l
*
t
a
n
h
(
W
x
l
x
t
l
+
W
h
c
l
h
t
-
1
l
+
b
c
l
)
…
h
t
l
is an input from (l−1)th layer …
h
t
l
is an lth output layer at time t−1 and
c
t
-
1
l
is an internal cell state at t−1.”, This expressly shows the delay component operating on outputs from the immediately preceding time step (t-1); unrolled over time, this captures one or more preceding steps.
Kim, page 2, section 3, “Equations (4), (5), (6) and (7) do not change for residual LSTM. The updated equations are as follows:
r
t
l
=
t
a
n
h
(
c
t
l
)
;
m
t
l
=
W
p
l
*
r
t
l
;
h
t
l
=
o
t
l
*
o
t
l
+
W
h
l
*
x
t
l
;
Where
W
h
l
can be replaced by an identity matrix if the dimension of
x
t
l
matches that of
h
t
l
.”, In Residual LSTM, the same LSTM delay over t-1 remains in place, and the skip is applied at the block output by summing the delay path output
m
t
l
with the block input
x
t
l
. )
Claim 2: Michielli in view of Kim teaches the limitations of claim 1. Michielli further teaches:
The method of claim 1, wherein the network output is the candidate network output for the last time step. (Michielli, page 76, col. 1, paragraph 2, “Both RNNs proposed in this study presented the same structure: the input layer was a sequence layer with 30 timesteps; the LSTM layers were used to learn the features from EEG signals; the fully connected (FC) layer was used to convert the output size of the previous layers into the number of sleep stages to recognize”, the input sequence consists of 30 time steps. Naturally, the final output is produced by the fully connected layer after the last time step has been processed by the LSTM layers.)
Claim 3: Michielli in view of Kim teaches the limitations of claim 1. Michielli and Kim further teaches:
The method of claim 1, wherein generating the block output comprises applying the skip connection for the neural network block to (i) the block input for the particular time step (Kim, page 2, section 3, “a shortcut path is added to an LSTM output layer
h
t
l
…
h
t
l
=
o
t
l
*
o
t
l
+
W
h
l
*
x
t
l
;
Where
W
h
l
can be replaced by an identity matrix if the dimension of
x
t
l
matches that of
h
t
l
.”, the shortcut path is the skip connection at the block output that adds the block input x_t (via
W
h
l
or the identity matrix) to the blocks internal/”delay path” output
m
t
l
.)
and (ii) the output of the delay component within the neural network block that operates on only the respective transformed block input generated by the neural network block by applying the learned block transformation to the respective block input for the immediately preceding time step that immediately precedes the particular time step in the time step sequence. (Kim, page 2, section 2.3, “
c
t
l
=
f
t
l
*
c
t
-
1
l
+
i
t
l
*
t
a
n
h
(
W
x
l
x
t
l
+
W
h
c
l
h
t
-
1
l
+
b
c
l
)
…
h
t
l
is an input from (l−1)th layer …
h
t
l
is an lth output layer at time t−1 and
c
t
-
1
l
is an internal cell state at t−1.”, This expressly shows the delay component operating on outputs from the immediately preceding time step (t-1); unrolled over time, this captures one or more preceding steps.
Kim, page 2, section 3, “Equations (4), (5), (6) and (7) do not change for residual LSTM. The updated equations are as follows:
r
t
l
=
t
a
n
h
(
c
t
l
)
;
m
t
l
=
W
p
l
*
r
t
l
;
h
t
l
=
o
t
l
*
o
t
l
+
W
h
l
*
x
t
l
;
Where
W
h
l
can be replaced by an identity matrix if the dimension of
x
t
l
matches that of
h
t
l
.”, In Residual LSTM, the same LSTM delay over t-1 remains in place, and the skip is applied at the block
Claim 4: Michielli in view of Kim teaches the limitations of claim 3. Michielli further teaches:
The method of claim 3, wherein generating the block output comprises: computing a sum of (i) the block input for the particular time step and (ii) the output of the delay component within the neural network block by applying the learned block transformation to the respective block input that operates on the respective transformed block input generated by the neural network block for the immediately preceding time step that immediately precedes the particular time step in the time step sequence. (Michielli, page 74, section 2.6, paragraph 2, “The activation and the output prediction at time t are expressed as:”
PNG
media_image2.png
43
177
media_image2.png
Greyscale
The current input (i) x_t is explicitly included in the block output computation and further is computed in a sum in combination with (ii) the transformed block input for the immediate preceding timestep a_<t-1>, the delayed component, as shown in the formula above.)
Claim 5: Michielli in view of Kim teaches the limitations of claim 4. Michielli further teaches:
The method of claim 4, wherein generating block output further comprises: applying a non-linearity to the sum. (Michielli, page 75, col. 2, paragraph 2, “Finally, the output gate is the section where the activation at the current timestep is generated and can be defined as:”
PNG
media_image3.png
29
166
media_image3.png
Greyscale
“In the previous expressions, σ represents the sigmoid function.”, the activation function σ which is applied to the outputted sum, is a sigmoid function which is inherently non-linear.)
Claim 6: Michielli in view of Kim teaches the limitations of claim 1. Kim further teaches:
The method of claim 1, wherein generating the block output comprises applying the skip connection for the neural network block to (i) the block input for the particular time step, (Kim, page 2, section 3, “a shortcut path is added to an LSTM output layer
h
t
l
…
h
t
l
=
o
t
l
*
o
t
l
+
W
h
l
*
x
t
l
;
Where
W
h
l
can be replaced by an identity matrix if the dimension of
x
t
l
matches that of
h
t
l
.”, the shortcut path is the skip connection at the block output that adds the block input x_t (via
W
h
l
or the identity matrix) to the blocks internal/”delay path” output
m
t
l
.)
(ii) the output of the delay component within the neural network block that operates on the respective transformed block input for the particular time step and (Kim, page 2, section 2.3, “
c
t
l
=
f
t
l
*
c
t
-
1
l
+
i
t
l
*
t
a
n
h
(
W
x
l
x