DUAL RECURRENT NEURAL NETWORK ARCHITECTURE FOR MODELING LONG-TERM DEPENDENCIES IN SEQUENTIAL DATA

Non-Final OA §103
Filed
Sep 24, 2019
Examiner
KWON, JUN
Art Unit
2127
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
5 (Non-Final)
This examiner grants 39% of cases after interview

— +47.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 71 resolved cases, 2023–2026
Examiner Intelligence

KWON, JUN View full profile →
Grants only 39% of cases
Career Allowance Rate
28 granted / 71 resolved
-15.6% vs TC avg
Strong +47% interview lift
Without
With
+47.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
20 currently pending
Career history
104
Total Applications
across all art units
Statute-Specific Performance

§101
3.5%
-36.5% vs TC avg
§103
88.8%
+48.8% vs TC avg
§102
6.5%
-33.5% vs TC avg
§112
0.6%
-39.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 71 resolved cases
Office Action

§103
Detailed Action
	This Office Action is in response to the remarks entered on 10/09/2025. Claims 1-7, 9, 11-22, 24, 26-36 are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities: “indicates separate a weight” in line 15 should read “indicates a separate weight”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 6-7, 9, 11-12, 17-22, 24, 26-27, and 32-36 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz et al. (Ortiz et al, “Learning State Representations for Query Optimization with Deep Reinforcement Learning”, 2018, hereinafter ‘Ortiz’) in view of Cheng et al. (US 20190171913 A1, hereinafter ‘Cheng’).

Regarding claim 1, Ortiz teaches: 
A method, comprising: identifying a set of hidden states associated with an input sequence, the input sequence being a time-based sequence of data and the hidden states in the set of hidden states corresponding to different time steps in the time-based sequence of data; ([Ortiz, Figure 3; page 3, left column, line 5-23] The paragraph discloses the role of the NNinit which corresponds to the history recurrent neural network.
[Ortiz, page 1, right col, 3rd para, line 17-19, Figure 1] “NNST is a recursive function that takes as input a previous subquery representation as well as an action at time t , to produce the subquery representation for time t + 1.” According to the Figure 1, the query operations are time-based, as the Action t and the Action t+1 which are query operations at time t and t+1 and it is also sequential as the query actions are provided after the previous query actions were provided to the neural networks.
[Ortiz, page 3, 3.2 Preliminary Results, second paragraph, line 6-8] “NNinit contains 50 hidden nodes in the hidden layer. We update the model via stochastic gradient descent with a loss based on relative error and a learning rate of .01.” The paragraph teaches NNinit contains 50 hidden states.) 
processing, by a history recurrent neural network, the set of hidden states to learn a cell state transition function associated with the input sequence, the cell state transition function being a function by which a cell state and hidden states of each node of the history recurrent neural network is changed, ([Ortiz, Figure 3; page 3, left column, line 5-23] The paragraph discloses the role of the NNinit which corresponds to the history recurrent neural network. [Ortiz, page 3, 3.2 Preliminary Results, second paragraph, line 6-8] The paragraph teaches NNinit contains 50 hidden states. [Ortiz, page 2, right column, 3.1 Approach, line 14-18] Both NNinit and the NNObserved processes the set of hidden states and corresponds to the history recurrent neural network)
inputting the input sequence and the cell state transition function to an update recurrent neural network ([Ortiz, page 3, right column, line 9-21; Figure 3] The paragraph and the figure disclose training the NNST with hidden states. The h2 is the cell state transition, and the a1 is the input sequence. The NNST corresponds to the update neural network.);
for each input of the input sequence, updating, by an update recurrent neural network, a current cell state and corresponding hidden states in the update recurrent neural network, wherein the update recurrent neural network uses the cell state transition function to update the current cell state and corresponding hidden states for the input of the input sequence. ([Ortiz, page 3, left column, line 5-23; Figure 3] As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions.
[Ortiz, page 2, left col, line 20-23] The observed variables, which are expressed as NNObserved in Figure 2, are current cell state. The current cell state is mapped to the hidden state of NNST.
[Ortiz, page 3, right column, line 9-21] The paragraph and the figure disclose training the NNST with hidden states. The NNST corresponds to the update neural network.
[Ortiz, page 4, left column, the 5th paragraph] “Initially, all state-action pairs are random values. At each timestep, the agent selects an action and observes the reward, rt+1 at state st+1. As the agent explores, these state-action pairs will converge to represent the expected reward of the states in future timesteps. At each state transition, each QL(s, a) is updated as follows: QL(st , at ) ←QL(st , at ) + α[rt+1 +γmaxa′QL(st+1, a′) − QL(st , at )] Where themaxa′QL(st+1, a′) represents the maximum value from st+1 given the target policy. We compute the subsequent state given the state transition function, NNST .”)
Ortiz does not specifically disclose: 
wherein the history recurrent neural network includes an attention mechanism that computes attention information for each time step in the time-based sequence of data by individually computing a relationship between a last hidden state corresponding to a last time step in the time-based sequence of data and each earlier hidden state corresponding to each earlier time step in the time-based sequence of data, such that the attention information computed for the time step indicates separate a weight for each earlier hidden state corresponding to each earlier time step in the time-based sequence of data,
wherein the attention information propagates through recurrent connections, and 
wherein output of the attention mechanism is used to learn the cell state transition function.
Cheng teaches: 
wherein the history recurrent neural network includes an attention mechanism that computes attention information for each time step in the time-based sequence of data by individually computing a relationship between a last hidden state corresponding to a last time step in the time-based sequence of data and each earlier hidden state corresponding to each earlier time step in the time-based sequence of data, such that the attention information computed for the time step indicates separate a weight for each earlier hidden state corresponding to each earlier time step in the time-based sequence of data, ([Cheng, 0043] “For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector ( or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states”, [Cheng, 0044] “the hierarchical classification system 80 generates an attention vector of respective scores for the encoder hidden states” and [Cheng, 0045] collectively disclose that the attention vector has a length equal to the number of time steps on the encoder side, which indicates that the attention is calculated for respective scores (i.e., the separate weight for each hidden state) for each encoder hidden state, and is derived by comparing the current decoder hidden state (i.e., last hidden state corresponding to a last time step in the time-based sequence of data) with the encoder hidden state (i.e., each earlier hidden state corresponding to each earlier time step in the time-based sequence of data) )
wherein the attention information propagates through recurrent connections ([Cheng, 0043] discloses that the attention vector is generated over the encoder hidden states 46 based on the current output and the encoder hidden states and the attention module configures the decoder recurrent neural network 82. The decoder RNN processes the attention vector), and 
wherein output of the attention mechanism is used to learn the cell state transition function; ([Cheng, 0027] and [Cheng, 0047] collectively disclose that the training process of the encoder recurrent neural network and the decoder recurrent neural network which involves the utilization of the attention module that generates attention vectors and generates predictive scores. Additionally, in accordance with its training, the RNN updates the hidden states (cell state transition functions). ) 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz and Cheng to use the method of including an attention mechanism that computes attention information of Cheng to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to improve the prediction accuracy of the recurrent neural network, as the attention mechanism permit the network to utilize the most relevant parts of the input sequence in a flexible manner, by a weighted combination of all the encoded input vectors, with the most relevant vectors being attributed the highest weights.

Claim 19 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1.

Regarding claim 34, Ortiz teaches a non-transitory computer-readable media storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform a method ([Ortiz, page 1, Introduction, 2nd paragraph] “Recently, thanks to dropping hardware costs and growing datasets available for training, deep learning has successfully been applied to solving computationally intensive learning tasks in other domains. The advantage of these type of models comes from their ability to learn unique patterns and features of the data that are difficult to manually find or design [3].” Ortiz discloses the machine learning process, which runs in computer with at least one processor that runs the program. A computer-readable media storing computer instruction executed by processors is inherent feature of machine learning systems.). Claim 34 is a non-transitory computer-readable media claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1.

Regarding claim 4, Ortiz in view of Cheng teaches: 
wherein the history recurrent neural network and the update recurrent neural network are long short-term memory (LSTM) networks ([Cheng, 0026] discloses that the encoder recurrent neural network 42 and the decoder recurrent neural network 44 includes Long Short-Term Memory (LSTM) neural network or GRU networks. The history RNN and the updated RNN are taught by Ortiz).

Regarding claim 6, Ortiz in view of Cheng teaches: 
wherein the history recurrent neural network and the update recurrent neural network are gated recurrent unit (GRU) networks ([Cheng, 0026] discloses that the encoder recurrent neural network 42 and the decoder recurrent neural network 44 includes Long Short-Term Memory (LSTM) neural network or GRU networks. The history RNN and the updated RNN are taught by Ortiz).

Regarding claim 7, Ortiz teaches: 
wherein the set of hidden states associated with the input sequence includes all hidden states associated with the input sequence ([Ortiz, page 3, second paragraph – third paragraph; Figure 3] The input x0 goes through a set of hidden states h1 and h2, which are the all hidden states. [Ortiz, page 3, 3.2 Preliminary Results, second paragraph] This paragraph teaches the NNinit contains 50 hidden states.).

Claim 22 is a system claim having similar limitation to the method claim 7. Therefore, it is rejected under the same rationale as claim 7. 

Regarding claim 9, Ortiz in view of Cheng teaches: 
wherein the history recurrent neural network applies the attention mechanism to the set of hidden states associated with the input sequence ([Cheng, 0043] “For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector ( or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states”, [Cheng, 0044] and [Cheng, 0045] collectively disclose that the attention vector has a length equal to the number of time steps on the encoder side, which indicates that the attention score is calculated for each encoder hidden state, and is derived by comparing the current decoder hidden state (i.e., last hidden state corresponding to a last time step in the time-based sequence of data) with the encoder hidden state (i.e., each earlier hidden state corresponding to each earlier time step in the time-based sequence of data). The decoder is interpreted as the history recurrent neural network).

Claim 24 is a system claim having similar limitation to the method claim 9. Therefore, it is rejected under the same rationale as claim 9. 

Regarding claim 11, Ortiz teaches: 
wherein a loss function is utilized to train the history recurrent neural network and the update recurrent neural network ([Ortiz, page 3, left column, 2nd paragraph – 3rd paragraph] “Before using the recursive NNST model, we must learn an additional function, NNinit , as shown in Figure 3. NNinit takes as input (x0,a0), where x0 is a vector that captures the properties of the database D and a0 is a single relational operator. The model outputs the cardinality of the subquery that executes the operation encoded in a0 on D. We define the vector, x0 to represent simple properties of the database, D. The list of properties we provide next is not definitive and more features can certainly be added. Currently, for each attribute in the dataset D, we use the following features to define x0: the min value, the max value, the number of distinct values, and a representation of a 1-D equi-width histogram. As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions. We want to learn an h1 representation that captures not only enough information to predict the cardinality of that subquery but of other subqueries built by extending it.”).

Claim 26 is a system claim having similar limitation to the method claim 11. Therefore, it is rejected under the same rationale as claim 11. 

Regarding claim 12, Ortiz teaches: 
wherein a perceptual loss is further utilized to train the history recurrent neural network and the update recurrent neural network ([Ortiz, page 3, left column, 3.2 Preliminary Results, second paragraph] “Training NNinit : As a first experiment, we initialize x0 with properties of the IMDB dataset and train NNinit to learn h1. a0 represents a conjunctive selection operation overm attributes from the aka_title relation. We generate 20k unique queries, where 15k are used for training the model and the rest are used for testing. NNinit contains 50 hidden nodes in the hidden layer. We update the model via stochastic gradient descent with a loss based on relative error and a learning rate of .01.” Corresponds to the process of using loss in history RNN.
[Ortiz, page 3, left column, 3rd paragraph] “As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions.” Corresponds to the process of using loss in updated RNN.).

Claim 27 is a system claim having similar limitation to the method claim 12. Therefore, it is rejected under the same rationale as claim 12. 

Regarding claim 17, Ortiz teaches: 
wherein the history recurrent neural network and the update recurrent neural network form a dual recurrent neural network architecture modeling long-term dependencies in sequential data represented by the input sequence ([Ortiz, page 3, left column, 2nd paragraph – 3rd paragraph] “Before using the recursive NNST model, we must learn an additional function, NNinit , as shown in Figure 3. NNinit takes as input (x0,a0), where x0 is a vector that captures the properties of the database D and a0 is a single relational operator. The model outputs the cardinality of the subquery that executes the operation encoded in a0 on D. We define the vector, x0 to represent simple properties of the database, D. The list of properties we provide next is not definitive and more features can certainly be added. Currently, for each attribute in the dataset D, we use the following features to define x0: the min value, the max value, the number of distinct values, and a representation of a 1-D equi-width histogram. As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions. We want to learn an h1 representation that captures not only enough information to predict the cardinality of that subquery but of other subqueries built by extending it.”).

Claim 32 is a system claim having similar limitation to the method claim 17. Therefore, it is rejected under the same rationale as claim 17. 

Regarding claim 18, Ortiz teaches: 
further comprising using the dual recurrent neural network architecture to predict long-term future data from the input sequence ([Ortiz, page 4, left column, 5th paragraph] “Initially, all state-action pairs are random values. At each timestep, the agent selects an action and observes the reward, rt+1 at state st+1. As the agent explores, these state-action pairs will converge to represent the expected reward of the states in future timesteps. At each state transition, each QL(s, a) is updated as follows: QL(st , at ) ←QL(st , at ) + α[rt+1 +γmaxa′QL(st+1, a′) − QL(st , at )] Where the maxa′QL(st+1, a′) represents the maximum value from st+1 given the target policy. We compute the subsequent state given the state transition function, NNST .”).

Claim 33 is a system claim having similar limitation to the method claim 18. Therefore, it is rejected under the same rationale as claim 18. 

Regarding claim 21, Ortiz in view of Cheng teaches: 
wherein the history recurrent neural network and the update recurrent neural network are: long short-term memory (LSTM) networks, convolutional long short-term memory (ConvLSTM) networks, or gated recurrent unit (GRU) networks ([Cheng, 0026] discloses that the encoder recurrent neural network 42 and the decoder recurrent neural network 44 includes Long Short-Term Memory (LSTM) neural network or GRU networks).

Claims 2, 5, 20, and 35-36 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz in view of Cheng and further in view of Jain et al. (US 20170262996 A1, hereinafter ‘Jain’).

Regarding claim 2, Ortiz in view of Cheng teaches: 
The method of claim 1. 
However, Ortiz in view of Cheng does not specifically teach wherein the input sequence is a sequence of frames of video.
Jain teaches wherein the input sequence is a sequence of frames of video ([Jain, 0087] “Based on the training, as each frame is received, the attention recurrent neural network outputs a classification score for a certain action and an attention feature map 704 for each frame. As shown in FIG. 7, multiple attention feature maps 704 are generated from a frame sequence. In one configuration, the attention recurrent neural network generates a classification score for an action class in each frame based on the training.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz, Cheng and Jain to use the method of inputting a sequence of frames of video of Jain to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to improve the usability of the recurrent neural network system by enabling the RNN to process more diverse types of input data.

Claim 20 is a system claim having similar limitation to the method claim 2. Therefore, it is rejected under the same rationale as claim 2. 

Regarding claim 5, Ortiz in view of Cheng teaches:
The method of claim 1.
Ortiz in view of Cheng does not specifically disclose: 
wherein the history recurrent neural network and the update recurrent neural network are convolutional long short-term memory (ConvLSTM) networks.
Jain teaches: 
wherein the history recurrent neural network and the update recurrent neural network are convolutional long short-term memory (ConvLSTM) networks ([Jain, Fig. 6] discloses that both inference network and prediction network are LSTM. [Jain, 0063] The LSTM of the Jain reference can be a combination of convolution layers and LSTM, which corresponds to convLSTM).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz and Cheng to use the ConvLSTM networks of Jain to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to improve the efficiency of the method, as ConvLSTM specialize in processing sequential input and video frames are sequential.

Regarding claim 35, Ortiz in view of Cheng and further in view of Jain teaches: 
wherein the input sequence is a sequence of video frames, and wherein the attention information is computed using the hidden states such that the attention information computed for a given video frame is based a spatio-temporal context of each prior frame in addition to pixel-level information in the given frame. (As disclosed in [Jain, 0060], hidden states are used to compute attention information. [Jain, 0067] The hidden unit r19 also receives an motion input f, produced from optical flow. The optical flow input may be a field of vectors that predict how pixels at frame (e.g., frame t) will move to pixels at a next frame (e.g., frame t + 1) … That is, the optical flow tracks action in the pixels and considers motion to predict the most salient features within the video. This paragraph teaches that the frame is based on a spatio-temporal context, which the context belongs to both space and time)

Regarding claim 36, Ortiz in view of Cheng and further in view of Jain teaches: 
wherein connections across temporal and spatial dimensions are utilized, such that each processing layer of the history recurrent neural network covers an entire context of the input sequence. (As disclosed in [Jain, 0060], hidden states are used to compute attention information. [Jain, 0067] The hidden unit r19 also receives an motion input f, produced from optical flow. The optical flow input may be a field of vectors that predict how pixels at frame (e.g., frame t) will move to pixels at a next frame (e.g., frame t + 1) … That is, the optical flow tracks action in the pixels and considers motion to predict the most salient features within the video. This paragraph teaches that the frame is based on a spatio-temporal context, which the context belongs to both space and time)

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Ortiz in view of Cheng and further in view of Li et al. (US 20170293836 A1, hereinafter ‘Li’).

Regarding claim 3, Ortiz in view of Cheng teaches the method of claim 1. 
Ortiz in view of Cheng does not specifically disclose: 
wherein the input sequence is a sequence of speech.
Li teaches wherein the input sequence is a sequence of speech ([Li, 0036] “Each layer can have multiple neuron-like units or nodes (hereinafter “nodes”), with each of the nodes connected to the other nodes. The input layer 210 includes input nodes, the output layer 240 includes output nodes, while the recurrent layer 220 and the aggregate layer 230 include hidden nodes. As an example to which RNN 200 can be applied, in the case of speech where a person utters a spoken digit, the input sequence is the speech signal corresponding to the spoken digit or a representation thereof, which can be unlabeled, while the output can be a label classifying the spoken digit.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz, Cheng, and Li to use the method of input sequence is a sequence of speech of Li to implement the prediction method of Ortiz. The suggestion and/or motivation to do make prediction of speech. Speech prediction requires the input speech data to make prediction.

Claim 13-14 and 28-29 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz in view of Cheng and further in view of Kaplanyan et al. (US 20180204314 A1, hereinafter ‘Kaplanyan’).

Regarding claim 13, Ortiz teaches: 
The method of claim 1. 
Ortiz in view of Cheng does not specifically disclose: 
wherein a skip connection is utilized between previous and current recurrent layers.
Kaplanyan teaches wherein a skip connection is utilized between previous and current recurrent layers ([Kaplanyan, 0113; Figure 9] “FIG. 9 illustrates an exemplary internal structure 900 of a recurrent RCNN connection, according to one embodiment. As shown, a first plurality of convolutions 902A-C receives a first input 904, and a second plurality of convolutions 902D-F receives a second input 910. A feedback loop 906 provides a hidden, recurrent state 908 from the first plurality of convolutions 902A-C as input to a second plurality of convolutions 902E-F. In this way, information may be retained between inputs of the recurrent RCNN.” ).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz, Cheng, and Kaplanyan to use the method of using skip connection of Kaplanyan to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to improve the accuracy of the prediction method, as using skip connection between layers enable the network to skip some of the hidden layers which may add more errors to the result.

Claim 28 is a system claim having similar limitation to the method claim 13. Therefore, it is rejected under the same rationale as claim 13. 

Regarding claim 14, Ortiz in view of Cheng teaches: 
The method of claim 13.
Ortiz in view of Cheng does not specifically disclose: 
wherein the skip connection concatenates output of the previous and current recurrent layers.
Kaplanyan teaches:
wherein the skip connection concatenates output of the previous and current recurrent layers ([Kaplanyan, 0113; Figure 9] “FIG. 9 illustrates an exemplary internal structure 900 of a recurrent RCNN connection, according to one embodiment. As shown, a first plurality of convolutions 902A-C receives a first input 904, and a second plurality of convolutions 902D-F receives a second input 910. A feedback loop 906 provides a hidden, recurrent state 908 from the first plurality of convolutions 902A-C as input to a second plurality of convolutions 902E-F. In this way, information may be retained between inputs of the recurrent RCNN.”).

Claim 29 is a system claim having similar limitation to the method claim 14. Therefore, it is rejected under the same rationale as claim 14. 

Claim 15-16 and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz in view of Cheng and further in view of Kearney et al. (US 20200364624 A1, hereinafter ‘Kearney’).

Regarding claim 15, Ortiz in view of Cheng teaches: 
The method of claim 1. 
Ortiz in view of Cheng does not specifically disclose: 
wherein a gated skip connection is utilized across layers.
Kearney teaches: 
wherein a gated skip connection is utilized across layers ([Kearney, 0153; Fig. 9] As shown in the Figure 9 and the paragraph 0153, the gated connection connects different layers.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz, Cheng, and Kearney to use the method of using gated skip connection of Kearney to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to improve the accuracy of the prediction method, as using gated skip connection enable the network to select which output from layers to pass.

Claim 30 is a system claim having similar limitation to the method claim 15. Therefore, it is rejected under the same rationale as claim 15. 

Regarding claim 16, Ortiz in view of Cheng and further in view of Kearney teaches: 
wherein the gated skip connection is a multiplicative gate added to control a flow of information across layers ([Kearney, 0153; Fig. 9] Selective propagation process corresponds to the multiplication process, as passing only the selected element can be interpreted as multiplying zero to the non-selected multiplicand.).

Claim 31 is a system claim having similar limitation to the method claim 16. Therefore, it is rejected under the same rationale as claim 16. 

Response to Arguments
Response to arguments under 35 U.S.C. 103
Arguments: Applicant asserts that the Jain reference does not teach or suggest at least applicant’s recited claim limitation that “the history recurrent neural network includes an attention mechanism that computes attention information including, for each time step in the time-based sequence of data, a relationship between a last hidden state and each earlier hidden state to indicate a weight for each earlier hidden state,” as Jain predicts an attention map based on a given input frame, using only the immediate previous hidden state (ht-1) and the current feature map (of the input frame).
Examiner’s Response: Applicant’s arguments with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached Monday – Friday 7:30AM – 4:30PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached at (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JUN KWON/Examiner, Art Unit 2127                                                                                                                                                                                         
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127
Read full office action