Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Priority Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Applicatio n No. JP2023-014633 , filed on 02/02/2023 . Status of Claims Claims 1 – 12 are pending and examined herein. Claims 3 – 5, 7 are rejected under 35 U.S.C. 112(b). Claims 1 – 12 are rejected under 35 U.S.C. 101. Claims 1 – 12 are rejected under 35 U.S.C. 103. Specification The disclosure is objected to because of the following informalities: [0048], [0038] refers to reference number 41 as “sequence data”. Fig. 4 refers to 41 as “time series data”. These can refer to different type of data based on the interpretation of “sequence data”. Appropriate correction is required. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. Claims 3 – 5, 7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 3 recites the limitation “having one or more remaining lengths among the N sequences.” The limitation suggests that a single sequence can have multiple remaining lengths. It is unclear how multiple remaining lengths is possible without explanations. The specification does not describe how this is possible or show examples of using multiple remaining lengths from a single sequence. For examination purposes, it would reference to “one remaining lengths” after selecting the sequence. Claim 4 recites the limitation “a selectivity among the M sequences”, “selects remaining second sequences”. It is unclear how the selectivity is performed or whether it is a parameter, probability, or some metrics used to get certain sequences from the M sequences. It is also unclear whether “second sequences” are all the remaining sequences, certain portion of sequences, or single sequence from the remaining ones after the first selection. Therefore, these limitations renders the claim indefinite. For examination purposes, “selectivity” would refer to simple selection of sequence and “second sequence” would refer to selecting any of the remaining. The term “ small difference ” in claim 5 and “large amount” in claim 4 are relative term s which renders the claim indefinite. The term “small ” and “large amount” are not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Claim 5 "difference " of sequences is rendered indefinite by the relative term "small" and Claim 4 "remaining lengths" of sequences is rendered indefinite by the relative term "large amount". For examination purposes, these terms “ small difference ” and “large amount” would be treated as the bare minimum difference that makes sequences not identical or 0. Claim 7 recites the limitation “a case of selecting data of the M sequences for each of the M sequences”. The limitation is selecting M sequences from each of the M sequences and that does not make sense grammatically. The claims are generally narrative and indefinite, failing to conform with current U.S. practice. They appear to be a literal translation into English from a foreign document and are replete with grammatical and idiomatic errors. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims FILLIN "Pluralize the word 'Claim' if necessary and then identify the claim(s) being rejected." 1 - 12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject matter. The analysis of claims 1 – 12 , in accordance with these steps, follows. Step 1 Analysis: Step 1 is to determine whether the claim is directed to a statutory category (process, machine, manufacture, or composition of matter. Claims 1 – 10 are directed to a RNN training apparatus , meaning that it is directed to the statutory category of machine . Claim 1 1 is directed to a RNN training method , which is the statutory category of process . Claim 12 is directed to a non-transitory computer-readable medi um including computer executable instructions , which can be an article of manufacture. Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis: Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101. Regarding claim 1 , the following claim elements are abstract ideas: and a processing circuitry that constructs a mini-batch by selecting data of M sequences from data of N sequences used for training of the recurrent neural network where M is smaller than N ( This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components. ) executes optimization calculation of the recurrent neural network based on the unprocessed hidden state and the mini-batch, and ( Executing optimization calculation is merely mathematical calculation, which is mathematical concept. ) The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: a storage that stores a hidden state that is intermediate output data of a recurrent neural network for N sequences; (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.) , and outputs sequence information identifying the selected sequence, (This is mere data outputting , an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception. ) reads an unprocessed hidden state of a sequence corresponding to the sequence information from the storage according to the sequence information, (This is mere transmitting data, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.) writes a processed hidden state in the storage according to the sequence information, (This is mere transmitting data, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.) the processed hidden state being intermediate output data of the recurrent neural network obtained by the optimization calculation. (This is mere data outputting , an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception. ) Regarding claim 2 , the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following abstract ideas: wherein the processing circuitry randomly selects the M sequences from the N sequences. ( This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components. ) Claim 2 does not recite additional elements. Regarding claim 3 , the rejection of claim 2 is incorporated herein. Further, claim 3 recites the following abstract ideas: wherein the processing circuitry randomly selects the M sequences from among sequences having one or more remaining lengths among the N sequences. ( This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components. ) Claim 3 does not recite additional elements. Regarding claim 4 , the rejection of claim 3 is incorporated herein. Further, claim 4 recites the following abstract ideas: wherein the processing circuitry randomly selects first sequences whose number corresponds to a product of the number of mini-batches and a selectivity among the M sequences, and preferentially selects remaining second sequences by giving priority to a sequence having a large amount of remaining lengths. ( This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components. ) Claim 4 does not recite additional elements. Regarding claim 5 , the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following abstract ideas: wherein the processing circuitry randomly selects the M sequences by giving priority to a sequence having a small difference between a sequence length and a remaining length among the N sequences. ( This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components. Also, selecting based on difference between lengths merely recites mathematical calculation, which is mathematical concept. ) Claim 5 does not recite additional elements. Regarding claim 6 , the rejection of claim 1 is incorporated herein. Further, claim 6 recites the following abstract ideas: wherein data of each of the N sequences is divided into blocks having a common TBPTT length. ( This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components. Also, dividing into blocks having a common length merely recites mathematical calculation, which is mathematical concept. ) Claim 6 does not recite additional elements. Regarding claim 7 , the rejection of claim 6 is incorporated herein. Further, claim 7 recites the following abstract ideas: wherein in a case of selecting data of the M sequences for each of the M sequences, the processing circuitry sequentially selects an unprocessed block among blocks of the selected sequence. ( This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components. ) Claim 7 does not recite additional elements. Regarding claim 8 , the rejection of claim 1 is incorporated herein. Further, claim 8 recites the following abstract ideas: wherein the processing circuitry performs forward propagation calculation, back propagation calculation, and parameter update in the optimization calculation, and calculates the hidden state in the forward propagation calculation and/or the back propagation calculation. (Performing various calculations for optimization are merely mathematical calculation, which is mathematical concept.) Claim 8 does not recite additional elements. Regarding claim 9 , the rejection of claim 1 is incorporated herein. Further, claim 9 recites the following additional elements: wherein the sequence information has an identifier of the selected sequence. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.) Regarding claim 10 , the rejection of claim 1 is incorporated herein. Further, claim 10 recites the following additional elements: wherein the processing circuitry overwrites the unprocessed hidden state with the processed hidden state. (This is mere transmitting data, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.) Claims 11 – 12 recite substantially similar subject matter to claim 1 respectively and are rejected with the same rationale, mutatis mutandis . Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claim s 1 , 6 – 12 are rejected under 35 U.S.C. 103 as being unpatentable over Branco et al. (US 12373670 B2) in view of Xu et al. (NPL: “Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications” ), further in view of Tang et al. (NPL: “On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition”) . Regarding Claim 1 , Branco teaches a storage that stores a hidden state that is intermediate output data of a recurrent neural network for N sequences; ( Column 4 Lines 32 – 38 of Branco states “Storage 240 is configured to store states associated with entities (e.g., Card A GRU state, Card B GRU state, etc.) In various embodiments, the storage includes one or more databases, which can be centrally stored or embedded. The storage optionally includes a cache, which may increase the speed of lookups.” Column 13 lines 3 – 7 of Branco states “The process updates a state in a storage system for the given card identifier (604). The storage system stores the state of each card seen by the system. Intuitively, the storage system grows linearly on the number of cards since there is one entry per card.” Branco teaches a storage holding one GRU hidden state per sequence scaled to N sequences total. ) , and outputs sequence information identifying the selected sequence, (Column 5 Lines 4 – 6 of Branco states “an entity identification field, usually a unique ID of the credit or debit card involved in the transaction, x.sub.id.” Column 12 Lines 59 – 63 of Branco states “The process begins by obtaining a current state for a given card identifier (600). The state captures the history of transactions of a given card. With this, the time and space complexity of scoring each transaction is constant, regardless of how many (if any) the card has appeared before.”) reads an unprocessed hidden state of a sequence corresponding to the sequence information from the storage according to the sequence information, ( Column 5 Lines 9 – 18 of Branco states “The process retrieves a first state (302). In various embodiments, if the first transaction is associated with an entity that has not been previously seen, then the first state is a default state. The default state can be a predefined/system-selected state that works well for various entities. If the first transaction is associated with an entity that has been previously seen, the first state is a saved recurrent neural network state for an entity associated with the first transaction.” ) executes optimization calculation of the recurrent neural network based on the unprocessed hidden state … , and ( Column 11 Lines 54 – 62 of Branco states “A transaction that the model needs to learn how to classify goes through the typical forward pass, followed by a backward pass, e.g., backpropagation of the respective gradients. For a non-scorable instance, however, a forward pass is done, but the backward pass is not. As a result, with the forward pass, the recurrent state of the card is updated with new information. The model does not learn how to classify the non-scorable instances, focusing solely on the target use-cases.” ) writes a processed hidden state in the storage according to the sequence information, the processed hidden state being intermediate output data of the recurrent neural network obtained by the optimization calculation. ( Column 6 Lines 12 – 20 of Branco states “The process updates the saved recurrent neural network state for the entity associated with the first transaction to be the second state (308). In various embodiments, each of the three functions ƒ, g, and h contain learnable parameters, which makes RNN models especially suitable for interleaved data streams. GRUs can be used for the recurrent function/block, where each entity is an independent sequence with its state, but sharing the learnable parameters in ƒ, g, and h across sequences.” Column 13 lines 3 – 7 of Branco states “The process updates a state in a storage system for the given card identifier (604). The storage system stores the state of each card seen by the system. Intuitively, the storage system grows linearly on the number of cards since there is one entry per card.” ) However, Branco does not explicitly teach and a processing circuitry that constructs a mini-batch by selecting data of M sequences from data of N sequences used for training of the recurrent neural network where M is smaller than N executes optimization calculation of the recurrent neural network based on the … mini-batch, and Xu teaches that and a processing circuitry that constructs a mini-batch by selecting data of M sequences from data of N sequences used for training of the recurrent neural network where M is smaller than N (Pg. 1 Abstract of Xu states “Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance.” Pg.3 3.2 Stateful Mini-Batches section of Xu states “SMB training algorithm (figure 1, plot B) passes RNN hidden states between batches, which makes the segments temporally dependent and respects the nature of time series data. In TensorFlow [13] and Keras [10], we can activate SMB by setting the RNNs' parameter stateful = True; then, the algorithm will use the last hidden states of each RNN segment from a batch to initialize each segment at the same position in the next batch.” Under BRI, the limitation reads on any training system that selects a batch of size M from a training of N total sequences, where M < N. A POSITA would understand Xu’s teaching that any stateful multi sequence training system selects M sequences per batch from a larger pool of N, with M<N being the property of mini-batch training.) Tang teaches that executes optimization calculation of the recurrent neural network based on the … mini-batch, and (Pg. 1 Abstract of Tang states “We then draw a connection between batch decoding and a popular training approach for recurrent networks, truncated backpropagation through time. Changing the decoding approach restricts the amount of past history recurrent networks can use for prediction, allowing us to analyze their ability to remember.” Under BRI, the claimed limitation is met by any system that performs forward and backward propagation using a stored hidden states as the initial state and a batch of sequence data as input, which the combination of Branco and Tang teaches.) It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Branco, Xu, and Tang. Branco teaches storing and updating a recurrent hidden state for each identified sequence, including retrieving a current state for a given identifier, processing the input and writing back the updated state. Xu teaches stateful mini-batch training in which hidden states are passed between temporally ordered training segments or batches so that temporal continuity is maintained during training. Tang teaches truncated backpropagation through time as a known training approach for recurrent neural networks in which sequence data are processed in fixed length time blocks for efficient training. One with ordinary skill in the art would have been motivated to incorporate the teachings of Xu, Tang into Branco to improve the training efficiency and reducing the computational burden associated with training long sequential data in recurrent neural networks. The combination would have been predictable combination of known recurrent neural network training techniques to result in robust and efficient stateful training over multiple sequences. Regarding claim 6 , the rejection of claim 1 is incorporated herein. Furthermore, the combination of Branco, Xu, and Tang teaches wherein data of each of the N sequences is divided into blocks having a common TBPTT length. ( Pg. 1 Abstract of Tang states “We then draw a connection between batch decoding and a popular training approach for recurrent networks, truncated backpropagation through time. Changing the decoding approach restricts the amount of past history recurrent networks can use for prediction, allowing us to analyze their ability to remember.” Pg. 4 3 Backpropagation through time of Tang states “There are many variants of BPTT with the most popular one being truncated BPTT [27]. In the original definition [27], instead of propagating gradients all the way to the start of the unrolled chain, the accumulation stops after a fixed number of time steps. Training recurrent networks with truncated BPTT can be justified if the truncated chains are enough to learn the target recursive functions. In the modern setting [22], truncated BPTT is treated as regular BPTT with batch decoding, and the number of unrolled steps before truncation is the number of context frames in batch decoding. Henceforth, we will use the term BPTT with batch decoding to avoid confusion.” Dividing each sequence into fixed length blocks of uniform TBPTT truncation window is definitional to the TBPTT procedure. ) Regarding claim 7 , the rejection of claim 6 is incorporated herein. Furthermore, the combination of Branco, Xu, and Tang teaches wherein in a case of selecting data of the M sequences for each of the M sequences, the processing circuitry sequentially selects an unprocessed block among blocks of the selected sequence. ( Pg. 1 – 2 Introduction of Xu states “First, we extend Stateful RNNs to enable the intra-batch sharing of temporal information by defining a batch as a temporally ordered set of training segments. During the forward pass for a batch, we pass the detached hidden states between segments, and we back- propagate the average loss of the batch to update the models' weights.” Pg.3 3.2 Stateful Mini-Batches section of Xu states “SMB training algorithm (figure 1, plot B) passes RNN hidden states between batches, which makes the segments temporally dependent and respects the nature of time series data. In TensorFlow [13] and Keras [10], we can activate SMB by setting the RNNs' parameter stateful = True; then, the algorithm will use the last hidden states of each RNN segment from a batch to initialize each segment at the same position in the next batch.” The same position initialization across batches is only possible if blocks within each sequence are selected in sequential temporal order. Processing blocks out of order would break the hidden state chain that Xu framework is built on so sequential selection is directly taught and required by Xu. ) Regarding claim 8 , the rejection of claim 1 is incorporated herein. Furthermore, the combination of Branco, Xu, and Tang teaches wherein the processing circuitry performs forward propagation calculation, back propagation calculation, and parameter update in the optimization calculation, and calculates the hidden state in the forward propagation calculation and/or the back propagation calculation. ( Column 11 Lines 54 – 63 of Branco states “A transaction that the model needs to learn how to classify goes through the typical forward pass, followed by a backward pass, e.g., backpropagation of the respective gradients. For a non-scorable instance, however, a forward pass is done, but the backward pass is not. As a result, with the forward pass, the recurrent state of the card is updated with new information. The model does not learn how to classify the non-scorable instances, focusing solely on the target use-cases.” Pg. 1 Abstract of Tang states “We then draw a connection between batch decoding and a popular training approach for recurrent networks, truncated backpropagation through time. Changing the decoding approach restricts the amount of past history recurrent networks can use for prediction, allowing us to analyze their ability to remember.” ) Regarding claim 9 , the rejection of claim 1 is incorporated herein. Furthermore, the combination of Branco, Xu, and Tang teaches wherein the sequence information has an identifier of the selected sequence. ( Column 5 Lines 4 – 6 of Branco states “an entity identification field, usually a unique ID of the credit or debit card involved in the transaction, x.sub.id.” Column 12 Lines 59 – 61 of Branco states “The process begins by obtaining a current state for a given card identifier (600). The state captures the history of transactions of a given card.” X_id is explicitly a unique identifier per sequence entity used to index state storage. ) Regarding claim 10 , the rejection of claim 1 is incorporated herein. Furthermore, the combination of Branco, Xu, and Tang teaches wherein the processing circuitry overwrites the unprocessed hidden state with the processed hidden state. ( Column 6 Lines 13 – 18 of Branco states “The process updates the saved recurrent neural network state for the entity associated with the first transaction to be the second state (308). In various embodiments, each of the three functions ƒ, g, and h contain learnable parameters, which makes RNN models especially suitable for interleaved data streams.” Branco’s storage holds exactly one state per card. The new state replaces the prior state at the same storage location which corresponds to the prior unprocessed state overwritten by the processed new state. ) Claims 11 – 12 recite substantially similar subject matter to claim 1 respectively and are rejected with the same rationale, mutatis mutandis . Claim s 2 – 5 are rejected under 35 U.S.C. 103 as being unpatentable over Branco et al. (US 12373670 B2) in view of Xu et al. (NPL: “Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications” ), Tang et al. (NPL: “On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition”) , further in view of Doetsch et al. (NPL: “A COMPARATIVE STUDY OF BATCH CONSTRUCTION STRATEGIES FOR RECURRENT NEURAL NETWORKS IN MXNET”) . Regarding claim 2 , the rejection of claim 1 is incorporated herein. Furthermore, the combination of Branco, Xu, and Tang does not explicitly teach wherein the processing circuitry randomly selects the M sequences from the N sequences. Doetsch teaches that wherein the processing circuitry randomly selects the M sequences from the N sequences. ( Pg. 2 4. Proposed Approach of Doetsch states “In order to improve the intra-batch variability we propose a stochastic bucketing process. At the beginning of each epoch the utterances are arranged randomly and then partitioned into bins of equal size.” Pg. 1 Introduction of Doetsch states “A straight-forward strategy to minimize zero padding is to sort the utterances by length and to partition them into batches afterwards. However, there are significant drawbacks to this method. First, the sequence order remains constant in each epoch and therefore the intra-batch variability is very low since the same sequences are usually combined into the same batch.” ) It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Branco, Xu, Tang, and Doetsch . Branco teaches storing and updating a recurrent hidden state for each identified sequence, including retrieving a current state for a given identifier, processing the input and writing back the updated state. Xu teaches stateful mini-batch training in which hidden states are passed between temporally ordered training segments or batches so that temporal continuity is maintained during training. Tang teaches truncated backpropagation through time as a known training approach for recurrent neural networks in which sequence data are processed in fixed length time blocks for efficient training. Doetsch teaches constructing mini-batches from a larger pool of training sequences using random and length aware batch selection techniques in order to reduce zero padding and improve batch efficiency. One with ordinary skill in the art would have been motivated to incorporate the teachings of Doetsch into the combination of Branco, Xu, Tang to make sure the combined system can adapt efficiently in large pool of training sequences using random and length aware batch selection. The combination would have predictably maintained temporal continuity across sequence segments while improving batching efficiency, reducing computation overload due to padding, and enabling effective recurrent network training over multiple variable length sequences. Regarding claim 3 , the rejection of claim 2 is incorporated herein. Furthermore, the combination of Branco, Xu, Tang, and Doetsch teaches wherein the processing circuitry randomly selects the M sequences from among sequences having one or more remaining lengths among the N sequences. ( Pg. 3 3.2 Stateful Mini-Batches of Xu states “SMB training algorithm (figure 1, plot B) passes RNN hidden states between batches, which makes the segments temporally dependent and respects the nature of time series data. In TensorFlow [13] and Keras [10], we can activate SMB by setting the RNNs' parameter stateful = True; then, the algorithm will use the last hidden states of each RNN segment from a batch to initialize each segment at the same position in the next batch.” Xu operates in a stateful multi sequence training framework that tracks and passes hidden states between training segments across batches. ) Regarding claim 4 , the rejection of claim 3 is incorporated herein. Furthermore, the combination of Branco, Xu, Tang, and Doetsch teaches wherein the processing circuitry randomly selects first sequences whose number corresponds to a product of the number of mini-batches and a selectivity among the M sequences, and preferentially selects remaining second sequences by giving priority to a sequence having a large amount of remaining lengths. ( Pg. 3 6. Experiments of Doetsch states “As expected, sorting the entire training set by utterance length reduces the required time per epoch to a minimum, while the best overall performance is obtained when utterances are shuffled randomly. Both bucketing and the proposed approach are in between. We can observe that our method is able to reach almost the same recognition performance as using a randomly shuffled sequence ordering, while being almost as fast as the sorted utterance scheduler. This allows for a good trade-off between runtime and system performance.” Doetsch identifies and measures the tradeoff between random selection and sorted selection and proposes operating “in between” as optimal. Under BRI, the claimed selectivity parameter is the direct and obvious parameterization of this known “in between” tradeoff. ) Regarding claim 5 , the rejection of claim 1 is incorporated herein. Furthermore, the combination of Branco, Xu, Tang, and Doetsch teaches wherein the processing circuitry randomly selects the M sequences by giving priority to a sequence having a small difference between a sequence length and a remaining length among the N sequences. ( Pg. 2 – 3 4. Proposed approach of Doetsch states “At the beginning of each epoch the utterances are arranged randomly and then partitioned into bins of equal size. Each bin is then sorted in alternating directions such that two consecutive bins are sorted in reverse order to each other. Finally, the constructed ordering is partitioned into batches. The overall algorithm can be summarized as follows: For each epoch 1. shuffle training data 2. partition resulting sequence into N bins 3. sort each bin n by the utterance length: - in ascending order if n is odd - in descending order if n is even 4. draw consecutive batches of desired size from the resulting sequence… The alternated sorting approach ensures that utterances at the boundaries of two consecutive bins are of similar length such that the final partitioning into batches requires minimal zero padding. Figure 1 shows the utterance lengths for random and sorted sequence ordering as well as for bucketing in MXNet and the proposed approach. Note that in the case of bucketing batches are put together by randomly choosing one of the buckets first, so the ordering does not directly represent the final set of batches.” Under BRI, the claimed metric (sequence length – remain length) represents the amount of each sequence already processed, prioritizing sequences with a small value of this metric selects sequences at an early, processing stage, producing batches. Doetsch achieves the same alignment principle with their length-based sort following the remaining length context. ) Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT BYUNGKWON HAN whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-5294 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT M-F: 9:00AM-6PM PST . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Li B Zhen can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571)272-3768 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /BYUNGKWON HAN/ Examiner, Art Unit 2121 /Li B. Zhen/ Supervisory Patent Examiner, Art Unit 2121