Last updated: May 29, 2026
Application No. 18/302,154
Energy-Efficient Recurrent Neural Network Accelerator

Non-Final OA §101§103
Filed
Apr 18, 2023
Examiner
KHAN, SHAHID K
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Taiwan Semiconductor Manufacturing Company Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +15.4% interview lift. Examiner has a relatively high allowance rate (74%); +15.4% interview lift. A written response may suffice.
Based on 394 resolved cases, 2023–2026
Examiner Intelligence

KHAN, SHAHID K View full profile →
Grants 74% — above average
Career Allowance Rate
292 granted / 394 resolved
+19.1% vs TC avg
Strong +15% interview lift
Without
With
+15.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
16 currently pending
Career history
424
Total Applications
across all art units
Statute-Specific Performance

§101
2.8%
-37.2% vs TC avg
§103
87.3%
+47.3% vs TC avg
§102
5.6%
-34.4% vs TC avg
§112
3.3%
-36.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 394 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the application filed 4/18/23 in which claims 1-20 were presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2/6/25 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites:
A method comprising:
[1]	batching a plurality of input vectors to form an input matrix;
[2]	multiplying the input matrix by an input vector weight matrix, the multiplication generating input vector partial sums for a plurality of timesteps;
[3]	multiplying a time-delayed hidden vector for a particular timestep by a hidden vector weight matrix, the multiplication generating a hidden vector partial sum for the particular timestep;
[4]	adding the hidden vector partial sum for the particular timestep to the input vector partial sum for the particular timestep, the adding generating a full sum for the particular timestep; and
[5]	processing the full sum for the particular timestep, the processing generating a time-delayed hidden vector for a next time step.
Step 1: YES. The claim recites a series of steps and, therefore, is a process.
Step 2A Prong 1: YES. The claim recites a series of mental process and mathematical calculation steps to produce a full sum of a matrix multiplication in an RNN core. Limitation [1] recites batching a plurality of input vectors into an input matrix. “Batching” in the context of the claim encompasses a user manually combining a set of input vectors to create an input matrix and, therefore, falls under the Mental Processes grouping of abstract ideas. Limitations [2]-[4] recite a set of multiplication and addition steps. Specifically, limitation [2] generates input vector partial sums by multiplying the input matrix with an input vector weight matrix. Limitation [3] generates a hidden vector partial sum for a particular time step by multiplying a time-delayed hidden vector for the time step with a hidden vector weight matrix. Finally, limitation [4] generates a full sum for the time step by adding the results of limitations [2] and [3]. Each of these steps recite a mathematical calculation and, therefore, fall under the Mathematical Concepts grouping of abstract ideas. Accordingly, claim 1 recites an abstract idea.
Step 2A Prong 2: NO. Limitation [4] recites an additional element that includes processing the full sum for the particular time step to generate a time delayed hidden vector for a next time step. As per paragraph 29 of the specification, the full sum vector is received by an activation function block which applies an activation function (e.g., sigmoid or hyperbolic tangent) to the full sum vector. Next, per paragraph 31 of the specification, the output of the activation function is received by a post-processing block which applies a digital filter, buffer, or function to the activation vector to produce RNN output signals that are in turn used as inputs within the time delayed hidden vector in the next time step. As drafted, limitation [4] recites “processing” at a high level of generality and, under a broadest reasonable interpretation, constitutes a mere instruction to “apply” the exception under MPEP 2106.05(f). Accordingly, claim 1 is directed to an abstract idea.
Step 2B: NO. As discussed above, limitation [4] recites “processing” at a high level of generality and, under a broadest reasonable interpretation, constitutes a mere instruction to “apply” the judicial exception under MPEP 2106.05(f). A mere instruction to apply an exception does not cause the abstract idea to amount to significantly more than the exception or provide an inventive concept. Thus, claim 1 is ineligible.
Claim 2 recites:
The method of claim 1, further comprising repeating the steps of:
[1a]	multiplying the time-delayed hidden vector by the hidden vector weight matrix;
[1b] 	adding the hidden vector partial sum to the input vector partial sum; and
[1c] 	processing the full sum for the particular timestep;
[2] 	for each timestep in a time sequence until a full sum for each timestep in the time sequence is generated.
Step 1: YES. The claim recites a series of steps and, therefore, is a process.
Step 2A Prong 1: YES. The claim recites a set of mathematical calculation steps to iteratively perform a multiply-and-accumulate operation for each timestep until a full sum for each timestep in the time sequence is generated. Limitations [1a] and [1b] recite the respective multiplication and addition steps for a particular timestep. Each of these steps recite a mathematical calculation and, therefore, fall under the Mathematical Concepts grouping of abstract ideas. Accordingly, claim 2 recites an abstract idea.
Step 2A Prong 2: NO. Limitation [1c] recites an additional element that includes processing the full sum for the particular time step. As explained above, “processing” is described in paragraphs 29-31 of the specification as applying an activation function (e.g., sigmoid or hyperbolic tangent) to the full sum vector and applying a digital filter, buffer, or function to the activation vector to produce RNN output signals that are in turn used as inputs within the time delayed hidden vector in the next time step. As drafted, limitation [1c] recites “processing” at a high level of generality and, under a broadest reasonable interpretation, constitutes a mere instruction to “apply” the exception under MPEP 2106.05(f). Limitation [2] requires performing limitations [1a]-[1c] until the full sum for all the timesteps in the time sequence is obtained. As the limitation merely requires performing limitations [1a]-[1c], it does not cause the judicial exception to be integrated into a practical application. Accordingly, claim 1 is directed to an abstract idea.
Step 2B: NO. As discussed in the previous step, limitation [1c] recites “processing” at a high level of generality and, under a broadest reasonable interpretation, constitutes a mere instruction to “apply” the judicial exception under MPEP 2106.05(f). A mere instruction to apply an exception does not cause the abstract idea to amount to significantly more than the exception or provide an inventive concept. Further, limitation [2] merely requires performing limitations [1a]-[1c] iteratively until the full sum for all the timesteps in the time sequence is obtained. Performing repetitive calculations does not impose meaningful limits on the scope of the claim and, therefore, does not provide an inventive concept under MPEP 2106.05(d)(II). 
Accordingly, claim 2 is ineligible.
Claims 3 and 4 recite:
The method of claim 1, wherein the input matrix comprises a plurality of input matrix folds and the input vector weight matrix comprises a plurality of input vector weight matrix folds.
The method of claim 3, wherein multiplying the input matrix by the input vector weight matrix comprises multiplying each fold of the input matrix by a corresponding fold of the input vector weight matrix.
Step 1: YES. Both claims further limit the process of claim 1 and, therefore, also recite a process.
Step 2A Prong 2/Step 2B: NO. Claim 3 further describes the input matrix and the input vector weight matrix in terms of matrix folds. Selecting a particular data source or type of data to be manipulated is considered insignificant extra-solution activity and, therefore, does not practically integrate the judicial exception or provide an inventive concept under MPEP 2106.05(g). Claim 4 further describes the multiplication of the input matrix by the input vector weight matrix as the multiplication of corresponding folds of the two matrices. Thus, claim 4 also describes the type of data to be manipulated and does not practically integrate the exception or provide an inventive concept.
Accordingly, claims 3 and 4 are ineligible.
Claim 5 recites:
The method of claim 1, wherein the multiplication of the input matrix by the input vector weight matrix and the multiplication of the time-delayed hidden vector by the hidden vector weight matrix are performed by a multiply-accumulate (MAC) unit.
Step 1: YES. Claim 5 further limits the process of claim 1 and, therefore, also recites a process.
Step 2A Prong 2/Step 2B: NO. Claim 5 recites the use of a multiply-accumulate (MAC) unit to perform the multiplications recited in claim 1. The claim does not cause the practical integration of the judicial exception or provide an inventive concept because it recites a generic computer component recited at a high level of generality and, therefore, constitutes mere instruction to “apply” the exception under MPEP 2106.05(f). 
Accordingly, claim 5 is ineligible.
Claim 6 depends from claim 5 and recites the limitation “receiving, at the MAC unit, the input vector weight matrix or the hidden vector weight matrix.” Receiving data is insignificant extra-solution activity and does not cause the judicial exception to be practically integrated in an application under MPEP 2106.05(g). The type of data received also does not cause practical integration or provide an inventive concept. Id. Recitation of “at the MAC unit” describes a generic computer component at a high level of generality and, thus, constitutes mere instruction to apply the exception under MPEP 2106.05(f).
Accordingly, claim 6 is ineligible.
Claim 7 recites:
The method of claim 6, further comprising:
[1]	based on a reception of the input vector weight matrix at the MAC unit, selecting the input matrix for multiplication by the input vector weight matrix; and
[2]	based on a reception of the hidden vector weight matrix at the MAC unit, selecting the time-delayed hidden vector for multiplication by the hidden vector weight matrix.
Step 2A Prong 1: YES. “Selecting” an input matrix for multiplication, as drafted, and under a broadest reasonable interpretation, encompasses a user mentally choosing an operand for a matrix multiplication operation. Thus, limitation [1] recites a Mental Concept. Similarly, in limitation [2] “selecting” a vector for multiplication, as drafted, and under a broadest reasonable interpretation, encompasses a user mentally choosing an operand for a vector-matrix multiplication calculation and, therefore, also recites a Mental Concept.
Step 2A Prong 2/Step 2B: NO. The additional “reception” elements of claim 6 recite data gathering steps that constitute insignificant extra-solution activity under MPEP 2106.05(g). The type of data does not cause the data gathering to integrate the judicial exception in a practical application. Id. Alternatively, “reception” under a broadest reasonable interpretation encompasses a user manually receiving or observing a particular weight matrix. Recitation of “at the MAC unit” recites a generic computer component at a high level of generality and, therefore, constitutes mere instruction to “apply” the exception under MPEP 2106.05(f).
Accordingly, claim 7 is ineligible.
Claim 8 depends from claim 7 and recites the limitation “wherein the selection of the input matrix and the selection of the time-delayed hidden vectors are performed by a first selection device.” 
Step 2A Prong 1: YES. As discussed above, “selecting” an input matrix for multiplication, as drafted, and under a broadest reasonable interpretation, encompasses a user mentally choosing an operand for a matrix multiplication operation. Thus, the claim recites a Mental Concept. 
Step 2A Prong 2/Step 2B: NO. The additional element “by a first selection device” recites a generic computer component at a high level of generality and, therefore, constitutes mere instruction to “apply” the exception under MPEP 2106.05(f). Therefore, it does not integrate the exception in a practical application or provide an inventive concept.
Accordingly, claim 8 is ineligible.
Claim 9 depends from claim 1 and further describes the time-delayed hidden vectors as “compris[ing] an initial vector, the initial vector being a random vector.” Thus, claim 9 further describes the mathematical calculation performed in claim 1 in terms of a particular type of operand and, therefore, recites an abstract idea.
Claim 10 depends from claim 1 and recites “combining the input vector weight matrix and the hidden vector weight matrix to form a combined weight matrix.” Figure 7 of Applicant’s disclosure illustrates combining weight matrices. “Combining” in the context of the claim and under a broadest reasonable interpretation encompasses a user manually concatenating two matrices to create a combined matrix. Thus, claim 10 recites a Mental Concept. 
Accordingly, claim 10 is ineligible.
Claim 11 depends from claim 1 and recites “receiving the input matrix, the input vector weight matrix, and the hidden vector weight matrix from a memory.” In other words, claim 11 describes the operands in the matrix multiplication operations in claim 1 as being received from a memory. Receiving data is considered insignificant extra-solution activity under MPEP 2106.05(g). The type or source of data does not meaningfully limit the scope of the claim. Id. Thus, claim 11 does not integrate the exception in a practical application or provide an inventive concept.
Accordingly, claim 11 is ineligible.
Claim 12 recites “wherein the memory is a dynamic random-access memory (DRAM).” Recitation of a generic computer component at a high level of abstraction constitutes mere instruction to “apply” the exception under MPEP 2106.05(f) and, therefore, does not cause the integration of the exception in a practical application or provide an inventive concept.
Accordingly, claim 12 is ineligible.
Independent claim 13 recites:
A neural network comprising:
[1] a multiply-accumulate (MAC) unit configured to:
[1a]	receive an input matrix and an input vector weight matrix;
[1b]	multiply the input matrix by the input vector weight matrix, the multiplication generating input vector partial sums;
[1c]	receive time-delayed hidden vectors and a hidden vector weight matrix; and
[1d]	multiply the time-delayed hidden vectors and the hidden vector weight matrix, the multiplication generating hidden vector partial sums;
[2] an accumulator coupled to the MAC unit, the accumulator configured to accumulate and add the input vector partial sums and the hidden vector partial sums, the addition generating a plurality of full sum vectors;
[2a]	wherein the neural network is configured to generate the time-delayed hidden vectors based on the plurality of full sum vectors; and
[3] a first selection device coupled to the MAC unit, the first selection device configured to select between the input matrix and the time-delayed hidden vectors for reception at the MAC unit.
Step 1: YES. The claim is directed to a neural network comprising a MAC unit, an accumulator coupled to the MAC unit, and a first selection device, and, therefore, cannot be software per se. Accordingly, claim 13 is directed to a hardware/software combination which may be interpreted as a machine under 35 USC 101. Accordingly, claim 13 and its dependent claims are directed to a statutory category under 35 U.S.C. 101.
Step 2A Prong 1: YES. Limitations [1b] and [1d] recite a set of matrix multiplication operations based on inputted data and limitation [2] recites an accumulate (addition) operation on the outputs of the MAC unit. These operations constitute mathematical calculations and, therefore, fall under the Mathematical Concepts grouping of abstract ideas. Finally, limitation [3] recites “select between the input matrix and the time-delayed hidden vectors.” “Select” as drafted and under a broadest reasonable interpretation encompasses observations, evaluations, judgements, and opinions. Thus, limitation [3] falls under the Mental Concept grouping of abstract ideas.
Step 2A Prong 2: NO. Limitations [1], [1a], and [1c] collectively recite a MAC unit which is configured to receive the operands of the matrix multiplications recited in limitations [1b] and [1d]. Limitation [2] recites “an accumulator” and limitation [3] recites “a first selection device.” Each of these elements are generic computer components recited at a high level of generality and, therefore, are mere instruction to “apply” the judicial exception. Such elements do not integrate the judicial exception into a practical application under MPEP 2106.05(f). Receiving data is considered insignificant extra-solution activity under MPEP 2106.05(g) and, therefore, limitations [1b] and [1d] do not integrate the judicial exception into a practical application. 
Limitation [2a] recites the neural network as generating the time delayed hidden vectors based on the full sum vectors. However, “generating” in the context of the claim encompasses a user mentally (or with the aid of pen and paper) manipulating the full sum vector to obtain the time-delayed hidden vector. Recitation of the neural network constitutes mere instruction to “apply” the exception using a generic computer component. See MPEP 2106.05(f).
Step 2B: NO. Limitations [1], [1a], and [1c] collectively recite a MAC unit which is configured to receive the operands of the matrix multiplications recited in limitations [1b] and [1d]. Limitation [2] recites “an accumulator” and limitation [3] recites “a first selection device.” Each of these elements are generic computer components recited at a high level of generality and, therefore, are mere instruction to “apply” the judicial exception. Such elements do not integrate the judicial exception into a practical application under MPEP 2106.05(f). Receiving data is considered insignificant extra-solution activity under MPEP 2106.05(g) and, therefore, limitations [1b] and [1d] do not integrate the judicial exception into a practical application. 
Limitation [2a] recites the neural network as generating the time delayed hidden vectors based on the full sum vectors. However, “generating” in the context of the claim encompasses a user mentally (or with the aid of pen and paper) manipulating the full sum vector to obtain the time-delayed hidden vector. Recitation of the neural network constitutes mere instruction to “apply” the exception using a generic computer component. See MPEP 2106.05(f).
The additional elements of the claim relate to the specific combination of the MAC unit, the accumulator, and the first selection device. Specifically, the accumulator is coupled to the MAC unit and is configured to accumulate and add the input vector partial sums and the hidden vector partial sums produced by the MAC unit. The fact that the accumulator is coupled to the MAC unit does not provide an inventive concept because the accumulation and addition functions are performed on the outputs of the MAC unit. Similarly, the first selection device is coupled to the MAC unit and is configured to select between the input matrix and the time-delayed hidden vectors for reception at the MAC unit. The recitation of the selection device as performing the selection between the input matrix and the time-delayed hidden vectors for reception at the MAC device also does not provide an inventive concept because it merely describes the selection as being performed by a device.
Accordingly, claim 13 is ineligible.
Claim 14 recites:
The neural network of claim 13, wherein multiplying the input matrix by the input vector weight matrix comprises:
[1]	loading one of a plurality of folds of the input vector weight matrix into the MAC unit;
[2]	serially multiplying each of a plurality of folds of the input vector by the fold of the input vector weight matrix;
[3]	loading a next fold of the plurality of folds of the input vector weight matrix into the MAC unit; and
[4]	repeating the serial multiplication and the loading of the next fold until the entire input vector has been multiplied by the input vector weight matrix.
Step 2A Prong 1: YES. Limitation [2] recite vector-matrix multiplication of the input vector weight matrix and the input vector and, therefore, fall under the Mathematical Concepts of abstract ideas. 
Step 2A Prong 2: NO. Limitations [1] and [3] recite data gathering/inputting steps in which portions of the input vector weight matrix are loaded into the MAC unit. Data gathering constitutes insignificant extra-solution activity under MPEP 2106.05(g). 
Step 2B: NO. Limitation [4] recites repeating the serial multiplication of the portions of the input vector until the entire input vector has been multiplied by the input vector weight matrix. Performing repetitive calculations is well understood routine, and conventional under MPEP 2106.05(d) and, therefore, does not provide an inventive concept. As discussed in the previous step, mere data gathering does not provide an inventive concept either. 
Accordingly, claim 14 is ineligible.
Claim 15 recites:
The neural network of claim 13, further comprising:
[1]	an input buffer coupled to the first selection device, the input buffer configured to store the input matrix; and
[2]	a weight buffer coupled to the MAC unit, the weight buffer configured to store the input vector weight matrix and the hidden vector weight matrix.
Step 2A Prong 2/Step 2B: NO. Both limitations [1] and [2] recites a buffer, which is a generic computer component recited at a high level of abstraction and, therefore, constitute mere instruction to “apply” the judicial exception under MPEP 2106.05(f). Similarly, the first selection device and MAC unit are also generic computer components recited at a high level of abstraction. Use of a computer or other machinery in its ordinary capacity to receive and store data does not integrate a judicial exception into a practical application or provide significantly more under MPEP 2106.05(f).
Accordingly, claim 15 is ineligible.
Claim 16 recites “an activation function device configured to apply an activation function to the full sum vectors.” 
Step 2A Prong 1: YES. Applying an activation function to the full sum vectors, as drafted, and under its broadest reasonable interpretation encompasses performing a mathematical calculation involving applying a formula for an activation function (e.g., sigmoid) to a full sum vector. Thus, the limitation recites an abstract idea.
Step 2A Prong 2/Step 2B: NO. The claim recites an activation function device as performing the mathematical calculation. However, the device is recited as a generic computer component and, therefore, constitutes a mere instruction to “apply” the judicial exception under MPEP 2106.05(f). Thus, it does not integrate the judicial exception into a practical application or provide significantly more.
Accordingly, claim 16 is ineligible.
Claim 17 recites “an activation buffer coupled to the first selection device, the activation buffer configured to store the time-delayed hidden vectors.”
Step 2A Prong 2/Step 2B: NO. Recitation of a generic computer component (activation buffer) at a high level of generality constitutes a mere instruction to “apply” the judicial exception and does not integrate the judicial exception into a practical application under MPEP 2106.05(f). Similarly, the fact that the buffer is configured to store data does not integrate the judicial exception or provide significantly more.
Accordingly, claim 17 is ineligible.
Claim 18 recites “wherein the time-delayed hidden vectors comprise an initial vector, the initial vector being a random vector.”
Step 2A Prong 2/Step 2B: NO. Claim 18 further describes the time-delayed hidden vectors received in claim 13 as comprising a random vector. The type of data does not cause data inputting to integrate the judicial exception into a practical application or amount to significantly more under MPEP 2106.05(g).
Accordingly, claim 18 is ineligible.
Claim 19 recites:
The neural network of claim 18, further comprising a second selection device coupled to the activation buffer and the activation function device, the second selection device configured to select between the initial vector and a separate time-delayed hidden vector for reception at the activation buffer.
Step 2A Prong 1: YES. Claim 19 describes selecting between the initial vector and time-delayed hidden vector for reception at the activation buffer. Selecting, as drafted, encompasses something that can be performed mentally by a user and, therefore, falls under the Mental Processes grouping of abstract ideas.
Step 2A Prong 2: NO. The claim describes the selecting being performed by a second selection device. However, the selection device is a generic computer component recited at a high level of generality and, therefore, does not integrate the judicial exception into a practical application.
Step 2B: NO. The inclusion of the selection device in combination with the activation buffer and the activation function device to select between the initial vector and the time-delayed hidden vector is a mere instruction to apply the judicial exception and, therefore, does not provide an inventive concept.
Independent claim 20 recites:
A recurrent neural network (RNN) core comprising: 
[1]	an input buffer configured to receive an input matrix from a memory and store the input matrix, the input matrix including a plurality of input vectors; 
[2]	a weight buffer configured to receive a weight matrix from the memory and store the weight matrix; and 
[3]	a multiply-accumulate (MAC) unit coupled to the input buffer and the weight buffer, the MAC unit configured to receive a fold of the input matrix and a corresponding fold of the weight matrix and to multiply the fold of the input matrix by the corresponding fold of the weight matrix, the multiplication generating an input vector partial sum.
Step 1: YES. The claim is directed to a recurrent neural network core comprising an input buffer, weight buffer, and a MAC unit. Since the MAC unit is a well-known hardware component, the claim as a whole is directed to hardware and, therefore, is a statutory category under 35 U.S.C. 101.
Step 2A Prong 1: YES. 
Limitation [3] recites “multiple the fold of the input matrix by the corresponding fold of the weight matrix, the multiplication generating an input vector partial sum.” This amounts to a mathematical calculation and, therefore, falls under the “Mathematical Concepts” grouping of abstract ideas.
Step 2A Prong 2: NO. Limitation [1] recites an input buffer at a high level of generality. Similarly, limitation [2] recites a weight buffer at a high level of generality. Finally, limitation [3] recites a MAC unit at a high level of generality. Generic computer components recited at a high level of generality do not integrate an abstract idea into a practical application and may be considered mere instructions to “apply” the abstract idea. See MPEP 2106.05(f). The remaining portion of limitation [1] describes receiving an input matrix from a memory and storing the input matrix. However, data gathering by itself is considered insignificant extra-solution activity and does not integrate the exception into a practical application. See MPEP 2106.05(g). The type or source of the data does not cause the data gathering activity to integrate the exception into a practical application. Id. Similarly, the remaining portion of limitation [2] recites receiving a weight matrix from memory and storing the weight matrix. Accordingly, these additional limitations also do not cause integration into a practical application. Finally, the remaining portion of limitation [3] describe receiving a fold of the input matrix and a corresponding fold of the weight matrix, which are also data gathering steps. 
Step 2B: NO. As discussed in the previous step, the additional elements of each limitation recite a number of generic computer components at a high level of generality. Accordingly, these components are mere instructions to “apply” the exception and fail to provide an inventive concept. See MPEP 2106.05(f). The remaining portions of each limitation (exclusive of the abstract idea discussed in step 2A prong 1) recite receiving data. Receiving data has been found by courts as well-understood, routine, and conventional. See MPEP 2106.05(d) (Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network)). The additional element of storing the received data in memory is also well-understood, routine, and conventional. See MPEP 2106.05(d) (Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93).
Accordingly, claim 20 is ineligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1-6, 11, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Silfa, Franyell, et al. "E-PUR: An energy-efficient processing unit for recurrent neural networks." Proceedings of the 27th international conference on parallel architectures and compilation techniques (2018) (“Silfa”) in view of Appleyard, Jeremy, Tomas Kocisky, and Phil Blunsom. "Optimizing performance of recurrent neural networks on gpus." arXiv preprint arXiv:1604.01946 (2016) (“Appleyard”).
Regarding claim 1, Silfa discloses [a] method comprising:
multiplying the input matrix by an input vector weight matrix, the multiplication generating input vector partial sums for a plurality of timesteps; (Silfa figure 4 equation (1):
[AltContent: oval]
    PNG
    media_image1.png
    126
    975
    media_image1.png
    Greyscale

(input vector (xt) for a current timestep (t) is multiplied by input vector weight matrix (Wix) generates a partial sum corresponding to the current timestep (t) (the input vector (xt) is interpreted as a matrix having a single column or row))
[AltContent: oval]multiplying a time-delayed hidden vector for a particular timestep by a hidden vector weight matrix, the multiplication generating a hidden vector partial sum for the particular timestep; (Silfa figure 4 equation (1): 
    PNG
    media_image2.png
    90
    694
    media_image2.png
    Greyscale

(time-delayed hidden vector(ht-1) for a particular/previous timestep (t-1) multiplied by a hidden vector weight matrix (Wih))
adding the hidden vector partial sum for the particular timestep to the input vector partial sum for the particular timestep, the adding generating a full sum for the particular timestep; and (Silfa figure 4 equation (1):
[AltContent: oval]
    PNG
    media_image2.png
    90
    694
    media_image2.png
    Greyscale

(hidden vector partial (Wihht-1) for the particular timestep (t) is added to the input vector partial sum (Wixxt) for the particular timestep to generate a full sum (it) for the particular timestep (t))
processing the full sum for the particular timestep, the processing generating a time-delayed hidden vector for a next time step (Silfa figure 4 equations (4)-(6):
[AltContent: oval]
    PNG
    media_image3.png
    164
    668
    media_image3.png
    Greyscale

(the full sum (it) is used to generate ct (LSTM cell state) in equation (4) which is then used to generate the output gate (ot) which is then used to generate the output (ht) of the LSTM cell for the current/next timestep (t)).
Silfa does not expressly disclose batching a plurality of input vectors to form an input matrix; (but see Appleyard pg. 3 Section 2.3 (“A single recurrent layer comprises many cells, the recurrent input of each depending on the output of the previous. The input from the previous layer may not have such a dependency and it is often possible to concatenate the inputs for multiple time steps producing a larger, more efficient, matrix multiplication.)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa to incorporate the teachings of Appleyard to concatenate the input vectors for multiple timesteps to form an input matrix to be multiplied to the weight matrix, at least because doing so would produce a larger, more efficient, matrix multiplication.

Regarding claim 2, Silfa, in view of Appleyard, discloses the invention of claim 1 as discussed above. Silfa further discloses repeating the steps of:
multiplying the time-delayed hidden vector by the hidden vector weight matrix; (Silfa figure 4 equation (1):
[AltContent: oval]
    PNG
    media_image2.png
    90
    694
    media_image2.png
    Greyscale

adding the hidden vector partial sum to the input vector partial sum; and (Silfa figure 4 equation (1):
[AltContent: oval]
    PNG
    media_image2.png
    90
    694
    media_image2.png
    Greyscale

processing the full sum for the particular timestep; (Silfa figure 4 equations (3)-(6):

    PNG
    media_image4.png
    236
    630
    media_image4.png
    Greyscale

	(the full sum (it) for the particular timestep (t) is processing by the cell state (ct) which is in turn used to process the output gate (ot) and the output (ht) of the LSTM cell))
for each timestep in a time sequence until a full sum for each timestep in the time sequence is generated (Silfa pg. 4 Section 2 (“In addition, RNNs are recurrently executed for every element in the input sequence and, hence, they can handle variable length input/output, which is a requirement for sequence processing problems.)).

Regarding claim 3, Silfa, in view of Appleyard, discloses the invention of claim 1 as discussed above. Silfa further discloses wherein the input matrix comprises a plurality of input matrix folds and the input vector weight matrix comprises a plurality of input vector weight matrix folds (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors. Then, the resulting vector is sent to the INT N-element Reduction Adder, in order to sum together all its elements, which takes log2(N) cycles. Finally, the resulting value is added to the value stored in a register called Accumulator (i.e. 24 bits), which accumulates the partial dot product until the results of all K subvectors are added together.”)).

Regarding claim 4, Silfa, in view of Appleyard, discloses the invention of claim 3 as discussed above. Silfa further discloses wherein multiplying the input matrix by the input vector weight matrix comprises multiplying each fold of the input matrix by a corresponding fold of the input vector weight matrix (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors. Then, the resulting vector is sent to the INT N-element Reduction Adder, in order to sum together all its elements, which takes log2(N) cycles. Finally, the resulting value is added to the value stored in a register called Accumulator (i.e. 24 bits), which accumulates the partial dot product until the results of all K subvectors are added together.”)).

Regarding claim 5, Silfa, in view of Appleyard, discloses the invention of claim 1 as discussed above. Silfa further discloses wherein the multiplication of the input matrix by the input vector weight matrix and the multiplication of the time-delayed hidden vector by the hidden vector weight matrix are performed by a multiply-accumulate (MAC) unit (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors. Then, the resulting vector is sent to the INT N-element Reduction Adder, in order to sum together all its elements, which takes log2(N) cycles. Finally, the resulting value is added to the value stored in a register called Accumulator (i.e. 24 bits), which accumulates the partial dot product until the results of all K subvectors are added together.”)).

Regarding claim 6, Silfa, in view of Appleyard, discloses the invention of claim 5 as discussed above. Silfa further discloses receiving, at the MAC unit, the input vector weight matrix or the hidden vector weight matrix (Silfa figure 8 Structure of Computation Unit:

    PNG
    media_image5.png
    630
    778
    media_image5.png
    Greyscale
.
(The Dot Product Unit receives the input weight matrix and the input vectors)).

Regarding claim 11, Silfa, in view of Appleyard, discloses the invention of claim 1 as discussed above. Silfa further discloses receiving the input matrix, the input vector weight matrix, and the hidden vector weight matrix from a memory (Silfa figure 8:

    PNG
    media_image6.png
    561
    693
    media_image6.png
    Greyscale

(weight and input buffers supply the weight and input matrices)).

Regarding claim 12, Silfa, in view of Appleyard, discloses the invention of claim 1 as discussed above. Silfa further discloses wherein the memory is a dynamic random-access memory (DRAM) (Silfa pg. 9 Section 4 Evaluation Methodology (“We model 4GB LPDDR4 DRAM.”)).

Regarding claim 20, Silfa discloses [a] recurrent neural network (RNN) core comprising: (Silfa figure 8; Structure of Computation Unit; Silfa pg. 6 Section 3.3 Computation Unit (“The Computation Unit is the hardware structure that implements the formal model of an LSTM cell, described in Figure 4. It is composed of two main components: the Dot Product Unit (DPU) and the Multifunctional Unit (MU).”))
an input buffer configured to receive an input matrix from a memory and store the input matrix, the input matrix including a plurality of input vectors; (Silfa figure 8 (Dot-Product Unit receives input matrix (xt) from the input buffer (xt is interpreted as a single column matrix)). Silfa does not expressly disclose the input matrix including a plurality of input vectors (but see Appleyard pg. 3 Section 2.3 (“A single recurrent layer comprises many cells, the recurrent input of each depending on the output of the previous. The input from the previous layer may not have such a dependency and it is often possible to concatenate the inputs for multiple time steps producing a larger, more efficient, matrix multiplication.)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa to incorporate the teachings of Appleyard to concatenate the input vectors xt for multiple timesteps to form an input matrix to be multiplied to the weight matrix, at least because doing so would produce a larger, more efficient, matrix multiplication.
Silfa further discloses a weight buffer configured to receive a weight matrix from the memory and store the weight matrix; and (Silfa figure 8 (Weight Buffer receives input vector weight matrix (Wx) and hidden vector weight matrix (Wh)))
a multiply-accumulate (MAC) unit coupled to the input buffer and the weight buffer, (Silfa figure 8 Dot Product Unit is coupled to the input buffer and the weight buffer and includes an INT N-Multiplier, INT N-Adder Reduction, and Accumulator) (Multiply-Accumulate (MAC) is the fundamental hardware/algorithmic step (a X b + c) used to compute each term within a dot product) the MAC unit configured to receive a fold of the input matrix and a corresponding fold of the weight matrix and to multiply the fold of the input matrix by the corresponding fold of the weight matrix, the multiplication generating an input vector partial sum (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors. Then, the resulting vector is sent to the INT N-element Reduction Adder, in order to sum together all its elements, which takes log2(N) cycles. Finally, the resulting value is added to the value stored in a register called Accumulator (i.e. 24 bits), which accumulates the partial dot product until the results of all K subvectors are added together.”)).

Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Silfa and Appleyard as applied to claim 6 above, and further in view of Liu (US 2023/0196068 A1; published Jun. 22, 2023).
Regarding claim 7, Silfa, in view of Appleyard, discloses the invention of claim 6 as discussed above. Although Silfa teaches that the Dot Product Unit receives the input vector weight matrix Wx and the input vector matrix xt, Silfa does not expressly disclose based on a reception of the input vector weight matrix at the MAC unit, selecting the input matrix for multiplication by the input vector weight matrix; and based on a reception of the hidden vector weight matrix at the MAC unit, selecting the time-delayed hidden vector for multiplication by the hidden vector weight matrix (but see Liu ¶ 61 (“The technical solution provided by the embodiment of the present disclosure is applied. Considering that the calculation of the gate structures occupies most of the calculations of the whole RNN, which are mainly the calculation of matrix and vector multiplication, the present disclosure is provided with the vector multiplication circuit including the N groups of multiplication arrays, and each group of multiplication arrays includes k multiplication units, which is beneficial to increasing the calculation speed. Considering that in the traditional technical solution, Wxxt and Whht-1 are combined together and calculated, and when the dimension of xt or ht-1 is relatively large, the calculation speed will be very slow. In the technical solution of the present disclosure, Wxxt and Whht-1 are calculated in time-sharing and segmentation modes, that is, it is not necessary to wait until all the values of Wxxt and Whht-1 are generated before accumulating, which is beneficial to further improving the acceleration effect of the technical solution. The first cache is configured to circularly switch between the first state and the second state, output Wx1 to WxN in N paths in the first state in parallel with all degrees of parallelism of k, and output Wh1 to WhN in N paths in the second state in parallel with all degrees of parallelism of k, wherein N is a positive integer greater than or equal to 2. The second cache is configured to circularly switch between the first state and the second state, output xt in the first state, and output ht-1 in the second state. The vector multiplication circuit is configured to use the N groups of multiplication arrays to respectively calculate Wx1xt to WxNxt when receiving Wx1 to WxN output by the first cache, and use the N groups of multiplication arrays to respectively calculate Wh1ht-1 to WhN ht-1 when receiving Wh1 to WhN output by the first cache.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa and Appleyard to incorporate the teachings of Liu to select xt for multiplication with the input vector weight matrix when the input buffer is in a first state and to select ht-1 for multiplication with the hidden vector weight matrix when the input buffer is in a second state, at least because doing so would increase the calculation speed.

Regarding claim 8, Silfa, in view of Appleyard and Liu, discloses the invention of claim 7 as discussed above. As explained above, Silfa does not expressly disclose wherein the selection of the input matrix and the selection of the time-delayed hidden vectors are performed by a first selection device (but see Liu ¶ 32 (“a first memory, a second memory, a third memory and a fourth memory, all connected to the first multiplexer through a data classifier, configured to output Wxi, Wxf, Wxo, and Wxc sequentially in parallel with all degrees of parallelism of k when the first multiplexer is in the first state, and configured to output Whi, Whf, Who, and Whc sequentially in parallel with all degrees of parallelism of k when the first multiplexer is in the second state”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa and Appleyard to incorporate the teachings of Liu to select xt for multiplication with the input vector weight matrix when the input buffer is in a first state and to select ht-1 for multiplication with the hidden vector weight matrix when the input buffer is in a second state, at least because doing so would increase the calculation speed.

Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Silfa and Appleyard as applied to claim 1 above, and further in view of Cammarota (US 2019/0325294 A1; published Oct. 24, 2019).
Regarding claim 9, Silfa, in view of Appleyard, discloses the invention of claim 1 as discussed above. Although both Silfa and Appleyard relate to recurrent neural network training, they do not expressly disclose wherein the time-delayed hidden vectors comprise an initial vector, the initial vector being a random vector (but see Cammarota ¶ 53 (“In operation, the RNN/LSTM (e.g., RNN/LSTM 340) may be used to determine an inference with respect to a given input. In one example, the input may be a sequence of audio data and the RNN/LSTM may be trained for speech recognition. The audio data may be divided into portions or chunks and supplied to the RNN/LSTM 340 as x=[x.sub.1.sup.1 . . . x.sub.1.sup.T]. For instance, each portion may correspond to a word within the sequence of audio data. Cell[1,1] may receive input x.sub.1.sup.1 along with the initial memory state c.sub.0.sup.1 and initial hidden state h.sub.0.sup.1. In some aspects, the initial hidden state and the initial memory state may be initialized to a predefined value (e.g., 0), a random value or other initial value.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa and Appleyard to incorporate the teachings of Cammarota to initialize the hidden state of the RNN to random values, at least because doing so may enable improved training speed and generalization.

Regarding claim 10, Silfa, in view of Appleyard, discloses the invention of claim 1 as discussed above. Silfa and Appleyard do not expressly disclose combining the input vector weight matrix and the hidden vector weight matrix to form a combined weight matrix (but see Cammarota ¶ 53 (“In operation, the RNN/LSTM (e.g., RNN/LSTM 340) may be used to determine an inference with respect to a given input. In one example, the input may be a sequence of audio data and the RNN/LSTM may be trained for speech recognition. The audio data may be divided into portions or chunks and supplied to the RNN/LSTM 340 as x = [x11 . . . x1T]. For instance, each portion may correspond to a word within the sequence of audio data. Cell [1,1] may receive input x11 along with the initial memory state c01 and initial hidden state h01. In some aspects, the initial hidden state and the initial memory state may be initialized to a predefined value (e.g., 0), a random value or other initial value. A first processor may execute cell [1,1] to compute the input x12. For example, the first processor may concatenate the received input x11 and the initial hidden state h01, the result of which may be scaled via matrix multiplication based on the input gate parameters
                
                    
                        
                            
                                
                                    
                                        
                                            
                                            W
                                            x
                                            i
                                        
                                    
                                
                                
                                    
                                        
                                            
                                            W
                                            x
                                            i
                                        
                                    
                                
                            
                        
                    
                
            .
R(D+H)xH.”) (The dimensions of the combined weight matrix reflect the number of rows of the input vector matrix (D) and the number of rows of the hidden weight matrix (H))).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa and Appleyard to incorporate the teachings of Cammarota to combine the input gate weight matrix and the hidden state weight matrix, at least because doing so would enable concurrently executing cells of the RNN/LSTM. See Cammarota ¶¶ 10-12.

Claims 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Silfa in view of Liu and Appleyard.
Regarding claim 13, Silfa discloses [a] neural network comprising:
a multiply-accumulate (MAC) unit configured to: (Silfa figure 8 Structure of Computation Unit:

    PNG
    media_image6.png
    561
    693
    media_image6.png
    Greyscale
 
(Dot Product Unit includes an INT N-Multiplier, INT N-Adder Reduction, and Accumulator) (Multiply-Accumulate (MAC) is the fundamental hardware/algorithmic step (a X b + c) used to compute each term within a dot product))
receive an input matrix and an input vector weight matrix; (Silfa figure 8 (Dot-Product Unit receives input vector weight matrix (Wx) and input matrix (xt) (xt is interpreted as a single column matrix))
[AltContent: oval]multiply the input matrix by the input vector weight matrix, the multiplication generating input vector partial sums; (Silfa figure 8 (INT N-Multiplier multiplies Wx and xt); 
    PNG
    media_image1.png
    126
    975
    media_image1.png
    Greyscale

(input vector (xt) for a current timestep (t) is multiplied by input vector weight matrix (Wix) generates a partial sum corresponding to the current timestep (t) (the input vector (xt) is interpreted as a matrix having a single column or row)))
receive time-delayed hidden vectors and a hidden vector weight matrix; and (Silfa figure 8 (Dot Product Unit receives time delayed hidden vectors (ht-1) and hidden vector weight matrix (Wh)))
[AltContent: oval]multiply the time-delayed hidden vectors and the hidden vector weight matrix, the multiplication generating hidden vector partial sums; (Silfa figure 8 (INT N-Multiplier multiplies Wx and xt); Silfa figure 4 equation (1): 
    PNG
    media_image2.png
    90
    694
    media_image2.png
    Greyscale

(time-delayed hidden vector(ht-1) for a particular/previous timestep (t-1) multiplied by a hidden vector weight matrix (Wih))
an accumulator coupled to the MAC unit, the accumulator configured to accumulate and add the input vector partial sums and the hidden vector partial sums, the addition generating a plurality of full sum vectors; (Silfa figure 8 (Dot Product Unit includes an INT N-Multiplier, INT N-Adder Reduction, and Accumulator); Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vectorht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors. Then, the resulting vector is sent to the INT N-element Reduction Adder, in order to sum together all its elements, which takes log2(N) cycles. Finally, the resulting value is added to the value stored in a register called Accumulator (i.e. 24 bits), which accumulates the partial dot product until the results of all K subvectors are added together.”)))
wherein the neural network is configured to generate the time-delayed hidden vectors based on the plurality of full sum vectors; and (Silfa figure 4 equation (1):
[AltContent: oval]
    PNG
    media_image2.png
    90
    694
    media_image2.png
    Greyscale

(hidden vector partial (Wihht-1) for the particular timestep (t) is added to the input vector partial sum (Wixxt) for the particular timestep to generate a full sum (it) for the particular timestep (t))
a first selection device coupled to the MAC unit, the first selection device configured to select between the input matrix and the time-delayed hidden vectors for reception at the MAC unit (Silfa figure 8 (As illustrated, the computation unit is designed such that the Dot Product Unit receives and performs either the multiplication of the input matrix (xt) and input vector weight matrix (Wx) or the multiplication of the time-delayed hidden vectors (ht-1) and the hidden vector weight matrix (Wh)).
Silfa does not expressly disclose a first selection device coupled to the MAC unit, the first selection device configured to select between the input matrix and the time-delayed hidden vectors for reception at the MAC unit (but see Liu (US 2023/0196068 A1; published Jun. 22, 2023) ¶ 109 (“According to the present disclosure, Wx includes Wxi, Wxf, Wxo, and Wxc, the vector multiplication circuit 30 includes four groups of multiplication arrays, and therefore, the data classifier 104 is needed for classification, that is, Wxi, Wxf, Wxo, and Wxc output by the first multiplexer 103 need to be transmitted to different multiplication arrays. Wh is similar to Wx in terms of processing manner. In an embodiment shown in FIG. 4, the first memory 105, the second memory 106, the third memory 107, and the fourth memory 108 are all first-in-first-out (FIFO) memories. FIFO-Wi 105 in FIG. 4 represents the first memory 105 and is configured to output Wxi and Whi.”); ¶ 111 (“It can be seen that according to the present disclosure, when the calculations of the first four formulas are performed, Wx and Wh are provided by the first cache 10 in a time-sharing mode, and xt and ht-1 are provided by the second cache 20 in a time-sharing mode.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa to incorporate the teachings of Liu to provide the input matrix (xt) and the time-delayed hidden vectors (ht-1) in a time-sharing mode, at least because doing so would enable the Dot Product unit to accelerate the recurrent neural network. See Liu ¶ 8.
Silfa does not expressly disclose an input matrix; (but see Appleyard pg. 3 Section 2.3 (“A single recurrent layer comprises many cells, the recurrent input of each depending on the output of the previous. The input from the previous layer may not have such a dependency and it is often possible to concatenate the inputs for multiple time steps producing a larger, more efficient, matrix multiplication.)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa to incorporate the teachings of Appleyard to concatenate the input vectors for multiple timesteps to form an input matrix to be multiplied to the weight matrix, at least because doing so would produce a larger, more efficient, matrix multiplication.

Regarding claim 14, Silfa, in view of Liu and Appleyard, discloses the invention of claim 13 as discussed above. Silfa further discloses wherein multiplying the input matrix by the input vector weight matrix comprises:
loading one of a plurality of folds of the input vector weight matrix into the MAC unit; (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated.”))
serially multiplying each of a plurality of folds of the input vector by the fold of the input vector weight matrix; (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors.”))
loading a next fold of the plurality of folds of the input vector weight matrix into the MAC unit; and (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors. Then, the resulting vector is sent to the INT N-element Reduction Adder, in order to sum together all its elements, which takes log2(N) cycles. Finally, the resulting value is added to the value stored in a register called Accumulator (i.e. 24 bits), which accumulates the partial dot product until the results of all K subvectors are added together.”))
repeating the serial multiplication and the loading of the next fold until the entire input vector has been multiplied by the input vector weight matrix (Silfa pg. 6 Section 3.3.1 The Dot Product Unit (“The DPU performs a integer (INT) dot product between two vectors of length M by splitting them into K sub-vectors of size N. On each cycle, this unit executes the following steps. First, two size N sub-vectors are loaded from two different on-chip scratchpad memories: the Weight Buffer and the Input Buffer. The former keeps all the synaptic weights of a given layer. The latter stores either the input vector xt or the previous output vector ht−1 of the layer being evaluated. Next, the N-element INT Multiplier performs an element-wise multiplication of the two sub-vectors. Then, the resulting vector is sent to the INT N-element Reduction Adder, in order to sum together all its elements, which takes log2(N) cycles. Finally, the resulting value is added to the value stored in a register called Accumulator (i.e. 24 bits), which accumulates the partial dot product until the results of all K subvectors are added together.”)).

Regarding claim 15, Silfa, in view of Liu and Appleyard, discloses the invention of claim 13 as discussed above. Silfa further discloses an input buffer coupled to the first selection device, the input buffer configured to store the input matrix; and (Silfa figure 8 (Input Buffer stores the input vectors xt/ht-1))
a weight buffer coupled to the MAC unit, the weight buffer configured to store the input vector weight matrix and the hidden vector weight matrix (Silfa figure 8 (Weight Buffer stores the input weight matrix (Wx) and hidden vector weight matrix (Wh)).

Regarding claim 16, Silfa, in view of Liu and Appleyard, discloses the invention of claim 13 as discussed above. Silfa further discloses an activation function device configured to apply an activation function to the full sum vectors (Silfa pg. 7 Section 3.3.2 The Multifunctional Unit (“The MUs for the input and forget gates perform very similar operations: they perform the multiplications for peephole connections and add the bias. Next, they apply the sigmoid function to the result. After this, the resulting value is sent to the MU of the cell updater gate, which uses this information to proceed with the computation of the cell state, i.e. ct, and, then, it applies the hyperbolic tangent function to this value. Once this information is computed, it is sent to the MU of the output gate, which computes and quantized the kth element of the output vector, i.e. ht, corresponding to the current element of the input sequence (i.e. xt).”)).

Regarding claim 17, Silfa, in view of Liu and Appleyard, discloses the invention of claim 13 as discussed above. Silfa further discloses an activation buffer coupled to the first selection device, the activation buffer configured to store the time-delayed hidden vectors (Silfa figure 8 (Input Buffer stores the time delayed hidden vectors (ht-1)).

Claims 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Silfa, Liu, and Appleyard as applied to claim 17 above, and further in view of Cammarota.
Regarding claim 18, Silfa, in view of Liu and Appleyard, discloses the invention of claim 17 as discussed above. Silfa does not expressly disclose wherein the time-delayed hidden vectors comprise an initial vector, the initial vector being a random vector but see Cammarota ¶ 53 (“In operation, the RNN/LSTM (e.g., RNN/LSTM 340) may be used to determine an inference with respect to a given input. In one example, the input may be a sequence of audio data and the RNN/LSTM may be trained for speech recognition. The audio data may be divided into portions or chunks and supplied to the RNN/LSTM 340 as x=[x.sub.1.sup.1 . . . x.sub.1.sup.T]. For instance, each portion may correspond to a word within the sequence of audio data. Cell[1,1] may receive input x.sub.1.sup.1 along with the initial memory state c.sub.0.sup.1 and initial hidden state h.sub.0.sup.1. In some aspects, the initial hidden state and the initial memory state may be initialized to a predefined value (e.g., 0), a random value or other initial value.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa and Appleyard to incorporate the teachings of Cammarota to initialize the hidden state of the RNN to random values, at least because doing so may enable improved training speed and generalization.

Regarding claim 19, Silfa, in view of Liu, Appleyard, and Cammarota, discloses the invention of claim 18 as discussed above. Although Silfa describes the computational unit of an LSTM/RNN cell as including an input buffer to hold recurrent time-delayed state vectors (ht-1) that supplies inputs to the Dot Product Unit and the Multifunctional Computation Unit that performs various activation functions, Silfa does not expressly disclose a second selection device coupled to the activation buffer and the activation function device, the second selection device configured to select between the initial vector and a separate time-delayed hidden vector for reception at the activation buffer (but see Liu ¶ 124 (“The third multiplexer 206 selects h0 only at the first selection, and h0 represents the hidden state data at time t=1, that is, the hidden state data h0 for a first time step is from off-chip storage, and the hidden state data for other time steps is from the state updating circuit 60.”); Liu ¶ 117 (“a third multiplexer 206, configured to obtain h0 from the off-chip storage, receive ht sent by the state updating circuit 60, and select h0 only at the first selection, wherein h0 represents hidden state data at time t=1.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silfa to incorporate the teachings of Liu to include a multiplexer coupled to the input buffer and the Multifunctional Unit to select between ho only at the first selection, at least because doing so would enable the initialization of the LSTM/RNN cell at time t=0.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Holmes, Connor, et al. "Grnn: Low-latency and scalable rnn inference on gpus." Proceedings of the Fourteenth EuroSys Conference 2019. 2019.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571)272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHAHID K KHAN/Primary Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Apr 18, 2023
Application Filed
Mar 12, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/492,294
Patent 12626095
MACHINE LEARNING OUTPUTS WITH HIGH CONFIDENCE EXPLANATIONS
4y 7m to grant Granted May 12, 2026
17/411,410
Patent 12619683
ARTIFICIAL INTELLIGENCE (AI) BASED DATA MATCHING AND ALIGNMENT
4y 8m to grant Granted May 05, 2026
17/807,290
Patent 12591768
DEEP LEARNING ACCELERATION WITH MIXED PRECISION
3y 9m to grant Granted Mar 31, 2026
18/675,206
Patent 12579516
System and Method for Organizing and Designing Comment
1y 9m to grant Granted Mar 17, 2026
18/525,525
Patent 12566813
SYSTEMS AND METHODS FOR RENDERING INTERACTIVE WEB PAGES
2y 3m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
90%
With Interview (+15.4%)
2y 10m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 394 resolved cases by this examiner. Grant probability derived from career allowance rate.