Last updated: May 29, 2026
Application No. 17/870,038
NEURAL PROCESSING UNIT FOR ATTENTION-BASED INFERENCE

Non-Final OA §101§103
Filed
Jul 21, 2022
Examiner
GODO, MORIAM MOSUNMOLA
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Arm Limited
OA Round
3 (Non-Final)
Interview Optional

— +33.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 44% grant rate with +33.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 69 resolved cases, 2023–2026
Examiner Intelligence

GODO, MORIAM MOSUNMOLA View full profile →
Grants 44% of resolved cases
Career Allowance Rate
30 granted / 69 resolved
-11.5% vs TC avg
Strong +34% interview lift
Without
With
+33.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
27 currently pending
Career history
118
Total Applications
across all art units
Statute-Specific Performance

§101
1.4%
-38.6% vs TC avg
§103
91.8%
+51.8% vs TC avg
§102
0.6%
-39.4% vs TC avg
§112
5.4%
-34.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 69 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
The claim amendments of 11/25/2025 has overcome the 35 U.S.C 101 directed to non-statutory subject matter for claim 25. As a result, the 35 U.S.C 101 directed to non-statutory subject matter for claim 25 has been withdrawn.
The Applicant’s argument on page 8, “the neural processing unit comprising a 
multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix." Figure 7 in the instant application shows an NPU 106 and a multiplication accumulation (MAC) engine 116 and paragraph [0009] explains: "Figure 7 shows a hardware schematic, and instruction and data flow, between hardware elements." Both of the NPU and the MAC engine are structure to perform the recited functions. See also the NPU 206 and MAC engine 216 in Figure 9 and paragraph [0011]. Thus, the claimed NPU and MAC engine are structural hardware elements” is persuasive. As a result, the 112(f) has been withdrawn.
	The claim amendments of 11/25/2025 has overcome the 112(b) rejection. As a result the 112(b) rejection has been withdrawn.
	On page 8-9 of the remarks, the Applicant argued that “Claim 1 describes a specific neural processing unit, which is hardware and not a mathematical concept, that further includes specialized hardware-a multiplication accumulation engine. The claimed NPU including a MAC engine is specialized hardware configured to solve technical problems in how neural processing units process dynamically generated information. The specification, in paragraphs [0016-18], explains specific technical problems with some neural processing units (NPUs), such as those in the Ethos-U family of NPUs from ARM™. These NPUs are unable to multiply together two matrices which are both dynamically generated at runtime (for example, input feature maps and activations in the case of convolutional neural networks). Instead, such NPUs require, by virtue of their architecture, at least one of the matrices to have been pre-calculated”.
	On page 9 of the remarks, the Applicant argued that “During machine learning inference, when such an NPU multiply two dynamically generated matrices, the NPU must offload this multiplication step to another processing element capable of performing the multiplication, such as a central processing unit (CPU). As NPUs are designed to perform matrix multiplication ("matmul") calculations more quickly than CPUs, this offloading results in a relative_slowdown of the inference process. This slowdown in the inference process is a real-world technical problem”.
	On page 9 of the remarks, the Applicant argued that “Paragraph [0030] of the specification points out that as some NPUs have to offload parts of the process of calculating the attention mechanism to a CPU capable of, but not specialized in, performing matmul of two input-dependent, dynamically generated matrices. Because CPUs are slower than NPUs at performing matmul calculations, and because this offloading requires multiple writes and reads to memory so that the NPU and a CPU for example can communicate the necessary data to one another, the offloading process is not optimal. Ways of increasing inference speed and efficiency, while using less memory bandwidth and electrical power, are desirable and needed”.
	The arguments above are not persuasive because the “specialized hardware- multiplication accumulation engine” in the neural processing unit, the dynamically generated matrices that solves the slowdown of the inference process as argued by the Applicant are not implemented into practical application to realize the improvement claimed in the technological field.
	On page 9-10 of the remarks, the Applicant argued that “The claimed technology solves these technical problems and improves on the above NPUs by increasing inference speed and efficiency and by using less memory bandwidth and electrical power. Claim 1 defines a new type of neural processing unit (NPU) for calculating an attention mechanism, with the NPU specifically comprising a multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix. Because the claimed NPU contains a multiplication accumulation engine that computes at least one score matrix on-chip, it can reduce CPU offloading and can provide improved inference speed, memory-bandwidth use, and power consumption when those on-chip computations replace work that would otherwise run on a CPU. Integrating a hardware multiplication accumulation engine is integrated into the hardware of the NPU to internally handle calculations, particularly for dynamically generated matrices, is not abstract and significantly more than mathematical concepts”.
	On page 11 of the remarks, the Applicant argued that “Similarly, the technology in claim 1 also integrates matrix multiplications into a specialized hardware element-a hardware NPU that includes a hardware MAC engine which allows the claimed NPU to internally handle multiplication of dynamically generated matrices in a much more efficient manner”.
	The above argument is not persuasive because the increasing inference speed and efficiency using less memory bandwidth and electrical power argued by the Applicant are not implemented in any application in the technological field in the claimed invention. The claimed invention needs to be applied to be able to achieve the improvement above.
	On page 11 of the remarks, the Applicant argued that “Applying that reasoning under Step 2A - Prong 2 to claim 1 of this application, the new NPU with a MAC integrates the math performed into a practical application where inference processes can be performed much faster and more efficiently … Similar arguments apply to all of the independent claims. Thus, the independent claims integrate any "mathematical concepts" idea into a practical application”.
	The above argument is not persuasive because the neural processing unit (NPU) comprising a multiplication accumulation engine (MAC) which are additional elements perform the abstract ideas of an inference process i.e., calculation of an attention matrix. The additional elements do not integrate the abstract ideas into application because the neural processing unit (NPU) including a multiplication accumulation engine (MAC) that perform the abstract ideas of attention matrix calculation are not applied to applications in the technological field. 
	On page 11-12 of the remarks, the Applicant argued that “Claim 1 defines a neural processing unit (NPU) for calculating an attention mechanism, with the NPU specifically comprising a multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix. Because the claimed NPU contains a multiplication accumulation engine that computes at least one score matrix on- chip, it can reduce some CPU offloading and can provide improved inference speed, memory- bandwidth use, and power consumption when those on-chip computations replace work that would otherwise run on a CPU. This specific solution, where a multiplication accumulation engine is integrated into the NPU to internally handle calculations, particularly for dynamically generated matrices, is not taught or suggested by the cited prior art:”
	The above argument is not persuasive because Tu in view of Delp and Yang teaches the limitations of claim 1 as detailed in the office action.
	On page 13 of the remarks, the Applicant argued that “Delp also fails to disclose at least a neural processing unit comprising a multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix. Although Delp discloses calculating and combining similarity matrices, they are based on visual and nutritional features for food classification, not on query, key, and learned key differences as defined for the "first score matrix" and "second score matrix" in the present claims. Thus, Delp also fails to disclose calculating a similarity matrix based on a combination of these specific types of score matrices recited in claim 1”.
	The above argument is not persuasive because Tu already teaches calculate a first score matrix based on differences between a query matrix and a key matrix (Step 302: Calculate a logical similarity degree between a query vector sequence and a key vector sequence in a current subspace [0059]; The query vector sequence Q, the key vector sequence K, and the value vector sequence V are matrices of I×d [0041]; calculating the logical similarity degree between the query vector sequence and the key vector sequence in the current subspace by using a Euclidean distance [0060]) and calculate a second score matrix based on differences between the key matrix and a learned key matrix (Step 206: Calculate a space difference degree between the subspaces by using the neural network model [0046]; The space difference degree is used for measuring a difference between the subspaces [0047]; … n subspaces including a key vector sequence (abstract); the key vector sequence K … is matrix of I×d [0041]; The learnable parameter matrices Wi K of the ith subspace are matrices of d×d [0084]. The Examiner notes that space difference degree between the subspaces include a key vector sequence (key matrix) and learnable parameter matrices Wi K (learned key matrix) and the instant specification discloses key projection matrix Wk comprise elements which are learned [0034]). The above teachings indicates that Tu already teaches a first and second score based on key matrix, query matrix and difference between key matrix and a learned key matrix. Delp was used to teach calculate a similarity matrix based on a combination of the first score matrix and second score matrix. Fig. 13 of Delp teaches similarity between two scores which broadly reads on the limitations “calculate a similarity matrix based on a combination of the first score matrix and second score matrix” of claim 1. 
	
	On page 13 of the remarks, the Applicant argued that “Tu and Delp do not teach comprising a multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix. The OA then relies on a third reference to Yang. Yang teaches a 4-bit multiplication accumulation engine shown in Fig. 3 that calculates a partial result stored in a register P of a weight value W and a feature value F each stored in a respective register. Each processing element (PE) "can perform 8 bit-8 bit MAC (multiply- accumulate) mode by taking four cycles in a timing-multiplexing manner." Page 511. Yang's MAC engine is disclosed in a distinctly different context: dynamic token-based quantization for mixed-precision tokens (see the Abstract). Yang's MAC engine is configured for calculations related to its quantization scheme-not for preventing offloading multiplications of dynamically generated matrices to other processing units as in claim 1. The technical problem solved by Yang is therefore quite different from the specific problem addressed by claim 1”.
	The above argument is not persuasive because Yang teaches a multiplication accumulation engine (Fig. 3, pg. 511) configured to calculate at least one of the first score matrix and the second score matrix (A register P stores the partial result. Each PE can perform 8 bit-8 bit MAC (multiply-accumulate) mode by taking four cycles in a timing-multiplexing manner, (pg. 511, left col., last para.); the two matrices Q and KT are fed into the systolic PE array, pg. 514, right col., second para. The Examiner notes that PE performs matrix calculation). These teaching of Yang using the broadest reasonable interpretation reads on “a multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix”. Moreover, claim 1 do not recite any limitations relating to using the claimed MAC to “preventing offloading multiplications of dynamically generated matrices to other processing units” as argued by the Applicant. It appears the Applicant is arguing what is not claimed.
	On page 13 of the remarks, the Applicant argued that “Therefore, the Examiner's generic motivation to combine for "efficiency" fails to provide a technical reason for a person of ordinary skill in the art to combine these references to arrive at claim 1's specific solution. For at least the reasons set forth above, claim 1 is non-obvious and patentable. Claims 24 and 25 are allowable for the same reasons, and the dependent claims are allowable by virtue of their dependencies”.
The above argument is not persuasive because claim 1 is obvious over Tu in view of Delp and further in view of Yang. Furthermore, a person having ordinary skill in the art would have modified the primary reference Tu with the teachings of the secondary because the benefit of efficiency provided by the secondary reference would have improved and provided optimal performance to Tu’s neural network training. As a result, “efficiency” is a technical reason that conventional known in the art of improving neural network training.
	In addition, claims 24 and 25 is similar to claim 1 and the same rationale applies. The dependent claims which depend directly or indirectly from claim 1, 24 and 25 are not allowable because the Applicant’s argument are not persuasive for similar reasons argued above regarding claim 1.

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 
Claims 1-25 are rejected under 35 U.S.C 101 because the claimed invention is directed towards an abstract idea without significantly more.

Step 1
	Independent claim 1 is directed to a device, and falls into one of the four statutory categories.
	Step 2A, Prong 1
	Claim 1 recites the following abstract ideas:
calculate a first score matrix based on differences between a query matrix and a key matrix (Mathematical concepts directed to the calculation of first score matrix using the difference between query matrix and key matrix); 
calculate a second score matrix based on differences between the key matrix and a learned key matrix (Mathematical concepts directed to the calculation of second score matrix using the difference between query matrix and key matrix); 
calculate a similarity matrix based on a combination of the first score matrix and second score matrix (Mathematical concepts directed to the calculation of similarity matrix using the combination of first score matrix and second score matrix) and 
calculate an attention matrix comprising applying a normalisation function to the similarity matrix (Mathematical concepts directed to the calculation of attention matrix by applying normalisation function).
to calculate at least one of the first score matrix and the second score matrix (Mathematical concepts directed to the calculation of the first and second score matrix).

Step 2A, Prong 2
	Claim 1 recites the following additional elements:
neural processing unit for calculating an attention mechanism comprising an attention matrix during machine learning inference, the neural processing unit configured to (this limitation is directed to merely using a computer (neural processing unit) as a tool to perform an abstract idea. This does not integrate the abstract idea into a practical application. See MPEP 2016.05(f)): 
comprising a multiplication accumulation engine configured (this limitation is directed to using a computer (accumulation engine) as a tool to perform the abstract idea. This does not integrate the abstract idea into practical application. See MPEP 2106.05(f))

Step 2B
	Claim 1 recites the following additional elements:
neural processing unit for calculating an attention mechanism comprising an attention matrix during machine learning inference, the neural processing unit configured to (this limitation is directed to merely using a computer (neural processing unit) as a tool to perform an abstract idea. This does not amount to significantly more. See MPEP 2016.05(f)):
comprising a multiplication accumulation engine configured (this limitation is directed to using a computer (accumulation engine) as a tool to perform the abstract idea. This does not it does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

4.	Dependent claim 2 is directed to a device, and falls into one of the four statutory categories. 
	Claim 2 recites the following abstract ideas:
	to calculate at least one input to a layer of a neural network by multiplying together at least one element of the attention matrix and at least one element of a learned value matrix (Mathematical concepts directed to the calculation of input layer using the multiplication).
	Claim 2 do not recite any additional elements.

5.	Dependent claim 3 is directed to a device, and falls into one of the four statutory categories. 
	Claim 3 do not recite any abstract ideas.
Claim 3 recite the following additional elements:
wherein the learned value matrix is identical to the learned key matrix (this limitation is directed to a particular type or source of data, which is field of use and it does not integrate the abstract ideas into practical application. See MPEP 2106.05(h)).

Claim 3 recite the following additional elements:
wherein the learned value matrix is identical to the learned key matrix (this limitation is directed to a particular type or source of data, which is field of use and it does not amount to significantly more than judicial exception. See MPEP 2106.05(h)).

6.	Dependent claim 4 is directed to a device, and falls into one of the four statutory categories. 
Claim 4 recites the following abstract ideas:
wherein the combination comprises calculating weighted values (Mathematical concepts directed to the combination of first score matrix and second score matrix that also include calculating weighted values).
Claim 4 do not recite any additional elements.

7.	Dependent claim 5 is directed to a device, and falls into one of the four statutory categories.
Claim 5 recites the following abstract ideas:
 wherein the combination comprises maximization (Mathematical concepts directed to the combination of first score matrix and second score matrix that also include maximization).
Claim 5 do not recite any additional elements.

8.	Dependent claim 6 is directed to a device, and falls into one of the four statutory categories.
Claim 6 recites the following abstract ideas:
multiply together an input matrix and a query projection matrix to obtain the query matrix (Mathematical concepts directed to multiplying an input matrix with query projection matrix).
Claim 6 do not recite any additional elements.

9.	Dependent claim 7 is directed to a device, and falls into one of the four statutory categories.
Claim 7 recites the following abstract ideas:
to multiply together an input matrix and a key projection matrix to obtain the key matrix (Mathematical concepts directed to multiplication of input matrix and key projection matrix).
Claim 7 do not recite any additional elements.

10.	Dependent claim 8 is directed to a device, and falls into one of the four statutory categories.
Claim 8 recites the following abstract ideas:
wherein calculating the first score matrix comprises calculating at least one sum of absolute values of differences between elements of the query matrix and elements of the key matrix (Mathematical concepts directed to the calculation of the absolute values of the differences between query matrix and key matrix).
Claim 8 do not recite any additional elements.

11.	Dependent claim 9 is directed to a device, and falls into one of the four statutory categories.
Claim 9 recites the following abstract ideas:
wherein calculating the second score matrix comprises calculating at least one sum of absolute values of differences between elements of the key matrix and elements of the learned key matrix (Mathematical concepts directed to the calculation of a second score matrix using the sum of the absolutes of the differences between key matrix and learned key matrix).
Claim 9 do not recite any additional elements.

12. 	Dependent claim 10 is directed to a device, and falls into one of the four statutory categories.
Claim 10 recites the following abstract ideas:
wherein calculating the similarity matrix comprises calculating maxima of a sum of the first score matrix and the second score matrix (Mathematical concepts directed to the calculation of similarity matrix by calculating the maxima of the sum of the first and second score matrix).
Claim 10 do not recite any additional elements.

13. 	Dependent claim 11 is directed to a device, and falls into one of the four statutory categories.
Claim 11 recites the following abstract ideas:
wherein calculating the similarity matrix comprises calculating maxima of a reciprocal of a sum of the first score matrix and the second score matrix (Mathematical concepts directed to the calculation of similarity matrix by calculating the maxima of the reciprocal of the sum of the first and second score matrix).
Claim 11 do not recite any additional elements.


14. 	Dependent claim 12 is directed to a device, and falls into one of the four statutory categories.
Claim 12 recites the following abstract ideas:
 	wherein the normalisation function comprises at least one of: a softmax function; a normalization by subtracting a mean and dividing by a standard deviation; a hyperbolic tangent function; and a sigmoid function (Mathematical concepts directed to the calculation of the normalisation function using softmax function, normalization, hyperbolic tangent function and sigmoid function).
Claim 12 do not recite any additional elements.

15. 	Dependent claim 13 is directed to a device, and falls into one of the four statutory categories.
Claim 13 recites the following abstract ideas:
 	further configured to apply a scaling function to the similarity matrix based on one or more dimensions of the similarity matrix (Mathematical concepts is directed to applying a scaling function to the similarity matrix).
Claim 13 do not recite any additional elements.

16.	 Dependent claim 14 is directed to a device, and falls into one of the four statutory categories.
Claim 14 do not recite any abstract ideas.
Claim 14 recites the following additional elements:
 wherein the neural processing unit comprises an Ethos-U processor (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This limitation does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)).
Claim 14 recites the following additional elements:
 wherein the neural processing unit comprises an Ethos-U processor (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)).

17. 	Dependent claim 15 is directed to a device, and falls into one of the four statutory categories.
Claim 15 do not recite any abstract ideas.
Claim 15 recites the following additional elements:
comprising a direct memory access element configured to fetch the learned key matrix and/or the learned value matrix from a memory external to the neural processing unit (This limitation is directed to insignificant extra-solution activity of mere data gathering. This limitation does not integrate the abstract idea into practical application. See MPEP 2106.05(g)).

Claim 15 recites the following additional elements:
comprising a direct memory access element configured to fetch the learned key matrix and/or the learned value matrix from a memory external to the neural processing unit (This limitation is directed to transmission of data and it is a well understood routine and conventional activity. This limitation does not amount to significantly more. See MPEP 2106.05(d)(II), example i).

18. 	Dependent claim 16 is directed to a device, and falls into one of the four statutory categories.
Claim 16 do not recite any abstract ideas.
Claim 16 recites the following additional elements:
wherein the direct memory access element is configured to prefetch the learned key matrix and/or learned value matrix from the memory to a buffer (This limitation is directed to insignificant extra-solution activity of mere data gathering. This limitation does not integrate the abstract idea into practical application. See MPEP 2106.05(g)).

Claim 16 recites the following additional elements:
wherein the direct memory access element is configured to prefetch the learned key matrix and/or learned value matrix from the memory to a buffer (This limitation is directed to transmission of data and it is a well understood routine and conventional activity. This limitation does not amount to significantly more. See MPEP 2106.05(d)(II), example i)

19.	 Dependent claim 17 is directed to a device, and falls into one of the four statutory categories.
Claim 17 do not recite any abstract ideas.
Claim 17 recites the following additional elements:
 wherein the buffer is a further memory external to the neural processing unit (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)).

Claim 17 recites the following additional elements:
 wherein the buffer is a further memory external to the neural processing unit (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)).

20.	 Dependent claim 18 is directed to a device, and falls into one of the four statutory categories.
Claim 18 do not recite any abstract ideas.
Claim 18 recites the following additional elements:
 wherein the buffer is a scratch buffer (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)).

Claim 18 recites the following additional elements:
 wherein the buffer is a scratch buffer (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)).

21. 	Dependent claim 19 is directed to a device, and falls into one of the four statutory categories.
Claim 19 do not recite any abstract ideas.
Claim 19 recites the following additional elements:
 wherein the neural processing unit comprises a shared buffer, and wherein the buffer is the shared buffer (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)).

Claim 19 recites the following additional elements:
 wherein the neural processing unit comprises a shared buffer, and wherein the buffer is the shared buffer (this limitation is directed to generally linking the use of a judicial exception to a particular technological environment. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)).

22. 	Dependent claim 20 is directed to a device, and falls into one of the four statutory categories.
	Claim 20 recites the following abstract ideas:
	to calculate at least one input to a layer of a neural network by multiplying together at least one element of the attention matrix and at least one element of the learned value matrix (Mathematical concepts directed to the calculation of one input layer by the multiplication of attention matrix and learned value matrix), and

Claim 20 recites the following additional elements:
 wherein the neural processing unit is further configured (this limitation is directed to using a computer (processing unit) as a tool to perform the abstract idea. This does not integrate the abstract idea into practical application. See MPEP 2106.05(f))
wherein the direct memory access element is configured to write the calculated at least one input to the memory external to the neural processing unit (this limitation is directed to insignificant extra-solution activity of the transfer of data. This does not integrate the abstract idea into practical application. See MPEP 2106.05(g)).

Claim 20 recites the following additional elements:
 wherein the neural processing unit is further configured (this limitation is directed to using a computer (neural processing unit) as a tool to perform the abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
wherein the direct memory access element is configured to write the calculated at least one input to the memory external to the neural processing unit (This limitation is directed insignificant extra-solution activity of the transfer of data and it is a well understood routine and conventional activity. This limitation does not amount to significantly more. See MPEP 2106.05(d)(II), example i).

23. 	Dependent claim 22 is directed to a device, and falls into one of the four statutory categories.
Claim 22 recites the following abstract ideas:
to calculate at least one input to a layer of a neural network by multiplying together at least one element of the attention matrix and at least one element of the learned value matrix (Mathematical concepts directed to the calculation of one input layer by the multiplication of attention matrix and learned value matrix).

Claim 22 recites the following additional elements:
wherein the multiplication accumulation engine is configured (this limitation is directed to using a computer (accumulation engine) as a tool to perform the abstract idea. This does not integrate the abstract idea into practical application. See MPEP 2106.05(f))

Claim 22 recites the following additional elements:
wherein the multiplication accumulation engine is configured (this limitation is directed to using a computer (accumulation engine) as a tool to perform the abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

24.	Dependent claim 23 is directed to a device, and falls into one of the four statutory categories.
	Claim 23 do not recite any abstract ideas.
	Claim 23 recites the following additional element:
an activation output element configured to, together with the multiplication accumulation engine, calculate the attention matrix (This limitation is directed to using a computer (activation output element) as a tool to implement the abstract idea. This does not integrate the abstract idea into practical application. See MPEP 2106.05(f)).

Claim 23 recites the following additional element:
an activation output element configured to, together with the multiplication accumulation engine, calculate the attention matrix (This limitation is directed to using a computer (activation output element) as a tool to implement the abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)).

25.	Independent claim 24 is directed to a device, and falls into one of the four statutory categories.
With regards to claim 24, it is substantially similar to claim 1, and is rejected in the same manner and reasoning applying.
Claim 24 further recites “an apparatus comprising at least one neural processing unit and at least one memory, the memory configured to pass, on demand, a learned key matrix to the neural processing unit” these limitations are directed to using a computer (the processing unit of the apparatus) as a tool to perform the implementation of the abstract idea. These limitations do not integrate the abstract idea into a practical application and do not amount to significantly more. See MPEP 2106.05(f). 

26.	Independent claim 25 is directed to a device, and falls into one of the four statutory categories.
With regards to claim 25, it is substantially similar to claim 1, and is rejected in the same manner and reasoning applying.
Claim 25 further recites “a computer program product comprising a non-transitory computer readable medium having computer readable program code stored thereon which, when executed by a neural processing unit” these limitations are directed to using a computer (the processing unit of the apparatus) as a tool to perform the implementation of the abstract idea. These limitations do not integrate the abstract idea into a practical application and do not amount to significantly more. See MPEP 2106.05(f).	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

27.	Claims 1, 2, 4, 5, 8, 9, 11-13, 15, 18, 19, 24 and 25 is rejected under 35 U.S.C. 103 as being unpatentable over Tu et al. (US20210027165) in view of Delp, III et al. (US20230222821 PCT filed 04/28/2021 hereinafter “Delp”) and further in view of Yang et al. ("DTATrans: Leveraging dynamic token-based quantization with accuracy compensation mechanism for efficient transformer architecture." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42.2 (2022): 509-520.)

Regarding claim 1, Tu teaches a neural processing unit for calculating an attention mechanism comprising an attention matrix (As shown in FIG. 16, the computer device includes a processor [0174]; a neural network training apparatus may be implemented in a form of a computer program, and the computer program may be run on the computer device shown in FIG. 16 [0176]; The neural network model includes a plurality of attention networks [0031]; FIG. 9 is a schematic flowchart of a step of calculating an attention matrix difference degree according to attention matrices corresponding to adjacent subspaces[0017]) during machine learning inference (so that a translated sentence can be determined according to the outputted target network representation sequence [0094]. The Examiner notes that translated sentence is an inference), 
the neural processing unit configured to: calculate a first score matrix based on differences between a query matrix and a key matrix (Step 302: Calculate a logical similarity degree between a query vector sequence and a key vector sequence in a current subspace [0059]; The query vector sequence Q, the key vector sequence K, and the value vector sequence V are matrices of I×d [0041]; calculating the logical similarity degree between the query vector sequence and the key vector sequence in the current subspace by using a Euclidean distance [0060]); 
calculate a second score matrix based on differences between the key matrix and a learned key matrix (Step 206: Calculate a space difference degree between the subspaces by using the neural network model [0046]; The space difference degree is used for measuring a difference between the subspaces [0047]; … n subspaces including a key vector sequence (abstract); the key vector sequence K … is matrix of I×d [0041]; The learnable parameter matrices Wi K of the ith subspace are matrices of d×d [0084]. The Examiner notes that space difference degree between the subspaces include a key vector sequence (key matrix) and learnable parameter matrices Wi K (learned key matrix) and the instant specification discloses key projection matrix Wk comprise elements which are learned [0034]);
and calculate an attention matrix comprising applying a normalisation function to the similarity matrix (Specifically, after logical similarity degrees corresponding to the subspaces are obtained, the logical similarity degrees corresponding to the subspaces are normalized, and the attention matrix corresponding to the current subspace is finally obtained [0066]).
Tu does not explicitly teach calculate a similarity matrix based on a combination of the first score matrix and second score matrix, the neural processing unit comprising a multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix.
Delp teaches calculate a similarity matrix based on a combination of the first score matrix and second score matrix (The similarity scores in the…similarity matrix 1332 can be combined with the similarity scores in the…similarity matrix 1334 [0117], Figure 13B); 
Since TU desires a neural network training method applied in an image annotation application scenario [00134] to improve accuracy of an output result [0005], and Delp teaches an output can include an annotated image [0046] using neural networks [0081] to improve the efficiency and accuracy [0045], then, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Tu to incorporate the teachings of Delp for the benefit of improving efficiency and accuracy [0045] using neural networks (Delp [0081])
Modified Tu does not explicitly teach a multiplication accumulation engine configured to calculate at least one of the first score matrix and the second score matrix.
Yang teaches the neural processing unit comprising a multiplication accumulation engine (Fig. 3, pg. 511) configured to calculate at least one of the first score matrix and the second score matrix (A register P stores the partial result. Each PE can perform 8 bit-8 bit MAC (multiply-accumulate) mode by taking four cycles in a timing-multiplexing manner, (pg. 511, left col., last para.); the two matrices Q and KT are fed into the systolic PE array, pg. 514, right col., second para. The Examiner notes that PE performs matrix calculation).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Yang for the benefit of designing the DTATrans model which has much less computation amount in attention operation, causing a 1.37× speedup averagely, low precision PEs (processing elements) and lower inference latency (Yang, pg. 519, right col., first para.)

Regarding claim 2, Modified Tu teaches the neural processing unit of claim 1, Tu teaches further configured to calculate at least one input to a layer of a neural network (Referring to FIG. 6, inputs are the same for each layer, and each of the inputs is an output of an upper layer. Subsequently, the input is divided into a plurality of sub-inputs, and the same transformation is performed on the sub-inputs by using respective network parameters of a plurality of subspaces (also referred to as heads), to obtain outputs of all the subspaces [0093]) by multiplying together at least one element of the attention matrix and at least one element of a learned value matrix (Alternatively, similarity degrees between the attention matrices of the adjacent subspaces may be measured by multiplying the attention matrices corresponding to the adjacent subspaces in the neural network model according to an element matrix [0099]).

	
Regarding claim 4, Modified Tu teaches the neural processing unit of claim 1, Tu teaches wherein the combination comprises calculating weighted values (performing weighted summation on the attention matrices corresponding to the adjacent subspaces to obtain the attention matrix difference degree [0099]).

Regarding claim 5, Modified Tu teaches the neural processing unit of claim 1, Tu teaches wherein the combination comprises maximization (The convergence condition may be that both the space difference degree and the output similarity degree are maximized [0054]).


Regarding claim 8, Modified Tu teaches the neural processing unit of claim 1, Tu teaches wherein calculating the first score matrix (Step 302: Calculate a logical similarity degree between a query vector sequence and a key vector sequence in a current subspace [0059]; The query vector sequence Q, the key vector sequence K, and the value vector sequence V are matrices of I×d [0041] comprises calculating at least one sum of absolute values of differences between elements of the query matrix and elements of the key matrix (The calculation manner may be customized by: … a Manhattan distance similarity degree calculation manner [0108]. The Examiner notes Manhattan Distance is the sum of absolute values differences between points).

Regarding claim 9,  Modified Tu teaches the neural processing unit of claim 1, Tu teaches wherein calculating the second score matrix comprises (Step 206: Calculate a space difference degree between the subspaces by using the neural network model [0046]; The space difference degree is used for measuring a difference between the subspaces [0047]; n subspaces including … a key vector sequence (abstract); the key vector sequence K … is matrix of I×d [0041]; The learnable parameter matrix Wi K of the ith subspace is matrix of d×d [0084]. The Examiner notes instant specification discloses key projection matrix W K comprise elements which are learned) calculating at least one sum of absolute values of differences between elements of the key matrix and elements of the learned key matrix (The calculation manner may be customized by: … a Manhattan distance similarity degree calculation manner [0108]. The Examiner notes Manhattan Distance is the sum of absolute values differences between points).

Regarding claim 11, Modified Tu teaches the neural processing unit of claim 1, Tu teaches wherein calculating the similarity matrix (Here, J(θ) is the target function, likelihood is the output similarity degree, … and arg max is an arguments of the maxima for obtaining a maximized value [0056]) comprises 
calculating maxima of a reciprocal of a sum of the first score matrix and the second score matrix (
    PNG
    media_image1.png
    86
    488
    media_image1.png
    Greyscale
[0056]).


Regarding claim 12, Modified Tu teaches the neural processing unit of claim 1, Tu teaches wherein the normalisation function comprises at least one of: a softmax function; a normalization by subtracting a mean and dividing by a standard deviation; a hyperbolic tangent function; and a sigmoid function (Subsequently, in the hth subspace, non-linear transformation is performed on the logical similarity degree Eh by using the softmax function to obtain an attention matrix Ah corresponding to the hth subspace: Ah=soft max(E h)  (15) [0090]).

Regarding claim 13, Modified Tu teaches the neural processing unit of claim 1, Tu teaches further configured to apply a scaling function to the similarity matrix (A calculation process of the logical similarity degree matrix E is described below through specific calculation: [0063]) based on one or more dimensions of the similarity matrix (Q=(q1, q2, . . . , qi, . . . , qI) and K=(k1, k2, . . . , ki, . . . , kI). qi and ki are d-dimensional column vectors, and are respectively a query vector and a key vector that correspond to the source vector representation zi. In the logical similarity degree matrix E=(e1, e2, . . . , ei, . . . , eI), the element ei is logical similarity degrees between the query vector qi corresponding to the source vector representation zi and key vectors k1, k2, . . . , ki, . . . , kI corresponding to all the elements in the training sample. ei is an element in an ith column of E, ei is an I-dimensional column vector, and a calculation formula is

    PNG
    media_image2.png
    52
    320
    media_image2.png
    Greyscale
[0064]. The Examiner notes (1/√d) is called attention scaling).

Regarding claim 15, Modified Tu teaches the neural processing unit of claim 1, Yang teaches comprising a direct memory access element configured to fetch the learned key matrix and/or the learned value matrix (DMA directly accesses/fetches data from DRAM as shown in Fig. 8, pg. 514. The Examiner notes that DMA can fetch data like learned key matrix) from a memory (DRAM, Fig. 8, pg. 514) external to the neural processing unit (Neural processing unit refers to all components in Fig. 8 except DRAM). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Yang for the benefit of designing the DTATrans model which has much less computation amount in attention operation, causing a 1.37× speedup averagely, low precision PEs (processing elements) and lower inference latency (Yang, pg. 519, right col., first para.)

Regarding claim 18, Modified Tu teaches the neural processing unit of claim 15, Yang teaches wherein the buffer is a scratch buffer (In each layer, the Q, K, and V vectors of valid token (4-bit tokens, 8-bit tokens, and a presentative token) are sent and stored in the corresponding line buffers. Then, the two matrices Q and KT are fed into the systolic PE array in a stepwise style as shown in Fig. 10(a) to meet the dataflow requirement in the systolic array, pg. 514, right col., second para. The Examiner notes the line buffer is a scratch buffer because it is used for temporary storage during operations).
The same motivation to combine dependent claim 15 applies here.

Regarding claim 19, Modified Tu teaches the neural processing unit of claim 15, Yang teaches wherein the neural processing unit comprises a shared buffer, and wherein the buffer is the shared buffer (line buffer, Fig. 9, pg. 514 is a shared buffer because multiple processes access the line buffer memory space).
The same motivation to combine dependent claim 15 applies here.

Regarding claim 24, claim 24 is similar to claim 1. It is rejected in the same manner and reasoning applying.
Further, Tu teaches an apparatus comprising at least one neural processing unit and at least one memory, (Fig. 16 is an apparatus that includes a processor and a memory) but does not explicitly teach the memory configured to pass, on demand, a learned key matrix to the neural processing unit,
Yang teaches the memory configured to pass, on demand, a learned key matrix to the neural processing unit (The output of the multihead attention and the input tokens will be added up and then executing the Layer_Norm operation in the first residual module ❸. After that, the output of ❸ buffered in R1 buffer will be sent to the FFN module ❹ and be reused in the second residual module ❺ to generate the transformer output (output tokens), pg. 514, left col., first para. para. The Examiner notes that the output of ❸ buffered in R1 buffer includes learned key matrix),
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Yang for the benefit of designing the DTATrans model which has much less computation amount in attention operation, causing a 1.37× speedup averagely, low precision PEs (processing elements) and lower inference latency (Yang, pg. 519, right col., first para.)

Regarding claim 25, claim 25 is similar to claim 1. It is rejected in the same manner and reasoning applying.
Further, Tu teaches a computer program product comprising a computer readable medium having computer readable program code stored thereon which, when executed by a neural processing unit for calculating an attention mechanism comprising an attention matrix during machine learning inference, causes the neural processing unit to (According to an embodiment, there is provided a non-transitory computer-readable storage medium storing computer program code to cause at least one processor to: [0008]; FIG. 9 is a schematic flowchart of a step of calculating an attention matrix difference degree according to attention matrices corresponding to adjacent subspaces[0017]):

28.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Tu et al. (US20210027165) in view of Delp, III et al. (US20230222821 PCT filed 04/28/2021 hereinafter “Delp”) in view of Yang et al. ("DTATrans: Leveraging dynamic token-based quantization with accuracy compensation mechanism for efficient transformer architecture." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42.2 (2022): 509-520) and further in view of Kim et al. (US20210183074)

Regarding claim 10, Modified Tu teaches the neural processing unit of claim 1, Tu teaches calculating maxima of a result (in a case that the model adjustment reference result is maximized, that the neural network model meets the convergence condition may specifically be performed according to the following formula: J=arg max{L+D}  (20) [0127]; Here, J represents the model adjustment reference result, arg max represents arguments of the maxima in which the model adjustment reference result is maximized [0128]) but does not explicitly teach calculating the similarity matrix comprises calculating maxima of a sum of the first score matrix and the second score matrix.
Kim teaches wherein calculating the similarity matrix comprises calculating maxima of a sum of the first score matrix and the second score matrix (… to maximize the sum of similarity values in the matrix).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Kim for the benefit of training an integrated similarity neural network to increase training efficiency (Kim [0122])

29.	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Tu et al. (US20210027165) in view of Delp, III et al. (US20230222821 PCT filed 04/28/2021 hereinafter “Delp”) in view of Yang et al. ("DTATrans: Leveraging dynamic token-based quantization with accuracy compensation mechanism for efficient transformer architecture." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42.2 (2022): 509-520.) and further in view of Tomkins et al. (US20210295822)

Regarding claim 3, Modified Tu teaches the neural processing unit of claim 2, Modified Tu does not explicitly teach wherein the learned value matrix is identical to the learned key matrix.
	Tomkins teaches wherein the learned value matrix is identical to the learned key matrix (Some embodiments may use a single index that includes one or more keys based on an n-gram …or a domain category value [0210]; wherein the first n-gram and the second n-gram are identical [0372]) 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Tomkins for the benefit of improving the accuracy of output recommendations (Tomkins [0126])

30.	Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Tu et al. (US20210027165) in view of Delp, III et al. (US20230222821 PCT filed 04/28/2021 hereinafter “Delp”) in view of Yang et al. ("DTATrans: Leveraging dynamic token-based quantization with accuracy compensation mechanism for efficient transformer architecture." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42.2 (2022): 509-520.) and further in view Huang et al. (US20250202679 PCT filed 03/30/2022)

Regarding claim 6, Modified Tu teaches the neural processing unit of claim 1, Modified Tu does not explicitly teach further configured to multiply together an input matrix and a query projection matrix to obtain the query matrix.
	Huang teaches further configured to multiply together an input matrix and a query projection matrix to obtain the query matrix (The query vector Q … may be computed by multiplying the homomorphically encrypted input embedding vector 24 by a query projection layer WQ… The projection layers WQ … may include matrix elements that are parameters of the transformer network 30 [0036]).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Huang for the benefit of protecting privacy of the user's data when performing inferencing at a transformer network (Huang [0023]).

Regarding claim 7, Modified Tu teaches the neural processing unit of claim 1, Modified Tu does not explicitly teach further configured to multiply together an input matrix and a key projection matrix to obtain the key matrix.
	Huang teaches further configured to multiply together an input matrix and a key projection matrix to obtain the key matrix (the key vector K … may be computed by multiplying the homomorphically encrypted input embedding vector 24 by … a key projection layer WK … The projection layers … WK … may include matrix elements that are parameters of the transformer network 30 [0036]).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Huang for the benefit of protecting privacy of the user's data when performing inferencing at a transformer network (Huang [0023]).

31.	Claims 14, 16, 20, 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Tu et al. (US20210027165) in view of Delp, III et al. (US20230222821 PCT filed 04/28/2021 hereinafter “Delp”) in view of Yang et al. ("DTATrans: Leveraging dynamic token-based quantization with accuracy compensation mechanism for efficient transformer architecture." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42.2 (2022): 509-520) and further in view of Sung et al. (US20240069500 filed 05/13/2022)

Regarding claim 14, Modified Tu teaches the neural processing unit of claim 1, Modified Tu does not explicitly teach wherein the neural processing unit comprises an Ethos-U processor.
Sung teaches wherein the neural processing unit comprises an Ethos-U processor (ARM's two new processors with AI capabilities-Arm Cortex-M55 and Ethos-U55, which are a neural processing unit (NPU)-specifically designed for Internet of Things (IoT) endpoint devices [0182]).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Sung for the benefit of providing up to 480 times the machine learning performance improvement (Sung [0182])

Regarding claim 16, Modified Tu teaches the neural processing unit of claim 14, 
Yang teaches wherein the direct memory access element (DMA, Fig. 8, pg. 514) is configured to prefetch the learned key matrix and/or learned value matrix from the memory (DRAM, Fig. 8, pg. 514) to a buffer (The DMA can then fetch the tokens to on-chip buffers with the address and data length (pg. 513, right col., last para.); Our quantization method dynamically tracks the token tolerance and adjusts the precision of token feature vectors, i.e., query (Q), key (K), and value (V) vectors in each block of attention-based NLP models, pg. 510, left col., second para.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Tu, Delp and Sung to incorporate the teachings of Yang for the benefit of designing the DTATrans model which has much less computation amount in attention operation, causing a 1.37× speedup averagely, low precision PEs (processing elements) and lower inference latency (Yang, pg. 519, right col., first para.)


Regarding claim 20, Modified Tu teaches the neural processing unit of claim 14, Yang teaches wherein the neural processing unit (Neural processing unit refers to all components in Fig. 8 except DRAM) is further configured to calculate at least one input to a layer of a neural network by multiplying together at least one element of the attention matrix and at least one element of the learned value matrix (In the attention layer, the attention probabilities (attention_prob) are produced by employing softmax on Q × KT. The attention output (attention_out) is obtained by multiplying attention_prob with V, pg. 510, right col., second to the last paragraph), and 
wherein the direct memory access element is configured to write the calculated at least one input to the memory external to the neural processing unit (Thus, we can sequentially store the block results of ordered input tokens on DRAM, pg. 516, left col., first para.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Yang for the benefit of designing the DTATrans model which has much less computation amount in attention operation, causing a 1.37× speedup averagely, low precision PEs (processing elements) and lower inference latency (Yang, pg. 519, right col., first para.)

Regarding claim 22, Modified Tu teaches the neural processing unit of claim 20, Yang teaches wherein the multiplication accumulation engine (Systolic-base design, Systolic-optimal design, and our design have 3168 (≈ 16 × 18 × 11) 4-bit MAC units, pg. 516, left col., last para.) is configured to calculate at least one input to a layer of a neural network (We can draw that reordering the input tokens results in a reordered Attention_out without impacting the output values. The following FFN will not change the order of Attention_out, pg. 515, right col., last para.) by multiplying together at least one element of the attention matrix and at least one element of the learned value matrix (Module ❶ is the systolic array for Attention_prob × V, which can produce the final results of the attention, pg. 514, right col., last para.).
The same motivation to combine dependent claim 20 applies here.

Regarding claim 23, Modified Tu teaches the neural processing unit of claim 20, Yang teaches comprising an activation output element configured to, together with the multiplication accumulation engine, calculate the attention matrix (Then, the two matrices Q and KT are fed into the systolic PE array in a stepwise style as shown in Fig. 10(a) to meet the dataflow requirement in the systolic array. A softmax unit then processes the result of Q×KT to get attention probabilities (pg. 514, right col., last para.); Systolic-base design, Systolic-optimal design, and our design have 3168 (≈ 16 × 18 × 11) 4-bit MAC units, pg. 516, left col., last para.).
The same motivation to combine dependent claim 20 applies here.

32.	Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Tu et al. (US20210027165) in view of Delp, III et al. (US20230222821 PCT filed 04/28/2021 hereinafter “Delp”) in view of Yang et al. ("DTATrans: Leveraging dynamic token-based quantization with accuracy compensation mechanism for efficient transformer architecture." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42.2 (2022): 509-520.) and further in view of Shin et al. (US20220156569 filed 11/08/2021)

Regarding claim 17, Modified Tu teaches the neural processing unit of claim 15, Modified Tu does not explicitly teach wherein the buffer is a further memory external to the neural processing unit.
	Shin teaches wherein the buffer (The activation buffer registers 207 may be configured as RAM, such as static random-access memory (SRAM) or dynamic random-access memory (DRAM) [0041], Fig. 2A) is a further memory external to the neural processing unit (an array 201 of processing elements (PEs) … Each PE may include K0 multipliers 204 (of which only one multiplier 204 is indicated), and an accumulator (adder tree) 205 connected as shown [0040], Fig. 2A).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Tu to incorporate the teachings of Shin for the benefit of an accelerator core for a neural network [0002] to perform checking for similarity between all the generated query and key vectors (Shin [0033]) 

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 8am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle T Bechtold can be reached on (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/M.G./Examiner, Art Unit 2148                                                                                                                                                                                                       
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Show 2 earlier events
Nov 25, 2025
Response Filed
Jan 21, 2026
Final Rejection mailed — §101, §103
Mar 02, 2026
Interview Requested
Mar 17, 2026
Applicant Interview (Telephonic)
Mar 18, 2026
Examiner Interview Summary
Apr 21, 2026
Request for Continued Examination
Apr 25, 2026
Response after Non-Final Action
May 27, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/927,018
Patent 12639556
Object-Centric Learning with Slot Attention
5y 10m to grant Granted May 26, 2026
18/583,459
Patent 12608609
MACHINE LEARNING BASED FILE RANKING METHODS AND SYSTEMS
2y 2m to grant Granted Apr 21, 2026
18/919,417
Patent 12602586
SUPERVISORY NEURON FOR CONTINUOUSLY ADAPTIVE NEURAL NETWORK
1y 5m to grant Granted Apr 14, 2026
17/096,425
Patent 12530583
VOLUME PRESERVING ARTIFICIAL NEURAL NETWORK AND SYSTEM AND METHOD FOR BUILDING A VOLUME PRESERVING TRAINABLE ARTIFICIAL NEURAL NETWORK
5y 2m to grant Granted Jan 20, 2026
16/249,279
Patent 12511528
NEURAL NETWORK METHOD AND APPARATUS
6y 11m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
44%
Grant Probability
77%
With Interview (+33.7%)
4y 7m (~8m remaining)
Median Time to Grant
High
PTA Risk
Based on 69 resolved cases by this examiner. Grant probability derived from career allowance rate.