Last updated: April 19, 2026
Application No. 17/500,598
VECTOR ACTIVATION FUNCTION TO SUPPORT MACHINE LEARNING INFERENCE AND OTHER PROCESSES

Final Rejection §101§103§112
Filed
Oct 13, 2021
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Mellanox Technologies Ltd.
OA Round
4 (Final)
Interview Optional

— +38.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

BOSTWICK, SIDNEY VINCENT View full profile →
Grants 52% of resolved cases
Career Allow Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
68 currently pending
Career history
204
Total Applications
across all art units
Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.9%
-18.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on January 28, 2026, in which claims 1, 3, 11, and 18 are currently amended. Claims 1-20 are currently pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on January 27, 2026 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
The rejections to claims 11-17 under 35 U.S.C. § 101 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.

Applicant’s arguments with respect to rejection of claims 1-10 and 18-20 under 35 U.S.C. 101 based on amendment have been considered, however, are not persuasive.
With respect to Applicant’s arguments on p. 7 of the Remarks submitted 1/28/2026 that using a processor comprised lookup table to match a plurality of activation functions to activation circuits could not be considered generic, Examiner respectfully disagrees.  One of ordinary skill in the art could readily look up information (observation) with or without the assistance of tools such as pen and paper.  The mere addition of a generic processor is seen as instructions to apply the judicial exception using generic computer components and not seen as integrating the judicial exception into a practical application.  
With respect to Applicant’s arguments on p. 7 of the Remarks submitted 1/28/2026 that the claims present a technical improvement, Examiner respectfully disagrees.  The claims as a whole are directed towards matching a process with a device suited to perform the process.  Just as it would not be considered a technical improvement to determine whether or not particular software was compatible with a particular computer, similarly, the claimed process of determining whether or not an activation function can be performed by a generic computer is not seen as providing a technical improvement.  In fact, Examiner notes that the instant claims appear to be directed explicitly towards using unmodified generic computer components based on the determination that the mental process is performed more efficiently on the generic computer component such that the generic computer component cannot be improved, but rather the mental process is explicitly improved by applying the generic computer component (See also MPEP 2106.07(a)(II) "employing well-known computer functions to execute an abstract idea, even when limiting the use of the idea to one particular environment, does not integrate the exception into a practical application").  Neither the instant claims nor Applicant’s arguments appear to provide an objective improvement to a generic computer system (See MPEP 2106.05(a) "An important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome.”). 
With respect to Applicant’s arguments on p. 8 of the Remarks submitted 1/28/2026 that the instant is directly analogous to Ex Parte Desjardins, Examiner respectfully disagrees.  The training process in Ex Parte Desjardins is a critical factor absent from the instant claims.
For at least these reasons and those further detailed below Examiner asserts that it is reasonable and appropriate to maintain the rejections under 35 U.S.C. 101 in view of claims 1-10 and 18-20.

Applicant’s arguments with respect to rejection of claims 1-17 under 35 U.S.C. 103 based on amendment have been considered. The argument is moot in view of a new ground of rejection set forth below, necessitated by Applicant’s amendments.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claims 1 and 11, "without performing a memory lookup in separate memory" is indefinite.  Claim 1 does not introduce a memory such that it would be clear to one of ordinary skill in the art what the "separate memory" is separate from.  In fact, Examiner asserts that it is unclear whether or not the "separate memory" is simply a compound pronoun identifying any memory contrary to it being interpreted as literally a memory separate from another memory.  Examiner asserts that even if taken at face value that the "separate memory" is separate from another memory, this would leave the claim so broad that it was non-limiting since what the "separate memory" is separate from is undefined such that reading from any set of memory could reasonably be construed as without performing a memory lookup in separate memory.  Since the scope of the claim cannot be reasonably determined, the claim is seen as being indefinite.  In the interest of further examination the claim is interpreted very broadly such that the claim is read as "without performing a memory lookup in off chip memory".

Claims 2-10 and 12-17 are rejected with respect to their dependence on rejected claims 1 and 11.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-10 and 18-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a processor, which is directed to a product, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass activation function processing, including the following: 
a lookup table that matches a plurality of activation functions to a plurality of activation function circuits such that each activation function is matched with an activation function circuit that is capable of performing that activation function (observation, evaluation, and judgement)
identify, from the lookup table and based on the processor instruction, a first activation function circuit from among the plurality of activation function circuits for performing the first activation function (observation, evaluation, and judgement),
access the lookup table to identify, based on the processor instruction, a first circuit from among the plurality of circuits for performing the first activation function (observation, evaluation, and judgement)
perform, using the input vector, the first activation function (observation, evaluation, and judgement)
generate an output vector based on the performed first activation function (observation, evaluation, and judgement)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “processor”, “activation function circuits”, “approximation circuitry”, and “with the identified first circuit”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application (See MPEP 2106.05(f)).  Claim 1 also recites additional elements “without performing a memory lookup in separate memory: receive a processor instruction comprising an input vector for a first activation function of the plurality of activation functions” which amounts to gathering data which is insignificant extra-solution activity (See MPEP 2106.05(g)).  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component and insignificant extra-solution activity.  The gathering of data is considered well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(i)).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claim 18 which recites a device, as well as to dependent claims 2-10 and 19-20. 
The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites additional instructions to apply the judicial exception using generic computer components (See MPEP 2106.05(f)) “a lookup table that is referenced during the hardware approximation of the activation function and wherein the lookup table references associated hardware to use during the hardware approximation of the activation function” 
Dependent claims 3 and 19 recite additional observation, evaluation, and judgement “the output vector is used for machine learning inference”
Dependent claim 4 recites additional instructions to apply the judicial exception using a generic computer component “the lookup table is used for data aggregation during the hardware approximation of the activation function”.
Dependent claim 5 recites additional observation, evaluation, and judgement “the input vector is part of a matrix”.
Dependent claim 6 recites additional elements “the activation function is made available on a vector instruction list” which amounts to generally linking the judicial exception to a particular field or technology
Dependent claim 7 recites mathematical calculations and relationships “the activation function comprises a sigmoid function”
Dependent claim 8 recites mathematical calculations and relationships “the activation function comprises a tan h(x) function”
Dependent claims 9 recites mathematical calculations and relationships “the activation function comprises a non-linear function”
Dependent claim 10 recites additional elements “the input vector comprises N elements and wherein each of the N elements are processed during a single clock cycle” which amounts to mere instructions to apply the judicial exception using generic computer components (a processor)
Dependent claim 20 recites additional mathematical calculations and relationships “the activation function comprises at least one of a sigmoid function and a tanh(x) function.”

Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claims 1-10 and 18-20 are rejected under 35 U.S.C. § 101. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

	Claims 1, 2, 3, 5,  and 9 are rejected under U.S.C. §103 as being unpatentable over the combination of Zhang ("nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices", 2021) and Henry (US20170103300A1).


    PNG
    media_image1.png
    596
    756
    media_image1.png
    Greyscale

Table 6 of Zhang


	 Regarding claim 1, Zhang teaches A processor, comprising: a lookup table that matches a plurality of activation functions to a plurality of activation function circuits such that each activation function is matched with an activation function circuit that is capable of performing that activation function([p. 82 §1] "We propose and design nn-Meter, a novel and efficient system to accurately predict inference latency of CNN models on diverse edge devices" See Table. 6 which is interpreted as a lookup table that matches a plurality of activation functions with circuits capable of performing the respective activation functions (hswish, relu, etc.).)
	perform, using the input vector, the first activation function with the identified first activation function circuit and generate an output vector based on the performed first activation function([p. 88 §6] "Given a model/kernel configuration, we generate the graph in both the Tensorflow protobuf and tflite format, which are generally supported by edge inference frameworks. We send the target model to the measurement platform and collect the returned inference latency. To measure the latency on the CPU, we set CPU frequency to the highest 2.42GHz. The latency on the CPU is measured by the TFLite benchmark tool. Since TFLite currently doesn’t support operator-level profiling for GPU, we implement an operator profiler in TFLite for GPU backend. For VPU latency measurement, we convert the protobuf format into OpenVINO IR, and measure the latency by the OpenVINOTM toolkit. The latency number is the average of 50 inference runs" [p. 87 5.2] "Driven by the two goals, we first prune the rarely-considered configurations by constraining the sampling distribution. This leverages the observation that many configurations are unlikely selected in state-of-the-art CNN models. And, the considered configurations are non-uniformly distributed in the sample space" Inference interpreted as synonymous with generating an output vector based on the input vector and based on the performed first activation function (see Table 6 which goes as far as showing fused operators including the convolution input vector and fused relu activation).  Examiner notes that due to the broad claim language the output vector could alternatively be interpreted as one of the plurality of hyperparameter configuration vectors in the search space from the pruned hyperparameter search space.).
	However, Zhang does not explicitly teach approximation circuitry to: without performing a memory lookup in separate memory: receive a processor instruction comprising an input vector for a first activation function of the plurality of activation functions
	identify, from the lookup table and based on the processor instruction, a first activation function circuit from among the plurality of activation function circuits for performing the first activation function.

	Henry, in the same field of endeavor, teaches approximation circuitry to: without performing a memory lookup in separate memory: receive a processor instruction comprising an input vector for a first activation function of the plurality of activation functions([¶0056] "The sequencer 128 fetches instructions from the program memory 129 and executes them" [¶0077] " The instructions flow down the pipeline and control the various functional units […] the initialize NPU instruction specifies the activation function to be performed on the accumulator 202 value 217, and a value indicating the specified activation function is saved in a configuration register for later use by the AFU 212 portion of the pipeline once the final accumulator 202 value 217 has been generated" [¶0085] "As shown, on clock 1, the 512 16-bit data words of row 17 are read out of the data RAM 122 and provided to the 512 NPUs 126" Henry clearly discloses a processor whose NNU/sequencer fetches and executes instructions and that the activation function data is read from RAM.)
	identify, from the lookup table and based on the processor instruction, a first activation function circuit from among the plurality of activation function circuits for performing the first activation function([¶0256] "The mux 3032 selects the appropriate input specified by the activation function 2934 value and provides the selection to the sign restorer 3034, which converts the positive form output of the mux 3032 to a negative form if the original accumulator 202 value 217was a negative value, e.g., to two's-complement form." [¶0077] " the initialize NPU instruction specifies the activation function to be performed on the accumulator 202 value 217").

	Zhang as well as Henry are directed towards neural network hardware acceleration.  Therefore, Zhang as well as Henry are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Zhang with the teachings of Henry by using a table-driven architectural mapping approach as taught by Zhang to provide an explicit lookup table that matches activation functions to corresponding activation circuits within Henry’s neural network accelerator.  In other words it would have been obvious to use the processor system in Henry as one of the hardware targets in Zhang.  Henry provides as additional motivation for combination ([¶0130] “the media registers 118 can concurrently write to or read from the buffer 1704 while the NPUs 126 are also reading from or writing to the weight RAM 124 (although preferably the NPUs 126 stall, if they are currently executing, to avoid accessing the weight RAM 124 while the buffer 1704 is accessing the weight RAM 124). This may advantageously provide improved performance”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 2, the combination of Zhang, and Henry teaches The processor of claim 1, wherein the output vector is used in a machine learning operation (Zhang [p. 88 §6] "Given a model/kernel configuration, we generate the graph in both the Tensorflow protobuf and tflite format, which are generally supported by edge inference frameworks. We send the target model to the measurement platform and collect the returned inference latency. To measure the latency on the CPU, we set CPU frequency to the highest 2.42GHz. The latency on the CPU is measured by the TFLite benchmark tool. Since TFLite currently doesn’t support operator-level profiling for GPU, we implement an operator profiler in TFLite for GPU backend. For VPU latency measurement, we convert the protobuf format into OpenVINO IR, and measure the latency by the OpenVINOTM toolkit. The latency number is the average of 50 inference runs" [p. 87 5.2] "Driven by the two goals, we first prune the rarely-considered configurations by constraining the sampling distribution. This leverages the observation that many configurations are unlikely selected in state-of-the-art CNN models. And, the considered configurations are non-uniformly distributed in the sample space" Inference interpreted as synonymous with generating an output vector based on the input vector and based on the performed first activation function (see Table 6 which goes as far as showing fused operators including the convolution input vector and fused relu activation).  Examiner notes that due to the broad claim language the output vector could alternatively be interpreted as one of the plurality of hyperparameter configuration vectors in the search space from the pruned hyperparameter search space.).
	
	 Regarding claim 3, the combination of Zhang, and Henry teaches The processor of claim 1, wherein the output vector is used for machine learning inference (Zhang [p. 88 §6] "Given a model/kernel configuration, we generate the graph in both the Tensorflow protobuf and tflite format, which are generally supported by edge inference frameworks. We send the target model to the measurement platform and collect the returned inference latency. To measure the latency on the CPU, we set CPU frequency to the highest 2.42GHz. The latency on the CPU is measured by the TFLite benchmark tool. Since TFLite currently doesn’t support operator-level profiling for GPU, we implement an operator profiler in TFLite for GPU backend. For VPU latency measurement, we convert the protobuf format into OpenVINO IR, and measure the latency by the OpenVINOTM toolkit. The latency number is the average of 50 inference runs" [p. 87 5.2] "Driven by the two goals, we first prune the rarely-considered configurations by constraining the sampling distribution. This leverages the observation that many configurations are unlikely selected in state-of-the-art CNN models. And, the considered configurations are non-uniformly distributed in the sample space" Inference interpreted as synonymous with generating an output vector based on the input vector and based on the performed first activation function (see Table 6 which goes as far as showing fused operators including the convolution input vector and fused relu activation).  Examiner notes that due to the broad claim language the output vector could alternatively be interpreted as one of the plurality of hyperparameter configuration vectors in the search space from the pruned hyperparameter search space.).
	
	 Regarding claim 5, the combination of Zhang, and Henry teaches The processor of claim 1, wherein the input vector is part of a matrix.(Zhang [p. 89 §7.1] "The GCN in BRP-NAS takes as input a feature description matrix and a description of the graph structure as an adjacency matrix. BRP-NAS encodes the cell of NASBench201 model for representation. The GCN input is a 9×6 feature matrix and a 9×9 adjacent matrix. However, for the non-cell-based models in our dataset, we should encode the complete model graph. Therefore, we encode all the kernel nodes in a model graph. Besides, we also encode the 5-dimension configuration as the node attributes. Finally, the graph representations are larger than the BRP-NAS. Specifically, the NASBench201 models representation are a 133×22 feature matrix and a 133×133 adjacent matrix").
	
	 Regarding claim 9, the combination of Zhang, and Henry teaches The processor of claim 1, wherein the activation function comprises a non-linear function.(Zhang [p. 83 §2] "activations (e.g., relu, relu6 and hswish)").

	Claims 4, 7,  and 8 are rejected under U.S.C. §103 as being unpatentable over the combination of Zhang and Henry and Xie (“A Twofold Lookup Table Architecture for Efficient Approximation of Activation Functions”, 2020).

	Regarding claim 4, the combination of Zhang, and Henry teaches The processor of claim 1.
	However, the combination of Zhang, and Henry doesn't explicitly teach wherein the lookup table is used for data aggregation during the first activation function.

	Xie, in the same field of endeavor, teaches the lookup table is used for data aggregation during the first activation function. ([p. 2554 §IV] "d-LUT and e-LUT require an 8-bit and an 11-bit addressing and the output value to have 8 and 2 bits, respectively. The address width and the data width for a few activation functions, such as tanh and softsign are presented in Table III. Once the data are retrieved from the LUTs, to satisfy the generalized bit-width requirements, the retrieved values are concatenated with zeros and summed and presented as 12-bit outputs").

	The combination of Zhang and Henry as well as Xie are directed towards hardware aware neural network operation optimization.  Therefore, the combination of Zhang and Henry as well as Xie are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Zhang and Henry with the teachings of Xie by adding the circuit in Xie to the list of edge devices.  Zhang explicitly teaches ([p. 92 §10] "We propose nn-Meter, a kernel-based prediction system that accurately predicts the latency of DNN models on diverse edge devices" such that it would be obvious before the effective filing date of the claimed invention to include the circuit described by Xie as one of the edge devices in Zhang.  Xie provides as additional motivation for combination ([p. 2540 Abstract] "when RALUT and our architecture were combined, it improved the compressibility of the RALUT-based result by up to additional 10.21% for a tanh activation function").

	 Regarding claim 7, the combination of Zhang, and Henry teaches The processor of claim 1.
	However, the combination of Zhang, and Henry doesn't explicitly teach wherein the activation function comprises a sigmoid function.

	Xie, in the same field of endeavor, teaches The processor of claim 1, wherein the activation function comprises a sigmoid function.([p. 2542 §III] "In this section, we will demonstrate the process of designing a t-LUT for a sigmoid function.").

	The combination of Zhang and Henry as well as Xie are directed towards hardware aware neural network operation optimization.  Therefore, the combination of Zhang and Henry as well as Xie are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Zhang and Henry with the teachings of Xie by adding the circuit in Xie to the list of edge devices.  Zhang explicitly teaches ([p. 92 §10] "We propose nn-Meter, a kernel-based prediction system that accurately predicts the latency of DNN models on diverse edge devices" such that it would be obvious before the effective filing date of the claimed invention to include the circuit described by Xie as one of the edge devices in Zhang.  Xie provides as additional motivation for combination ([p. 2540 Abstract] "when RALUT and our architecture were combined, it improved the compressibility of the RALUT-based result by up to additional 10.21% for a tanh activation function").

	 Regarding claim 8, the combination of Zhang, and Henry teaches The processor of claim 1.
	However, the combination of Zhang, and Henry doesn't explicitly teach, wherein the activation function comprises a tan h(x) function.

	Xie, in the same field of endeavor, teaches The processor of claim 1, wherein the activation function comprises a tan h(x) function.([p. 2545] "Designed circuit for sigmoid, tanh, softsign" See also Table IV).

	The combination of Zhang and Henry as well as Xie are directed towards hardware aware neural network operation optimization.  Therefore, the combination of Zhang and Henry as well as Xie are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Zhang and Henry with the teachings of Xie by adding the circuit in Xie to the list of edge devices.  Zhang explicitly teaches ([p. 92 §10] "We propose nn-Meter, a kernel-based prediction system that accurately predicts the latency of DNN models on diverse edge devices" such that it would be obvious before the effective filing date of the claimed invention to include the circuit described by Xie as one of the edge devices in Zhang.  Xie provides as additional motivation for combination ([p. 2540 Abstract] "when RALUT and our architecture were combined, it improved the compressibility of the RALUT-based result by up to additional 10.21% for a tanh activation function").
	Claim 6 is rejected under U.S.C. §103 as being unpatentable over the combination of Zhang and Henry and in further view of Chen (US20200034698A1).

	 Regarding claim 6, the combination of Zhang, and Henry teaches The processor of claim 1.
	However, the combination of Zhang, and Henry doesn't explicitly teach wherein the activation function is made available on a vector instruction list.

	Chen, in the same field of endeavor, teaches The processor of claim 1, wherein the activation function is made available on a vector instruction list. ([¶0045] "[0045] the controller unit 11 is configured to extract a first instruction from the storage unit, parse the first instruction to obtain an operation code of the operation instruction and an operation domain, extract input data and weight data corresponding to the operation domain, and the operation code, the input data and the weight data are transmitted to the operation unit, and the operation code includes at least one of the following codes: an operation code of the matrix operation instruction, a vector operation instruction operation code, an activation operation instruction operation code, an offset operation instruction operation code, a convolution operation instruction operation code, a conversion operation instruction operation code and the like;").

	The combination of Zhang and Henry as well as Chen are directed towards neural network accelerators.  Therefore, Zhang and Henry as well as Chen are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Zhang and Henry with the teachings of Chen by having the activation function as part of a vector instruction list.  Chen provides as additional motivation for combination ([Abstract] “The technical solution provided by this application has advantages of a fast calculation speed and energy-saving”).

	Claim 10 is rejected under U.S.C. §103 as being unpatentable over the combination of Zhang and Henry and Lo (US20200210839A1).

	 Regarding claim 10, the combination of Zhang, and Henry teaches The processor of claim 1.
	However, the combination of Zhang, and Henry doesn't explicitly teach, wherein the input vector comprises N elements and wherein each of the N elements are processed during a single clock cycle.

	Lo, in the same field of endeavor, teaches the input vector comprises N elements and wherein each of the N elements are processed during a single clock cycle. ([¶0106] " if 16 activation values are generated for a given unit of computation (e.g., a single clock cycle, or a predetermined number of clock cycles), then a predetermined number (e.g., one or a few) of activation values having outlier values are stored per compute cycle.").

	The combination of Zhang and Henry as well as Lo are directed towards neural network accelerators.  Therefore, Zhang and Henry as well as Lo are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Zhang and Henry with the teachings of Lo by performing a number of activations in a single clock cycle.  Lo provides as additional motivation for combination ([¶0108] “storing fewer activation values with outlier values decreases storage costs, while storing more activation values with outlier values increases precision of the neural network”).

	Claims 11-16 are rejected under U.S.C. §103 as being unpatentable over the combination of Henry and Zhang.

	 Regarding claim 11, Henry teaches a system, comprising: ([¶0045] "The instruction fetch unit 101 controls the fetching of architectural instructions 103 from system memory (not shown) into the instruction cache 102. The instruction fetch unit 101 provides a fetch address to the instruction cache 102 that specifies a memory address at which the processor 100 fetches a cache line of architectural instruction bytes into the instruction cache 102")
	and a processor to, without performing a memory lookup in separate memory: receive a processor instruction comprising an input vector for a first activation function of plurality of activation functions;([¶0056] "The sequencer 128 fetches instructions from the program memory 129 and executes them" [¶0077] " The instructions flow down the pipeline and control the various functional units […] the initialize NPU instruction specifies the activation function to be performed on the accumulator 202 value 217, and a value indicating the specified activation function is saved in a configuration register for later use by the AFU 212 portion of the pipeline once the final accumulator 202 value 217 has been generated" [¶0085] "As shown, on clock 1, the 512 16-bit data words of row 17 are read out of the data RAM 122 and provided to the 512 NPUs 126" Henry clearly discloses a processor whose NNU/sequencer fetches and executes instructions and that the activation function data is read from RAM.)
	identify, from the lookup table and based on the processor instruction, a first activation function circuit from among the plurality of activation function circuits for performing the first activation function;([¶0256] "The mux 3032 selects the appropriate input specified by the activation function 2934 value and provides the selection to the sign restorer 3034, which converts the positive form output of the mux 3032 to a negative form if the original accumulator 202 value 217was a negative value, e.g., to two's-complement form." [¶0077] " the initialize NPU instruction specifies the activation function to be performed on the accumulator 202 value 217")
	perform, using the input vector, the first activation function with the identified first activation function circuit such that all elements of the input vector are processed during a single clock cycle; ([¶0065] "circuitry of the AFU 212 performs the activation function in a single clock cycle" [¶0073] "the 512 data words from row 17 of the data RAM 122 are provided to the corresponding data input 207 of the 512 NPUs 126 and the 512 weight words from row 0 of the weight RAM 124 are provided to the corresponding weight input 206 of the 512 NPUs 126." [¶0076] "the execution of the write AFU output instruction may be overlapped with the execution of other instructions in a pipelined nature such that the write AFU output instruction effectively executes in a single clock cycle" The input vector is interpreted as the set of values held across the NPUs/accumulators)
	and generate an output vector based on the performed first activation function.([¶0053] "The weight RAM 124 is arranged as W rows of N weight words, and the data RAM 122 is arranged as D rows of N data words. Each data word and each weight word is a plurality of bits, preferably 8 bits, 9 bits, 12 bits or 16 bits. Each data word functions as the output value (also sometimes referred to as an activation) of a neuron of the previous layer in the network" Activation data word (bit vector) interpreted as output vector based on the performed first activation function).
	However, Henry does not explicitly teach a lookup table that matches a plurality of activation functions to a plurality of activation function circuits such that each activation function is matched with an activation function circuit capable of performing that activation function; .

	Zhang, in the same field of endeavor, teaches a lookup table that matches a plurality of activation functions to a plurality of activation function circuits such that each activation function is matched with an activation function circuit capable of performing that activation function; ([p. 82 §1] "We propose and design nn-Meter, a novel and efficient system to accurately predict inference latency of CNN models on diverse edge devices" See Table. 6 which is interpreted as a lookup table that matches a plurality of activation functions with circuits capable of performing the respective activation functions (hswish, relu, etc.).).

	Henry as well as Zhang are directed towards neural network hardware acceleration.  Therefore, Zhang as well as Henry are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Zhang with the teachings of Henry by using a table-driven architectural mapping approach as taught by Zhang to provide an explicit lookup table that matches activation functions to corresponding activation circuits within Henry’s neural network accelerator.  In other words it would have been obvious to use the processor system in Henry as one of the hardware targets in Zhang.  Henry provides as additional motivation for combination ([¶0130] “the media registers 118 can concurrently write to or read from the buffer 1704 while the NPUs 126 are also reading from or writing to the weight RAM 124 (although preferably the NPUs 126 stall, if they are currently executing, to avoid accessing the weight RAM 124 while the buffer 1704 is accessing the weight RAM 124). This may advantageously provide improved performance”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 12, the combination of Henry, and Zhang teaches The system of claim 11, wherein the processor comprises a Central processor.(Zhang [p. 86 §5.1] "Conv and DWConv take 94.2%, 91.91%, 75.5% of the model latency on the CPU, GPU, and VPU, respectively" See Table 6).
	
	 Regarding claim 13, the combination of Henry, and Zhang teaches The system of claim 11, wherein the processor comprises a Graphics processor.(Zhang [p. 86 §5.1] "Conv and DWConv take 94.2%, 91.91%, 75.5% of the model latency on the CPU, GPU, and VPU, respectively" See Table 6).
	
	 Regarding claim 14, the combination of Henry, and Zhang teaches The system of claim 11, wherein the processor comprises a Data processor.(Zhang [p. 86 §5.1] "Conv and DWConv take 94.2%, 91.91%, 75.5% of the model latency on the CPU, GPU, and VPU, respectively" VPU interpreted as data processor.  See also Table 6.).
	
	 Regarding claim 15, the combination of Henry, and Zhang teaches The system of claim 11, wherein the processor is to perform the activation function in an absence of a memory read or memory write.(Zhang [p. 88 §6] "Given a model/kernel configuration, we generate the graph in both the Tensorflow protobuf and tflite format, which are generally supported by edge inference frameworks. We send the target model to the measurement platform and collect the returned inference latency. To measure the latency on the CPU, we set CPU frequency to the highest 2.42GHz. The latency on the CPU is measured by the TFLite benchmark tool. Since TFLite currently doesn’t support operator-level profiling for GPU, we implement an operator profiler in TFLite for GPU backend. For VPU latency measurement, we convert the protobuf format into OpenVINO IR, and measure the latency by the OpenVINOTM toolkit. The latency number is the average of 50 inference runs" [p. 87 5.2] "Driven by the two goals, we first prune the rarely-considered configurations by constraining the sampling distribution. This leverages the observation that many configurations are unlikely selected in state-of-the-art CNN models. And, the considered configurations are non-uniformly distributed in the sample space" Zhang does not rely on a lookup table for performing the activation therefore the activation function in Zhang is interpreted as being performed in an absence of performing a memory read and write.).
	
	 Regarding claim 16, the combination of Henry, and Zhang teaches The system of claim 11, wherein the activation function comprises a non-linear function.(Zhang [p. 83 §2] "activations (e.g., relu, relu6 and hswish)").
	
	Claim 17 is rejected under U.S.C. §103 as being unpatentable over the combination of Henry and Zhang and Xie.

	 Regarding claim 17, the combination of Henry, and Zhang teaches The system of claim 11.
	However, the combination of Henry, and Zhang doesn't explicitly teach, wherein the activation function comprises at least one of a sigmoid function and a tan h(x) function.

	Xie, in the same field of endeavor, teaches the activation function comprises at least one of a sigmoid function and a tan h(x) function.([p. 2542 §III] "In this section, we will demonstrate the process of designing a t-LUT for a sigmoid function.").

	The combination of Henry, and Zhang as well as Xie are directed towards hardware aware neural network operation optimization.  Therefore, the combination of Zhang and Henry as well as Xie are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Zhang and Henry with the teachings of Xie by adding the circuit in Xie to the list of edge devices.  Zhang explicitly teaches ([p. 92 §10] "We propose nn-Meter, a kernel-based prediction system that accurately predicts the latency of DNN models on diverse edge devices" such that it would be obvious before the effective filing date of the claimed invention to include the circuit described by Xie as one of the edge devices in Zhang.  Xie provides as additional motivation for combination ([p. 2540 Abstract] "when RALUT and our architecture were combined, it improved the compressibility of the RALUT-based result by up to additional 10.21% for a tanh activation function").

	Claims 18 and 19 are rejected under U.S.C. §103 as being unpatentable over the combination of Zhang and Rossi (WO2018222299A1) which corresponds to (US10585703B2).

	 Regarding claim 18, Zhang teaches A device, comprising: a processor to: ([p. 89] "We evaluate nn-Meter on the benchmark dataset (Table 2) for CPU, GPU, and VPU")
	receive a processor instruction comprising an input vector for a first activation function from among a plurality of activations functions performable by the processor; ([p. 86 §5.1] "A large sample space of Conv. The possible configurations of a kernel decides the sample space. For the latency-dominating Conv and DWConv kernels, the primary configuration parameter includes: input height H, input width W , kernel size K, stride S, input channel number Cin and output channel Cout. Since H usually is equal to W for a kernel in CNN models, we encode it as a 5- dimension vector: (HW ,K,S, Cin, Cout)")
	wherein the lookup table matches the plurality of activation functions to the plurality of activation function circuits such that each activation function is matched with an activation function circuit that is capable of performing that activation function;([p. 82 §1] "We propose and design nn-Meter, a novel and efficient system to accurately predict inference latency of CNN models on diverse edge devices" See Table. 6 which is interpreted as a lookup table that matches a plurality of activation functions with circuits capable of performing the respective activation functions (hswish, relu, etc.).)
	perform, using the input vector, the first activation function with the identified first activation function circuit; and generate an output vector based on the performed first activation function([p. 88 §6] "Given a model/kernel configuration, we generate the graph in both the Tensorflow protobuf and tflite format, which are generally supported by edge inference frameworks. We send the target model to the measurement platform and collect the returned inference latency. To measure the latency on the CPU, we set CPU frequency to the highest 2.42GHz. The latency on the CPU is measured by the TFLite benchmark tool. Since TFLite currently doesn’t support operator-level profiling for GPU, we implement an operator profiler in TFLite for GPU backend. For VPU latency measurement, we convert the protobuf format into OpenVINO IR, and measure the latency by the OpenVINOTM toolkit. The latency number is the average of 50 inference runs" [p. 87 5.2] "Driven by the two goals, we first prune the rarely-considered configurations by constraining the sampling distribution. This leverages the observation that many configurations are unlikely selected in state-of-the-art CNN models. And, the considered configurations are non-uniformly distributed in the sample space" Inference interpreted as synonymous with generating an output vector based on the input vector and based on the performed first activation function (see Table 6 which goes as far as showing fused operators including the convolution input vector and fused relu activation).  Examiner notes that due to the broad claim language the output vector could alternatively be interpreted as one of the plurality of hyperparameter configuration vectors in the search space from the pruned hyperparameter search space.).
	However, Zhang does not explicitly teach identify, from a lookup table and based on the processor instruction, a first activation function circuit from among a plurality of activation function circuits for performing the first activation function, .

	Rossi, in the same field of endeavor, teaches identify, from a lookup table and based on the processor instruction, a first activation function circuit from among a plurality of activation function circuits for performing the first activation function, ([¶0026] "The neural network annotator 210 in an example can utilize a lookup table to determine whether the CPU or GPU would have better performance in running a particular operation").

	Zhang as well as Rossi are directed towards hardware analysis and selection for neural network processing.  Therefore, Zhang as well as Rossi are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Zhang with the teachings of Rossi by using the latency prediction in Zhang (summarized in Tables 6-9) to populate the LUT in Rossi for LUT based hardware selection.  Zhang explicitly discloses that the model is publicly available at https://github.com/microsoft/nn-Meter (as of 10/3/2025).  Rossi provides as additional motivation for combination ([¶0024] “in instances in which the operation is supported by the CPU and the GPU, the neural network annotator 210 can determine which processor completes the operation in a faster amount of time (e.g., by looking at the architecture of the device to determine whether the device would run the operation better on either the CPU or GPU).”).

	 Regarding claim 19, the combination of Zhang, and Rossi teaches The device of claim 18, wherein the activation function is performed in an absence of performing a memory read or memory write.(Zhang [p. 88 §6] "Given a model/kernel configuration, we generate the graph in both the Tensorflow protobuf and tflite format, which are generally supported by edge inference frameworks. We send the target model to the measurement platform and collect the returned inference latency. To measure the latency on the CPU, we set CPU frequency to the highest 2.42GHz. The latency on the CPU is measured by the TFLite benchmark tool. Since TFLite currently doesn’t support operator-level profiling for GPU, we implement an operator profiler in TFLite for GPU backend. For VPU latency measurement, we convert the protobuf format into OpenVINO IR, and measure the latency by the OpenVINOTM toolkit. The latency number is the average of 50 inference runs" [p. 87 5.2] "Driven by the two goals, we first prune the rarely-considered configurations by constraining the sampling distribution. This leverages the observation that many configurations are unlikely selected in state-of-the-art CNN models. And, the considered configurations are non-uniformly distributed in the sample space" Zhang does not rely on a lookup table for performing the activation therefore the activation function in Zhang is interpreted as being performed in an absence of performing a memory read and write.).
	
	Regarding claim 20, the combination of Zhang and Rossi teaches The device of claim 18.
	However, the combination of Zhang and Rossi doesn't explicitly teach, wherein the activation function comprises at least one of a sigmoid function and a tan h(x) function.

	Xie, in the same field of endeavor, teaches the activation function comprises at least one of a sigmoid function and a tan h(x) function. ([p. 2542 §III] "In this section, we will demonstrate the process of designing a t-LUT for a sigmoid function.").

	The combination of Zhang and Rossi as well as Xie are directed towards hardware aware neural network operation optimization.  Therefore, the combination of Zhang and Rossi as well as Xie are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Zhang and Rossi with the teachings of Xie by adding the circuit in Xie to the list of edge devices.  Zhang explicitly teaches ([p. 92 §10] "We propose nn-Meter, a kernel-based prediction system that accurately predicts the latency of DNN models on diverse edge devices" such that it would be obvious before the effective filing date of the claimed invention to include the circuit described by Xie as one of the edge devices in Zhang.  Xie provides as additional motivation for combination ([p. 2540 Abstract] "when RALUT and our architecture were combined, it improved the compressibility of the RALUT-based result by up to additional 10.21% for a tanh activation function").

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Oct 13, 2021
Application Filed
Feb 18, 2025
Non-Final Rejection — §101, §103, §112
May 09, 2025
Interview Requested
May 15, 2025
Applicant Interview (Telephonic)
May 15, 2025
Examiner Interview Summary
May 27, 2025
Response Filed
Jun 20, 2025
Final Rejection — §101, §103, §112
Sep 11, 2025
Interview Requested
Sep 22, 2025
Examiner Interview Summary
Sep 22, 2025
Applicant Interview (Telephonic)
Sep 24, 2025
Request for Continued Examination
Oct 02, 2025
Response after Non-Final Action
Oct 24, 2025
Non-Final Rejection — §101, §103, §112
Jan 28, 2026
Response Filed
Feb 27, 2026
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/373,021
Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
2y 5m to grant Granted Feb 24, 2026
18/486,534
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 5m to grant Granted Feb 10, 2026
16/902,547
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
2y 5m to grant Granted Jan 27, 2026
18/607,777
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
2y 5m to grant Granted Jan 06, 2026
16/940,293
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.