Last updated: April 19, 2026
Application No. 18/323,473
ENCODING METHOD AND ENCODING CIRCUIT

Non-Final OA §101§103
Filed
May 25, 2023
Examiner
LU, HWEI-MIN
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Macronix International Co. Ltd.
OA Round
1 (Non-Final)
This examiner grants 62% of cases after interview

— +39.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 217 resolved cases, 2023–2026
Examiner Intelligence

LU, HWEI-MIN View full profile →
Grants 62% of resolved cases
Career Allow Rate
134 granted / 217 resolved
+6.8% vs TC avg
Strong +40% interview lift
Without
With
+39.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
37 currently pending
Career history
254
Total Applications
across all art units
Statute-Specific Performance

§101
11.2%
-28.8% vs TC avg
§103
43.8%
+3.8% vs TC avg
§102
9.4%
-30.6% vs TC avg
§112
33.0%
-7.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 217 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This office action is in responsive to communication(s): original application filed on 05/23/2023, said application claims a priority filing date of 02/22/2023.  Claims 1-10 are pending. Claims 1 and 6 are independent.

Specification
The disclosure is objected to because of the following informalities: 
in Abstract, lines 5-6, "… binding the second generated by the activation function with a random vector …"  appears to be "… binding the second vector generated by the activation function with a random vector …";
in ¶ [0005], "… binding the second generated by the activation function with a random vector …" appears to be "… binding the second vector generated by the activation function with a random vector …";
in ¶ [0006], "… a binding circuit coupled to the activation circuit for binding the second generated by the activation function circuit with a random vector  …" appears to be "… a binding circuit coupled to the activation circuit for binding the second vector generated by the activation function circuit with a random vector  …" .  
Appropriate correction is required.

Claim Objections
Claims 1-2 and 6-7 are objected to because of the following informalities: 
in Claim 1, lines 6-7, "… binding the second generated by the activation function with a random vector  …" appears to be "… binding the second vector generated by the activation function with a random vector  …";
in Claim 1, line 8, "… adding the binding results to generate an adding result …" appears to be "… adding the plurality of binding results to generate an adding result …";
in Claim 2, lines 1-2, "… wherein the convolution layer performs linear conversion on the input …" appears to be "… wherein the convolution layer performs the linear conversion on the input …";
in Claim 6, lines 9-10, "… a binding circuit coupled to the activation circuit for binding the second generated by the activation function circuit with a random vector …" appears to be "… a binding circuit coupled to the activation circuit for binding the second vector generated by the activation function circuit with a random vector …";
in Claim 6, lines 12-13, "… adding the binding results to generate an adding result …" appears to be "… adding the plurality of binding results to generate an adding result …";
in Claim 7, lines 1-2, "… wherein the convolution layer circuit performs linear conversion on the input …" appears to be "… wherein the convolution layer circuit performs the linear conversion on the input …".  
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 

Independent Claims 1 and 6
Step 1: Claim 1 is a process claim and Claim 6 is a circuit claim. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) recite(s) "performing linear conversion on an input into a first vector based on a weight (by a convolution layer)", "comparing the first vector generated from the convolution layer with a reference value to generate a second vector (by an activation function)", "binding the second vector generated by the activation function with a random vector to generate a plurality of binding results", "adding the binding results to generate an adding result", and "operating the adding result by a Signum function and a normalization function to generate an output vector" which can be reasonably considered as mental processes (i.e., which "can be performed in the human mind, or by a human using a pen and paper") or mathematical concepts/algorithms/calculations. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) recite(s) additional elements/limitations of "encoding circuit" (Claim 6), "memory device" (Claim 6), "convolution layer circuit" (Claim 6), "activation circuit" (Claim 6), "binding circuit" (Claim 6), "adding circuit" (Claim 6), "Signum function and normalization circuit" (Claim 6), and "the output vector is written into the memory device" (Claim 6) which only amount to "apply it" with the use of generic computer components (e.g., "memory device" and various "circuits" for performing various abstract functions described in Step 2A Prong 1) or insignificant extra solution activity (e.g., "the output vector is written into the memory device"). None of the additional elements/limitations, taken alone or in combination, integrate the abstract idea into a practical application. 
Step 2B:The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because (1) .

Claims 2 and 7
Step 1: Claim 2 is a process claim and Claim 7 is a circuit claim. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) "the convolution layer performs linear conversion on the input into the first vector based on the weight and a bias value" which can be reasonably considered as mental processes (i.e., which "can be performed in the human mind, or by a human using a pen and paper") or mathematical concepts/algorithms/calculations. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) does/do not further recite(s) additional elements/limitations. 
Step 2B:The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claims 3 and 8
Step 1: Claim 3 is a process claim and Claim 8 is a circuit claim. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) "when the input is a 32-bit floating point input, the first vector is a floating point vector; and the second vector and the output vector are both binary vectors" which can be reasonably considered as mental processes (i.e., which "can be performed in the human mind, or by a human using a pen and paper") or mathematical concepts/algorithms/calculations.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) does/do not further recite(s) additional elements/limitations. 
Step 2B:The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claims 4 and 9
Step 1: Claim 4 is a process claim and Claim 9 is a circuit claim. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) recite(s) "in a training stage, the activation function is a hyperbolic tangent function" (Claim 4), "in an inference stage, the activation function is a Signum function" (Claim 4), "in a training stage, performs a hyperbolic tangent function" (Claim 9), and "in an inference stage, performs a Signum function" (Claim 9) which can be reasonably considered as mental processes (i.e., which "can be performed in the human mind, or by a human using a pen and paper") or mathematical concepts/algorithms/calculations. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) does/do not further recite(s) additional elements/limitations. 
Step 2B:The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claims 5 and 10
Step 1: Claim 5 is a process claim and Claim 10 is a circuit claim. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) recite(s) "the second vector is bound with the random vector by an XOR logic operation" (Claim 5) which can be reasonably considered as mental processes (i.e., which "can be performed in the human mind, or by a human using a pen and paper") or mathematical concepts/algorithms/calculations. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) recite(s) additional element of "XOR logic gate" (Claim 5) which only amount to "apply it" with the use of generic computer components or insignificant extra solution activity. None of the additional elements/limitations, taken alone or in combination, integrate the abstract idea into a practical application. 
Step 2B:The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional limitation/element of "XOR logic gate" is a common component for performing a well-understood, routine and conventional (WURC) activity such as an XOR logic operation.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 6-8 are rejected under 35 U.S.C. 103 as being unpatentable over Qi (US 2022/0114413 A1, pub. date: 04/14/2022), hereinafter Qi in view of Kleyko et al. ("Density Encoding Enables Resource-Efficient Randomly Connected Neural Networks", IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO. 8, Aug. 24, 2020, pp. 3777-3783), hereinafter Kleyko.

Independent Claims 1 and 6
Qi discloses an encoding method (Qi, ¶¶ [0002]-[0003]: to achieve real-time and low power in deep convolution neural network (DCNN) inference, the models are compressed in terms of memory size and the number of computation operation when deployed in limited-resource devices; most current mechanisms require a non-trivial normalization stages before and after the convolution operation; the hardware cost of the normalization stage exceeds that of the convolution operation; a possible solution is presented for quantization implementation that utilizes only an extra shifter for the normalization, thus significantly reducing the hardware cost; the proposed solution transforms the multiplication-based normalization into a shift-only one by limiting the quantization in a power-of-two scale; ¶ [0024]: allow a convolution to be determined utilizing minimal hardware resources while retaining an inference accuracy comparable to a corresponding floating point model; specifically, a proposed hardware implementation for the fused convolution layer operation utilizing only a shifter in the normalization stage is proposed; the proposed method and system may support multiple different quantization configurations such as symmetric or asymmetric, layer-wise or channel-wise, and multiple activation functions such as rectified linear unit (ReLU), parametric rectified linear unit (pReLU), filtered rectified linear unit (ReLUx), and hard-sigmoid; the proposed solution may provide zero-numerical-error hardware deployment by utilizing a quantization library; the library may directly quantize individual tensors in a computational graph from public deep learning (DL) frameworks without modifying the kernel implementation of operations from the original framework), comprising: 
performing linear conversion on an input into a first vector based on a weight by a convolution layer (Qi, ¶ [0005]: quantizing an input tensor into a first power-of-two value, quantizing a weight tensor into a second power-of-two value, performing a convolution based on the quantized input tensor and the quantized weight tensor, quantizing a bias tensor into a third power-of-two value, bias-adding an output of the convolution and the quantized bias tensor and outputting a bias-addition; ¶¶ [0025]-[0042]: A typical 2D convolution layer in the convolutional neural network (CNN) may be described in following equations: P = W*Ai-1 + b  (1), Ai = σ(P)  (2), where Ai is the output of ith layer in shape of Ci × Hi × Wi, W is the weight matrix of the layer in shape of Ci × C i-1 × KH × Kw, b is the bias vector of the layer in shape of Ci × 1, P is the intermediate multiply-accumulation result of the layer in shape of Ci × Hi × Wi, σ is the non-linear activation function, Ci is the number of channel of ith layer, Hi, Wi, is the height and width of the feature map of ith layer, * denotes the 2D convolution operation; to limit the data precision for computation efficiency, an affine mapping is commonly used to linearly quantize the floating-point value to a finite bit-width integer representation;                         
                            
                                    v
                                
                                ~
                            
                            =
                            s
                            
                                            v
                                        
                                        -
                                    
                                    -
                                    
                                            z
                                        
                                        -
                                    
                      (3), where                         
                            
                                    v
                                
                                ~
                            
                     is the dequantized value in floating point that approximate the real value v,                         
                            
                                    v
                                
                                -
                            
                     is s the integer representation of the quantized value,                         
                            
                                    s
                                    ,
                                     
                                            z
                                        
                                        -
                                    
                     are the quantization parameters where s is the normalization factor of v in floating-point format and z the zero point of v in integer format, these tuples may be computed offline; convolution layer operations are given by Equations (1) and (2), with quantized variables and illustrate integer-based implementations; the offline-computed parameters may be in a lower-bit format to reduce model and memory footprint, the heavy computation kernel (convolution) may also be lower-bit integer-based for low-cost hardware implementation and the offline-computed parameters may be fused when possible while minimizing the induced bias in the fusion; plugging equation (3) into equation (1) with respective quantization parameters (s, z), the equivalent integer-based 2D convolution operation is given by equation (4); separating the integer operations and floating-point operations in equation (5); the intermediate result of the integer-based multiply-accumulation operation is designated as                         
                            
                                    P
                                
                                -
                            
                     in equations (6)-(7); ¶¶ [0059]-[0064] with 310 and 314 in FIG. 3: a multiply-accumulate layer 310 outputs an intermediate result of the multiply-accumulate                         
                            
                                    P
                                
                                -
                            
                     314 as an integer variable; asymmetric quantization is supported in the proposed data path when using ReLU activation function; Weight quantization: since the number of MACs operations may be double, it is recommended to avoid quantizing weight asymmetrically; Activation functions: when using symmetric quantization or partial asymmetric quantization (asymmetric weight and symmetric activation), several activation functions are feasible: ReLU (a=0, d=0), ReLU6 (x=6×                        
                            
                                    2
                                
                                    n
                                    
                                            A
                                        
                                            n
                                        
                                    w
                                
                    ), leakyReLU (a≈0.125, d=0), pReLU (p≈2n), hard-sigmoid (with slope 2-n); ¶¶ [0065]-[0066] with 410-424 and 432-438 FIG. 4: the software quantization library may compute the parameters by inserting quasi-quantization nodes into the original floating-point computational graph in the DL framework without replacing the original math-related operation kernels (2D convolutional layer (Conv2D), bias addition (BiasAdd), Non-linear), as shown in FIG. 4; an example fused convolutional method includes (a) quantizing 414 an input tensor 410 Ai-1 434 into a first power-of-two value; (b) quantizing 416 a weight tensor 412 W 432 into a second power-of-two value and performing a convolution 418 based on the quantized input tensor and the quantized weight tensor; (c) quantizing 422 a bias tensor 420 b 436 into a third power-of-two value, bias-adding 424 an output of the convolution and the quantized bias tensor and outputting a bias-addition                         
                            
                                    P
                                
                                ~
                            
                     438; ¶ [0068] with 510-518 in FIG. 5: quantizing 510 an input tensor into a first power-of-two value, quantizing 512 a weight tensor into a second power-of-two value and performing 514 a convolution based on the quantized input tensor and the quantized weight tensor; quantizing 516 a bias tensor into a third power-of two value, bias-adding 518 an output of the convolution and the quantized bias tensor and outputting a bias-addition); 
comparing the first vector generated from the convolution layer with a reference value to generate a second vector by an activation function; binding the second vector generated by the activation function  (Qi, ¶¶ [0043]-[0059] and [0004] with 312-332 and  in FIG. 3: considering the final non-linear operation in equation (2) and the output expressed in quantized integer-format to cascade to the next layer as equation (8); most non-linear activation functions may be relaxed or approximated by a Parametric Rectifier Linear Unit (PReLU) with upper bound clamping as equation (9); plugging equation (9) into equation (8), and rounding the remaining floating-point parts to integer or converting them to fixed-point representation yields equations (10)-(11); the operation is formulated as in equation (6) and (10); by constraining the floating-point scaling factor s and a to power-of-two values as in equation (12); the normalization stage may be simplified as equations (13)-(14); under either one of the following conditions, the addition operation in the normalization stage may be avoided;                         
                            
                                    z
                                
                                            A
                                        
                                            i
                                        
                     = 0 indicates that activation may be symmetrically quantized; ReLU (a=0) is used as the activation function a(x); in this case,                         
                            
                                    z
                                
                                            A
                                        
                                            i
                                        
                     may be fused in the bias-stage as equations (15)-(17); the data path is illustrated in FIG. 3; accumulator precision, to simulate the precision of the mantissa of float 32, a 24-bit integer may be used for the intermediate accumulation                         
                            
                                    P
                                
                                -
                            
                    ; as long as the accumulation operation does not overflow, the result may be bit-exact compared to software quantization, which implies that the weights and input activation data may be properly normalized, which is satisfied when using a batch normalization layer in the model; the precision of fused bias                         
                            
                                    b
                                
                                ~
                            
                     as the bias term may be added into the accumulator, a reasonable maximum precision may be designated as 24-bit; in equation (15), the first two terms are in the same scale as the accumulation result                         
                            
                                    P
                                
                                -
                            
                    ; no overflow may occur if the normalization requirement is satisfied; the third term indicates a conversion of the original bias vector b to fixed-point and normalization on the scale of the accumulation result                         
                            
                                    P
                                
                                -
                            
                    ; when quantizing the original bias vector b in the software library, no affine mapping is needed; the fixed-point bit-precision and Q factor may be designed to maintain bit-exactness between software quasi-quantization and a hardware quantization implementation; the last term may be completely fused without introducing numerical error, as it remains the lower-bit precision of                         
                            
                                    z
                                
                                            A
                                        
                                            i
                                        
                    ; the precision of the fused zero-point                         
                            
                                    z
                                
                                ~
                            
                     may impact the subsequent non-linear operations, for the entries of                         
                            
                                    P
                                
                                -
                            
                     close to                         
                            
                                    z
                                
                                ~
                            
                    ; the induced bias may be amplified in the output activation and may be propagated in the subsequent layers; the intermediate multiply-accumulate is routed to a comparator 318 that also receives a first zero point z 316; a first multiplexer 320 is coupled to the comparator 318, the first multiplexer 320 receives multiple of power-of-two exponent values ni 322; a shift normalizer 324 which normalizes based on shift-only is coupled to the first multiplexer 320; the shift normalizer 324 receives the multiply-accumulation result 314 and the multiple power-of-two exponent values 322; the shift normalizer 324 limits a quantization of the multiply-accumulation result 314 to a power-of-two scale; a second multiplexer 326 is coupled to an output of the shift normalizer 324, the first multiplexer 320 and receives a second zero point                         
                            
                                    z
                                
                                            A
                                        
                                            i
                                        
                     328 and outputs an activation                         
                            
                                            A
                                        
                                            i
                                        
                                -
                            
                     332; the multiply-accumulation result 314 may be an integer variable; the first zero point 316 may be fused or an approximated integer variable; the activation 332 may be an integer variable; ¶ [0063]: Activation functions: when using symmetric quantization or partial asymmetric quantization (asymmetric weight and symmetric activation), several activation functions are feasible: ReLU (a=0, d=0), ReLU6 (x=6×                        
                            
                                    2
                                
                                    n
                                    
                                            A
                                        
                                            n
                                        
                                    w
                                
                    ), leakyReLU (a≈0.125, d=0), pReLU (p≈2n), hard-sigmoid (with slope 2-n); ¶¶ [0065]-[0067] with 426-430 and 439-440 in FIG. 4: non-linearizes 426 the output of the bias-addition and quantizes 428 the output of the non-linearization into an activation Ai taking a form of a fourth power-of-two value output tensor 430; ¶ [0068] with 520-522 in FIG. 5: non-linearizing 520 the output of the bias-addition and quantizing 522 the output of the non-linearization into an activation taking a form of a fourth power-of-two value output tensor).  
Qi further discloses an encoding circuit coupled to a memory device, the encoding circuit comprising: a convolution layer circuit coupled to the memory device; an activation circuit coupled to the convolution layer circuit; a binding circuit coupled to the activation circuit; an adding circuit coupled to the binding circuit; and a Signum function and normalization circuit coupled to the adding circuit, wherein the output vector is written into the memory device (Qi, ¶¶ [0015]-[0023] with FIGS. 1-2: implement neural nets associated with the operation of one or more portions or steps of process 500; the processors associated with the hybrid system comprise a field programmable gate army (FPGA) 122, a graphical processor unit (GPU) 120 and a central processing unit (CPU) 118, which have the capability of providing a neural net; an FPGA is a field programmable device, it has the ability to be reconfigured and perform in hardwired circuit fashion any function that may be programmed into a CPU or GPU; there are other types of processors that the system may encompass such as an accelerated processing unit (APUs) which comprise a CPU with GPU elements on chip and digital signal processors (DSPs) which are designed for performing high speed numerical data processing. Application specific integrated circuits (ASICs) may also perform the hardwired functions of an FPGA; the graphical processor unit 120, central processing unit 118 and field programmable gate arrays 122 are connected and are connected to a memory interface controller 112; the FPGA is connected to the memory interface through a programmable logic circuit to memory interconnect 130; this additional device is utilized due to the fact that the FPGA is operating with a very large bandwidth and to minimize the circuitry utilized from the FPGA to perform memory tasks. The memory and interface controller 112 is additionally connected to persistent memory disk 110, system memory 114 and read only memory (ROM) 116; the processors associated with the hybrid system comprise a field programmable gate array (FPGA) 210 and a central processing unit (CPU) 220; the FPGA is electrically connected to an FPGA controller 212 which interfaces with a direct memory access (DMA) 218. The DMA is connected to input buffer 214 and output buffer 216, which are coupled to the FPGA to buffer data into and out of the FPGA respectively; the DMA 218 includes of two first in first out (FIFO) buffers one for the host CPU and the other for the FPGA, the DMA allows data to be written to and read from the appropriate buffer; on the CPU side of the DMA are a main switch228 which shuttles data and commands to the DMA; the DMA is also connected to an SD RAM controller 224 which allows data to be shuttled to and from the FPGA to the CPU 220, the SDRAM controller is also connected to external SDRAM 226 and the CPU 220; the main switch 228 is connected to the peripherals interface 230. A flash controller 222 controls persistent memory and is connected to the CPU 220; ¶ [0059] with FIG. 3: a fused convolutional layer 300, having a shift-only normalizer; a multiply-accumulate layer 310 outputs an intermediate result of the multiply-accumulate 314 as an integer variable; the intermediate multiply-accumulate is routed to a comparator 318 that also receives a first zero point 316; a first multiplexer 320 is coupled to the comparator 318, the first multiplexer 320 receives multiple of power-of-two exponent values, 322; a shift normalizer 324 which normalizes based on shift-only is coupled to the first multiplexer 320; the shift normalizer 324 receives the multiply-accumulation result 314 and the multiple power-of-two exponent values 322; the shift normalizer 324 limits a quantization of the multiply-accumulation result 314 to a power-of-two scale; a second multiplexer 326 is coupled to an output of the shift normalizer 324, the first multiplexer 320 and receives a second zero point 328 and outputs an activation 332).
Qi fails to explicitly discloses binding the second vector generated by the activation function with a random vector to generate a plurality of binding results.
Kleyko teaches a system a method relating to machine learning algorithms for neural networks (Abstract in Page 3777), wherein binding the second vector generated by the activation function with a random vector to generate a plurality of binding results; adding the binding results to generate an adding result (Qi, Abstract and Section I of Page 3777: resource-efficient randomly connected neural networks known as random vector functional link (RVFL) networks since their simple design and extremely fast training time make them very attractive for solving many applied classification tasks; propose to represent input features via the density-based encoding known in the area of stochastic computing and use the operations of binding and bundling from the area of hyperdimensional computing for obtaining the activations of the hidden neurons; the proposed approach demonstrates higher average accuracy than the conventional RVFL; also demonstrate that it is possible to represent the readout matrix using only integers in a limited range with minimal loss in the accuracy; in this case, the proposed approach operates only on small n-bits integers, which results in a computationally efficient architecture; finally, through hardware field-programmable gate array (FPGA) implementations, show that such an approach consumes approximately 11 times less energy than that of the conventional RVFL; RFVLs provide a universal approximation for continuous maps and functional approximations that converge in the Kullback–Leibler divergence when the target function is a probability density function; present an approach for an order of magnitude increase of the resource-efficiency (memory footprint, computational complexity, and energy consumption) of RVFLs operations; the proposed approach combines techniques from two fields of computer science: stochastic computing and hyperdimensional computing; the fundamental idea is in the realization of activations of the hidden layer with the computationally simple operations of hyperdimensional computing and the usage of the density-based encoding of the input features as in stochastic computing; moreover, enhance this approach with the integer-only readout matrix; this combination allows us to use integer arithmetics end-to-end; Section II.A with FIG. 1 of Pages 3777-3778: Fig. 1 depicts the architecture of the conventional RVFL, which includes three layers of neurons; in general, the connectivity of an RVFL is described by two matrices and a vector; a matrix Win [Symbol font/0xCE] [N × K] describes connections between the input layer neurons and the hidden layer neurons; this matrix projects the given input features to the hidden layer; each neuron in the hidden layer has a parameter called a bias; biases of the hidden layer are stored in a vector and denoted as b [Symbol font/0xCE]  [N × 1]; the other matrix of readout connections Wout [Symbol font/0xCE]  [L × N] between the hidden and output layers transforms the current activations in the hidden layer stored in h into the network’s output y; the main feature of the RVFL is that matrix Win and vector b are randomly generated at the network initialization and stay fixed during the network’s lifetime; there are no strict limitations for the generation of Win and b; they are usually randomly drawn from either normal or uniform distributions; here, both Win  and b are generated from a uniform distribution; the range for Win is [−1, 1], while the range for b is [−0.1, 0.1]; since Win and b are fixed, the process of training RVFL is focused on learning the values of the readout matrix Wout; the main advantage of training only Wout is that the corresponding optimization problem is strictly convex; thus, the solution could be found in a single analytical step; the activations of the network’s hidden layer h are described by the following equation h = g(Win x + b), where g(x) is a nonlinear activation function applied to each neuron; here, the sigmoid function g(x) = (1/1 + e−x ) is used; the predictions issued by the output layer are calculated as y = Wout h ; the standard way of acquiring weights of the trainable connections between the hidden and the output layers in the Wout matrix is via solving the ridge regression (which is a special case of the Tikhonov regularization) problem, which minimizes the mean square error between predictions (2) and the ground truth; the activations of the hidden layer hT for each training example are collected together in matrix H [Symbol font/0xCE]  [M × N]; matrix Y [Symbol font/0xCE]  [M × L] stores the corresponding ground-truth classifications using one-hot encodings; given H and Y, Wout is calculated as equation (3); Section II.B with FIG. 2 of Pages 3778-3779: stochastic computing operates with scalars between 0 and 1, which are represented as random bit vectors where the scalar being encoded determines the probability of generating ones; thus, the density of ones in the obtained bit vector encodes the scalar; hence, such a representation method is called the density-based encoding; generating random streams is important because the independence of two vectors is a prerequisite for using the Boolean operations to implement the arithmetics on them (e.g., AND for multiplication); note that, for the proposed approach, no arithmetic operations will be performed with the density-based encodings of scalars; therefore, the randomness of representations for encoding scalars is not compulsory in this study; in fact, from the simplicity point of view, it is more advantageous to use a structured version of the density-based encoding, which does not require a source of randomness; use the structured version of the density-based encoding also known under the name thermometric encoding for the rest of this brief; Fig. 2 illustrates all possible values, which could be encoded when the dimensionality of the representation5 is set to N = 4; Fig. 2 indicates that using the density-based encoding, it is possible to represent N + 1 different values; the most convenient way of denoting these values is by using integers in the range [0, N] (nodes on the left in the figure); in this case, in order to obtain the encoding of a given value v, it is necessary to set v leftmost positions of the vector to “one” (hashed red nodes in the figure), while the rest of the vector is set to “zero” (filled green nodes); in the case of bipolar representations used in the following, “one” corresponds to −1, while “zero” corresponds to 1; recall, however, that input features are not integers in the range [0, N]; instead, it is assumed that a feature xi is represented by a real number in the range [0, 1; the task is to represent the current value of the feature as a vector f [Symbol font/0xCE] [N × 1] using the abovementioned density-based encoding; since the encoding requires a finite set of values between 0 and N, real numbers are first discretized using a fixed quantization step, which is determined by N; given the current value of the feature, it is quantized to the closest integer as equation (4); the obtained v will determine the density-based encoding f; the presented procedure allows generating density-based encodings for the whole feature vector x; Matrix F [Symbol font/0xCE]  [N × K], where K denotes the number of features, contains the density-based encodings f of the current values of x; Section II.C with of Page-3779: hyperdimensional computing also known as vector symbolic architectures is a family of bioinspired methods of representing and manipulating concepts for cognitive architectures and their meanings in a high-dimensional space; vectors of high (but fixed) dimensionality (denoted as N) are the basis for representing information in hyperdimensional computing; the information is distributed across the HD vector’s positions; therefore, HD vectors use distributed representations; distributed representations are contrary to the localist representations since any subset of the positions can be interpreted; this is very relevant to the density-based encoding introduced in Section II-B since the encoding in f is also distributed; in the scope of this brief paper, columns of Win matrix are interpreted as HD vectors, which are generated randomly; these HD vectors are bipolar (Win [Symbol font/0xCE]  {−1,+1}[N×K]) and random with equal probabilities for +1 and −1. It is worth noting that an important property of high-dimensional spaces is that with an extremely high probability, all random HD vectors are dissimilar to each other (quasi-orthogonal); in order to manipulate HD vectors, hyperdimensional computing defines operations on them; in this brief, implicitly use only two key operations: binding and bundling; the binding operation is used to associate two HD vectors together; the result of binding is another HD vector; here, the result of binding (denoted as z) two vectors x and y is calculated as follows: z = x [Symbol font/0xC4] y, where the notation [Symbol font/0xC4] for the Hadamard product is used to denote the binding operation since this brief uses position-wise multiplication for binding; an important property of the binding operation is that the resultant HD vector z is quasi-orthogonal to the HD vectors being bound; the second operation is called bundling; the bundling operation combines several HD vectors into a single HD vector; its simplest realization is a position-wise addition; however, when using the position-wise addition, the vector space becomes unlimited; therefore, it is practical to limit the values of the result; this could be achieved with, e.g., a clipping function [denoted as fκ(*)]; in the clipping function, κ is a configurable threshold parameter; thus, in this brief, the bundling operation is implemented via position-wise addition limited via the clipping function; e.g., the result (denoted as a) of bundling HD vectors x and y is simply a = fκ(x + y); in contrast to the binding operation, the resultant HD vector a is similar to all bundled HD vectors, which allows, e.g., storing information in HD vectors;  Section III with FIGS. 3-4 in Pages 3779-3780: presents an architecture of the RVFL utilizing the density-based encoding; the approach is illustrated in Fig. 3; the major difference is that the proposed approach is illustrated with four layers of neurons: input layer (x, K neurons); density-based representation layer (F, N × K neurons); hidden layer (h, N neurons); and output layer (y, L neurons); thus, in contrast to the conventional RVFL, the hidden layer is not connected directly to the input layer; instead, each input feature is first transformed into a row of neurons storing its density-based encodings; these vectors constitute the density-based representation layer, which, in turn, is connected to the hidden layer; note also that the input and density-based representation layers are not fully connected; each neuron in the input layer is only connected to N neurons in the corresponding row of the next layer; moreover, these connections (blue lines in Fig. 3) are called “feature-dependent” because the activation of the ith input neuron xi will be quantized to the closest integer v according to (4); in turn, v determines the number of the rightmost connections, which transmits −1, the remaining connections from that neuron transmit +1; since each neuron in the density-based representation layer has only one incoming connection, the input activations are projected in the form of the bipolar matrix F; it is also important to mention that the density-based representation and hidden layers are not fully connected; in fact, each neuron in the density-based representation layer has only one outgoing connection; therefore, the matrix Win describing the fixed random connections to the hidden layer is still Win [Symbol font/0xCE] [N × K]; moreover, these connections have a clear structure; in Fig. 3, the connections are structured in such a way that each column in F is connected to one of the hidden layer neurons; it explains why the number of hidden neurons N also determines the dimensionality of the density-based encoding of features: each hidden neuron has its corresponding column in F (see Fig. 4); similar to the conventional RVFL, the values of Win are also generated randomly; however, the values are drawn equiprobably from {−1,+1}; thus, similar to F, Win is also a bipolar matrix; when reflecting to the ideas of hyperdimensional computing, Win should be interpreted as K N-dimensional bipolar HD vectors; i.e., each feature is assigned with the corresponding HD vector; thus, a conceptual intermediate step before getting input values of the hidden neurons is the binding operation between features’ HD vectors and their current density-based encoding; finally, the proposed approach uses different nonlinear activation function in the hidden layer; the clipping function (5) is used instead of the sigmoid function. The clipping function is characterized by the threshold value κ regulating nonlinear behavior of the neurons and limiting the range of activation values. Summarizing the aforementioned differences, activations of the hidden layer h are obtained as follows: h = fκ (Σ F [Symbol font/0xC4] Win), where Σ is a column-wise summation; note that in contrast to (1), there is no bias term since it has been found empirically that its presence does not improve classification performance; in order to make operations of the proposed approach more intuitive, Fig. 4 presents a numerical example of acquiring the activations of the hidden layer; first, the input layer with K = 5 neurons sets the values of the current feature vector; the quantized values determine the neurons of the density-based encoding, which are set to −1 (the rest is +1); the bottom left figure shows a randomly generated Win; once F is obtained, we calculate the Hadamard product F [Symbol font/0xC4] Win, which is denoted as “bound representations” in Fig. 4; the row-wise summation of the resultant matrix represents the input values of the hidden layer; finally, the clipping function (κ = 2 in Fig. 4) is used in the hidden layer to get h).
Qi and Kleyko are analogous art because they are from the same field of endeavor, a system a method relating to machine learning algorithms for neural networks.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Kleyko to Qi.  Motivation for doing so would improve accuracy and enhance resource-efficienc.

Claims 2 and 7
Qi in view of Kleyko discloses all the elements as stated in Claims 1 and 6 further discloses wherein the convolution layer performs linear conversion on the input into the first vector based on the weight and a bias value (Qi, ¶¶ [0025]-[0042]: A typical 2D convolution layer in the convolutional neural network (CNN) may be described in following equations: P = W*Ai-1 + b  (1), Ai = σ(P)  (2), where Ai is the output of ith layer in shape of Ci × Hi × Wi, W is the weight matrix of the layer in shape of Ci × C i-1 × KH × Kw, b is the bias vector of the layer in shape of Ci × 1, P is the intermediate multiply-accumulation result of the layer in shape of Ci × Hi × Wi, σ is the non-linear activation function, Ci is the number of channel of ith layer, Hi, Wi, is the height and width of the feature map of ith layer, * denotes the 2D convolution operation; to limit the data precision for computation efficiency, an affine mapping is commonly used to linearly quantize the floating-point value to a finite bit-width integer representation;                         
                            
                                    v
                                
                                ~
                            
                            =
                            s
                            
                                            v
                                        
                                        -
                                    
                                    -
                                    
                                            z
                                        
                                        -
                                    
                      (3), where                         
                            
                                    v
                                
                                ~
                            
                     is the dequantized value in floating point that approximate the real value v,                         
                            
                                    v
                                
                                -
                            
                     is s the integer representation of the quantized value,                         
                            
                                    s
                                    ,
                                     
                                            z
                                        
                                        -
                                    
                     are the quantization parameters where s is the normalization factor of v in floating-point format and z the zero point of v in integer format, these tuples may be computed offline; convolution layer operations are given by Equations (1) and (2), with quantized variables and illustrate integer-based implementations; the offline-computed parameters may be in a lower-bit format to reduce model and memory footprint, the heavy computation kernel (convolution) may also be lower-bit integer-based for low-cost hardware implementation and the offline-computed parameters may be fused when possible while minimizing the induced bias in the fusion; plugging equation (3) into equation (1) with respective quantization parameters (s, z), the equivalent integer-based 2D convolution operation is given by equation (4); separating the integer operations and floating-point operations in equation (5); the intermediate result of the integer-based multiply-accumulation operation is designated as                         
                            
                                    P
                                
                                -
                            
                     in equations (6)-(7)) ().  

Claims 3 and 8
Qi in view of Kleyko discloses all the elements as stated in Claims 1 and 6 further discloses wherein when the input is a 32-bit floating point input, the first vector is a floating point vector; and the second vector and the output vector are both binary vectors (Qi, ¶¶ [0025]-[0042]: A typical 2D convolution layer in the convolutional neural network (CNN) may be described in following equations: P = W*Ai-1 + b  (1), Ai = σ(P)  (2), where Ai is the output of ith layer in shape of Ci × Hi × Wi, W is the weight matrix of the layer in shape of Ci × C i-1 × KH × Kw, b is the bias vector of the layer in shape of Ci × 1, P is the intermediate multiply-accumulation result of the layer in shape of Ci × Hi × Wi, σ is the non-linear activation function, Ci is the number of channel of ith layer, Hi, Wi, is the height and width of the feature map of ith layer, * denotes the 2D convolution operation; to limit the data precision for computation efficiency, an affine mapping is commonly used to linearly quantize the floating-point value to a finite bit-width integer representation;                         
                            
                                    v
                                
                                ~
                            
                            =
                            s
                            
                                            v
                                        
                                        -
                                    
                                    -
                                    
                                            z
                                        
                                        -
                                    
                      (3), where                         
                            
                                    v
                                
                                ~
                            
                     is the dequantized value in floating point that approximate the real value v,                         
                            
                                    v
                                
                                -
                            
                     is s the integer representation of the quantized value,                         
                            
                                    s
                                    ,
                                     
                                            z
                                        
                                        -
                                    
                     are the quantization parameters where s is the normalization factor of v in floating-point format and z the zero point of v in integer format, these tuples may be computed offline; convolution layer operations are given by Equations (1) and (2), with quantized variables and illustrate integer-based implementations; the offline-computed parameters may be in a lower-bit format to reduce model and memory footprint, the heavy computation kernel (convolution) may also be lower-bit integer-based for low-cost hardware implementation and the offline-computed parameters may be fused when possible while minimizing the induced bias in the fusion; plugging equation (3) into equation (1) with respective quantization parameters (s, z), the equivalent integer-based 2D convolution operation is given by equation (4); separating the integer operations and floating-point operations in equation (5); the intermediate result of the integer-based multiply-accumulation operation is designated as                         
                            
                                    P
                                
                                -
                            
                     in equations (6)-(7); accumulator precision, to simulate the precision of the mantissa of float 32, a 24-bit integer may be used for the intermediate accumulation P) (Kleyko, Section III with FIGS. 3-4 in Pages 3779-3780: presents an architecture of the RVFL utilizing the density-based encoding; the approach is illustrated in Fig. 3; the major difference is that the proposed approach is illustrated with four layers of neurons: input layer (x, K neurons); density-based representation layer (F, N × K neurons); hidden layer (h, N neurons); and output layer (y, L neurons); thus, in contrast to the conventional RVFL, the hidden layer is not connected directly to the input layer; instead, each input feature is first transformed into a row of neurons storing its density-based encodings; these vectors constitute the density-based representation layer, which, in turn, is connected to the hidden layer; note also that the input and density-based representation layers are not fully connected; each neuron in the input layer is only connected to N neurons in the corresponding row of the next layer; moreover, these connections (blue lines in Fig. 3) are called “feature-dependent” because the activation of the ith input neuron xi will be quantized to the closest integer v according to (4); in turn, v determines the number of the rightmost connections, which transmits −1, the remaining connections from that neuron transmit +1; since each neuron in the density-based representation layer has only one incoming connection, the input activations are projected in the form of the bipolar matrix F; it is also important to mention that the density-based representation and hidden layers are not fully connected; in fact, each neuron in the density-based representation layer has only one outgoing connection; therefore, the matrix Win describing the fixed random connections to the hidden layer is still Win [Symbol font/0xCE] [N × K]; moreover, these connections have a clear structure; in Fig. 3, the connections are structured in such a way that each column in F is connected to one of the hidden layer neurons; it explains why the number of hidden neurons N also determines the dimensionality of the density-based encoding of features: each hidden neuron has its corresponding column in F (see Fig. 4); similar to the conventional RVFL, the values of Win are also generated randomly; however, the values are drawn equiprobably from {−1,+1}; thus, similar to F, Win is also a bipolar matrix; when reflecting to the ideas of hyperdimensional computing, Win should be interpreted as K N-dimensional bipolar HD vectors; i.e., each feature is assigned with the corresponding HD vector; thus, a conceptual intermediate step before getting input values of the hidden neurons is the binding operation between features’ HD vectors and their current density-based encoding; finally, the proposed approach uses different nonlinear activation function in the hidden layer; the clipping function (5) is used instead of the sigmoid function. The clipping function is characterized by the threshold value κ regulating nonlinear behavior of the neurons and limiting the range of activation values. Summarizing the aforementioned differences, activations of the hidden layer h are obtained as follows: h = fκ (Σ F [Symbol font/0xC4] Win), where Σ is a column-wise summation; note that in contrast to (1), there is no bias term since it has been found empirically that its presence does not improve classification performance; in order to make operations of the proposed approach more intuitive, Fig. 4 presents a numerical example of acquiring the activations of the hidden layer; first, the input layer with K = 5 neurons sets the values of the current feature vector; the quantized values determine the neurons of the density-based encoding, which are set to −1 (the rest is +1); the bottom left figure shows a randomly generated Win; once F is obtained, we calculate the Hadamard product F [Symbol font/0xC4] Win, which is denoted as “bound representations” in Fig. 4; the row-wise summation of the resultant matrix represents the input values of the hidden layer; finally, the clipping function (κ = 2 in Fig. 4) is used in the hidden layer to get h).  

Claims 4 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Qi in view of Kleyko as applied to Claims 1 and 6 respectively above, and further in view of Chen et al. ("FxpNet: Training a deep convolutional neural network in fixed-point representation", 2017 International Joint Conference on Neural Networks (IJCNN), May 14-19, 2017, pp. 2494-2501), hereinafter Chen.

Claims 4 and 9
Qi in view of Kleyko discloses all the elements as stated in Claims 1 and 6 further discloses wherein in a training stage, the activation function is a  ReLU function; and in an inference stage, the activation function is a Signum function (Qi, ¶ [0063]: Activation functions: when using symmetric quantization or partial asymmetric quantization (asymmetric weight and symmetric activation), several activation functions are feasible: ReLU (a=0, d=0), ReLU6 (x=6×                        
                            
                                    2
                                
                                    n
                                    
                                            A
                                        
                                            n
                                        
                                    w
                                
                    ), leakyReLU (a≈0.125, d=0), pReLU (p≈2n), hard-sigmoid (with slope 2-n)).  
Qi in view of Kleyko fails to explicitly disclose wherein in a training stage, the activation function circuit performs a hyperbolic tangent function; and in an inference stage, the activation function circuit performs a Signum function.
Chen teaches a system and a method relating to deep convolutional neural network (Chen, Abstract of Page 2494), wherein in a training stage, the activation function circuit performs a hyperbolic tangent function; and in an inference stage, the activation function circuit performs a Signum function (Chen, Abstract of Page 2494: introduce FxpNet, a framework to train deep convolutional neural networks with low bit-width arithmetics in both forward pass and backward pass; during training FxpNet further reduces the bit-width of stored parameters (also known as primal parameters) by adaptively updating their fixed-point formats; these primal parameters are usually represented in the full resolution of floating-point values in previous binarized and quantized neural networks; in FxpNet, during forward pass fixed-point primal weights and activations are first binarized before computation, while in backward pass all gradients are represented as low resolution fixed-point values and then accumulated to corresponding fixed-point primal parameters; to have highly efficient implementations in FPGAs, ASICs and other dedicated devices, FxpNet introduces Integer Batch Normalization (IBN) and Fixed-point ADAM (FxpADAM) methods to further reduce the required floating-point operations, which will save considerable power and chip area; Section II.A in Page 2495: binarizing weight and activation can significantly speedup the performance by the bit convolution kernels; there are two binarization approaches, deterministic and stochastic, used to transform floating-point value into one single bit; stochastic binarization could get slightly better performance at the cost of more complex implementation since it requires hardware to generate random bits when quantizing; thus,  propose using only the deterministic binarization method (a simple sign function): wb = sign(w) = +1 for w ≥ 0, − 1 otherwise; binarization dramatically reduces computation and memory consumption in forward pass, nevertheless, the derivative of the sign function is 0 almost everywhere, makes the gradients of the cost c can’t be propagated in backward pass; to address this problem, adopt the "straight-through estimator" (STE) method, and use the same STE formulation in equation (2); above STE preserves gradient information and cancels the gradient when ri is too large; no-cancelling will cause a significant performance drop; as also pointed out in QNN STE can also be seen as applying the well-known hard tanh activation function to ri, defined as in equation (3); correspondingly, the derivative of hard tanh is defined as equation (4) which is exactly the STE defined in Equation 2; with Equation 2 and 4, BNN binarizes both activations and weights during forward pass, while still reserving real-valued gradients of weights to guarantee that Stochastic Gradient Descent (SGD) works well; BNN further proposes shift-based Batch Normalization (BN) and a shift-based ADAM learning rule to accelerate training and reduce the impact of weights’ scale).
Qi in view of Kleyko, and Chen are analogous art because they are from the same field of endeavor, a system and a method relating to deep convolutional neural network.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Chen to Qi in view of Kleyko.  Motivation for doing so would reduce computation and memory consumption in forward pass, and reserve real-valued gradients of weights to guarantee that Stochastic Gradient Descent (SGD) works well in backward pass (Chen, Section II.A in Page 2495).

Claims 5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Qi in view of Kleyko as applied to Claims 1 and 6 respectively above, and further in view of Rosing et al. (US 2022/0019441 A1, pub. date: 01/20/2022), hereinafter Rosing.

Claims 5 and 10
Qi in view of Kleyko discloses all the elements as stated in Claims 1 and 6 except failing to explicitly disclose wherein the second vector is bound with the random vector by an XOR logic operation or wherein the binding circuit is an XOR logic gate.  
Rosing teaches a system and a method relating to deep learning (Rosing, ¶ [0004]), wherein the second vector is bound with the random vector by an XOR logic operation or wherein the binding circuit is an XOR logic gate (Rosing, ¶¶ [0147]-[0151] and [0154]-[0160] with FIGS. 11(a): the human brain is more capable of recognizing patterns than calculating with numbers; this fact motivates us to simulate the process of brain's computing with points in high-dimensional space; these points can effectively model the neural activity patterns of the brain's circuits; this capability makes hyperdimensional vectors very helpful in many real-world tasks; a new hyper-vector can be based on vector or Boolean operations, such as binding that forms a new hyper-vector which associates two base hyper-vectors, and bundling that combines several hyper-vectors into a single composite hyper-vector; component-wise XOR: bind two hyper-vectors A and B by component-wise XOR and denote the operation as A[Symbol font/0xC4]B. The result of this operation is a new hyper-vector that is dissimilar to its constituents (i.e., d(A[Symbol font/0xC4]B; A)≈D/2), where d() is the Hamming distance; hence XOR can be used to associate two hyper-vectors; ¶¶ [0284]-[0286] with FIGS. 31(a)-(c): the encoder, shown in FIG. 31a, implements bitwise XOR operations between hyper-vectors P and L over different features, and thresholds the results; enable in-memory XOR operations by making a small modification to the sense amplifier of the crossbar memory, as shown in FIG. 31b; FIG. 31c shows the sense amplifier designed to implement the majority function).
Qi in view of Kleyko, and Rosing are analogous art because they are from the same field of endeavor, a system and a method relating to deep learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Rosing to Qi in view of Kleyko.  Motivation for doing so would be more e.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yu et al. ("Accurate and Efficient Stochastic Computing Hardware for Convolutional Neural Networks", 2017 IEEE International Conference on Computer Design (ICCD), Nov. 5-8, 2017, pp. 105-112) discloses in ABSTRACT of Page 105 that (1) present an efficient unipolar stochastic computing hardware for convolutional neural networks (CNNs); (2) it includes stochastic ReLU and optimized max function, which are key components in a CNN; (3) to avoid the range limitation problem of stochastic numbers and increase the signal-to-noise ratio, perform weight normalization and upscaling; (4) in addition, to reduce the overhead of binary-to-stochastic conversion, propose a scheme for sharing stochastic number generators among the neurons in a CNN; and (5) the approach outperforms the previous ones based on stochastic computing in terms of accuracy, area, and energy consumption.  Yu further discloses in Section III.C with FIG. 5 in Page 108 that (1) propose an optimized stochastic max (Smax) function shown in Fig. 5(a) to reduce the SNG overhead; (2) the basic concept of Smax is updating only the difference of the two input SC bitstreams; (3) the difference can be easily calculated with a single XOR gate; (4) in Fig. 5(a), for example, if A and B are different, the Tanh module (implemented as an FSM) is enabled to update its own state; (5) the input from A to Tanh works as a bipolar-encoded number, so 1 increases the state and 0 decreases it; (6) thus, if A is larger, Tanh tends to stay on the high state side; if B is larger, it tends to stay on the low state side; (7) the Mux in Fig. 5(a) selects A when Tanh is at a state higher than half of the highest state; (8) when the enable value is 0 (i.e., A and B are the same), Tanh does not update its state; (9) Fig. 5(a) and (c) show Smax results and mean absolute errors, respectively; and (10) the errors are very small, implying that the Smax module well approximates the conventional max operation.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HWEI-MIN LU whose telephone number is (313)446-4913. The examiner can normally be reached Mon - Fri: 9:00 AM - 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HWEI-MIN LU/Primary Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

May 25, 2023
Application Filed
Feb 21, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/737,938
Patent 12602578
LIGHT SOURCE COLOR COORDINATE ESTIMATION SYSTEM AND DEEP LEARNING METHOD THEREOF
2y 5m to grant Granted Apr 14, 2026
17/804,513
Patent 12596954
MACHINE LEARNING FOR MANAGEMENT OF POSITIONING TECHNIQUES AND RADIO FREQUENCY USAGE
2y 5m to grant Granted Apr 07, 2026
17/231,757
Patent 12591770
PREDICTING A STATE OF A COMPUTER-CONTROLLED ENTITY
2y 5m to grant Granted Mar 31, 2026
17/662,568
Patent 12579466
DYNAMIC USER-INTERFACE COMPARISON BETWEEN MACHINE LEARNING OUTPUT AND TRAINING DATA
2y 5m to grant Granted Mar 17, 2026
17/805,377
Patent 12561222
REDUCING BIAS IN MACHINE LEARNING MODELS UTILIZING A FAIRNESS DEVIATION CONSTRAINT AND DECISION MATRIX
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+39.5%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 217 resolved cases by this examiner. Grant probability derived from career allow rate.