Last updated: May 29, 2026
Application No. 18/344,831
NEURAL NETWORK INFERENCE BASED ON TABLE LOOKUP

Non-Final OA §102§103§112
Filed
Jun 29, 2023
Examiner
SPRAUL III, VINCENT ANTON
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
Interview Optional

— +26.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 60% grant rate with +26.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 37 resolved cases, 2023–2026
Examiner Intelligence

SPRAUL III, VINCENT ANTON View full profile →
Grants 60% of resolved cases
Career Allowance Rate
22 granted / 37 resolved
+4.5% vs TC avg
Strong +27% interview lift
Without
With
+26.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
19 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.8%
-37.2% vs TC avg
§103
93.8%
+53.8% vs TC avg
§102
0.6%
-39.4% vs TC avg
§112
1.7%
-38.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 37 resolved cases
Office Action

§102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 3 and 15 objected to because of the following informalities. Both claims recite “a probability for a centroid in a codebook indicating a probability that the centroid is closet to a sample input sub-vector associated with the codebook relative to other centroids in the codebook.” Examiner respectfully suggests that “closet” was intended to be “closest,” and in further examination below, the claims will be interpreted as so written.

Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6 and 18 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claims 6 and 18 both recite “wherein the differentiable function is a softmax function.” However, no antecedent basis is provided for “the differentiable function.” Examiner respectfully suggests that claim 6 was intended to depend up claim 4, and claim 18 was intended to depend upon claim 16. In further examination below, claims 6 and 18 will be interpreted as depending upon claims 4 and 16, respectively.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-4, 6-9, 12-16, and 18-20 rejected under 35 U.S.C. 102(a) (1) as being anticipated by Ran et al., “PECAN: A Product-Quantized Content Addressable Memory Network,” April 2023,  2023 Design, Automation & Test in Europe Conference & Exhibition (hereafter Ran).

Regarding claim 1 and analogous claims 13 and 20: 
Ran teaches:
“A computer-implemented method, comprising”: Ran, Abstract, “A novel deep neural network (DNN) architecture [computer-implemented method] is proposed wherein the filtering and linear transform are realized solely with product quantization (PQ)”; Ran, section IV, paragraph2, “All experiments are run on a machine equipped with four NVIDIA Tesla V100 GPU with 24GB frame buffer, and all codes are implemented by PyTorch.”
“dividing a first input for a first layer of a neural network into a first plurality of input sub-vectors”: Ran, section III, paragraph 1, “It is natural to set each prototype in PQ to be a k2 x 1 subvector (viz. same size as a vectorized kernel), with p prototypes in each of the cin input channels according to the patterns of flattened matrices. With this setting, there are two main components in a trained PECAN that require memory storage in each layer, namely, i) pcin prototypes for ‘quantizing’ the input subvectors [dividing a first input for a first layer of a neural network into a first plurality of input sub-vectors]; ii) coutcinp inner product values between the (sub)rows in F and each prototype.”
“determining respective target centroids for the first plurality of input sub-vectors based on respective distances between the first plurality of input sub-vectors and respective centroids in a first plurality of codebooks for the first layer, a centroid representing a cluster of sub- vectors with matched feature information”: Ran, section III, paragraph 1, “For an intermediate CNN layer, consider the flattened feature matrix X ∈ ℝcink2×HoutWout, where cin and k are the number of input channels and the kernel size, Hout and Wout are height and width of the output feature, codebooks C ∈ ℝcink2 x p are assigned with parameters to construct an embedding table for the features, where p is the number of choices for each codebook C(j), j = 1, 2, …, D. C(j)m ∈ ℝd are called prototypes [determining respective target centroids for the first plurality of input sub-vectors … a centroid representing a cluster of sub- vectors with matched feature information], m = 1, 2, . . . , p (cf. Fig. 1(c))”; Ran, section III, paragraph 2, “In short, PECAN is mapping (quantizing) the original input features onto prototype patterns in compact codebooks, then multiplication between weights (F) and features (X) can be approximated by lookup table operation during inference [based on respective distances between the first plurality of input sub-vectors and respective centroids in a first plurality of codebooks for the first layer]”; Ran, Fig. 2 caption, “When approximating subvectors with the closest prototypes, PECAN-A and PECAN-D adopt different assignment schemes.”
“and the respective centroids in the first plurality of codebooks being determined along with a first weight matrix for the first layer through a training procedure of the neural network”: Ran, section III. A, paragraph 1, “For PECAN-A, we compute the approximated matrix ˜X by splitting its rows into D = cin groups, each with subvectors of dimension d = k2, and get the attention scores Ki(j) to formulate the combination of prototypes Cm(j). […] Since the dot product distance function with softmax is differentiable, mapping features to prototypes can be learned end-to-end. It is worth noting that all intermediate features are replaced with the combination of learned prototypes after training [the respective centroids in the first plurality of codebooks being determined along with a first weight matrix for the first layer through a training procedure of the neural network].”
“selecting, from a lookup table, respective target computation results of the respective target centroids with the first weight matrix, the lookup table comprising respective computation results of the respective centroids in the first plurality of codebooks with the first weight matrix”; Ran, section III. C, paragraph 1, “For the original convolution, the computation im2col complexity is O(cinHoutWoutk2cout). During inference, our method includes two stages, the first is to get the indices by computing the distance between the flattened features and prototypes, while the second is to retrieve the product between weights and prototypes computed in advance, i.e., a simple table lookup [selecting, from a lookup table, respective target computation results of the respective target centroids with the first weight matrix, the lookup table comprising respective computation results of the respective centroids in the first plurality of codebooks with the first weight matrix].”
“determining a first output corresponding to the first input for the first layer based on aggregation of the respective target computation results”: Ran, section III. C, paragraph 1, “For the original convolution, the computation im2col complexity is O(cinHoutWoutk2cout). During inference, our method includes two stages, the first is to get the indices by computing the distance between the flattened features and prototypes, while the second is to retrieve the product between weights and prototypes computed in advance, i.e., a simple table lookup”; Ran, “Algorithm 1 Inference Algorithm of PECAN,” “Input: Codebook C ∈ ℝcink2×p, 4-D learned kernel tensor K ∈ ℝcout×cin×k×k, unfolded features X ∈ ℝcink2×HoutWout. Output: The approximated convolution output ˜Y ∈ ℝcout×HoutWout [determining a first output corresponding to the first input for the first layer based on aggregation of the respective target computation results].”

Regarding claim 2 and analogous claim 14: 
Ran teaches “The method of claim 1.” 
Ran further teaches “wherein the respective centroids in the first plurality of codebooks are updated by decreasing a loss function for training the neural network during a backpropagation of the training procedure”: Ran, section III. B, paragraph 2, “Based on this, we can now use the argmax function in the forward pass and softmax function during backpropagation [a backpropagation of the training procedure]”; Ran, section IV, paragraph 2, “To implement the PECAN framework for the CIFAR-10 and CIFAR-100 tasks, we use the co-optimization strategy that update the prototypes and weights together. We set the training epochs for PECAN-A and PECAN-D as 150 and 300, respectively. The learning rate for PECAN-A is set to 0.01 initially, decaying every 50 epoch, while that of PECAN-D is initialized as 0.001, decaying at epoch 200. For both datasets, we employ softmax function and set the temperature τ at 1 and 0.5 for PECAN-A and PECAN-D, respectively. We set the batch size to 64, and use cross-entropy as the loss function [the respective centroids in the first plurality of codebooks are updated by decreasing a loss function for training the neural network ], which is optimized by Adam.”

Regarding claim 3 and analogous claim 15: 
Ran teaches “The method of claim 2.” 
Ran further teaches:
“wherein a sample input for the first layer corresponding to a training sample is divided into a first plurality of sample input sub-vectors, a sample input sub-vector for the first layer is associated with one of the first plurality of codebooks”: Ran, section III, paragraph 2, “In short, PECAN is mapping (quantizing) the original input features onto prototype patterns in compact codebooks, then multiplication between weights (F) and features (X) can be approximated by lookup table operation during inference”; Ran, Fig. 2 caption, “When approximating subvectors with the closest prototypes [a sample input sub-vector for the first layer is associated with one of the first plurality of codebooks], PECAN-A and PECAN-D adopt different assignment schemes”; Ran, section III, paragraph 1, “It is natural to set each prototype in PQ to be a k2 x 1 subvector (viz. same size as a vectorized kernel), with p prototypes in each of the cin input channels according to the patterns of flattened matrices. With this setting, there are two main components in a trained PECAN that require memory storage in each layer, namely, i) pcin prototypes for ‘quantizing’ the input subvectors [a sample input for the first layer corresponding to a training sample is divided into a first plurality of sample input sub-vectors]; ii) coutcinp inner product values between the (sub)rows in F and each prototype.”
“and wherein the loss function is determined based at least in part on respective probabilities for the respective centroids in the first plurality of codebooks, a probability for a centroid in a codebook indicating a probability that the centroid is closest to a sample input sub-vector associated with the codebook relative to other centroids in the codebook” (underlined text as interpreted by Examiner; see claim objections, above): Ran, section III. A, paragraph 1, “For PECAN-A, we compute the approximated matrix ˜X by splitting its rows into D = cin groups, each with subvectors of dimension d = k2, and get the attention scores Ki(j) to formulate the combination of prototypes Cm(j):

    PNG
    media_image1.png
    35
    448
    media_image1.png
    Greyscale

where i = 1, 2, . . . ,HoutWout. Since the dot product distance function with softmax is differentiable, mapping features to prototypes can be learned end-to-end [softmax transforms results to probabilities that a particular result is most correct, used here as part of the training, hence, the loss function is determined based at least in part on respective probabilities for the respective centroids in the first plurality of codebooks, a probability for a centroid in a codebook indicating a probability that the centroid is closest to a sample input sub-vector associated with the codebook relative to other centroids in the codebook]. It is worth noting that all intermediate features are replaced with the combination of learned prototypes after training.”

Regarding claim 4 and analogous claim 16: 
Ran teaches “The method of claim 3.” 
Ran further teaches “wherein the probability for the centroid in the codebook is determined through a differentiable function, the differentiable function being determined based at least in part on a distance between the centroid and the sample input sub-vector, and wherein the respective centroids in the first plurality of codebooks are updated through a backpropagation based on gradient information generated via the differentiable function”: Ran, section III. B, paragraph 2, “Based on this, we can now use the argmax function in the forward pass and softmax function during backpropagation [a backpropagation based on gradient information generated via the differentiable function]”; Ran, section III. A, paragraph 1, “For PECAN-A, we compute the approximated matrix ˜X by splitting its rows into D = cin groups, each with subvectors of dimension d = k2, and get the attention scores Ki(j) to formulate the combination of prototypes Cm(j):

    PNG
    media_image1.png
    35
    448
    media_image1.png
    Greyscale

where i = 1, 2, . . . ,HoutWout. Since the dot product distance function [the differentiable function being determined based at least in part on a distance between the centroid and the sample input sub-vector] with softmax is differentiable [the probability for the centroid in the codebook is determined through a differentiable function], mapping features to prototypes can be learned end-to-end [the respective centroids in the first plurality of codebooks are updated through a backpropagation based on gradient information generated via the differentiable function]. It is worth noting that all intermediate features are replaced with the combination of learned prototypes after training.”

Regarding claim 6 and analogous claim 18: 
Ran teaches “The method of claim -4” (underlined text as interpreted by Examiner; see 112(b) claim rejection, above).
Ran further teaches “wherein the differentiable function is a softmax function”: Ran, section III. A, paragraph 1, “For PECAN-A, we compute the approximated matrix ˜X by splitting its rows into D = cin groups, each with subvectors of dimension d = k2, and get the attention scores Ki(j) to formulate the combination of prototypes Cm(j):

    PNG
    media_image1.png
    35
    448
    media_image1.png
    Greyscale

where i = 1, 2, . . . ,HoutWout. Since the dot product distance function with softmax is differentiable [wherein the differentiable function is a softmax function], mapping features to prototypes can be learned end-to-end. It is worth noting that all intermediate features are replaced with the combination of learned prototypes after training.”

Regarding claim 7: 
Ran teaches “The method of claim 3.”
Ran further teaches “wherein the respective centroids in the first plurality of codebooks are initialized in the training procedure by clustering a set of sample inputs for the first layer corresponding to a set of training samples in a training dataset for the neural network”: Ran, section III, paragraph 1, “The convolution operation in a CNN is conceptually illustrated as a window sliding across the cin-channel input feature (cf. Fig. 1(a)) [clustering a set of sample inputs for the first layer corresponding to a set of training samples in a training dataset for the neural network, interpreted as including operations that divide training data into groups]. Actual implementations often unfold the convolution into a matrix-matrix product (cf. Fig. 1(b)). Specifically, the im2col command stretches the input entries covered in each filter stride into a column and concatenates the columns into a matrix X, whereas the kernel tensors are reshaped into a filter matrix F, such that PQ can be used to approximate FX. For an intermediate CNN layer, consider the flattened feature matrix X ∈ ℝcink2×HoutWout, where cin and k are the number of input channels and the kernel size, Hout and Wout are height and width of the output feature, codebooks C ∈ ℝcink2 x p are assigned with parameters to construct an embedding table for the features [initialized in the training procedure], where p is the number of choices for each codebook C(j), j = 1, 2, …, D. C(j)m ∈ ℝd are called prototypes, m = 1, 2, . . . , p (cf. Fig. 1(c))”; Ran, section III, paragraph 2, “In short, PECAN is mapping (quantizing) the original input features onto prototype patterns in compact codebooks, then multiplication between weights (F) and features (X) can be approximated by lookup table operation during inference.”

Regarding claim 8 and analogous claim 19:
Ran teaches “The method of claim 1.”
Ran further teaches “wherein the lookup table is a quantized lookup table comprising respective quantized computation results of the respective centroids in the first plurality of codebooks with the first weight matrix”: Ran, section III, paragraph 2, “In short, PECAN is mapping (quantizing) the original input features onto prototype patterns in compact codebooks, then multiplication between weights (F) and features (X) can be approximated by lookup table operation during inference [wherein the lookup table is a quantized lookup table comprising respective quantized computation results of the respective centroids in the first plurality of codebooks with the first weight matrix].”

Regarding claim 9:
Ran teaches “The method of claim 1.”
Ran further teaches:
“wherein an intermediate lookup table is used during a backpropagation of the training procedure, the intermediate lookup table comprising respective intermediate real-value computation results of respective intermediate centroids in the first plurality of codebooks with an intermediate weight matrix for the first layer”: Ran, section III, paragraph 1, “For an intermediate CNN layer, consider the flattened feature matrix X ∈ ℝcink2×HoutWout, where cin and k are the number of input channels and the kernel size, Hout and Wout are height and width of the output feature, codebooks C ∈ ℝcink2 x p are assigned with parameters to construct an embedding table for the features, where p is the number of choices for each codebook C(j), j = 1, 2, …, D. C(j)m ∈ ℝd are called prototypes, m = 1, 2, . . . , p (cf. Fig. 1(c))”; Ran, section III, paragraph 2, “In short, PECAN is mapping (quantizing) the original input features onto prototype patterns in compact codebooks, then multiplication between weights (F) and features (X) can be approximated by lookup table operation during inference “; Ron, section III. B, paragraphs 1-2, “To enable optimization for prototypes with the non-differentiable function argmax, we approximate it with a differentiable softmax function:

    PNG
    media_image2.png
    69
    406
    media_image2.png
    Greyscale

[where C is the set of codebooks used for lookups of computations between weights and features, hence, an intermediate lookup table … comprising respective intermediate real-value computation results of respective intermediate centroids in the first plurality of codebooks with an intermediate weight matrix for the first layer, intermediate interpreted applying to values used and/or updated during an ongoing process] where τ is the temperature to relax the softmax function. […] Now the approximated index ˜Ki(j) is fully differentiable when τ != 0. However, this yields the combination of prototypes for ˜Xi(j) again, while we need τ → 0 to get discrete indices during the forward inference. To this end, we follow [3] and define a new index to solve both non-differentiable and discrete problems in one go. Specifically, in the forward and backward passes during training, we adopt

    PNG
    media_image3.png
    52
    448
    media_image3.png
    Greyscale

where sg is stop gradient [K is computed using C, hence, intermediate lookup table is used during a backpropagation of the training procedure], which takes the identity function in the forward pass and drops the gradient inside it in the backward pass. Based on this, we can now use the argmax function in the forward pass and softmax function during backpropagation.”
“and wherein an intermediate quantized lookup table is used during a forward propagation of the training procedure, the intermediate quantized lookup table comprising respective intermediate quantized computation results of the respective intermediate centroids in the first plurality of codebooks with the intermediate weight matrix for the first layer”: Ran, section III, paragraph 2, “In short, PECAN is mapping (quantizing) the original input features onto prototype patterns in compact codebooks, then multiplication between weights (F) and features (X) can be approximated by lookup table operation during inference [hence the prototypes are a quantized lookup table]”; Ron, section III. B, paragraphs 1-2, “To enable optimization for prototypes with the non-differentiable function argmax, we approximate it with a differentiable softmax function:

    PNG
    media_image2.png
    69
    406
    media_image2.png
    Greyscale

[where C is the set of codebooks used for lookups of computations between weights and features, containing quantized prototypes, hence, an intermediate lookup table … intermediate quantized lookup table comprising respective intermediate quantized computation results of the respective intermediate centroids in the first plurality of codebooks with the intermediate weight matrix for the first layer, intermediate interpreted applying to values used and/or updated during an ongoing process] where τ is the temperature to relax the softmax function. […] Now the approximated index ˜Ki(j) is fully differentiable when τ != 0. However, this yields the combination of prototypes for ˜Xi(j) again, while we need τ → 0 to get discrete indices during the forward inference. To this end, we follow [3] and define a new index to solve both non-differentiable and discrete problems in one go. Specifically, in the forward and backward passes during training, we adopt

    PNG
    media_image3.png
    52
    448
    media_image3.png
    Greyscale

where sg is stop gradient, which takes the identity function in the forward pass and drops the gradient inside it in the backward pass. Based on this, we can now use the argmax function in the forward pass and softmax function during backpropagation.”

Regarding claim 12:
Ran teaches “The method of claim 1.”
	Ran further teaches “wherein a second plurality of codebooks are determined for a second layer of the neural network, respective centroids in the second plurality of codebooks being determined along with a second weight matrix for the second layer through the training procedure of the neural network”: Ran, Fig. 3 caption, “Fig. 3. The flattened features and codebooks for five different layers in VGG-Small [including a second layer of the neural network], (a)-(e) for conv1-conv5. For each subfigure, the upper image is the input feature after im2col operation, the second image shows the approximation matrix [a second weight matrix ] after substitution with PECAN-D which is composed of the corresponding codebook shown in the third row [respective centroids in the second plurality of codebooks].”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 5 and 17 rejected under 35 U.S.C. 103 over Ran in view of Ichino et al., US Pre-Grant Publication No. 2023/0360440 (hereafter Ichino). 
Ran teaches “The method of claim 4.” 
Ran further teaches (bold only) “wherein the differentiable function is determined further based on a learning coefficient for the first layer, the learning coefficient is configured to control a distribution of probabilities for centroids in a codebook, and is determined along with the respective centroids in the first plurality of codebooks through the training procedure of the neural network”: Ran, section III. B, paragraph 1, “To enable optimization for prototypes with the non-differentiable function argmax, we approximate it with a differentiable softmax function:

    PNG
    media_image4.png
    60
    408
    media_image4.png
    Greyscale

where τ is the temperature to relax the softmax function [the learning coefficient is configured to control a distribution of probabilities for centroids in a codebook]”; Ran, section IV, paragraph 2, “To implement the PECAN framework for the CIFAR-10 and CIFAR-100 tasks, we use the co-optimization
strategy that update the prototypes and weights together. We set the training epochs for PECAN-A and PECAN-D as 150 and 300, respectively. The learning rate for PECAN-A is set to 0.01 initially, decaying every 50 epoch, while that of PECAN-D is initialized as 0.001, decaying at epoch 200. For both datasets, we employ softmax function and set the temperature τ at 1 and 0.5 for PECAN-A and PECAND,
respectively [and is determined along with the respective centroids in the first plurality of codebooks].”
Ran does not explicitly teach (bold only) “wherein the differentiable function is determined further based on a learning coefficient for the first layer, the learning coefficient is configured to control a distribution of probabilities for centroids in a codebook, and is determined along with the respective centroids in the first plurality of codebooks through the training procedure of the neural network.”
Ichino teaches (bold only) “wherein the differentiable function is determined further based on a learning coefficient for the first layer, the learning coefficient is configured to control a distribution of probabilities for centroids in a codebook, and is determined along with the respective centroids in the first plurality of codebooks through the training procedure of the neural network”: Ichino, paragraph 0182, “For example, the attention processing unit 304 may be configured to multiply the attention map input from training data by a temperature coefficient S and input a multiplication result to the Softmax function and the multiplication unit 313 may be configured to multiply a result of the convolution calculation unit 311 by an obtained function value. In this case, the temperature coefficient S may be a parameter adjusted through learning [through the training procedure of the neural network].”
Ichino and Ran are analogous arts as they are both related to neural network training using softmax. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the trainable softmax temperature of Ichino with the teachings of Ran to arrive at the present invention, in order to improve training, as stated in Ichino, paragraph 0184, “Thereby, the authentication device 100 is expected to perform authentication with relatively high accuracy using a feature in which a part having many features for personal identification is emphasized.”

Claim 10 rejected under 35 U.S.C. 103 over Ran in view of Chen, US Pre-Grant Publication No. 2020/0074318 (hereafter Chen). 
Ran teaches “The method of claim 1.” 
Ran does not explicitly teach “in accordance with a determination that an inference service for the neural network is activated, storing the first plurality of codebooks in a cache.”
Chen teaches “accordance with a determination that an inference service for the neural network is activated, storing the first plurality of codebooks in a cache”: Chen, paragraph 0273, “Example 7 includes the subject matter of Examples 1 -6, wherein the one or more processors are further to: assign one or more caches to one or more layers of the plurality of layers of the neural network to expedite inference acceleration for the neural network [in accordance with a determination that an inference service for the neural network is activated, storing the first plurality of codebooks in a cache].”
Chen and Ran are analogous arts as they are both related to neural network efficiency improvements. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the caching of neural network data of Chen with the teachings of Ran to arrive at the present invention, in order to improve inference processing speed, as stated in Chen, paragraph 0273, “Example 7 includes the subject matter of Examples 1 -6, wherein the one or more processors are further to: assign one or more caches to one or more layers of the plurality of layers of the neural network to expedite inference acceleration for the neural network.”

Claim 11 rejected under 35 U.S.C. 103 over Ran in view of Wikipedia, “Divide-and-conquer algorithm,” version of May 2023 (hereafter Wikipedia).
	Ran teaches “The method of claim 1.” 
	Ran further teaches (bold only) “wherein a first codebook of the first plurality of codebooks is divided into a plurality of sub-codebooks, a sub-codebook comprising two or more centroids, the first codebook being associated with a first input sub-vector of the first plurality of input sub-vectors, and wherein determining the respective target centroids comprises selecting a plurality of candidate target centroids from the plurality of sub-codebooks based on parallel distance comparison for the plurality of sub-codebooks, the parallel distance comparison is configured to compare distances between the first input sub-vector and respective centroids in the plurality of sub-codebooks; and determining a target centroid for the first input sub-vector from the plurality of candidate target centroids by comparing distances between the first input sub-vector and the plurality of candidate target centroids”: Ran, section III, paragraph 1, “For an intermediate CNN layer, consider the flattened feature matrix X ∈ ℝcink2×HoutWout, where cin and k are the number of input channels and the kernel size, Hout and Wout are height and width of the output feature, codebooks C ∈ ℝcink2 x p are assigned with parameters to construct an embedding table for the features [the first codebook being associated with a first input sub-vector of the first plurality of input sub-vectors], where p is the number of choices for each codebook C(j), j = 1, 2, …, D. C(j)m ∈ ℝd are called prototypes [determining the respective target centroids comprises selecting a plurality of candidate target centroids], m = 1, 2, . . . , p (cf. Fig. 1(c))”; Ran, section III, paragraph 2, “In short, PECAN is mapping (quantizing) the original input features onto prototype patterns in compact codebooks, then multiplication between weights (F) and features (X) can be approximated by lookup table operation during inference”; Ran, Fig. 2 caption, “When approximating subvectors with the closest prototypes [based on … distance comparison; the … distance comparison is configured to compare distances between the first input sub-vector and respective centroids … ; and determining a target centroid for the first input sub-vector], PECAN-A and PECAN-D adopt different assignment schemes.”
	Ran does not explicitly teach (bold only) “wherein a first codebook of the first plurality of codebooks is divided into a plurality of sub-codebooks, a sub-codebook comprising two or more centroids, the first codebook being associated with a first input sub-vector of the first plurality of input sub-vectors, and wherein determining the respective target centroids comprises selecting a plurality of candidate target centroids from the plurality of sub-codebooks based on parallel distance comparison for the plurality of sub-codebooks, the parallel distance comparison is configured to compare distances between the first input sub-vector and respective centroids in the plurality of sub-codebooks; and determining a target centroid for the first input sub-vector from the plurality of candidate target centroids by comparing distances between the first input sub-vector and the plurality of candidate target centroids.”
Wikipedia teaches (bold only) “wherein a first codebook of the first plurality of codebooks is divided into a plurality of sub-codebooks, a sub-codebook comprising two or more centroids, the first codebook being associated with a first input sub-vector of the first plurality of input sub-vectors, and wherein determining the respective target centroids comprises selecting a plurality of candidate target centroids from the plurality of sub-codebooks based on parallel distance comparison for the plurality of sub-codebooks, the parallel distance comparison is configured to compare distances between the first input sub-vector and respective centroids in the plurality of sub-codebooks; and determining a target centroid for the first input sub-vector from the plurality of candidate target centroids by comparing distances between the first input sub-vector and the plurality of candidate target centroids”: Wikipedia, paragraph 4, “The divide-and-conquer paradigm is often used to find an optimal solution of a problem. Its basic idea is to decompose a given problem into two or more similar, but simpler, subproblems [is divided into a plurality of sub-codebooks, a sub-codebook comprising two or more centroids], to solve them in turn [from the plurality of sub-codebooks based on parallel distance comparison for the plurality of sub-codebooks], and to compose their solutions to solve the given problem [from the plurality of candidate target centroids by comparing distances between the first input sub-vector and the plurality of candidate target centroids]. Problems of sufficient simplicity are solved directly. For example, to sort a given list of n natural numbers, split it into two lists of about n/2 numbers each, sort each of them in turn, and interleave both results appropriately to obtain the sorted version of the given list (see the picture). This approach is known as the merge sort algorithm.”
Wikipedia and Ran are analogous arts as they are both related to algorithmic efficiency. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the divide and conquer algorithm from Wikipedia with the closest-value search of Ran to arrive at the present invention’s finding of a closest value through dividing the list of candidates and comparing the closest result in each sub-list, in order to take advantage of parallelism for performance improvements, as stated in Wikipedia, “Parallelism,” “Divide-and-conquer algorithms are naturally adapted for execution in multi-processor machines, especially shared-memory systems where the communication of data between processors does not need to be planned in advance because distinct sub-problems can be executed on different processors.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Bagherinezhad et al., US Pre-Grant Publication No. 2019/0026600, discloses a lookup-based convolutional neural network using a sparse tensor that is updated during training using back-propagation.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINCENT SPRAUL whose telephone number is (703) 756-1511. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL HUNTLEY can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VAS/               Examiner, Art Unit 2129                                                                                                                                                                                         
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jun 29, 2023
Application Filed
Mar 17, 2026
Non-Final Rejection mailed — §102, §103, §112
May 05, 2026
Examiner Interview Summary
May 05, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

17/163,396
Patent 12619905
FLEXIBLE EMBEDDING SYSTEMS AND METHODS FOR REAL-TIME COMPARISONS
5y 3m to grant Granted May 05, 2026
17/557,599
Patent 12608446
DETERMINING PERFORMANCE CHANGE WITHIN A DATASET WITH AN APPLIED CONDITION USING MACHINE LEARNING MODELS
4y 4m to grant Granted Apr 21, 2026
17/163,383
Patent 12591634
COMPOSITE EMBEDDING SYSTEMS AND METHODS FOR MULTI-LEVEL GRANULARITY SIMILARITY RELEVANCE SCORING
5y 2m to grant Granted Mar 31, 2026
17/249,028
Patent 12591796
INTELLIGENT DISTANCE PROMPTING
5y 1m to grant Granted Mar 31, 2026
17/353,931
Patent 12572620
RELIABLE INFERENCE OF A MACHINE LEARNING MODEL
4y 8m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
86%
With Interview (+26.7%)
4y 4m (~1y 5m remaining)
Median Time to Grant
Low
PTA Risk
Based on 37 resolved cases by this examiner. Grant probability derived from career allowance rate.