Office Action Analysis: 18320896 — APPARATUS AND METHOD WITH QUANTIZATION CONFIGURATOR

Examiner Intelligence

PHAM, JESSICA THUY View full profile →
Grants only 33% of cases
Career Allow Rate
1 granted / 3 resolved
-21.7% vs TC avg
Minimal -33% lift
Without
With
+-33.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
38 currently pending
Career history
41
Total Applications
across all art units
Statute-Specific Performance

§101
26.8%
-13.2% vs TC avg
§103
35.5%
-4.5% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
22.7%
-17.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are pending and examined herein.
Claims 1-20 are rejected under 35 U.S.C. 112(b).
Claims 1-20 are rejected under 35 U.S.C. 101.
Claims 1-20 are rejected under 35 U.S.C. 103.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The attached information disclosure statement(s) (IDS) filed on 5/19/2023 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement(s) is/are being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 1 and 11 recite the limitations “generating genes by cataloging possible combinations of a quantization precision and a calibration method for each of layers of a pre-trained neural network” and “determining layer sensitivity for each of the layers based on combinations corresponding to the genes”. It is unclear whether the “combinations corresponding to the genes” in the second limitation refers to the “possible combinations of a quantization precision and a calibration method for each of layers of a pre-trained neural network” in the first limitation or if it is another set of combinations. Therefore, the claims are rendered indefinite. For purposes of examination, the  “combinations corresponding to the genes” in the second limitation will be treated as referring to the “possible combinations of a quantization precision and a calibration method for each of layers of a pre-trained neural network” in the first limitation.
Dependent claims 2-10 and 12-20 fail to resolve the issue and are rejected with the same rationale.

Claims 2 and 12 recite the limitations “performing post training quantization (PTQ) on a first of the layers; determining prediction accuracy of the pre-trained neural network by applying the PTQ to the first layer; and determining a difference between prediction accuracy of the pre-trained neural network and prediction accuracy obtained by applying the PTQ to the first layer." It is unclear whether the PTQ is applied to the first layer once in the step of “performing post training quantization (PTQ) on a first of the layers” and the determinations use the result of that first application, or if PTQ is performed three times on the first layer, once in each step. Therefore, the claims are rendered indefinite. For purposes of examination, the claim will be interpreted as having one application of PTQ on the first layer, and the determinations will be interpreted as using the result of the application of PTQ on the first layer.
Claims 2 and 12 recite the first limitation “determining prediction accuracy of the pre-trained neural network by applying the PTQ to the first layer” and the second limitation “determining a difference between prediction accuracy of the pre-trained neural network and prediction accuracy obtained by applying the PTQ to the first layer.” It is unclear whether “prediction accuracy of the pre-trained neural network” in the second limitation or “prediction accuracy obtained by applying the PTQ to the first layer” in the second limitation refers to the “prediction accuracy of the pre-trained neural network” in the first limitation. As the “prediction accuracy of the pre-trained neural network” in the first limitation is determined “by applying the PTQ to the first layer”, the elements “prediction accuracy of the pre-trained neural network” in the second limitation and “prediction accuracy obtained by applying the PTQ to the first layer” in the second limitation seem to refer to the same value, and it is unclear why a difference would be determined between two of the same values. Therefore, the claims are rendered indefinite. For purposes of examination, the “prediction accuracy of the pre-trained neural network” in the second limitation will be treated as not referring to the first limitation, but is just the prediction accuracy of the pre-trained neural network, while the “prediction accuracy obtained by applying the PTQ to the first layer” in the second limitation will refer to the prediction accuracy obtained by applying PTQ to the first layer.

Claims 3 and 13 recite the limitation "the quantization precision" in the second paragraph of each claim.  There is insufficient antecedent basis for this limitation in the claim.
Claims 3 and 13 recite the limitation “calibration available for the quantization configuration.” Claims 1 and 11 recite the limitation “a calibration method.” It is unclear whether the “calibration available for the quantization configuration” in claims 3/13 refers to “a calibration method” in claims 1/11 or if it is a separate calibration. Therefore, the claims are rendered indefinite. For purposes of examination, “calibration available for the quantization configuration” in claims 3/13 will be treated as referring to “a calibration method” in claims 1/11. 

Claims 8 and 18 recite the limitation “determining a quantization precision and a calibration method to be applied to each of the layers, considering the prediction accuracy of the neural network and a fitness evaluation function for energy.” It is unclear whether “a quantization precision and a calibration method” refers to “a quantization precision and a calibration method” in claim 1/11 or if it is a separate quantization precision and calibration method. Therefore, the claims are rendered indefinite. For purposes of examination,  “a quantization precision and a calibration method” in claims 8/18 will be treated as referring to “a quantization precision and a calibration method” in claim 1/11.
Claims 8 and 18 recite the limitation "the prediction accuracy of the neural network”  There is insufficient antecedent basis for this limitation in the claim.

Claims 9 and 19 recite the limitation "the generated quantization configuration to be applied to each of the layers.”  There is insufficient antecedent basis for this limitation in the claim.

Claim 20 recites the limitation “a quantization precision and a calibration method.” It is unclear whether “a quantization precision and a calibration method” refers to “a quantization precision and a calibration method” in claim 11 or if it is a separate quantization precision and calibration method. Therefore, the claims are rendered indefinite. For purposes of examination,  “a quantization precision and a calibration method” in claim 20 will be treated as referring to “a quantization precision and a calibration method” in claim 11.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject
matter. The analysis of claims 1-20, in accordance with these steps, follows.

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine,
manufacture, or composition of matter. Claims 1-9 are directed to a process, claim 10 is directed to an article of manufacture, and claims 10-20 are directed to a machine. All claims are directed to statutory categories and analysis proceeds.

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
	None of the claims represent an improvement to technology.
	
	Regarding claim 1, the following are abstract ideas:
generating genes by cataloging possible combinations of a quantization precision and a calibration method for each of layers of a pre-trained neural network; (Generating genes (i.e. generating a code) by cataloging combinations can be practically performed in the human mind. For example, one could determine possible precisions and determine calibration methods and generate a code that represents the combinations. This is a mental process.)
determining layer sensitivity for each of the layers based on combinations corresponding to the genes; (Determining a layer sensitivity can be practically performed in the human mind, i.e. using the accuracy before and after quantization to determine a value that represents a layer sensitivity for the combinations indicated by the genes. This is a mental process.)
determining priorities of the genes and selecting some of the genes based on the respective priorities of the genes; (Determining priorities and selecting genes can be practically performed in the human mind. This is a mental process.)
generating progeny genes by performing crossover on the selected genes; (One could practically generate genes by performing crossover in the human mind. For example, one could, given the data for the genes, determine parts of two genes to cross over to create another gene, i.e. if one selected gene code was 1234, and another was 4321, the crossover could be performed at the midpoint of each gene to generate the code of 1221. This is a mental process.)
calculating layer sensitivity for each of the layers corresponding to a combination of the crossover; and (Calculating a value is a mathematical calculation, which is a mathematical concept.)
updating one or more of the genes using the progeny genes based on a comparison of layer sensitivity of the genes and layer sensitivity of the progeny genes. (Updating the genes based on a comparison of layer sensitivities can be practically performed in the human mind, i.e. selecting progeny genes to add to the gene pool. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	A processor-implemented method of generating a quantization configuration, the method comprising: (This recites a generic computer component, which amounts to mere instructions to apply an exception.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. The following are abstract ideas:
determining a difference between prediction accuracy of the pre-trained neural network and prediction accuracy obtained by applying the PTQ to the first layer. (Determining a difference between accuracies is a mathematical calculation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
performing post training quantization (PTQ) on a first of the layers; (Post training quantization is a generic machine learning process. This amounts to mere instructions to apply an exception.)
determining prediction accuracy of the pre-trained neural network by applying the PTQ to the first layer; and (Determining an accuracy of a neural network is a generic machine learning process. This amounts to mere instructions to apply an exception.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. The following are abstract ideas:
determining a quantization precision available for the quantization configuration; and (Determining available precisions for a quantization configuration can be practically performed in the human mind. This is a mental process.)
generating a zero point and a scale factor corresponding to calibration available for the quantization configuration. (Generating a zero point and a scale factor based on the calibration can be practically performed in the human mind. This is a mental process.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. The following is an abstract idea:
determining the priority of the genes using at least one of a Pareto-front or a crowding distance for each of the layers. (Determining the priority of the genes using a Pareto-front or a crowding distance can be practically performed in the human mind. This is a mental process.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. The following is an abstract idea:
selecting some of the genes using tournament selection or biased roulette wheel for each of the layers. (Selecting genes using tournament selection or biased roulette wheel can be practically performed in the human mind. This is a mental process.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. The following is an abstract idea:
selecting a reference point for the crossover. (Selecting a reference point can be practically performed in the human mind. This is a mental process.)

Regarding claim 7, the rejection of claim 1 is incorporated herein. The following is an abstract idea:
randomly changing the quantization precision and/or the calibration method of the combination of the crossover through a mutation process. (Changing the values of the crossover through a mutation process can be practically performed in the human mind, i.e. randomly selecting values to change in the crossover genes. This is a mental process.)

Regarding claim 8, the rejection of claim 1 is incorporated herein. The following is an abstract idea:
determining a quantization precision and a calibration method to be applied to each of the layers, considering the prediction accuracy of the neural network and a fitness evaluation function for energy. (Determining a precision and a method for each layer using the accuracy and a fitness evaluation function can be practically performed in the human mind. This is a mental process.)

Regarding claim 9, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
re-training the pre-trained neural network based on the generated quantization configuration to be applied to each of the layers. (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)

Regarding claim 10, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1. (This recites generic computer components and processes. This amounts to mere instructions to apply an exception.)

Regarding claim 11, the following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
An apparatus for generating a quantization configuration, the apparatus comprising:  (This recites generic computer components and processes. This amounts to mere instructions to apply an exception.)
a memory configured to store instructions; and (This recites generic computer components and processes. This amounts to mere instructions to apply an exception.)
one or more processors configured to execute the instructions to configure the one or more processors to: (This recites generic computer components and processes. This amounts to mere instructions to apply an exception.)
The remainder of claim 11 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 12-19 recite substantially similar subject matter to claims 2-9 respectively and are rejected with the same rationale, mutatis mutandis. 

	Regarding claim 20, the rejection of claim 11 is incorporated herein. The following is an abstract idea:
determine a quantization precision and a calibration method to be applied to each of the layers based on a fitness evaluation function. (Determining a precision and a calibration method for each of the layers can be practically performed in the human mind. This is a mental process.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yuan (“EvoQ: Mixed Precision Quantization of DNNs via Sensitivity Guided Evolutionary Search”, July 19, 2020), Chang (“RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions”, October 30, 2021), and Fasfous ("HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology", September 17, 2021).

Regarding claim 1, Yuan teaches
A processor-implemented method of generating a quantization configuration, the method comprising: (Page 2 states "In this work, we use an evolutionary algorithm to explore mixed precision quantization policy in a heuristic manner. Given a pre-trained full precision model                 
                    M
                
            , our target is to find an optimized quantization policy                 
                    Π
                    (
                    
                            b
                        
                            1
                        
                    .
                    .
                    .
                     
                            b
                        
                            l
                        
                    )
                
            , where                 
                    
                            b
                        
                            t
                        
             denotes the quantization bit-width of the                 
                    i
                
            -th layer." Page 4 states "All experiments are conducted in Pytorch [31]." As Pytorch is framework used on a computer, the method must be implemented using a processor.)
generating genes by cataloging possible combinations of a quantization precision … for each of layers of a pre-trained neural network; (Page 2 states “Given a pre-trained full precision model                 
                    M
                
            , our target is to find an optimized quantization policy                 
                    Π
                    (
                    
                            b
                        
                            1
                        
                    .
                    .
                    .
                     
                            b
                        
                            l
                        
                    )
                
            , where                 
                    
                            b
                        
                            t
                        
             denotes the quantization bit-width of the                 
                    i
                
            -th layer. The problem can be defined as
                
                                            max
                                        
                                            Π
                                            
                                                            b
                                                        
                                                            1
                                                        
                                                    …
                                                     
                                                            b
                                                        
                                                            l
                                                        
                                    F
                                    
                                            Π
                                            
                                                            b
                                                        
                                                            1
                                                        
                                                    …
                                                     
                                                            b
                                                        
                                                            l
                                                        
                                            ,
                                             
                                            M
                                        
                                    s
                                    .
                                    t
                                    .
                                    
                                                    ∑
                                                    
                                                        i
                                                        =
                                                        1
                                                    
                                                        l
                                                    
                                                            C
                                                        
                                                            i
                                                        
                                                    *
                                                    
                                                            b
                                                        
                                                            i
                                                        
                                                    ∑
                                                    
                                                        i
                                                        =
                                                        1
                                                    
                                                        l
                                                    
                                                            C
                                                        
                                                            i
                                                        
                                    ≤
                                    
                                            b
                                        
                                            t
                                            a
                                            r
                                            g
                                            e
                                            t
                                        
                                    #
                                
                            1
                        
where                 
                    F
                    (
                    ⋅
                    )
                
             denotes the quantization policy evaluation function,                 
                    
                            C
                        
                            i
                        
             is the parameter size of the                 
                    i
                
            -th layer,                 
                    l
                
             represents the layer numbers, and                 
                    
                            b
                        
                            t
                            a
                            r
                            g
                            e
                            t
                        
             represents the target average bit-width.                 
                    
                            b
                        
                            m
                            i
                            n
                        
             and                 
                    
                            b
                        
                            m
                            a
                            x
                        
             denote the min and max bit-width, respectively. The total search space is exponential and equals to                 
                    
                                            b
                                        
                                            m
                                            a
                                            x
                                        
                                    -
                                    
                                            b
                                        
                                            m
                                            i
                                            n
                                        
                            l
                        
            .” Page 3, ‘B. Search strategy’, states "To automatically search for high-performing quantization policy, we employ a classical evolutionary algorithm, tournament selection. The procedure is summarized in Algorithm 1. It keeps a population of                 
                    P
                
             quantization policy throughout the experiment. The population is initialized with a uniform quantization policy and its random perturbations. After this, evolution improves the initial population in iterations. Each individual (quantization policy) is evaluated according to (3) using                 
                    N
                
             unlabeled samples." The quantization policies are interpreted as the genes, which are generated from the search space. As the genes are generated, they catalogue possible combinations of bit-widths, interpreted as the quantization precision. As the policy,                 
                    Π
                    (
                    
                            b
                        
                            1
                        
                    .
                    .
                    .
                     
                            b
                        
                            l
                        
                    )
                
             contains a bit-width for each layer, each layer has possible quantization precisions.)
determining layer sensitivity for each of the layers based on combinations corresponding to the genes; (Pages 3-4 states "To optimize the search efficiency, we use the quantization sensitivity of each layer to optimize the mutation direction. We first employ N samples to evaluate the quantization error per-layer as: 
                
                                    E
                                    
                                                    b
                                                
                                                    j
                                                
                                    =
                                    
                                            1
                                        
                                            N
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                                N
                                            
                                                                    Q
                                                                
                                                                            b
                                                                        
                                                                            j
                                                                        
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                            -
                                                            M
                                                            
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                    2
                                                
                                    #
                                
                            4
                        
where                 
                    
                            Q
                        
                                    b
                                
                                    j
                                
             denotes the quantization model that the                 
                    j
                
            -th layer is quantized to                 
                    
                            b
                        
                            j
                        
             bits.” Page 4 states "Based on the bit-width allocation of the individual, we calculate the relative gain or loss per-layer when we increase or reduce quantization bit-width." Therefore, the quantization sensitivity, interpreted as the layer sensitivity, is determined for each layer. As the bit-width allocation (part of the quantization configuration, interpreted as the genes) is used to calculate the relative gain/loss, the layer sensitivity is based on combinations corresponding to the genes.)
determining priorities of the genes and selecting [a gene] based on the respective priorities of the genes; (Page 3 states "At each evolutionary step,                 
                    S
                
             quantization policies are randomly sampled from the population. The quantization policy with the highest fitness in the sample is selected as the parent." Therefore, the fitness determines priorities of the genes, and the highest is selected as the parent.)
generating progeny genes by performing [mutation on the selected gene] (Line 14 of Algorithm 1 shows that that a child quantization policy, interpreted as the progeny gene is generated by mutating the parent gene.)
calculating layer sensitivity for each of the layers corresponding to a combination of [the mutation]; and (Note that, when combining the references of Yuan and Kang as shown below, the progeny gene would be produced by both mutation and crossover, and therefore would be used for the calculation of the layer sensitivity. Line 14 of Algorithm 1 shows that the progeny gene is the result of a mutation from the parent. Pages 3-4 states "To optimize the search efficiency, we use the quantization sensitivity of each layer to optimize the mutation direction. We first employ N samples to evaluate the quantization error per-layer as: 
                
                                    E
                                    
                                                    b
                                                
                                                    j
                                                
                                    =
                                    
                                            1
                                        
                                            N
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                                N
                                            
                                                                    Q
                                                                
                                                                            b
                                                                        
                                                                            j
                                                                        
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                            -
                                                            M
                                                            
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                    2
                                                
                                    #
                                
                            4
                        
where                 
                    
                            Q
                        
                                    b
                                
                                    j
                                
             denotes the quantization model that the                 
                    j
                
            -th layer is quantized to                 
                    
                            b
                        
                            j
                        
             bits.” Page 4 states "Based on the bit-width allocation of the individual, we calculate the relative gain or loss per-layer when we increase or reduce quantization bit-width. For the layers that are more sensitive to current bit-width, we increase their probability for higher bit-width. For the layers that are less sensitive to current bitwidth, we increase their probability for lower bit-width. This modification can improve the efficiency of search and helps escape from the local minimum." Therefore, the possible mutation of increasing or reducing the quantization of the layer is used to calculate the layer sensitivity.)
updating one or more of the genes using the progeny genes based on a comparison of layer sensitivity of the genes and layer sensitivity of the progeny genes. (Page 4 states "Based on the bit-width allocation of the individual, we calculate the relative gain or loss per-layer when we increase or reduce quantization bit-width. For the layers that are more sensitive to current bit-width, we increase their probability for higher bit-width. For the layers that are less sensitive to current bit-width, we increase their probability for lower bit-width. This modification can improve the efficiency of search and helps escape from the local minimum." As the current bit-width is used (the genes) to determine the  relative gain or loss per-layer when we increase or reduce quantization bit-width (the mutation that produces the progeny genes), the quantization sensitivity is compared between the genes and the progeny genes. As the progeny gene is pushed into the population in Algorithm 1, line 16, the genes (population) are modified using the progeny genes.)
	Yuan does not appear to explicitly teach
	[cataloging possible combinations of a quantization precision] and a calibration method [for each of layers of a neural network]
	[selecting] some of the genes
[generating progeny genes by performing] crossover on the selected genes;
However, Chang—directed to analogous art—teaches
[cataloging possible combinations of a quantization precision] and a calibration method [for a neural network] (Page 5252 states "This is the first effort to apply mixed quantization schemes and multiple precisions within layers, targeting for simplified operations in hardware inference, while preserving the accuracy. Specifically, two quantization schemes i.e., Power-of-Two (PoT) and Fixed-point (Fixed), and two precisions i.e., 4-bit and 8-bit are adopted and explored for quantization on weights and activations, to reduce inference computation and preserve accuracy." Page 5255 catalogues the potential combinations for each layer "The RMSMP quantization algorithm can train a DNN model from scratch or quantize a pre-trained model into a quantized one, such that for each layer, the numbers of filters quantized into PoT-W4A4, Fixed-W4A4, and Fixed-W8A4 follow the predefined ratio of                 
                    
                            S
                        
                            Pot-4
                        
                    :
                     
                            S
                        
                            Fixed-4
                        
                    :
                     
                            S
                        
                            Fixed-8
                        
                    =
                     
                    A
                     
                    :
                     
                    B
                     
                    :
                     
                    C
                    ,
                     
                    w
                    h
                    e
                    r
                    e
                     
                    A
                     
                    +
                     
                    B
                     
                    +
                     
                    C
                     
                    =
                     
                    100
                
            .")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Chang and Yuan because as Yuan states on page 5253, "The Fixed-point (Fixed) quantization scheme has superior accuracy performance, and the Power-of-Two (PoT) is the most computationally efficient quantization scheme (with still acceptable accuracy performance) to speedup inference since multiplications can be replaced by bit shifting operations. Therefore, this work proposes a novel row wise mixed-scheme quantization approach with Fixed for preserving accuracy and PoT for reducing computation of inference."
	The combination of Chang and Yuan does not appear to explicitly teach
[selecting] some of the genes
[generating progeny genes by performing] crossover on the selected genes;
However, Fasdous—directed to analogous art—teaches
[selecting] some of the genes (Pages 7 and 8 state "Based on the GA configuration,                 
                    ψ
                
             and                 
                    φ
                     
            define the fitness of each individual                 
                    
                            ρ
                        
                            0
                        
                    ∈
                    
                            P
                        
                            0
                        
            . As depicted in Figure 1, ψ and φ are fed back to a selection phase in                 
                    G
                
            , to constrain the cardinality of the population to                 
                    |
                    P
                    |
                     
                    =
                     
                    m
                
            . Individuals survive this phase based on their fitness." The individuals are interpreted as the genes and the surviving individuals are interpreted as the selected genes.)
[generating progeny genes by performing] crossover on the selected genes; (Page 8 states "Survivors are allowed to mate and produce offspring in                 
                    
                            P
                        
                            1
                        
            , which inherit alleles from two survivor parents through crossover. A round of mutation takes place, altering alleles of the offspring in                 
                    
                            P
                        
                            1
                        
            . The population goes through the same phases of fitness evaluation, selection and crossover for                 
                    n
                
             subsequent generations.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Chang, Yuan, and Fasfous because, as stated by Fasfous on pages 7 and 8, "Bitwidth-tolayer encoding can be captured intuitively in sequential genomes, which leads to a sensible use of GA operators, such as single-point crossover (example in Figure 4). Neighbouring CNN layers have higher feature correlation than distant layers. Therefore, quantized layer relationships encoded in neighboring genetic loci can survive in a population and be reused through single-point crossover to create more efficient offspring. The more fit the parents become throughout the generations, the better genetic localities they will have to create better individuals. Mutation further allows offspring to escape local minima of their parents." Additionally, page 9 states "The mutation, crossover and selection operations are pivotal to the GA’s efficacy. We apply single-point crossover, which intuitively has a high probability of capturing attractive bitwidth-to-layer encodings of two fit individuals and maintains inter-layer dependencies across segments of the CNN, as shown in Figure 4."

Regarding claim 2, the rejection of claim 1 is incorporated herein. Yuan teaches
performing post training quantization (PTQ) on a first of the layers; (Pages 3-4 states "To optimize the search efficiency, we use the quantization sensitivity of each layer to optimize the mutation direction. We first employ N samples to evaluate the quantization error per-layer as: 
                
                                    E
                                    
                                                    b
                                                
                                                    j
                                                
                                    =
                                    
                                            1
                                        
                                            N
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                                N
                                            
                                                                    Q
                                                                
                                                                            b
                                                                        
                                                                            j
                                                                        
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                            -
                                                            M
                                                            
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                    2
                                                
                                    #
                                
                            4
                        
where                 
                    
                            Q
                        
                                    b
                                
                                    j
                                
             denotes the quantization model that the                 
                    j
                
            -th layer is quantized to                 
                    
                            b
                        
                            j
                        
             bits.”  As the fully-trained model is quantized, the quantization model is obtained using post training quantization applied to layer                 
                    j
                
             interpreted as the first layer.)
determining prediction accuracy of the pre-trained neural network by applying the PTQ to the first layer; and (Pages 3-4 states "To optimize the search efficiency, we use the quantization sensitivity of each layer to optimize the mutation direction. We first employ N samples to evaluate the quantization error per-layer as: 
                
                                    E
                                    
                                                    b
                                                
                                                    j
                                                
                                    =
                                    
                                            1
                                        
                                            N
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                                N
                                            
                                                                    Q
                                                                
                                                                            b
                                                                        
                                                                            j
                                                                        
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                            -
                                                            M
                                                            
                                                                            x
                                                                        
                                                                            i
                                                                        
                                                    2
                                                
                                    #
                                
                            4
                        
where                 
                    
                            Q
                        
                                    b
                                
                                    j
                                
             denotes the quantization model that the                 
                    j
                
            -th layer is quantized to                 
                    
                            b
                        
                            j
                        
             bits.”  As the fully-trained model is quantized, the quantization model is obtained using post training quantization.                 
                    
                            Q
                        
                                    b
                                
                                    j
                                
                                    x
                                
                                    i
                                
             is interpreted as the prediction accuracy of the pre-trained neural network.)
determining a difference between prediction accuracy of the pre-trained neural network and 
prediction accuracy obtained by applying the PTQ to the first layer. (                
                    
                            Q
                        
                                    b
                                
                                    j
                                
                                    x
                                
                                    i
                                
                    -
                    M
                    
                                    x
                                
                                    i
                                
             is the difference between the accuracy of the pre-trained neural network                 
                    M
                    
                                    x
                                
                                    i
                                
             and the prediction accuracy obtained by applying the PTQ to the first layer                 
                    
                            Q
                        
                                    b
                                
                                    j
                                
                                    x
                                
                                    i
                                
            .)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Yuan does not appear to explicitly teach
determining a quantization precision available for the quantization configuration; and 
generating a zero point and a scale factor corresponding to calibration available for the quantization configuration.
However, Chang—directed to analogous art—teaches 
determining a quantization precision available for the quantization configuration; and (Page 5254 states "The majority of the rows use the 4-bit precision for weights/activations i.e., PoT-W4A4 and Fixed-W4A4, because 2-bit has large accuracy loss and 3-bit is not suitable for hardware implementation, which prefers operands in 2bit, 4-bit, 8-bit, etc. To boost accuracy, a higher precision with 8-bit weights and 4-bit activations is used on the Fixed scheme, i.e., Fixed-W8A4. The PoT scheme is not applied the higher precision because of its rigid resolution issue." Therefore, quantization precisions are determined for the each scheme, interpreted as configurations.)
generating a zero point and a scale factor corresponding to calibration available for the quantization configuration. (Page 5254 states "The Fixed scheme uses uniform quantization levels, with the quantized weights mapped as a scaling factor times the quantization levels." Therefore, a scaling factor is generated. One of ordinary skill in the art would realize, that for a uniform quantization level, the zero-point is zero, which is therefore generated when the quantization occurs.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yuan and Chang for the reasons given above in regards to claim 1.

Regarding claim 4, the rejection of claim 1 is incorporated herein. The combination of Yuan and Chang does not appear to explicitly teach
determining the priority of the genes using at least one of a Pareto-front or a crowding distance for each of the layers. 
However, Fasfous—directed to analogous art—teaches
determining the priority of the genes using at least one of a Pareto-front or a crowding distance for each of the layers. (Page 8 states "In the case of NSGA-II optimization, the algorithm evaluates the Pareto optimality of each individual w.r.t. the population P. This relieves the burden of crafting a single fitness function, which may not always guarantee a fair balance between multiple objectives. Additionally, having an array of potential solutions in a Pareto-front is a better approach for design space exploration, compared to having a single solution suggested by the search algorithm." Page 9 states "On the other hand, NSGA-II selection is based on the crowded-comparison-operator." One of ordinary skill in the art would realize that the crowded-comparison-operator requires the use of a crowding distance.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yuan and Chang with the teachings of Fasfous for the reasons given above in regards to claim 1. Additionally, as Fasfous states on page 8, "Additionally, having an array of potential solutions in a Pareto-front is a better approach for design space exploration, compared to having a single solution suggested by the search algorithm. Design space exploration is a fundamental part of HW-SW co-design making NSGA-II an attractive alternative to SOGA."

Regarding claim 5, the rejection of claim 1 is incorporated herein. Yuan teaches
selecting some of the genes using tournament selection or biased roulette wheel for each of the layers. (Page 3 states "To automatically search for high-performing quantization policy, we employ a classical evolutionary algorithm, tournament selection. The procedure is summarized in Algorithm 1.")

Regarding claim 6, the rejection of claim 1 is incorporated herein. The combination of Yuan and Chang does not appear to explicitly teach
selecting a reference point for the crossover. 
However, Fasfous—directed to analogous art—teaches
selecting a reference point for the crossover. (Page 9 states "The mutation, crossover and selection operations are pivotal to the GA’s efficacy. We apply single-point crossover, which intuitively has a high probability of capturing attractive bitwidth-to-layer encodings of two fit individuals and maintains inter-layer dependencies across segments of the CNN, as shown in Figure 4. With mutation probability pmut a single allele at a randomly selected genetic locus is replaced by another from the set of possible alleles." The single point is interpreted as the reference point.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yuan and Chang with the teachings of Fasfous for the reasons given above in regards to claim 1. Additionally, as stated by Fasfous on page 9, "We apply single-point crossover, which intuitively has a high probability of capturing attractive bitwidth-to-layer encodings of two fit individuals and maintains inter-layer dependencies across segments of the CNN, as shown in Figure 4."

Regarding claim 7, the rejection of claim 1 is incorporated herein. The combination of Yuan amd Chang does not appear to explicitly teach 
randomly changing the quantization precision and/or the calibration method of the combination of the crossover through a mutation process. 
However, Fasfous—directed to analogous art—teaches
randomly changing the quantization precision and/or the calibration method of the combination of the crossover through a mutation process. (Page 9 states "The mutation, crossover and selection operations are pivotal to the GA’s efficacy. We apply single-point crossover, which intuitively has a high probability of capturing attractive bitwidth-to-layer encodings of two fit individuals and maintains inter-layer dependencies across segments of the CNN, as shown in Figure 4. With mutation probability pmut a single allele at a randomly selected genetic locus is replaced by another from the set of possible alleles.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yuan and Chang with the teachings of Fasfous for the reasons given above in regards to claim 1.

Regarding claim 8, the rejection of claim 1 is incorporated herein. Yuan does not appear to explicitly teach 
determining a quantization precision and a calibration method to be applied to each of the layers, considering the prediction accuracy of the neural network and a fitness evaluation function for energy. 
However, Chang—directed to analogous art—teaches
determining a quantization precision and a calibration method to be applied to each of the layers (Page 5255 states "For the assignment of quantization schemes and precisions to the filters of each layer, we use the Hessian-based method to determine which filters should use Fixed-W8A4 (higher precision). And for the rest filters, we determine PoT-W4A4 vs Fixed-W4A4 based on the variances of the weights in each filter. Once it is determined the assignment of quantization scheme and precision (PoT-W4A4, FixedW4A4, and Fixed-W8A4) down to the filter level for each layer, the Straight Through Estimator (STE) [1, 35].")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yuan with the teachings of Chang for the reasons given above in regards to claim 1.
The combination of Yuan and Chang does not appear to explicitly teach
[determining a quantization precision], considering the prediction accuracy of the neural network and a fitness evaluation function for energy. 
However, Fasfous—directed to analogous art—teaches
(Page 7 states "Referring back to Figure 1, on the top-left an initial population                 
                    
                            P
                        
                            0
                        
             is randomly generated at the start of the genetic algorithm G, with each individual encoding the quantization levels of each layer of the CNN in its genes. The individuals of                 
                    
                            P
                        
                            0
                        
             are briefly fine-tuned and evaluated based on their task accuracy                 
                    ψ
                
             on a validation set (Figure 1 top-right), as well as HW-estimates                 
                    φ
                
             of the HW-model through inference simulation (Figure 1 bottom-right). Based on the GA configuration,                 
                    ψ
                
             and                 
                    φ
                     
            define the fitness of each individual                 
                    
                            ρ
                        
                            0
                        
                    ∈
                     
                            P
                        
                            0
                        
            ." The task accuracy                 
                    ψ
                
             on a validation set is interpreted as the prediction accuracy of the neural network. Fig. 3 shows that                 
                    φ
                
             in phase 3 includes the total energy, meaning that it is included in the fitness evaluation function. Page 9 states "At the end of the search, when a solution is chosen, we train it from scratch, without loading any pre-trained weights." Therefore, the genetic algorithm, which includes the fitness evaluation function including accuracy and energy, results in precision levels of each layer being chosen. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yuan and Chang with the teachings of Fasfous for the reasons given above in regards to claim 1.

Regarding claim 9, the rejection of claim 1 is incorporated herein. Yuan teaches 
re-training the pre-trained neural network based on the generated quantization configuration to be applied to each of the layers. (Page 4 states "The performance of the quantization model can be further improved by calibrating the features using the teacher-student framework. The pre-trained full precision model is considered as a teacher, and the quantization model is considered as a student." One of ordinary skill in the art would realize that the quantization model is the pre-trained model with the quantization configuration applied to each layer. Algorithm 2 on page 4 shows the feature calibration, which includes retraining the quantization model, which is the pre-trained neural network with the quantization configuration.)

Regarding claim 10, the rejection of claim 1 is incorporated herein. Yuan teaches
A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1. (Page 4 states "All experiments are conducted in Pytorch [31]." As Pytorch is framework used on a computer, the method must be implemented using a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method.)

Regarding claim 11, Yuan teaches
An apparatus for generating a quantization configuration, the apparatus comprising: (Page 2 states "In this work, we use an evolutionary algorithm to explore mixed precision quantization policy in a heuristic manner. Given a pre-trained full precision model                 
                    M
                
            , our target is to find an optimized quantization policy                 
                    Π
                    (
                    
                            b
                        
                            1
                        
                    .
                    .
                    .
                     
                            b
                        
                            l
                        
                    )
                
            , where                 
                    
                            b
                        
                            t
                        
             denotes the quantization bit-width of the                 
                    i
                
            -th layer."Page 4 states "All experiments are conducted in Pytorch [31]." As Pytorch is framework used on a computer, the method must be implemented using a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method. The computer is interpreted as the apparatus.)
a memory configured to store instructions; and (As Pytorch is framework used on a computer, the method must be implemented using a processor and a memory configured to store instructions.)
one or more processors configured to execute the instructions to configure the one or more processors to: (As Pytorch is framework used on a computer, the method must be implemented using a processor to execute the instructions stored on the memory.)
The remainder of claim 11 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 12-19 recite substantially similar subject matter to claims 2-9 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 20, the rejection of claim 11 is incorporated herein. Yuan teaches
determine a quantization precision … to be applied to each of the layers based on a fitness evaluation function. (Page 3 states "Each individual (quantization policy) is evaluated according to (3) using N unlabeled samples. At each evolutionary step, S quantization policies are randomly sampled from the population. The quantization policy with the highest fitness in the sample is selected as the parent. A new quantization policy, called the child, is constructed from the parent by mutation operation. The quantization policy with the worst fitness in the sample is excluded from the population, and the mutated child is pushed into the population. This scheme uses repeated competitions of random individuals to search for an optimized quantization policy." Page 1 states “Given a pre-trained full precision model                 
                    M
                
            , our target is to find an optimized quantization policy                 
                    Π
                    (
                    
                            b
                        
                            1
                        
                    .
                    .
                    .
                     
                            b
                        
                            l
                        
                    )
                
            , where                 
                    
                            b
                        
                            t
                        
             denotes the quantization bit-width of the                 
                    i
                
            -th layer." Therefore, the policy that includes precisions for each layer are determined based on the fitness evaluation function.
Yuan does not appear to explicitly teach
[determining a quantization precision] and a calibration method [to be applied to each of the layers]
However, Chang—directed to analogous art—teaches
[determining a quantization precision] and a calibration method [to be applied to each of the layers] (Page 5255 states "For the assignment of quantization schemes and precisions to the filters of each layer, we use the Hessian-based method to determine which filters should use Fixed-W8A4 (higher precision). And for the rest filters, we determine PoT-W4A4 vs Fixed-W4A4 based on the variances of the weights in each filter. Once it is determined the assignment of quantization scheme and precision (PoT-W4A4, FixedW4A4, and Fixed-W8A4) down to the filter level for each layer, the Straight Through Estimator (STE) [1, 35].")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yuan with the teachings of Chang for the reasons given above in regards to claim 1.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.T.P./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

May 19, 2023
Application Filed
Mar 04, 2026
Non-Final Rejection — §101, §103, §112 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
33%
Grant Probability
0%
With Interview (-33.3%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allow rate.
APPARATUS AND METHOD WITH QUANTIZATION CONFIGURATOR

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

APPARATUS AND METHOD WITH QUANTIZATION CONFIGURATOR

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email