Last updated: May 29, 2026

Application No. 18/362,435

Neural Network Training Method and Related Device

Non-Final OA §101§102§103

Filed

Jul 31, 2023

Priority

Jan 30, 2021 — CN 202110132041.6 +1 more

Examiner

HICKS, AUSTIN JAMES

Art Unit

2142

Tech Center

2100 — Computer Architecture & Software

Assignee

Huawei Technologies Co., Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +24.7% interview lift. Examiner has a relatively high allowance rate (75%); +24.7% interview lift. A written response may suffice.

Based on 411 resolved cases, 2023–2026

Examiner Intelligence

HICKS, AUSTIN JAMES View full profile →

Grants 75% — above average

Career Allowance Rate

310 granted / 411 resolved

+20.4% vs TC avg

Strong +25% interview lift

Without

With

+24.7%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

30 currently pending

Career history

462

Total Applications

across all art units

Statute-Specific Performance

§101

3.8%

-36.2% vs TC avg

§103

82.6%

+42.6% vs TC avg

§102

9.1%

-30.9% vs TC avg

§112

3.9%

-36.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 411 resolved cases

Office Action

§101 §102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea of a mathematical concept without significantly more. The claims recite calculating the weights of a neural network using forward propagation, determining the fit using expansion and binarization, calculating the gradient of the error for the weights, calculating subfunction of the series expansion, fitting the error to a neural network, and using Fourier/wavelet/discrete Fourier as the expansion. This judicial exception is not integrated into a practical application because the claims to a specific data type merely link the abstract idea to the field of computers. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the components such as memory and processor are generic computer parts.





Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 2, 4, 5, 7-9, 11, 12, 14-16, 18 and 19 are rejected under 35 U.S.C. 102(a)(1) as being described by Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks by Gong et al.

Gong teaches claims 1, 8 and 15. A neural network training method, comprising: (Gong abs. “DSQ can automatically evolve during training to gradually approximate the standard quantization.”)
performing, in a forward propagation process and using a binarization function, (Gong sec. 3.1 “For 1-bit binary quantization, the binary neural network (BNN) limits its activations and weights to either-1 or +1 usually using the binary function… sgn(x)…” The binarizing happens in the forward pass in algorithm 1 in section 3.5, see below.) 
    PNG
    media_image1.png
    146
    342
    media_image1.png
    Greyscale
 
binarization processing on a target weight to obtain a weight of a first neural network layer in a neural network, (See above in Gong algorithm 1 where the wq- is the weight and it is a function of binarization of the weight wsq. These weights are the layer weights.) or on an activation value of a second neural network layer in neural network to obtain an input of the first neural network layer; (The inputs are quantized/binarized too because Gong sec. 4.1 says “When building up a quantized model, we simply insert DSQ function to all places that will be quantized ,e.g., the inputs and weights of a convolution layer.”)
determining a fitting function based on series expansion of the binarization function; and (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.” The binarization is aproximated by a series of tangent functions.)
calculating, in a backward propagation process, a first gradient of a loss function with respect to the target weight using a second gradient of the fitting function as a third gradient of the binarization function. (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.”  Gong abs “Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation…” This shows that Gong uses a series of smooth differentiable tangent function to approximate the gradient used in backpropagation. This teaches taking the gradient of the fitting function instead of the gradient of the binarization function, because binaries are not differentiable.)

Gong teaches claims 2, 9 and 16. The neural network training method of claim 1, further comprising determining a plurality of subfunctions based on the series expansion, wherein the fitting function comprises the plurality of subfunctions and an error function. (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.” The subfunctions are the tangent functions, the error function is included because the gradient is calculated and the gradient is a gradient of error.)

Gong teaches claims 4, 11 and 18. The neural network training method of claim 2, further comprising fitting the error function using at least one neural network layer, wherein calculating the first gradient comprises:
calculating, in the backward propagation process, fourth gradients of the plurality of subfunctions with respect to the target weight;
calculating a fifth gradient of the at least one neural network layer with respect to the target weight; and
calculating the first gradient based on the fourth gradients and the fifth gradient. (Gong algorithm 1 and equation 6, see below. The gradient of alpha is the gradient of the subfunctions. It’s all with respect to weight because Gong sec. 4.1 says “When building up a quantized model, we simply insert DSQ function to all places that will be quantized ,e.g., the inputs and weights of a convolution layer.”)

    PNG
    media_image2.png
    116
    308
    media_image2.png
    Greyscale

Algorithm 1

    PNG
    media_image3.png
    40
    280
    media_image3.png
    Greyscale

Equation 6

Gong teaches claims 5, 12 and 19. The neural network training method of claim 1, further comprising determining a plurality of subfunctions based on the series expansion, wherein the fitting function comprises the plurality of subfunctions. (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.” The tangent functions are the series expansion and, collectively, they are the fitting function.)

Gong teaches claims 7 and 14. The neural network training method of claim 1, wherein a data type of the target weight is a 32-bit floating point type, a 64-bit floating point type, a 32-bit integer type, or an 8-bit integer type. (Gong sec. 2.2 “frameworks usually support 8-bit integer arithmetic…” Gong sec. 4.5 says “while existing open-source high performance inference frameworks (e.g.,NCNN-8-bit [31]) usually only support 8-bit operations. In practice, the lower bitwidth doesn’t mean a faster inference speed, mainly due to the overflow and transferring among the registers…” This teaches an 8bit integer type as the standard before this paper. That means the 8-bit integer type was known at the time of filing.)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks by Gong et al and Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm by Liu et al.
Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks by Gong et al and https://www.physics.smu.edu/scalise/P4321sp20/fs.pdf (Olver). 

Gong teaches claims 3, 10 and 17. The neural network training method of claim 2, further comprising fitting the error function by using a (Gong sec. 4.1 “All convolution and fully-connected layers except the first and the last one are quantized with DSQ.”)
	Gong doesn’t teach a two layer with a residual.
	However, Liu teaches a two-layer fully connected neural network with a residual. (Liu p. 14 “keep the weights and activations in the first convolution and the last fully-connected layers to be real-valued.” Liu p. 2 “we propose to keep these real activations via adding a simple yet effective shortcut, dubbed Bi-Real net. As shown in Fig. 1(b), the shortcut connects the real activations to an addition operator with the real-valued activations of the next block.” The shortcut connection is the residual connection.)
	Liu, Gong and the claims are all directed to training fully connected neural networks. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to have a residual shortcut to “keep these real activations” (Liu p. 14) because “the representational capability of the Bi-Real net is significantly enhanced and the additional cost on computation is negligible.” Liu abs.

Gong teaches claims 6, 13 and 20. The neural network training method of claim 1, wherein the series expansion is a  (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.”)
	Gong doesn’t teach a Fourier series expansion of the binarization function.
	However, Olver teaches a Fourier series expansion of the binarization function. (Olver p. 645 “Thus, the Fourier series converges, as expected, to f(x) at all points of continuity; at discontinuities, the Fourier series can’t decide whether to converge to the right or left hand limit, and so ends up ‘splitting the difference’ by converging to their average; see Figure 12.4.” Fig. 12.4 shows a fourier series of a binary/step function, below. And equation 12.41, shows a fourier series expansion of a step function.)

    PNG
    media_image4.png
    250
    314
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    86
    652
    media_image5.png
    Greyscale

	Olver, Gong and the claims all turn a non-continuous step function into a smooth function. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use a Fourier series to do the conversion because “If f(x) is any piecewise continuous function, then its Fourier coefficients are well defined — the integrals (12.28) exist and are finite.” Olver p. 644



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AUSTIN HICKS/Primary Examiner, Art Unit 2142

Read full office action

Prosecution Timeline

Jul 31, 2023

Application Filed

Feb 26, 2026

Non-Final Rejection mailed — §101, §102, §103

May 13, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

17/638,748

Patent 12639558

NEURAL NETWORK PROCESSOR SYSTEM AND METHODS OF OPERATING AND FORMING THEREOF

4y 3m to grant Granted May 26, 2026

17/956,638

Patent 12626157

IDENTIFYING IDLE-CORES IN DATA CENTERS USING MACHINE-LEARNING (ML)

3y 7m to grant Granted May 12, 2026

18/047,397

Patent 12626178

QUANTUM COMPUTING BASED KERNEL ALIGNMENT FOR A SUPPORT VECTOR MACHINE TASK

3y 6m to grant Granted May 12, 2026

17/694,063

Patent 12608441

SCREENING FOR FLUCTUATING ENERGY RELAXATION TIMES

4y 1m to grant Granted Apr 21, 2026

17/786,650

Patent 12591767

NEURAL NETWORK ACCELERATION CIRCUIT AND METHOD

3y 9m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

75%

Grant Probability

99%

With Interview (+24.7%)

3y 2m (~4m remaining)

Median Time to Grant

Low

PTA Risk

Based on 411 resolved cases by this examiner. Grant probability derived from career allowance rate.