Last updated: April 19, 2026

Application No. 17/825,803

Training Classical and Quantum Algorithms for Orthogonal Neural Networks

Final Rejection §103

Filed

May 26, 2022

Examiner

THAI, JASMINE THANH

Art Unit

2129

Tech Center

2100 — Computer Architecture & Software

Assignee

Qc Ware Corp.

OA Round

2 (Final)

This examiner grants 25% of cases after interview

— +56.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 24 resolved cases, 2023–2026

Examiner Intelligence

THAI, JASMINE THANH View full profile →

Grants only 25% of cases

Career Allow Rate

6 granted / 24 resolved

-30.0% vs TC avg

Strong +56% interview lift

Without

With

+56.3%

Interview Lift

resolved cases with interview

Typical timeline

4y 0m

Avg Prosecution

30 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

23.6%

-16.4% vs TC avg

§103

37.2%

-2.8% vs TC avg

§102

14.6%

-25.4% vs TC avg

§112

21.8%

-18.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 24 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 12/02/2025 have been fully considered and they are partially persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 103, the arguments are directed to newly amended limitations that were not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-4, 7-12, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Schuld, Maria, et al. "Circuit-centric quantum classifiers." arXiv preprint arXiv:1804.00633 (2018). (“Schuld”) in view of Huang, Lei, et al. "Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks." arXiv preprint arXiv:1709.06079 (2017). (“Huang”)
In regards to claim 1, 
Schuld teaches A method for training a layer of a neural network with an [orthogonal] weight matrix, the method comprising: executing layers of BS gates of a quantum circuit, each BS gate being a single parameterized two-qubit gate, 
(Schuld, Section II B., “Given an encoded feature vector ϕ(x) which is now a ‘ket’ vector in the Hilbert space of a n qubit system, the model circuit maps this ket vector to another ket vector ϕ 0 = Uθϕ(x) by a unitary operation Uθ which is parametrised by a set of variables θ.
As described before, we decompose U into

    PNG
    media_image1.png
    23
    233
    media_image1.png
    Greyscale

where each Ul is either a single qubit or a two-qubit quantum gate [each BS gate being a single parameterized two-qubit gate].”)
(Schuld, Section II C., “As a product of elementary gates, the model circuit Ux can be understood as a sequence of linear layers of a neural network with the same number of units in each “hidden layer”. This perspective facilitates the comparison of the circuit-centric quantum classifier with widely studied neural network models, and visualises the connectivity power of (controlled) single qubit gates. The position of the qubit (as well as the control) determine the architecture of each layer, i.e. which units are connected and which “weights” are tied in a “gate-layer” [executing layers of BS gates of a quantum circuit].”)

Schuld teaches wherein weights of the [orthogonal] weight matrix are based on values of parameters of the BS gates;
(Schuld, Section II. C., “Note that although we speak of linear layers here, the weights (i.e., the entries of the weight matrix representing a gate) have a nonlinear dependency on the model parameters θ [wherein weights of the [orthogonal] weight matrix are based on values of parameters of the BS gates], a circumstance that plays a role for the convergence of the hybrid training method.”)
Schuld teaches determining gradients of a cost function C with respect to parameters of the BS gates of the quantum circuit; 
(Schuld, Section IV. B., “The derivative [determining gradients of a cost function C with respect to parameters of the BS gates of the quantum circuit] of the objective function with respect to a model parameter ν = b, µ (where µ ∈ θ is a circuit parameter) for a single data sample {(x m, ym)} is calculated as

    PNG
    media_image2.png
    114
    491
    media_image2.png
    Greyscale
”)
Schuld teaches and updating values of parameters of the BS gates of the quantum circuit based on the gradients of the cost function C, wherein updating values of the parameters of the BS gates of the quantum circuit based on gradients of the cost function comprises: updating a value of a parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
of a BS gate of the quantum circuit based on the value of the parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
and 
    PNG
    media_image4.png
    49
    36
    media_image4.png
    Greyscale
 wherein 
    PNG
    media_image4.png
    49
    36
    media_image4.png
    Greyscale
 is the gradient of the cost function C with respect to the parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
.
(Schuld, Section IV. A., “We choose a standard least-squares objective to evaluate the cost of a parameter configuration θ and a bias b given a training set, D = {(x 1 , y1 ), ...,(xM, yM)},

    PNG
    media_image5.png
    61
    352
    media_image5.png
    Greyscale

here π is the continuous output of the model defined in Equation (7)…
Gradient descent updates each parameter µ [updating values of parameters of the BS gates of the quantum circuit based on the gradients of the cost function C] from the set of circuit parameters θ via 

    PNG
    media_image6.png
    56
    272
    media_image6.png
    Greyscale
[ wherein updating values of the parameters of the BS gates of the quantum circuit based on gradients of the cost function comprises: updating a value of a parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
of a BS gate of the quantum circuit based on the value of the parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
and 
    PNG
    media_image4.png
    49
    36
    media_image4.png
    Greyscale
 wherein 
    PNG
    media_image4.png
    49
    36
    media_image4.png
    Greyscale
 is the gradient of the cost function C with respect to the parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
.]”)

However, Schuld does not explicitly teach an orthogonal weight matrix; updated values of the parameters preserving the orthogonality of the orthogonal weight matrix, 

Huang teaches an orthogonal weight matrix; updated values of the parameters preserving the orthogonality of the orthogonal weight matrix,
(Huang, Abstact, “We also propose a novel orthogonal weight normalization [an orthogonal weight matrix] method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal [updated values of the parameters preserving the orthogonality of the orthogonal weight matrix] and back-propagates gradient information through the transformation during training.”)
Schuld is considered to be analogous to the claimed invention because they are in the same field of quantum neural networks. Huang is considered to be analogous to the claimed invention because they are in the same field of orthogonal neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schuld to incorporate the teachings of Huang in order to provide a method of orthogonal weight normalization as doing so stabilizes the network activations and regularize FNNs.(Huang, Abstract, “Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feedforward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the rectangular orthogonal matrix can stabilize the distribution of network activations and regularize FNNs. We also propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal and back-propagates gradient information through the transformation during training. To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design an orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method largely improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets. In particular, we have reduced the test error of wide residual network on CIFAR-100 from 20.04% to 18.61% with such simple substitution. Our code is available online for result reproduction.”)

In regards to claim 2, 
Schuld and Huang teaches The method of claim 1, 
Schuld teaches wherein determining gradients of the cost function comprises determining gradients of the cost function with respect to the parameter of each BS gate of the quantum circuit.
(Schuld, Section IV. B., “The derivative [wherein determining gradients of the cost function comprises determining gradients of the cost function with respect to the parameter of each BS gate of the quantum circuit] of the objective function with respect to a model parameter ν = b, µ (where µ ∈ θ is a circuit parameter) for a single data sample {(x m, ym)} is calculated as

    PNG
    media_image2.png
    114
    491
    media_image2.png
    Greyscale
”)


In regards to claim 3, 
Schuld and Huang teaches The method of claim 1, 
Schuld teaches wherein executing layers of BS gates of the quantum circuit comprises: measuring a resulting quantum state 
    PNG
    media_image7.png
    22
    21
    media_image7.png
    Greyscale
 after each layer 
    PNG
    media_image8.png
    17
    11
    media_image8.png
    Greyscale
 of the quantum circuit is executed.
(Schuld, Section II C., “After executing the quantum circuit Uθϕ(x) in Step 2 [measuring a resulting quantum state 
    PNG
    media_image7.png
    22
    21
    media_image7.png
    Greyscale
 after each layer 
    PNG
    media_image8.png
    17
    11
    media_image8.png
    Greyscale
 of the quantum circuit is executed], the measurement of the first qubit (Step 3) results in state 1 with probability[32] 

    PNG
    media_image9.png
    65
    352
    media_image9.png
    Greyscale

To resolve these statistics we have to run the entire circuit S times and measure the first qubit. We estimate p(q0 = 1) from these samples s1, ..., sS. This is a Bernoulli parameter estimation problem which we discuss in Section IV E.
The classical postprocessing (Step 4) consists of adding a learnable bias term b to produce the continuous output of the model,

    PNG
    media_image10.png
    37
    380
    media_image10.png
    Greyscale
”)

In regards to claim 4, 
Schuld and Huang teaches The method of claim 3, 
Schuld teaches further comprising determining errors 
    PNG
    media_image11.png
    16
    13
    media_image11.png
    Greyscale
 for layers 
    PNG
    media_image8.png
    17
    11
    media_image8.png
    Greyscale
of the quantum circuit.
(Schuld, Section IV E.2., “The continuous output of the circuit-centric quantum classifier was based on the probability of measuring the first qubit in state 1. To resolve this number, we have to repeat the entire algorithm multiple times. Each measurement samples from the Bernoulli distribution p(q0 = 1) = ν, and we want to estimate ν from the S samples q 1 1 , ..., qS 1 . The number of samples needed to estimate ν at error 
    PNG
    media_image12.png
    16
    13
    media_image12.png
    Greyscale
 with probability > 2/3 scales as O(Var(σz)/ 
    PNG
    media_image12.png
    16
    13
    media_image12.png
    Greyscale
 2 )), where Var(σz) is the variance of the sigma-z operator that we measure with respect to the final quantum state [12, 34]. If amplitude estimation is used then the number of repetitions of circuit centric classifier falls into O(1/ 
    PNG
    media_image12.png
    16
    13
    media_image12.png
    Greyscale
) at a price of increasing the circuit depth by a factor of O(1/ 
    PNG
    media_image12.png
    16
    13
    media_image12.png
    Greyscale
).”)

In regards to claim 7,
Schuld and Huang teaches The method of claim 1, wherein updating the value of the parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
of the BS gate of the quantum circuit based on the value of the parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
and 
    PNG
    media_image4.png
    49
    36
    media_image4.png
    Greyscale
 comprises: updating a value of a parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
 of a BS gate of the quantum circuit according to  
    PNG
    media_image13.png
    40
    129
    media_image13.png
    Greyscale
 where 
    PNG
    media_image14.png
    21
    12
    media_image14.png
    Greyscale
 is the learning rate.
(Schuld, Section IV. A., “We choose a standard least-squares objective to evaluate the cost of a parameter configuration θ and a bias b given a training set, D = {(x 1 , y1 ), ...,(xM, yM)},

    PNG
    media_image5.png
    61
    352
    media_image5.png
    Greyscale

here π is the continuous output of the model defined in Equation (7)…
Gradient descent updates each parameter µ from the set of circuit parameters θ via 

    PNG
    media_image6.png
    56
    272
    media_image6.png
    Greyscale
[updating a value of a parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
 of a BS gate of the quantum circuit according to  
    PNG
    media_image13.png
    40
    129
    media_image13.png
    Greyscale
 where 
    PNG
    media_image14.png
    21
    12
    media_image14.png
    Greyscale
 is the learning rate]”)

In regards to claim 8,
Schuld and Huang teaches The method of claim 1, 
Schuld teaches wherein a quantum computing system executes the layers of the BS gates of the quantum circuit.
(Schuld, Section IV E.1., “Each of these gates has to be decomposed into the elementary constant gate set used in the physical implementation of the quantum computer.”)

Claims 9 and 17 are rejected on the same rational under 35 U.S.C. 103 as claim 1.
Claims 10 and 18 are rejected on the same rational under 35 U.S.C. 103 as claim 2.
Claims 11 and 19 are rejected on the same rational under 35 U.S.C. 103 as claim 3.
Claims 12 and 20 are rejected on the same rational under 35 U.S.C. 103 as claim 4.
Claim 15 is rejected on the same rational under 35 U.S.C. 103 as claim 7.
Claim 16 is rejected on the same rational under 35 U.S.C. 103 as claim 8.

Claim(s) 5-6 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Schuld and Huang in further view of Michael A. Nielsen, “Chapter 2 How the backpropagation algorithm works”, Neural Networks and Deep Learning, Determination Press, 2015 (Last update: Thu Dec 26 15:26:33 2019)
In regards to claim 5,
Schuld and Huang teaches The method of claim 4, 
Nielsen teaches wherein determining errors 
    PNG
    media_image11.png
    16
    13
    media_image11.png
    Greyscale
 for layers 
    PNG
    media_image8.png
    17
    11
    media_image8.png
    Greyscale
of the quantum circuit comprises determining errors for each layer of the quantum circuit in reverse order according to:

    PNG
    media_image15.png
    31
    162
    media_image15.png
    Greyscale
where 
    PNG
    media_image16.png
    25
    36
    media_image16.png
    Greyscale
 is the transpose of 
    PNG
    media_image17.png
    21
    18
    media_image17.png
    Greyscale
, where 
    PNG
    media_image18.png
    26
    18
    media_image18.png
    Greyscale
 is the error for layer 
    PNG
    media_image19.png
    18
    12
    media_image19.png
    Greyscale
 of the quantum circuit and 
    PNG
    media_image20.png
    22
    23
    media_image20.png
    Greyscale
 is a matrix representation of BS gates in layer 
    PNG
    media_image19.png
    18
    12
    media_image19.png
    Greyscale
 of the quantum circuit.
(Nielsen, (BP2), “An equation for the error δl in terms of the error in the next layer, δl+1: In particular

    PNG
    media_image21.png
    36
    430
    media_image21.png
    Greyscale

where (wl+1)T is the transpose of the weight matrix wl+1 for the (l+1)th layer.”)
Nielsen is considered to be analogous to the claimed invention because they are reasonably pertinent to the problem the inventor faced (understanding the math behind gradient descent and backpropagation). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schuld and Huang to incorporate the teachings of Nielsen in order to provide backpropagation and better understand the mathematics behind backpropagation (Nielsen, paragraph 3-4, “This chapter is more mathematically involved than the rest of the book. If you're not crazy about mathematics you may be tempted to skip the chapter, and to treat backpropagation as a black box whose details you're willing to ignore. Why take the time to study those details? 
The reason, of course, is understanding. At the heart of backpropagation is an expression for the partial derivative ∂C/∂w of the cost function C with respect to any weight w (or bias b) in the network. The expression tells us how quickly the cost changes when we change the weights and biases. And while the expression is somewhat complex, it also has a beauty to it, with each element having a natural, intuitive interpretation. And so backpropagation isn't just a fast algorithm for learning. It actually gives us detailed insights into how changing the weights and biases changes the overall behaviour of the network. That's well worth studying in detail.”)

In regards to claim 6,
Schuld and Huang teaches The method of claim 5, 
Schuld teaches wherein the gradient of the cost function C with respect to a parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
 of a BS gate acting on qubits i and i + 1 is defined by:

    PNG
    media_image22.png
    38
    582
    media_image22.png
    Greyscale

(Schuld, Section II B., “To make the single qubit gates trainable we need to formulate them in terms of parameters that can be learnt. The way the parametrisation is defined can have a significant impact on training, since it defines the shape of the cost function [wherein the gradient of the cost function C with respect to a parameter 
    PNG
    media_image3.png
    24
    21
    media_image3.png
    Greyscale
 of a BS gate]. A single qubit gate G is a 2 × 2 unitary, which can always be written [30] as 

    PNG
    media_image23.png
    56
    443
    media_image23.png
    Greyscale
 ”)
(Schuld, Section III C., “To show an example, consider a Hilbert space of dimension 2n with n = 2 qubits |q0q1> [acting on qubits i and i + 1]. A single qubit unitary G applied to q0 would have the following matrix representation

    PNG
    media_image24.png
    351
    504
    media_image24.png
    Greyscale
”)

Claim 13 is rejected on the same rational under 35 U.S.C. 103 as claim 5.
Claim 14 is rejected on the same rational under 35 U.S.C. 103 as claim 6.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129                                                                                                                                                                                                        



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129

Read full office action

Prosecution Timeline

May 26, 2022

Application Filed

Jun 29, 2025

Non-Final Rejection — §103

Dec 02, 2025

Response Filed

Feb 08, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/366,773

Patent 12561603

SYSTEM FOR TIME BASED MONITORING AND IMPROVED INTEGRITY OF MACHINE LEARNING MODEL INPUT DATA

2y 5m to grant Granted Feb 24, 2026

17/588,175

Patent 12555000

GENERATION OF CONVERSATIONAL TASK COMPLETION STRUCTURE

2y 5m to grant Granted Feb 17, 2026

17/676,775

Patent 12462154

METHOD AND SYSTEM FOR ASPECT-LEVEL SENTIMENT CLASSIFICATION BY MERGING GRAPHS

2y 5m to grant Granted Nov 04, 2025

17/470,900

Patent 12395590

REDUCTION AND GEO-SPATIAL DISTRIBUTION OF TRAINING DATA FOR GEOLOCATION PREDICTION USING MACHINE LEARNING

2y 5m to grant Granted Aug 19, 2025

17/357,626

Patent 12380361

Federated Machine Learning Management

2y 5m to grant Granted Aug 05, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

25%

Grant Probability

81%

With Interview (+56.3%)

4y 0m

Median Time to Grant

Moderate

PTA Risk

Based on 24 resolved cases by this examiner. Grant probability derived from career allow rate.