Last updated: May 29, 2026

Application No. 18/157,608

SCALABLE TENSOR NETWORK CONTRACTION USING REINFORCEMENT LEARNING

Final Rejection §103

Filed

Jan 20, 2023

Priority

Jan 20, 2022 — provisional 63/301,103

Examiner

MULLINAX, CLINT LEE

Art Unit

2123

Tech Center

2100 — Computer Architecture & Software

Assignee

Nvidia Corporation

OA Round

2 (Final)

Interview Optional

— +38.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 48% grant rate with +38.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 126 resolved cases, 2023–2026

Examiner Intelligence

MULLINAX, CLINT LEE View full profile →

Grants 48% of resolved cases

Career Allowance Rate

60 granted / 126 resolved

-7.4% vs TC avg

Strong +39% interview lift

Without

With

+38.7%

Interview Lift

resolved cases with interview

Typical timeline

4y 7m

Avg Prosecution

12 currently pending

Career history

151

Total Applications

across all art units

Statute-Specific Performance

§101

6.3%

-33.7% vs TC avg

§103

85.8%

+45.8% vs TC avg

§102

4.8%

-35.2% vs TC avg

§112

1.9%

-38.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 126 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is a responsive to the application filed on 12/08/2025.
Claims 1, 4-14, 17-18, and 21-25 are pending.
Claims 1, 4-10, 12-14, 17-18, and 21-24 have been amended.
Claims 2-3, 15-16, and 20 have been canceled.
Claim 25 has been added.

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 4-14, 17-18, and 21-25 under 35 U.S.C. 101, have been fully considered and are persuasive.  The rejection(s) under 35 U.S.C. 101 of claim(s) 1, 4-14, 17-18, and 21-25 has been withdrawn. 

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1 and 14 under 35 U.S.C. 103, have been considered but they are not persuasive. Applicant argues that no reference teaches the amended limitations at a high level. The examiner respectfully disagrees.
Due to the broadness of the claim language, the combination of Liu and Xu have been found to teach the argued limitations. See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Claim Objections
Claims 1 and 14 are objected to because of the following informalities:
Claims 1 and 14 recite a typo saying “generating a distribution a weights over edges of a sample graph representation of a sample tensor network”. An optional solution to overcome this objection is to change the claim to read “generating a distribution of weights over edges of a sample graph representation”
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 4-14, 17-18, 21, 23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (“Quantum Tensor Networks for Variational Reinforcement Learning”, 2020) hereinafter Liu, in view of Xu et al (“Multi-Graph Tensor Networks”, 2021) hereinafter Xu.
Regarding claims 1 and 14, Liu teaches a computer-implemented method for contracting a tensor network; a device implementing an agent for contracting a tensor network, comprising: one or more processors; and a non-transitory computer-readable medium, having computer-executable instructions stored thereon, the computer-executable instructions, when executed by the one or more processors, causing the one or more processors to; a non-transitory computer-readable medium implementing an agent, having computer-executable instructions stored thereon, for model training, the computer-executable instructions, when executed by one or more processors, causing the one or more processors to (sections 1, 3.1, and 4.2 teach determining computational and storage costs for algorithm calculations, directed to program execution efficiency on one or more processors in a computer system communicatively coupled to one or memories for performing the embodiments of the disclosure): 
generating a graph representation of the tensor network (sections 2.2 and 4.3 teach “Tensor networks have a highly intuitive graph representation [19]. As shown by Fig. 1, the modes are represented as edges. Two nodes connected by a common edge means the corresponding tensors are contracted in that mode”), the graph representation comprising nodes and edges connecting the nodes, wherein each node represents a tensor in the tensor network and each edge represents a set of shared indices between two tensors (sections 2.2 and 4.3 teach “Tensor networks have a highly intuitive graph representation [19]. As shown by Fig. 1, the modes are represented as edges. Two nodes connected by a common edge means the corresponding tensors are contracted in that mode”);
processing, by a policy model of a trained agent, the graph representation to determine a contraction path for the tensor network (section 4.2 teaches a reinforcement learning “agent” determining rewards for “state-action pair[s]” of the tensor representations; sections 4.3-4.4 teach “Having obtained representations for the reward and policy (policy model of a trained agent) using tensor networks, we connect the two MPS’s on every dangling edge. This connection operation corresponds to the inner product in (12), giving us a tensor network that calculates a scalar output for the total energy H’(π) when contracted (determine a contraction path).”); and
processing the tensor network in accordance with the contraction path to generate a contracted tensor network (sections 4.3-4.4 teach “Having obtained representations for the reward and policy using tensor networks, we connect the two MPS’s on every dangling edge (processing the tensor network). This connection operation corresponds to the inner product in (12), giving us a tensor network that calculates a scalar output for the total energy H’(π) when contracted (in accordance with the contraction path to generate a contracted tensor network).”);
wherein the policy model of the trained agent is trained by a reinforcement learning (RL) process comprising performing, for one or more iterations:
generating a distribution a weights over edges of a sample graph representation of a sample tensor network (section 4.2 teaches a reinforcement learning “agent” determining rewards for “state-action pair[s]” of the tensor representations; sections 2.1-2.2 and 4.3-4.4 teach determining policies as a distribution of edge vectors in the network),
selecting an edge from the distribution and performing a pairwise tensor contraction along the selected edge to update the sample graph representation (section 4.2 teaches a reinforcement learning “agent” determining rewards for “state-action pair[s]” of the tensor representations; sections 2.2 and 4.3-4.4 teach “Having obtained representations for the reward and policy using tensor networks, we connect the two MPS’s on every dangling edge. This connection operation corresponds to the inner product in (12), giving us a tensor network that calculates a scalar output for the total energy H’(π) when contracted”; wherein “Two nodes connected by a common edge means the corresponding tensors are contracted in that mode (pairwise)”),
determining a loss corresponding to the pairwise tensor contraction along the selected edge and adding the loss to an accumulated path loss (sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 3.2 and 4.3-4.4 teach computing an “error between the true tensor and the estimation” of connected (contraction) tensor cores for an optimal “path” total (accumulated path loss)), and
updating, based on a terminal state accumulated path loss corresponding to a selected sequence of pairwise tensor contractions, parameters of the policy model (sections 4.3-4.4 teach repeating the error computation of tensor core connections “until the error between the true tensor and the estimation is smaller than ∈1” while updating the policy).

Liu at least implies processing, by an agent that implements a reinforcement learning algorithm, the graph representation to determine a contraction for the tensor network, and generating a distribution a weights over edges of a sample graph representation of a sample tensor network (see mappings above); however, Xu teaches processing, by a policy model of a trained agent, the graph representation to determine a contraction path for the tensor network (sections 2.1, 2.4, and 3.2-3.3 teach tensor network wherein “each tensor is represented as a node, while the number of edges that extends from that node corresponds to tensor order [6]. If two nodes are connected through an edge, it represents a linear contraction between two tensors over modes of equal dimensions” as determined in a “multi-graph tensor network” with matrix “contractions” using “Double Deep Q-Learning [16], where a separate target network, ~ Q, is used to compute the estimated Q value” for the policy), and
generating a distribution a weights over edges of a sample graph representation of a sample tensor network (sections 3.1-3.3 teach multi-graph network learning “through a series of multi-linear graph filter and weight matrix contractions, which essentially iterates the graph filtering operation across all M graph domains”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Xu’s teachings of multi-graph tensor network weight contractions through reinforcement Q-learning into Liu‘s teaching of graphically representing a tensor network with contractions and reinforcement learning in order to achieve “efficient and meaningful modelling strategy in a deep learning setting” superior to comparative models (Xu, section 6).

Regarding claims 4 and 17, the combination of Liu and Xu teach all the claim limitations of claims 1 and 14 above; and further teach wherein the agent comprises a graph neural network (GNN) configured to:
receive, as input, a current graph representation of a current state of a tensor network (Xu, sections 2.4 and 5 teach “matricized” graph “input samples” for the time mode to a “Graph Convolutional Network (GCN)” as an agent for finding the optimal policy values for reinforcement learning), and 
generate, based on the current state of the tensor network, a distribution of weights over edges for the current graph representation of the tensor network (Xu, sections 3.1-3.3 and 5 teach multi-graph network learning “through a series of multi-linear graph filter and weight matrix contractions, which essentially iterates the graph filtering operation across all M graph domains” according to the input time mode), wherein each respective weight is determined by a weight function that represents a time and/or a space complexity of a pairwise tensor contraction operation along a respective edge of the current graph representation (Xu, sections 3.1-3.3 and 5 teach multi-graph network learning according to equation 2 including graph weights values of the modes and weight matrix contractions).
Liu and Xu are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claim 5, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the terminal state accumulated path loss is a sum of approximate values to estimate losses of a number of pairwise tensor contractions of the selected sequence of pairwise tensor contraction (Liu, sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 4.3-4.4 teach computing an “error (loss) between the true tensor and the estimation” of connected tensor cores (selected sequence of pairwise tensor contraction), and repeating the error computation of tensor core connections “until the error between the true tensor and the estimation is smaller than ∈1” (terminal state)).

Regarding claim 6, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the RL process includes, during at least one of the one or more iterations, determining that an intermediate accumulated path loss exceeds a threshold value (Liu, sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 4.2-4.4 teach computing an “error between the true tensor and the estimation” of connected (accumulated path loss) tensor cores, and if it is above (exceed) “threshold values”, iterating the reinforcement learning including connecting k tensor cores).

Regarding claim 7, the combination of Liu and Xu teach all the claim limitations of claim 6 above; and further teach wherein the threshold value is a smallest observed accumulated path loss associated with a respective selected sequence of pairwise tensor contractions (Liu, sections 4.3-4.4 teach repeating the error computation of tensor core connections of the current policy “until the error between the true tensor and the estimation is smaller than ∈1” being a “small threshold value” (smallest) for the chosen “contraction path”).

Regarding claim 8, the combination of Liu and Xu teach all the claim limitations of claim 7 above; and further teach determining, based on the threshold value, an approximation value to estimate the loss of the number of pairwise tensor contractions in the selected sequence of pairwise tensor contractions (Liu, sections 4.3-4.4 teach the error computation of tensor core connections “until the error between the true tensor and the estimation is smaller than ∈1 and” the variation of energy function with regard to tensor cores (approximation) for the chosen “contraction path”, “where ∈1, ∈2 ∈ ℝ are small threshold value”)).

Regarding claims 9 and 18, the combination of Liu and Xu teach all the claim limitations of claims 4 and 17 above; and further teach wherein an off-policy element is applied in the policy model to shift the policy model towards a target policy (Liu, sections 3.1 and 4.3-4.4 teach RL policy learning by inserting dangling edges to connect the MSP’s and pass to the tensor completion algorithm and energy minimization via Hamiltonian equation).
Liu at least implies wherein an off-policy element is applied in the reinforcement learning algorithm to shift the policy adopted therein towards a target policy (see mappings above); however, Xu teaches wherein an off-policy element is applied in the reinforcement learning algorithm to shift the policy adopted therein towards a target policy (section 2.4 teaches using “Double Deep Q-Learning [16], where a separate target network, ~ Q, is used to compute the estimated Q value”, and “To alleviate the issue of non-stationary targets, experience replay is used, whereby past experiences are stored in a buffer, from which a batch is sampled at every time instant to train the network with back-propagation and stochastic gradient descent”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Xu’s teachings of multi-graph tensor network contractions through reinforcement Q-learning of stored data into Liu‘s teaching of graphically representing a tensor network with contractions and reinforcement learning in order to achieve “efficient and meaningful modelling strategy in a deep learning setting” superior to comparative models (Xu, section 6).

Regarding claims 10 and 19, the combination of Liu and Xu teach all the claim limitations of claims 9 and 18 above; and further teach wherein the off-policy element is determined based on samples drawn from an optimistic buffer, the method further comprising: 
retrieving a plurality of data from the optimistic buffer (Xu, section 2.4 teaches using “Double Deep Q-Learning [16], where a separate target network, ~ Q, is used to compute the estimated Q value”, and “To alleviate the issue of non-stationary targets, experience replay is used, whereby past experiences are stored in a buffer, from which a batch is sampled at every time instant to train the network with back-propagation and stochastic gradient descent”; section 3.3 teach the inputs being based on “time-steps”); 
assigning scores to each of the plurality of data; 
determining, based on the distribution of the scores of the plurality of data, the samples to be drawn from the optimistic buffer; 
determining the off-policy element based on the samples (Xu, section 2.4 teaches past experiences are stored in a buffer, from which a batch is sampled at every time instant to train the network with back-propagation and stochastic gradient descent”; section 3.3 teach the inputs being sampled based on “time-steps”))); and 
updating the policy model (Xu, sections 2.4 teach updating the Q-network function policy from the past samples).
Liu and Xu are combinable for the same rationale as set forth above with respect to claims 9 and 18.

Regarding claim 11, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach extracting features from the tensor network, the features associated with a distribution (Xu, section 3.2-3.3 teach “extracting feature map[s]” across “all M graph domains”); 
assigning a dynamic range to one or more tails of the distribution (Xu, sections 3.2-3.3 teach “using a single weight matrix…for all of the graph domains, where J1 controls the number of hidden units (feature maps)” (dynamic range)); and 
compressing the dynamic range for the one or more tails of the distribution (Xu, sections 3.2-3.3 “it is customary to flatten the extracted features and pass them through dense layers to generate the final output. To further reduce the complexity, the weight matrices of the dense layers can be tensorized and represented in TT format, as discussed in [13]. This further reduces the number of parameters, while maintaining compatibility”).
Liu and Xu are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 12, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the agent further implements a solver, and the solver provides action scores as additional features to be learned by the policy model (Liu, sections 3.2 and 4.2 teach “We can interpret a particular state-action as a particle in motion, the immediate reward following it as its momentum, and the future expected rewards down the path as its potential energy. Therefore, the optimal policy dictates a path that obeys the principle of least action by maximizing the (negative) sum of kinetic and potential energies”).

Regarding claim 13, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the agent further implements a solver, wherein the solver calculates a number of contractions corresponding to a contraction path, and the policy model reinforcement learning algorithm in the agent calculates the rest of the contractions in the corresponding contraction path (Liu, sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 3.2 and 4.2-4.4 teach computing an “error between the true tensor and the estimation” of connected (contracted) tensor cores of the current policy, and if it is above (exceed) “threshold values”, iterating the reinforcement learning including connecting k tensor cores (calculating the number of contractions) for an optimal “path”.).

Regarding claims 21 and 23, the combination of Liu and Xu teach all the claim limitations of claims 1 and 14 above; and further teach wherein the policy model is trained based on, at least in part, a memory limit parameter associated with a processor configured to implement the policy model (Liu, sections 2.2 and 4.2-4.4 teach determining RL policy transitional probabilities for an optimal policy while monitoring “computational costs” and determining storage “space-efficiency” when storing connected tensor elements when contracting).

Regarding claim 25, the combination of claims 1 and 4 are analogous and the combination of Liu and Xu teach all the claim limitations of claims 1 and 4 above.
Liu and Xu are combinable for the same rationale as set forth above with respect to claim 1.

Claims 22 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (“Quantum Tensor Networks for Variational Reinforcement Learning”, 2020) hereinafter Liu, in view of Xu et al (“Multi-Graph Tensor Networks”, 2021) hereinafter Xu, in view of Huang et al (“Efficient parallelization of tensor network contraction for simulating quantum computation”, 2021).
Regarding claims 22 and 24, the combination of Liu and Xu teach all the claim limitations of claims 4 and 17 above; and further teach wherein the GNN is configured to generate the distribution of weights over edges for the graph representation of the tensor network (Xu, sections 2.4 and 5 teach using a “Graph Convolutional Network (GCN)” as an agent for finding the optimal policy values for reinforcement learning, including as taught in sections 3.1-3.3, multi-graph network learning “through a series of multi-linear graph filter and weight matrix contractions, which essentially iterates the graph filtering operation across all M graph domains”) 
Liu and Xu are combinable for the same rationale as set forth above with respect to claims 1 and 14.
However, the combination does not explicitly teach and, for each edge, a corresponding probability p used to determine whether to perform a slicing operation.
Huang teaches and, for each edge, a corresponding probability p used to determine whether to perform a slicing operation (section “Methods” teaches “We use index-slicing-incorporated sequential pairwise contraction to contract tensor networks” and based on the “probability” of the bitstrings).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Liu‘s teaching of graphically representing a tensor network with contractions and reinforcement learning as modified by Xu’s teachings of multi-graph tensor network contractions through reinforcement Q-learning, to include tensor network index slicing operations and probability determinations as taught by Huang in order to reduce computational time complexities (Huang, section Discussion).


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123

Read full office action

Prosecution Timeline

Jan 20, 2023

Application Filed

Sep 30, 2025

Non-Final Rejection mailed — §103

Dec 08, 2025

Response Filed

Mar 24, 2026

Final Rejection mailed — §103

May 01, 2026

Interview Requested

May 07, 2026

Examiner Interview Summary

May 07, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

16/005,750

Patent 12619424

ROBOTIC SCRIPT GENERATION BASED ON PROCESS VARIATION DETECTION

7y 10m to grant Granted May 05, 2026

18/380,620

Patent 12613706

HARDWARE ACCELERATED MACHINE LEARNING

2y 6m to grant Granted Apr 28, 2026

17/089,974

Patent 12608639

SYSTEM AND METHOD FOR PREDICTIVE VOLUMETRIC AND STRUCTURAL EVALUATION OF STORAGE TANKS

5y 5m to grant Granted Apr 21, 2026

18/375,973

Patent 12561620

Machine Learning-Based URL Categorization System With Noise Elimination

2y 4m to grant Granted Feb 24, 2026

16/726,709

Patent 12554962

CONFIGURABLE PROCESSOR ELEMENT ARRAYS FOR IMPLEMENTING CONVOLUTIONAL NEURAL NETWORKS

6y 1m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

48%

Grant Probability

86%

With Interview (+38.7%)

4y 7m (~1y 2m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 126 resolved cases by this examiner. Grant probability derived from career allowance rate.