Prosecution Insights
Last updated: April 19, 2026
Application No. 18/157,608

SCALABLE TENSOR NETWORK CONTRACTION USING REINFORCEMENT LEARNING

Final Rejection §103
Filed
Jan 20, 2023
Examiner
MULLINAX, CLINT LEE
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
2 (Final)
48%
Grant Probability
Moderate
3-4
OA Rounds
4y 4m
To Grant
86%
With Interview

Examiner Intelligence

Grants 48% of resolved cases
48%
Career Allow Rate
59 granted / 123 resolved
-7.0% vs TC avg
Strong +38% interview lift
Without
With
+38.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
25 currently pending
Career history
148
Total Applications
across all art units

Statute-Specific Performance

§101
22.8%
-17.2% vs TC avg
§103
53.6%
+13.6% vs TC avg
§102
6.3%
-33.7% vs TC avg
§112
13.1%
-26.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 123 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Status of Claims This action is a responsive to the application filed on 12/08/2025. Claims 1, 4-14, 17-18, and 21-25 are pending. Claims 1, 4-10, 12-14, 17-18, and 21-24 have been amended. Claims 2-3, 15-16, and 20 have been canceled. Claim 25 has been added. Response to Arguments Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 4-14, 17-18, and 21-25 under 35 U.S.C. 101, have been fully considered and are persuasive. The rejection(s) under 35 U.S.C. 101 of claim(s) 1, 4-14, 17-18, and 21-25 has been withdrawn. Applicant’s arguments, with respect to the rejection(s) of claim(s) 1 and 14 under 35 U.S.C. 103, have been considered but they are not persuasive. Applicant argues that no reference teaches the amended limitations at a high level. The examiner respectfully disagrees. Due to the broadness of the claim language, the combination of Liu and Xu have been found to teach the argued limitations. See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments. Claim Objections Claims 1 and 14 are objected to because of the following informalities: Claims 1 and 14 recite a typo saying “generating a distribution a weights over edges of a sample graph representation of a sample tensor network”. An optional solution to overcome this objection is to change the claim to read “generating a distribution of weights over edges of a sample graph representation” Appropriate correction is required. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 1, 4-14, 17-18, 21, 23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (“Quantum Tensor Networks for Variational Reinforcement Learning”, 2020) hereinafter Liu, in view of Xu et al (“Multi-Graph Tensor Networks”, 2021) hereinafter Xu. Regarding claims 1 and 14, Liu teaches a computer-implemented method for contracting a tensor network; a device implementing an agent for contracting a tensor network, comprising: one or more processors; and a non-transitory computer-readable medium, having computer-executable instructions stored thereon, the computer-executable instructions, when executed by the one or more processors, causing the one or more processors to; a non-transitory computer-readable medium implementing an agent, having computer-executable instructions stored thereon, for model training, the computer-executable instructions, when executed by one or more processors, causing the one or more processors to (sections 1, 3.1, and 4.2 teach determining computational and storage costs for algorithm calculations, directed to program execution efficiency on one or more processors in a computer system communicatively coupled to one or memories for performing the embodiments of the disclosure): generating a graph representation of the tensor network (sections 2.2 and 4.3 teach “Tensor networks have a highly intuitive graph representation [19]. As shown by Fig. 1, the modes are represented as edges. Two nodes connected by a common edge means the corresponding tensors are contracted in that mode”), the graph representation comprising nodes and edges connecting the nodes, wherein each node represents a tensor in the tensor network and each edge represents a set of shared indices between two tensors (sections 2.2 and 4.3 teach “Tensor networks have a highly intuitive graph representation [19]. As shown by Fig. 1, the modes are represented as edges. Two nodes connected by a common edge means the corresponding tensors are contracted in that mode”); processing, by a policy model of a trained agent, the graph representation to determine a contraction path for the tensor network (section 4.2 teaches a reinforcement learning “agent” determining rewards for “state-action pair[s]” of the tensor representations; sections 4.3-4.4 teach “Having obtained representations for the reward and policy (policy model of a trained agent) using tensor networks, we connect the two MPS’s on every dangling edge. This connection operation corresponds to the inner product in (12), giving us a tensor network that calculates a scalar output for the total energy H’(π) when contracted (determine a contraction path).”); and processing the tensor network in accordance with the contraction path to generate a contracted tensor network (sections 4.3-4.4 teach “Having obtained representations for the reward and policy using tensor networks, we connect the two MPS’s on every dangling edge (processing the tensor network). This connection operation corresponds to the inner product in (12), giving us a tensor network that calculates a scalar output for the total energy H’(π) when contracted (in accordance with the contraction path to generate a contracted tensor network).”); wherein the policy model of the trained agent is trained by a reinforcement learning (RL) process comprising performing, for one or more iterations: generating a distribution a weights over edges of a sample graph representation of a sample tensor network (section 4.2 teaches a reinforcement learning “agent” determining rewards for “state-action pair[s]” of the tensor representations; sections 2.1-2.2 and 4.3-4.4 teach determining policies as a distribution of edge vectors in the network), selecting an edge from the distribution and performing a pairwise tensor contraction along the selected edge to update the sample graph representation (section 4.2 teaches a reinforcement learning “agent” determining rewards for “state-action pair[s]” of the tensor representations; sections 2.2 and 4.3-4.4 teach “Having obtained representations for the reward and policy using tensor networks, we connect the two MPS’s on every dangling edge. This connection operation corresponds to the inner product in (12), giving us a tensor network that calculates a scalar output for the total energy H’(π) when contracted”; wherein “Two nodes connected by a common edge means the corresponding tensors are contracted in that mode (pairwise)”), determining a loss corresponding to the pairwise tensor contraction along the selected edge and adding the loss to an accumulated path loss (sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 3.2 and 4.3-4.4 teach computing an “error between the true tensor and the estimation” of connected (contraction) tensor cores for an optimal “path” total (accumulated path loss)), and updating, based on a terminal state accumulated path loss corresponding to a selected sequence of pairwise tensor contractions, parameters of the policy model (sections 4.3-4.4 teach repeating the error computation of tensor core connections “until the error between the true tensor and the estimation is smaller than ∈1” while updating the policy). Liu at least implies processing, by an agent that implements a reinforcement learning algorithm, the graph representation to determine a contraction for the tensor network, and generating a distribution a weights over edges of a sample graph representation of a sample tensor network (see mappings above); however, Xu teaches processing, by a policy model of a trained agent, the graph representation to determine a contraction path for the tensor network (sections 2.1, 2.4, and 3.2-3.3 teach tensor network wherein “each tensor is represented as a node, while the number of edges that extends from that node corresponds to tensor order [6]. If two nodes are connected through an edge, it represents a linear contraction between two tensors over modes of equal dimensions” as determined in a “multi-graph tensor network” with matrix “contractions” using “Double Deep Q-Learning [16], where a separate target network, ~ Q, is used to compute the estimated Q value” for the policy), and generating a distribution a weights over edges of a sample graph representation of a sample tensor network (sections 3.1-3.3 teach multi-graph network learning “through a series of multi-linear graph filter and weight matrix contractions, which essentially iterates the graph filtering operation across all M graph domains”). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Xu’s teachings of multi-graph tensor network weight contractions through reinforcement Q-learning into Liu‘s teaching of graphically representing a tensor network with contractions and reinforcement learning in order to achieve “efficient and meaningful modelling strategy in a deep learning setting” superior to comparative models (Xu, section 6). Regarding claims 4 and 17, the combination of Liu and Xu teach all the claim limitations of claims 1 and 14 above; and further teach wherein the agent comprises a graph neural network (GNN) configured to: receive, as input, a current graph representation of a current state of a tensor network (Xu, sections 2.4 and 5 teach “matricized” graph “input samples” for the time mode to a “Graph Convolutional Network (GCN)” as an agent for finding the optimal policy values for reinforcement learning), and generate, based on the current state of the tensor network, a distribution of weights over edges for the current graph representation of the tensor network (Xu, sections 3.1-3.3 and 5 teach multi-graph network learning “through a series of multi-linear graph filter and weight matrix contractions, which essentially iterates the graph filtering operation across all M graph domains” according to the input time mode), wherein each respective weight is determined by a weight function that represents a time and/or a space complexity of a pairwise tensor contraction operation along a respective edge of the current graph representation (Xu, sections 3.1-3.3 and 5 teach multi-graph network learning according to equation 2 including graph weights values of the modes and weight matrix contractions). Liu and Xu are combinable for the same rationale as set forth above with respect to claims 1 and 14. Regarding claim 5, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the terminal state accumulated path loss is a sum of approximate values to estimate losses of a number of pairwise tensor contractions of the selected sequence of pairwise tensor contraction (Liu, sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 4.3-4.4 teach computing an “error (loss) between the true tensor and the estimation” of connected tensor cores (selected sequence of pairwise tensor contraction), and repeating the error computation of tensor core connections “until the error between the true tensor and the estimation is smaller than ∈1” (terminal state)). Regarding claim 6, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the RL process includes, during at least one of the one or more iterations, determining that an intermediate accumulated path loss exceeds a threshold value (Liu, sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 4.2-4.4 teach computing an “error between the true tensor and the estimation” of connected (accumulated path loss) tensor cores, and if it is above (exceed) “threshold values”, iterating the reinforcement learning including connecting k tensor cores). Regarding claim 7, the combination of Liu and Xu teach all the claim limitations of claim 6 above; and further teach wherein the threshold value is a smallest observed accumulated path loss associated with a respective selected sequence of pairwise tensor contractions (Liu, sections 4.3-4.4 teach repeating the error computation of tensor core connections of the current policy “until the error between the true tensor and the estimation is smaller than ∈1” being a “small threshold value” (smallest) for the chosen “contraction path”). Regarding claim 8, the combination of Liu and Xu teach all the claim limitations of claim 7 above; and further teach determining, based on the threshold value, an approximation value to estimate the loss of the number of pairwise tensor contractions in the selected sequence of pairwise tensor contractions (Liu, sections 4.3-4.4 teach the error computation of tensor core connections “until the error between the true tensor and the estimation is smaller than ∈1 and” the variation of energy function with regard to tensor cores (approximation) for the chosen “contraction path”, “where ∈1, ∈2 ∈ ℝ are small threshold value”)). Regarding claims 9 and 18, the combination of Liu and Xu teach all the claim limitations of claims 4 and 17 above; and further teach wherein an off-policy element is applied in the policy model to shift the policy model towards a target policy (Liu, sections 3.1 and 4.3-4.4 teach RL policy learning by inserting dangling edges to connect the MSP’s and pass to the tensor completion algorithm and energy minimization via Hamiltonian equation). Liu at least implies wherein an off-policy element is applied in the reinforcement learning algorithm to shift the policy adopted therein towards a target policy (see mappings above); however, Xu teaches wherein an off-policy element is applied in the reinforcement learning algorithm to shift the policy adopted therein towards a target policy (section 2.4 teaches using “Double Deep Q-Learning [16], where a separate target network, ~ Q, is used to compute the estimated Q value”, and “To alleviate the issue of non-stationary targets, experience replay is used, whereby past experiences are stored in a buffer, from which a batch is sampled at every time instant to train the network with back-propagation and stochastic gradient descent”). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Xu’s teachings of multi-graph tensor network contractions through reinforcement Q-learning of stored data into Liu‘s teaching of graphically representing a tensor network with contractions and reinforcement learning in order to achieve “efficient and meaningful modelling strategy in a deep learning setting” superior to comparative models (Xu, section 6). Regarding claims 10 and 19, the combination of Liu and Xu teach all the claim limitations of claims 9 and 18 above; and further teach wherein the off-policy element is determined based on samples drawn from an optimistic buffer, the method further comprising: retrieving a plurality of data from the optimistic buffer (Xu, section 2.4 teaches using “Double Deep Q-Learning [16], where a separate target network, ~ Q, is used to compute the estimated Q value”, and “To alleviate the issue of non-stationary targets, experience replay is used, whereby past experiences are stored in a buffer, from which a batch is sampled at every time instant to train the network with back-propagation and stochastic gradient descent”; section 3.3 teach the inputs being based on “time-steps”); assigning scores to each of the plurality of data; determining, based on the distribution of the scores of the plurality of data, the samples to be drawn from the optimistic buffer; determining the off-policy element based on the samples (Xu, section 2.4 teaches past experiences are stored in a buffer, from which a batch is sampled at every time instant to train the network with back-propagation and stochastic gradient descent”; section 3.3 teach the inputs being sampled based on “time-steps”))); and updating the policy model (Xu, sections 2.4 teach updating the Q-network function policy from the past samples). Liu and Xu are combinable for the same rationale as set forth above with respect to claims 9 and 18. Regarding claim 11, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach extracting features from the tensor network, the features associated with a distribution (Xu, section 3.2-3.3 teach “extracting feature map[s]” across “all M graph domains”); assigning a dynamic range to one or more tails of the distribution (Xu, sections 3.2-3.3 teach “using a single weight matrix…for all of the graph domains, where J1 controls the number of hidden units (feature maps)” (dynamic range)); and compressing the dynamic range for the one or more tails of the distribution (Xu, sections 3.2-3.3 “it is customary to flatten the extracted features and pass them through dense layers to generate the final output. To further reduce the complexity, the weight matrices of the dense layers can be tensorized and represented in TT format, as discussed in [13]. This further reduces the number of parameters, while maintaining compatibility”). Liu and Xu are combinable for the same rationale as set forth above with respect to claim 1. Regarding claim 12, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the agent further implements a solver, and the solver provides action scores as additional features to be learned by the policy model (Liu, sections 3.2 and 4.2 teach “We can interpret a particular state-action as a particle in motion, the immediate reward following it as its momentum, and the future expected rewards down the path as its potential energy. Therefore, the optimal policy dictates a path that obeys the principle of least action by maximizing the (negative) sum of kinetic and potential energies”). Regarding claim 13, the combination of Liu and Xu teach all the claim limitations of claim 1 above; and further teach wherein the agent further implements a solver, wherein the solver calculates a number of contractions corresponding to a contraction path, and the policy model reinforcement learning algorithm in the agent calculates the rest of the contractions in the corresponding contraction path (Liu, sections 2 and 2.2 teach “mode-k product (a.k.a, tensor contraction, Einstein sum)” of the edges. Sections 3.2 and 4.2-4.4 teach computing an “error between the true tensor and the estimation” of connected (contracted) tensor cores of the current policy, and if it is above (exceed) “threshold values”, iterating the reinforcement learning including connecting k tensor cores (calculating the number of contractions) for an optimal “path”.). Regarding claims 21 and 23, the combination of Liu and Xu teach all the claim limitations of claims 1 and 14 above; and further teach wherein the policy model is trained based on, at least in part, a memory limit parameter associated with a processor configured to implement the policy model (Liu, sections 2.2 and 4.2-4.4 teach determining RL policy transitional probabilities for an optimal policy while monitoring “computational costs” and determining storage “space-efficiency” when storing connected tensor elements when contracting). Regarding claim 25, the combination of claims 1 and 4 are analogous and the combination of Liu and Xu teach all the claim limitations of claims 1 and 4 above. Liu and Xu are combinable for the same rationale as set forth above with respect to claim 1. Claims 22 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (“Quantum Tensor Networks for Variational Reinforcement Learning”, 2020) hereinafter Liu, in view of Xu et al (“Multi-Graph Tensor Networks”, 2021) hereinafter Xu, in view of Huang et al (“Efficient parallelization of tensor network contraction for simulating quantum computation”, 2021). Regarding claims 22 and 24, the combination of Liu and Xu teach all the claim limitations of claims 4 and 17 above; and further teach wherein the GNN is configured to generate the distribution of weights over edges for the graph representation of the tensor network (Xu, sections 2.4 and 5 teach using a “Graph Convolutional Network (GCN)” as an agent for finding the optimal policy values for reinforcement learning, including as taught in sections 3.1-3.3, multi-graph network learning “through a series of multi-linear graph filter and weight matrix contractions, which essentially iterates the graph filtering operation across all M graph domains”) Liu and Xu are combinable for the same rationale as set forth above with respect to claims 1 and 14. However, the combination does not explicitly teach and, for each edge, a corresponding probability p used to determine whether to perform a slicing operation. Huang teaches and, for each edge, a corresponding probability p used to determine whether to perform a slicing operation (section “Methods” teaches “We use index-slicing-incorporated sequential pairwise contraction to contract tensor networks” and based on the “probability” of the bitstrings). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Liu‘s teaching of graphically representing a tensor network with contractions and reinforcement learning as modified by Xu’s teachings of multi-graph tensor network contractions through reinforcement Q-learning, to include tensor network index slicing operations and probability determinations as taught by Huang in order to reduce computational time complexities (Huang, section Discussion). Conclusion THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241. The examiner can normally be reached on Mon - Fri 8:00-4:30 PT. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /C.M./Examiner, Art Unit 2123 /ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action

Prosecution Timeline

Jan 20, 2023
Application Filed
Sep 24, 2025
Non-Final Rejection — §103
Dec 08, 2025
Response Filed
Mar 19, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12561620
Machine Learning-Based URL Categorization System With Noise Elimination
2y 5m to grant Granted Feb 24, 2026
Patent 12554962
CONFIGURABLE PROCESSOR ELEMENT ARRAYS FOR IMPLEMENTING CONVOLUTIONAL NEURAL NETWORKS
2y 5m to grant Granted Feb 17, 2026
Patent 12547887
SYSTEM FOR DETECTING ELECTRIC SIGNALS
2y 5m to grant Granted Feb 10, 2026
Patent 12518169
SYSTEMS AND METHODS FOR SAMPLE GENERATION FOR IDENTIFYING MANUFACTURING DEFECTS
2y 5m to grant Granted Jan 06, 2026
Patent 12493771
DEEP LEARNING MODEL FOR ENERGY FORECASTING
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
48%
Grant Probability
86%
With Interview (+38.3%)
4y 4m
Median Time to Grant
Moderate
PTA Risk
Based on 123 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month