Last updated: May 29, 2026
Application No. 17/115,631
NEURAL NETWORK SCHEDULER

Non-Final OA §103
Filed
Dec 08, 2020
Examiner
LEY, SALLY THI
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
5 (Non-Final)
Interview Optional

— +33.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 19% grant rate with +33.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 36 resolved cases, 2023–2026
Examiner Intelligence

LEY, SALLY THI View full profile →
Grants only 19% of cases
Career Allowance Rate
7 granted / 36 resolved
-35.6% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
17 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
10.3%
-29.7% vs TC avg
§103
83.2%
+43.2% vs TC avg
§102
3.8%
-36.2% vs TC avg
§112
2.7%
-37.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 24 March 2026 has been entered.
 
Status of Claims
	This Office Action is in response to the communication filed on 24 Mar 2026.
	Claims 1-30 are being considered on the merits.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06 Feb 2026 and 04 Apr 2026 have been considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, initialed and dated copies of Applicant's IDS forms 1499 are attached to the instant Office action. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-30 are rejected under 35 U.S.C. 103 as being unpatentable over Choudhury, et. al. (US 2020/0125926 A1; hereinafter “Choudhury”) in view of Oskooi, et. al. (US 2021/0174206 A1; hereinafter, “Oskooi”) 

Regarding Claim 1, Choudhury as modified by Oskooi teaches:
One or more processors, comprising: circuitry to: (Choudhury, para. 0038: “Additionally, an embodiment of the present invention can make use of software running on a computer or workstation. With reference to FIG. 6, such an implementation might employ, for example, a processor 602, a memory 604, and an input/output interface formed, for example, by a display 606 and a keyboard 608. The term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor.”)
cause one or more first neural networks to infer computing resource usage characteristics (Choudhury, para. 0003 and 0014: “An exemplary computer-implemented method can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints” “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input.”) of one or more second neural networks based, at least in part, on performance metrics of prior performance of the one or more second neural networks executed in parallel with one or more additional neural networks using one or more of a plurality of computing resources; (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
select one or more computing resources of the plurality of computing resources to perform one or more inferencing tasks using the one or more second neural networks based, at least in part, on the inferred computing resource usage characteristics; and  (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
load balance performance of the one or more second neural networks and the one or more additional neural networks based, at least in part, on the selection of the one or more computing resources. (Oskooi, para. 0012: “FIG. 3 is a schematic drawing that illustrates a non-limiting example of an even distribution of simulation work and a load-balanced distribution of simulation work according to various aspects of the present disclosure.”) 

	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury. Choudhury teaches dynamic batch sizing for inferencing of deep neural networks in resource-constrained environments; Oskooi teaches  a method for optimal parallel execution of a simulation of a design where one or more features are inputs into one or more machine learning models to determine prediction of execution times. One of ordinary skill would have been motivated to combine the teachings of Oskooi into Choudhury in order to predict and implement optimal parallel execution of a simulation of a design therefore enabling more efficient use of available processing elements (Oskooi, para. 0004 and 0006). 

Regarding Claim 2, Choudhury, as modified, teaches claim 1 (above). Oskooi further teaches:  
The one or more processors of claim 1, wherein the selected one or more computing resources, of the plurality of computing resources, performs inference operations using the one or more first neural networks. (Oskooi paras. 0051-0053, and figures 5-6 “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models where a model training engine which may employ a first neural network is used to to train one or more machine learning models).   
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 1. 

Regarding Claim 3, Choudhury, as modified, teaches claim 1 (above). Choudhury further teaches:  
The one or more processors of claim 1, wherein the circuitry is (Choudhury, para. 0038: “Additionally, an embodiment of the present invention can make use of software running on a computer or workstation. With reference to FIG. 6, such an implementation might employ, for example, a processor 602, a memory 604, and an input/output interface formed, for example, by a display 606 and a keyboard 608. The term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor.”) to use the one or more first neural networks to predict performance requirements for inference operations to be performed on a candidate computing resource of the plurality of computing resources. (Choudhury, para. 0028: “By way of illustration, FIG. 4 depicts input 402 and input 404, wherein input 402 includes a feed forward model and input 404 includes resource constraints for the given system (such as, for example, available memory, permissible latency, etc.). Inputs 402 and 404 are provided to pre-processing component 406 and optimal batch size sequence determination component 408. As depicted in FIG. 4, the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. The input/output activation sizes for each batch size, the working memory for each batch size maxio(⋅,⋅,⋅), etc. can be statically computed. Determining time/energy to compute a layer for a batch size requires a run through each layer with the corresponding batch sizes. All of these entries can be computed once for a given model.”)

Regarding Claim 4, Choudhury, as modified, teaches claim 1 (above). Choudhury further teaches:
The one or more processors of claim 1, wherein performance requirements are predicted based, at least in part, on an identity of a computing resource, of the plurality of computing resources, that is a candidate for performing inference. (Choudhury, para. 0014: “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input. Additionally, such an embodiment includes computing a set of statistics related to resource utilization (such as activation memory size, working memory, inference time, etc.)” Examiner notes that Choudhury teaches determining resource constraints and utilization which necessary requires the identification of such resources in the first place)

Regarding Claim 5, Choudhury, as modified, teaches claim 1 (above). Choudhury further teaches:
The one or more processors of claim 1, wherein each of the plurality of computing resources is a candidate for performing inference operations. (Choudhury, para. 0014: “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input. Additionally, such an embodiment includes computing a set of statistics related to resource utilization (such as activation memory size, working memory, inference time, etc.)” Examiner notes that Choudhury teaches determining resource constraints and utilization of computing resources such that the resources are capable of performing operations i.e. they are a candidate).

Regarding Claim 6, Choudhury, as modified, teaches claim 1 (above). Oskooi further teaches:
The one or more processors of claim 1, wherein the circuitry to train the one or more first neural networks in response to a change in the plurality of computing resources identified to perform inference operations. (Oskooi, para. 0020: “Different materials or data processing will often require vastly disparate computational resources, and merely dividing such a domain into equal-volume chunks for each processing element can result in an imbalanced computational load, such that some processing elements are idle while others complete their work. This both degrades performance and makes performance prediction more difficult since it depends on the precise spatial layout. Hence, some embodiments of the present disclosure also apply a data-driven approach to load balancing, in which a small number of simulations are used to estimate the costs of different model components, leading to a new partitioning algorithm that produces unequal domains as appropriate with nearly equal costs per process. This heterogeneity may also be an input to the ANN, and despite the complexity of such unequal-chunk parallel computations it is possible to predict the execution time of simulations drawn from real applications with a mean error of around 20±10% on Amazon EC2 cloud-computing clusters. Load balancing allows the ANN to predict execution based on what kinds of physics are present but without needing to know the exact spatial distribution, enabling a 6-input ANN to be trained with ˜104 simulations.”) 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 1. 

Regarding Claim 7, Choudhury, as modified, teaches claim 1 (above). Choudhury further teaches: 
The one or more processors of claim 1, wherein the one or more first neural networks are trained to predict computing resource requirements of inferences operations on each of a plurality of computing resources previously assigned to perform inference operations. (Choudhury, para. 0031 and 0033: “Accordingly, at least one embodiment of the invention can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints; computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks; determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks.” “the inferencing model can include a feed forward model” Examiner notes that Choudhury teaches compute statistics for one or more deep neural networks where such statistics are indicative of resource requirements). 

Regarding Claim 8, Oskooi, as modified, teaches claim 1 (above). Choudhury further teaches:
The one or more processors of claim 1, wherein an application programming interface (Choudhary, para. 0038: “Additionally, an embodiment of the present invention can make use of software running on a computer or workstation. With reference to FIG. 6, such an implementation might employ, for example, a processor 602, a memory 604, and an input/output interface formed, for example, by a display 606 and a keyboard 608” Examiner notes that the broadest reasonable interpretation of an application programming interface (“API”) is that software that facilitates communication between two other pieces of software or computer components such as those inherently existing in a computer system) provides one or more metrics indicative of computing resource requirements of inference operations (Choudhury, para. 0031 and 0033: “Accordingly, at least one embodiment of the invention can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints; computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks; determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks.” “the inferencing model can include a feed forward model” Examiner notes that Choudhury teaches compute statistics for one or more deep neural networks where such statistics are indicative of resource requirements) 

Regarding Claim 9, Choudhury teaches: 
A system, comprising: one or more processors to: (Choudhury, para. 0038: “Additionally, an embodiment of the present invention can make use of software running on a computer or workstation. With reference to FIG. 6, such an implementation might employ, for example, a processor 602, a memory 604, and an input/output interface formed, for example, by a display 606 and a keyboard 608. The term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor.”)
cause a first one or more neural networks (Choudhury, para. 0003 and 0014: “An exemplary computer-implemented method can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints” “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input.”) of one or more second neural networks based, at least in part, on performance metrics of prior performance of the one or more second neural networks executed in parallel with one or more additional neural networks using one or more of a plurality of computing resources; (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
select one or more computing resources of the plurality of computing resources to perform one or more inferencing tasks using the one or more second neural networks based, at least in part, on the inferred computing resource usage characteristics; and  (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
load balance performance of the one or more second neural networks and the one or more additional neural networks based, at least in part, on the selection of the one or more computing resources. (Oskooi, para. 0012: “FIG. 3 is a schematic drawing that illustrates a non-limiting example of an even distribution of simulation work and a load-balanced distribution of simulation work according to various aspects of the present disclosure.”) 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury. Choudhury teaches dynamic batch sizing for inferencing of deep neural networks in resource-constrained environments; Oskooi teaches  a method for optimal parallel execution of a simulation of a design where one or more features are inputs into one or more machine learning models to determine prediction of execution times. One of ordinary skill would have been motivated to combine the teachings of Oskooi into Choudhury in order to predict and implement optimal parallel execution of a simulation of a design therefore enabling more efficient use of available processing elements (Oskooi, para. 0004 and 0006).

Regarding Claim 10, Choudhury, as modified, teaches claim 9 (above). Choudhury further teaches:
The system of claim 9, wherein the one or more processors to select the one or more computing resources, of the plurality of computing resources to perform inference operations using the second one or more neural networks. (Oskooi paras. 0051-0053, and figures 5-6 “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models where a model training engine which may employ a first neural network is used to train a second one or more machine learning models).   
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 9. 

Regarding Claim 11, Choudhury, as modified, teaches claim 9 (above). Choudhury further teaches:
The system of claim 9, the one or more processors (Choudhury, para. 0038: “Additionally, an embodiment of the present invention can make use of software running on a computer or workstation. With reference to FIG. 6, such an implementation might employ, for example, a processor 602, a memory 604, and an input/output interface formed, for example, by a display 606 and a keyboard 608. The term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor.”) to use the first one or more neural networks to predict one or more performance requirements of using the one or more computing resources, of the plurality of computing resources, to perform an inference operation of the second one or more neural networks.  (Choudhury, para. 0028: “By way of illustration, FIG. 4 depicts input 402 and input 404, wherein input 402 includes a feed forward model and input 404 includes resource constraints for the given system (such as, for example, available memory, permissible latency, etc.). Inputs 402 and 404 are provided to pre-processing component 406 and optimal batch size sequence determination component 408. As depicted in FIG. 4, the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. The input/output activation sizes for each batch size, the working memory for each batch size maxio(⋅,⋅,⋅), etc. can be statically computed. Determining time/energy to compute a layer for a batch size requires a run through each layer with the corresponding batch sizes. All of these entries can be computed once for a given model.”)

Regarding Claim 12, Choudhury, as modified, teaches claim 9 (above). Choudhury further teaches: 
The system of claim 9, wherein the first one or more neural networks predict one or more performance requirements of the second one or more neural networks based, at least in part, on input, to the first one or more neural networks, (Choudhury, para. 0003:” An exemplary computer-implemented method can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints”) comprising an identifier of the one or more computing resources of the plurality of computing resources.  (Choudhury, para. 0014: “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input. Additionally, such an embodiment includes computing a set of statistics related to resource utilization (such as activation memory size, working memory, inference time, etc.)” Examiner notes that Choudhury teaches determining resource constraints and utilization which necessary requires the identification of such resources in the first place)

Regarding Claim 13, Choudhury, as modified, teaches claim 9 (above). Choudhury further teaches: 
The system of claim 9, wherein the plurality of computing resources comprise a plurality of computing devices, and wherein each of the plurality of computing devices is a candidate for being identified to perform inference operations of the second one or more neural networks.  (Choudhury, para. 0014: “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input. Additionally, such an embodiment includes computing a set of statistics related to resource utilization (such as activation memory size, working memory, inference time, etc.)” Examiner notes that Choudhury teaches determining resource constraints and utilization which necessary requires the identification of such resources in the first place).

Regarding Claim 14, Choudhury teaches claim 9 (above). Oskooi further teaches: 
The system of claim 9, wherein one or more computing devices train the first one or more neural networks in response to a change in the plurality of computing resources identified to perform inference operations of the second one or more neural networks. (Oskooi, para. 0020: “Different materials or data processing will often require vastly disparate computational resources, and merely dividing such a domain into equal-volume chunks for each processing element can result in an imbalanced computational load, such that some processing elements are idle while others complete their work. This both degrades performance and makes performance prediction more difficult since it depends on the precise spatial layout. Hence, some embodiments of the present disclosure also apply a data-driven approach to load balancing, in which a small number of simulations are used to estimate the costs of different model components, leading to a new partitioning algorithm that produces unequal domains as appropriate with nearly equal costs per process. This heterogeneity may also be an input to the ANN, and despite the complexity of such unequal-chunk parallel computations it is possible to predict the execution time of simulations drawn from real applications with a mean error of around 20±10% on Amazon EC2 cloud-computing clusters. Load balancing allows the ANN to predict execution based on what kinds of physics are present but without needing to know the exact spatial distribution, enabling a 6-input ANN to be trained with ˜104 simulations.”) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 9. 

Regarding Claim 15, Choudhury, as modified, teaches claim 9 (above). Choudhury further teaches: 
The system of claim 9, wherein the first one or more neural networks are trained to predict computing resource utilization, by the second one or more neural networks, on each computing resource of the plurality of computing resources.  (Choudhury, para. 0031 and 0033: “Accordingly, at least one embodiment of the invention can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints; computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks; determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks.” “the inferencing model can include a feed forward model” Examiner notes that Choudhury teaches using a deep neural network to compute resource utilization for one or more deep neural networks)

Regarding Claim 16, Choudhury, as modified, teaches claim 9 (above). Choudhury further teaches:
The system of claim 9, wherein the second one or more neural networks are associated with an application programming interface to provide one or more metrics indicative of computing requirements of the second one or more neural networks.  (Choudhary, para. 0038: “Additionally, an embodiment of the present invention can make use of software running on a computer or workstation. With reference to FIG. 6, such an implementation might employ, for example, a processor 602, a memory 604, and an input/output interface formed, for example, by a display 606 and a keyboard 608” Examiner notes that the broadest reasonable interpretation of an application programming interface (“API”) is that software that facilitates communication between two other pieces of software or computer components such as those inherently existing in a computer system where “to provide” is merely intended use).

Regarding Claim 17, Choudhury, as modified, teaches: 
A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least (Oskooi, para. 0078: “The techniques described may constitute machine-executable instructions embodied within a tangible or non-transitory machine (e.g., computer) readable storage medium, that when executed by a machine will cause the machine to perform the operations described.”
cause a first one or more neural networks to infer computing resource usage characteristics (Choudhury, para. 0003 and 0014: “An exemplary computer-implemented method can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints” “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input.”) of one or more second neural networks based, at least in part, on performance metrics of prior performance of the one or more second neural networks executed in parallel with one or more additional neural networks using one or more of a plurality of computing resources; (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
select one or more computing resources of the plurality of computing resources to perform one or more inferencing tasks using the one or more second neural networks based, at least in part, on the inferred computing resource usage characteristics; and  (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
load balance performance of the one or more second neural networks and the one or more additional neural networks based, at least in part, on the selection of the one or more computing resources. (Oskooi, para. 0012: “FIG. 3 is a schematic drawing that illustrates a non-limiting example of an even distribution of simulation work and a load-balanced distribution of simulation work according to various aspects of the present disclosure.”) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury. Choudhury teaches dynamic batch sizing for inferencing of deep neural networks in resource-constrained environments; Oskooi teaches  a method for optimal parallel execution of a simulation of a design where one or more features are inputs into one or more machine learning models to determine prediction of execution times. One of ordinary skill would have been motivated to combine the teachings of Oskooi into Choudhury in order to predict and implement optimal parallel execution of a simulation of a design therefore enabling more efficient use of available processing elements (Oskooi, para. 0004 and 0006). 

Regarding Claim 18, Choudhury, as modified, teaches claim 17 (above). Oskooi further teaches:
The non-transitory machine-readable medium of claim 17, comprising further instructions which, if performed by one or more processors, cause the one or more processors to at least: select the one or more computing resources, of the computing resources, to perform inference operations associated with the second one or more neural networks.  (Oskooi paras. 0051-0053, and figures 5-6 “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models where a model training engine which may employ a first neural network is used to to train one or more machine learning models).   
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 17.

Regarding Claim 19, Choudhury, as modified, teaches claim 17 (above). Choudhury further teaches: 
The non-transitory machine-readable medium of claim 17, comprising further instructions which, if performed by one or more processors, cause the one or more processors to at least: use the first one or more neural networks to predict one or more performance requirements of the second one or more neural networks.  (Choudhury, para. 0028: “By way of illustration, FIG. 4 depicts input 402 and input 404, wherein input 402 includes a feed forward model and input 404 includes resource constraints for the given system (such as, for example, available memory, permissible latency, etc.). Inputs 402 and 404 are provided to pre-processing component 406 and optimal batch size sequence determination component 408. As depicted in FIG. 4, the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. The input/output activation sizes for each batch size, the working memory for each batch size maxio(⋅,⋅,⋅), etc. can be statically computed. Determining time/energy to compute a layer for a batch size requires a run through each layer with the corresponding batch sizes. All of these entries can be computed once for a given model.”)

Regarding Claim 20, Choudhury, as modified, teaches claim 17 (above). Choudhury further teaches: 
The non-transitory machine-readable medium of claim 17, wherein the first one or more neural networks predict one or more performance requirements of the second one or more neural networks based, at least in part, on input comprising an identifier of the one or more computing resources, of the computing resources.  (Choudhury, para. 0014: “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input. Additionally, such an embodiment includes computing a set of statistics related to resource utilization (such as activation memory size, working memory, inference time, etc.)” Examiner notes that Choudhury teaches determining resource constraints and utilization which necessary requires the identification i.e. an identifier of such resources in the first place). 

Regarding Claim 21, Choudhury, as modified, teaches claim 17 (above). Choudhury further teaches: 
The non-transitory machine-readable medium of claim 17, wherein the computing resources comprise a plurality of computing devices, and wherein each of the plurality of computing devices is a candidate for being identified to perform inference operations of the second one or more neural networks.  (Choudhury, para. 0014: “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input. Additionally, such an embodiment includes computing a set of statistics related to resource utilization (such as activation memory size, working memory, inference time, etc.)” Examiner notes that Choudhury teaches determining resource constraints and utilization which necessary requires the identification of such resources in the first place). 

Regarding Claim 22, Choudhury, as modified, teaches claim 17 (above). Choudhury further teaches: 
The non-transitory machine-readable medium of claim 17, comprising further instructions which, if performed by one or more processors, cause the one or more processors to at least: train the first one or more neural networks subsequent to a change in computing resources identified to perform inference operations of the second one or more neural networks. (Oskooi, para. 0020: “Different materials or data processing will often require vastly disparate computational resources, and merely dividing such a domain into equal-volume chunks for each processing element can result in an imbalanced computational load, such that some processing elements are idle while others complete their work. This both degrades performance and makes performance prediction more difficult since it depends on the precise spatial layout. Hence, some embodiments of the present disclosure also apply a data-driven approach to load balancing, in which a small number of simulations are used to estimate the costs of different model components, leading to a new partitioning algorithm that produces unequal domains as appropriate with nearly equal costs per process. This heterogeneity may also be an input to the ANN, and despite the complexity of such unequal-chunk parallel computations it is possible to predict the execution time of simulations drawn from real applications with a mean error of around 20±10% on Amazon EC2 cloud-computing clusters. Load balancing allows the ANN to predict execution based on what kinds of physics are present but without needing to know the exact spatial distribution, enabling a 6-input ANN to be trained with ˜104 simulations.”) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 17. 

Regarding Claim 23, Choudhury, as modified, teaches claim 17 (above). Choudhury further teaches: 
The non-transitory machine-readable medium of claim 17, wherein the first one or more neural networks are trained to predict computing resource utilization, by the second one or more neural networks, on each computing resource of the computing resources. (Choudhury, para. 0031 and 0033: “Accordingly, at least one embodiment of the invention can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints; computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks; determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks.” “the inferencing model can include a feed forward model” Examiner notes that Choudhury teaches compute statistics for one or more deep neural networks including computing resource restraints)

Regarding Claim 24, Choudhury, as modified, teaches:
A method, comprising: causing a first one or more neural networks to infer computing resource characteristics (Choudhury, para. 0003 and 0014: “An exemplary computer-implemented method can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints” “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input.”) of one or more second neural networks based, at least in part, on performance metrics of prior performance of the one or more second neural networks executed in parallel with one or more additional neural networks using one or more of a plurality of computing resources; (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
selecting one or more computing resources of the plurality of computing resources to perform one or more inferencing tasks using the one or more second neural networks based, at least in part, on the inferred computing resource usage characteristics; and  (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
load balance performance of the one or more second neural networks and the one or more additional neural networks based, at least in part, on the selection of the one or more computing resources. (Oskooi, para. 0012: “FIG. 3 is a schematic drawing that illustrates a non-limiting example of an even distribution of simulation work and a load-balanced distribution of simulation work according to various aspects of the present disclosure.”) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury. Choudhury teaches dynamic batch sizing for inferencing of deep neural networks in resource-constrained environments; Oskooi teaches  a method for optimal parallel execution of a simulation of a design where one or more features are inputs into one or more machine learning models to determine prediction of execution times. One of ordinary skill would have been motivated to combine the teachings of Oskooi into Choudhury in order to predict and implement optimal parallel execution of a simulation of a design therefore enabling more efficient use of available processing elements (Oskooi, para. 0004 and 0006).

Regarding Claim 25, Choudhury, as modified, teaches claim 24 (above). Oskooi further teaches: 
The method of claim 24, further comprising: balancing computing resource utilization between the plurality of computing resources based, at least in part, on a prediction of one or more performance requirements of the second one or more neural networks. (Oskooi paras. 0051-0053, and figures 5-6 “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models where a model training engine which may employ a first neural network is used to to train one or more machine learning models).   
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 24. 

Regarding Claim 26, Choudhury, as modified, teaches claim 24 (above). Choudhury further teaches: 
The method of claim 24, further comprising: identifying one or more computing resources of the plurality of computing resources to perform inference operations by using the first one or more neural networks; and (Choudhury, para. 0031 and 0033: “Accordingly, at least one embodiment of the invention can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints; computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks; determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks.” “the inferencing model can include a feed forward model” Examiner notes that Choudhury teaches using a deep neural network to compute statistics for one or more deep neural networks).
causing the identified one or more computing resources to perform the inference operations using the second one or more neural networks.  (Choudhury, para. 0016-0017: “By way merely of example, with uniform batch size, a memory requirement of layer L2 104 can restrict the batch size that can be processed for the network. Additionally, a larger batch size of b can be used for layers L1 102 and L3 106, while a b′<b batch size can be used for layer L2 104…Accordingly, such an example embodiment (and as depicted in FIG. 1) can include processing layer L1 102 with a batch size of b, producing output activations of b samples at layer L1 102. This can be followed by b/b′ phases, wherein in each phase, layer L2 104 is processed with a batch size of b′. Activations of b samples are available as input for layer L3 106, and one or more embodiments of the invention can include processing layer L3 106 with a batch size of b.” Examiner notes that Choudhury teaches processing layers of a neural network i.e. performing inference operations).	

Regarding Claim 27, Choudhury, as modified, teaches claim 24 (above). Choudhury further teaches: 
The method of claim 24, further comprising: using the first one or more neural networks to generate predictions of one or more performance requirements of the second one or more neural networks; and (Choudhury, para. 0028: “By way of illustration, FIG. 4 depicts input 402 and input 404, wherein input 402 includes a feed forward model and input 404 includes resource constraints for the given system (such as, for example, available memory, permissible latency, etc.). Inputs 402 and 404 are provided to pre-processing component 406 and optimal batch size sequence determination component 408. As depicted in FIG. 4, the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. The input/output activation sizes for each batch size, the working memory for each batch size maxio(⋅,⋅,⋅), etc. can be statically computed. Determining time/energy to compute a layer for a batch size requires a run through each layer with the corresponding batch sizes. All of these entries can be computed once for a given model.”)
identifying the one or more computing resources based, at least in part, on the predictions of performance requirements.  (Choudhury, para. 0019: “Further, in at least one embodiment of the invention, a configuration <i, b, mem> is feasible if the total memory required for performing inferencing computations at layer Li with a batch size of b, is at most mem (that is, in(i, b)+ws(i, b)+out(i, b)≤mem).” Examiner notes that the broadest reasonable interpretation of “performance requirements” of a neural network includes performance requirements of a single layer of such neural network)

Regarding Claim 28, Choudhury, as modified, teaches claim 24 (above). Choudhury further teaches:
The method of claim 24, further comprising: training the first one or more neural networks to generate predictions of one or more performance requirements of the second one or more neural networks, (Choudhury, para. 0028: “By way of illustration, FIG. 4 depicts input 402 and input 404, wherein input 402 includes a feed forward model and input 404 includes resource constraints for the given system (such as, for example, available memory, permissible latency, etc.). Inputs 402 and 404 are provided to pre-processing component 406 and optimal batch size sequence determination component 408. As depicted in FIG. 4, the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. The input/output activation sizes for each batch size, the working memory for each batch size maxio(⋅,⋅,⋅), etc. can be statically computed. Determining time/energy to compute a layer for a batch size requires a run through each layer with the corresponding batch sizes. All of these entries can be computed once for a given model.”)
wherein the prediction is based, at least in part, on input to the first one or more neural networks comprising an identifier of the one or more computing resources of the plurality of computing resources to be used to perform inference operations of the second one or more neural networks (Choudhury, para. 0014: “Such an embodiment includes determining individual layer batch sizes for inferencing using one or more models used for inferencing and resource constraints (such as total available memory, maximum latency for inferencing, maximum energy for inferencing, etc.) as input. Additionally, such an embodiment includes computing a set of statistics related to resource utilization (such as activation memory size, working memory, inference time, etc.)” Examiner notes that Choudhury teaches determining resource constraints and utilization which necessary requires the identification of such resources in the first place)

Regarding Claim 29, Choudhury, as modified, teaches claim 24 (above). Choudhury does not explicitly disclose:
The method of claim 24, further comprising: training the first one or more neural networks subsequent to a change in the plurality of computing resources identified to perform inference operations associated with the second one or more neural networks (Oskooi, para. 0020: “Different materials or data processing will often require vastly disparate computational resources, and merely dividing such a domain into equal-volume chunks for each processing element can result in an imbalanced computational load, such that some processing elements are idle while others complete their work. This both degrades performance and makes performance prediction more difficult since it depends on the precise spatial layout. Hence, some embodiments of the present disclosure also apply a data-driven approach to load balancing, in which a small number of simulations are used to estimate the costs of different model components, leading to a new partitioning algorithm that produces unequal domains as appropriate with nearly equal costs per process. This heterogeneity may also be an input to the ANN, and despite the complexity of such unequal-chunk parallel computations it is possible to predict the execution time of simulations drawn from real applications with a mean error of around 20±10% on Amazon EC2 cloud-computing clusters. Load balancing allows the ANN to predict execution based on what kinds of physics are present but without needing to know the exact spatial distribution, enabling a 6-input ANN to be trained with ˜104 simulations.”) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 24. 
 

Regarding Claim 30, Choudhury, as modified, teaches claim 24 (above). Choudhury further teaches:  
The method of claim 24, further comprising: obtaining one or more metrics indicative of computing resources utilized by inference operations of the second one or more neural networks; and (Choudhury, para. 0031 and 0033: “Accordingly, at least one embodiment of the invention can include obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (ii) one or more resource constraints; computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks; determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks.” “the inferencing model can include a feed forward model” Examiner notes that Choudhury teaches compute statistics for one or more deep neural networks where such statistics are indicative of resource requirements)
training the first one or more neural networks based, at least in part, on the one or more metrics (Oskooi paras. 0021, 0051-0053, and figures 5-6: “ Massively parallel scientific simulations provide only a limited amount of training data for predicting running time. As the number n of inputs (of the simulation or the hardware) is increased, a larger and larger set of training data is generally used for an ANN to fully characterize the problem space. However, acquiring training data is costly in this case, since each data point is a large-scale parallel simulation. In order to reduce the amount of training runs required to obtain accurate predictions for heterogeneity with many inputs {right arrow over (p)}∈ [Image Omitted] n, the execution time T(p) is factorized to exploit crude a priori knowledge” “At block 512, the model training engine 414 trains one or more machine learning models using the sets of features and the performance measurements as training data…By accurately predicting an optimum number of processing elements for conducting the simulation, peak performance can be achieved without utilizing computing resources that will either not meaningfully contribute to performance gains or that will actually cause worse performance.” Examiner notes figures 5-6 illustrate the flow chart of training, storing and executing one or more machine learning models).   
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Oskooi into Choudhury, as set forth above with respect to claim 24. 

Response to Applicant Arguments and Remarks

35 U.S.C §101 
	In light of applicant’s amendments and remarks, the previously asserted § 101 rejection has been withdrawn. 

35 U.S.C §103
	In light of applicant’s amendments, the § 103 rejection has been updated and claims 1-30 now stand rejected over Choudhury in view of Oskooi for the reasons set forth in the rejection above. 
	In the middle of page 11 of applicant’s remarks, applicant argues that neither Choudhury nor Justus teaches load balancing of a neural network from one computing resource to another computing resource as now recited in amended independent claim 1.
However, Oskooi teaches interconnected compute platforms comprising disparate computational resources and a calculation of optimal parallel execution among such compute platform. 
	Towards the bottom of page 11, applicant states that amended independent claims 9, 17, and 24 recite similar new claim limitations. Similarly, claims 9, 17, and 24 also now stand rejected over Choudhury in view of Oskooi. 
	Applicant makes no independent argument regarding dependent claims 2-8, 10-16, 18-23, and 25-30. Therefore, for at least the reasons set forth above with respect to each respective independent claim and for the reasons set forth in the § 103 rejections for each dependent claim, such dependent claims are also rejected over Choudhury in view of Oskooi. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sally T. Ley whose telephone number is (571)272-3406. The examiner can normally be reached Monday - Thursday, 10:00am - 6:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/STL/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
Prosecution Timeline

Show 16 earlier events
Oct 24, 2025
Response Filed
Dec 02, 2025
Final Rejection mailed — §103
Jan 16, 2026
Interview Requested
Jan 29, 2026
Applicant Interview (Telephonic)
Feb 01, 2026
Examiner Interview Summary
Mar 24, 2026
Request for Continued Examination
Mar 26, 2026
Response after Non-Final Action
May 11, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/981,796
Patent 12632746
A METHOD AND APPARATUS FOR DISPLAYING CATEGORIZED CARBON EMISSIONS
3y 6m to grant Granted May 19, 2026
16/733,393
Patent 12443830
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS
5y 9m to grant Granted Oct 14, 2025
16/835,892
Patent 12135927
EXPERT-IN-THE-LOOP AI FOR MATERIALS DISCOVERY
4y 7m to grant Granted Nov 05, 2024
17/992,958
Patent 11880776
GRAPH NEURAL NETWORK (GNN)-BASED PREDICTION SYSTEM FOR TOTAL ORGANIC CARBON (TOC) IN SHALE
1y 2m to grant Granted Jan 23, 2024
Study what changed to get past this examiner. Based on 4 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
19%
Grant Probability
53%
With Interview (+33.3%)
4y 8m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allowance rate.