Last updated: April 19, 2026
Application No. 18/230,311
BATCH SCHEDULING FOR EFFICIENT EXECUTION OF MULTIPLE MACHINE LEARNING MODELS

Non-Final OA §101§103
Filed
Aug 04, 2023
Examiner
MENGISTU, TEWODROS E
Art Unit
2127
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
This examiner grants 49% of cases after interview

— +28.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 127 resolved cases, 2023–2026
Examiner Intelligence

MENGISTU, TEWODROS E View full profile →
Grants 49% of resolved cases
Career Allow Rate
62 granted / 127 resolved
-6.2% vs TC avg
Strong +28% interview lift
Without
With
+28.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
34 currently pending
Career history
161
Total Applications
across all art units
Statute-Specific Performance

§101
27.9%
-12.1% vs TC avg
§103
44.5%
+4.5% vs TC avg
§102
9.6%
-30.4% vs TC avg
§112
14.7%
-25.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 127 resolved cases
Office Action

§101 §103
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-21 are pending for examination. Claims 1, 16, and 21 are independent.

Claim Objections
Claim 8 objected to because of the following informalities: claim 8 recites two different claims and is being treated as two claims, applicant needs to renumber all the claims.  Appropriate correction is required.
Claim 3 objected to because of the following informalities: Claim 3 recites " The method claim 1" and appears to be missing "of" to recite "the method of claim 1".  
Claims 12 and 17 objected to because of the following informalities: Claim 12 recites " The method of claim 12" and depends on itself. Claim 17 is objected for similar reasons.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-21 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1
According to the first part of the analysis, in the instant case, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

Regarding Claim 1:
2A Prong 1:

generating a first batch queue comprising one or more MLM batches, wherein at least one MLM batch comprises one or more MLMs of the plurality of MLMs, at least one MLM batch having a combined expected utilization of the set of computational resources not exceeding a threshold utilization; (This step for generating a queue comprising a MLM batch is practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation).)

2A Prong 2: This judicial exception is not integrated into a practical application. 
Additional elements: 

receiving an identification of a plurality of machine learning models (MLMs) for execution on a set of computational resources; (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and data gathering. See MPEP 2106.05(g).)
obtaining execution metrics characterizing expected utilization of the set of computational resources during execution of individual MLMs of the plurality of MLMs; (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and data gathering. See MPEP 2106.05(g).)
initiating parallel execution of a first MLM batch of the one or more MLM batches of the first batch queue. (This step is adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
The additional elements as disclosed above alone or in combination do not integrate the judicial exception into practical application as they are insignificant extra solution activity in combination of generic computer functions that are implemented to perform the disclosed abstract idea above.

2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
Additional elements: 

receiving an identification of a plurality of machine learning models (MLMs) for execution on a set of computational resources; (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and is well understood, routine and conventional activity of transmitting and receiving data as identified by the court (MPEP2106.05(d)(ll)(i))))
obtaining execution metrics characterizing expected utilization of the set of computational resources during execution of individual MLMs of the plurality of MLMs; (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and is well understood, routine and conventional activity of transmitting and receiving data as identified by the court (MPEP2106.05(d)(ll)(i))))
initiating parallel execution of a first MLM batch of the one or more MLM batches of the first batch queue. (This step is adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
The additional elements as disclosed above in combination of the abstract idea
are not sufficient to amount to significantly more than the judicial exception as they are
well, understood, routine and conventional activity as disclosed in combination
of generic computer functions that are implemented to perform the disclosed abstract idea above.

Regarding Claim 16: see the rejection of claim 1 above. Same rationale applies.
2A Prong 2 & 2B: The claim recites another additional element “A system comprising: a memory device; and a processor, communicatively coupled to the memory device, to:” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))

Regarding Claim 21: see the rejection of claim 1 above. Same rationale applies.
2A Prong 2 & 2B: The claim recites another additional element “A processor comprising processing circuitry to perform operations comprising:” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))

Regarding Claim 2
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
wherein the execution metrics characterizing expected utilization of the set of computational resources include at least one of: a size of input data into an MLM of the plurality of MLMs, a total memory used during execution of the MLM, a peak memory use during execution of the MLM, or a peak processing clock speed during execution of the MLM. (The specification of data to be stored is understood to be a field of use limitation. The limitation further specifies the execution metrics - See MPEP 2106.05(h).)

Regarding Claim 3
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
wherein the execution metrics further include expected utilization of one or more virtual processing units supported by the set of computational resources. (The specification of data to be stored is understood to be a field of use limitation. The limitation further specifies the execution metrics - See MPEP 2106.05(h).)

Regarding Claim 4
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2: 
wherein obtaining the execution metrics for an MLM of the plurality of MLMs comprises: collecting the execution metrics during individual execution of the MLM. (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and data gathering. See MPEP 2106.05(g).)
2B:
wherein obtaining the execution metrics for an MLM of the plurality of MLMs comprises: collecting the execution metrics during individual execution of the MLM. (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and is well understood, routine and conventional activity of transmitting and receiving data as identified by the court (MPEP2106.05(d)(ll)(i))))

Regarding Claim 5
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2: 
further comprising: storing the collected execution metrics in a memory device. (This step directed to storing information, is understood to be insignificant extra- solution activity and data gathering. See MPEP 2106.05(g).)
2B:
further comprising: storing the collected execution metrics in a memory device. (This step is directed to storing information, which is understood to be insignificant extra-solution activity and is well understood, routine and conventional activity as identified by the court (MPEP 2106.05(d)(ll)(IV)))))

Regarding Claim 6
2A Prong 1: 
wherein obtaining the execution metrics for an MLM of the plurality of MLMs comprises estimating the execution metrics for the MLM using one or more of: an architecture of the MLM, a size of an input into the MLM, a number of computational operations associated with the MLM, or one or more number formats used by the computational operations associated with the MLM. (This step is practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation/judgment).)
2A Prong 2 & 2B: The claim does not recite any additional elements.

Regarding Claim 7
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
wherein the combined expected utilization of the set of computational resources by the first MLM batch characterizes expected utilization of memory resources during parallel execution of one or more MLMs of the first MLM batch.  (The specification of data to be stored is understood to be a field of use limitation. The limitation further specifies the combined expected utilization - See MPEP 2106.05(h).)

Regarding Claim 8
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
wherein the combined expected utilization of the set of computational resources by the first MLM batch characterizes expected utilization of one or more processing units during parallel execution of the one or more MLMs of the first MLM batch. (The specification of data to be stored is understood to be a field of use limitation. The limitation further specifies the combined expected utilization - See MPEP 2106.05(h).)

The method of claim 1, wherein of the set of computational resources comprises at least one of a central processing unit (CPU), a data processing unit (DPU), or a graphics processing unit (GPU).  (The specification of data to be stored is understood to be a field of use limitation. The limitation further specifies the computational resources - See MPEP 2106.05(h).)

Regarding Claim 9
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
initiating, concurrently with the parallel execution of the first MLM batch, parallel execution of a second MLM batch of the one or more MLM batches of the first batch queue, the first MLM batch and the second MLM batch being executed on: two or more separate graphics processing units (GPUs), or two or more separate virtual GPUs supported by a same GPU. (This step is adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))

Regarding Claim 10
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
responsive to completing execution of the first MLM batch, initiating parallel execution of a second MLM batch of the one or more MLM batches of the first batch queue. (This step is adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))

Regarding Claim 11
2A Prong 1: 
subsequent to initiating execution of the first MLM batch, generating at least a second batch queue, wherein the second batch queue comprises at least one MLM batch that is different from at least one other MLM batch of the first batch queue.  (This step is practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation/judgment).)
2A Prong 2 & 2B: The claim does not recite any additional elements.

Regarding Claim 12
2A Prong 1: 
determining first performance metrics associated with execution of the first batch queue; computing second performance metrics associated with prospective execution of the second batch queue; and responsive to a comparison of the first performance metrics and the second performance metrics, switching from the execution of the first batch queue to an execution of the second batch queue. (These steps are practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation/judgment).)
2A Prong 2 & 2B: The claim does not recite any additional elements.

Regarding Claim 13
2A Prong 1: 
responsive to receiving, from the user, a selection of the second batch queue, switching from execution of the first batch queue to execution of the second batch queue. (This step is practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation/judgment).)
2A Prong 2: 
displaying a first efficiency report to a user, wherein the first efficiency report comprises runtime performance metrics associated with execution of the first batch queue; displaying a second efficiency report to the user, wherein the second efficiency report comprises estimated performance metrics associated with prospective execution of the second batch queue; (The steps are directed to presenting information, is understood to be insignificant extra- solution activity as presenting offer and gathering statistics. See MPEP 2106.05(g).)
2B:
displaying a first efficiency report to a user, wherein the first efficiency report comprises runtime performance metrics associated with execution of the first batch queue; displaying a second efficiency report to the user, wherein the second efficiency report comprises estimated performance metrics associated with prospective execution of the second batch queue; (This step is directed to presenting information, which is understood to be insignificant extra-solution activity and is well understood, routine and conventional activity of presenting offers as identified by the court (MPEP 2106.05(d)(ll)(iv)))))

Regarding Claim 14
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2: 
storing at least one of the first batch queue or the second batch queue in a memory device. (The step directed to storing information, is understood to be insignificant extra- solution activity and data gathering. See MPEP 2106.05(g).)
2B:
storing at least one of the first batch queue or the second batch queue in a memory device. (This step is directed to storing information, which is understood to be insignificant extra-solution activity and is well understood, routine and conventional activity as identified by the court (MPEP 2106.05(d)(ll)(IV)))))

Regarding Claim 15
2A Prong 1: 
The method of claim 1, wherein generating the first batch queue comprises: forming, using a priority metric, a priority queue for the plurality of MLMs; and performing a plurality of MLM placement operations, wherein individual MLM placement operations comprise: selecting a next MLM in the priority queue; placing the selected MLM, using the threshold utilization and the execution metrics for the selected MLM, into at least one of: an existing MLM batch of the first batch queue, or a new MLM batch of the first batch queue. (These steps are practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation/judgment).)
2A Prong 2 & 2B: The claim does not recite any additional elements.

Regarding Claim 17
2A Prong 1: 
estimate the execution metrics for the MLM using one or more of: an architecture of the MLM, a size of an input into the MLM, a number of computational operations associated with the MLM, or one or more number formats used by the computational operations associated with the MLM. (This step is practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation/judgment).)
2A Prong 2: 
wherein to obtain the execution metrics for an MLM of the plurality of MLMs, the processing device is to perform at least one of: collect the execution metrics during individual execution of the MLM; (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and data gathering. See MPEP 2106.05(g).)
2B:
wherein to obtain the execution metrics for an MLM of the plurality of MLMs, the processing device is to perform at least one of: collect the execution metrics during individual execution of the MLM; (This step is directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity and is well understood, routine and conventional activity of transmitting and receiving data as identified by the court (MPEP2106.05(d)(ll)(i))))

Regarding Claim 18
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
wherein the combined expected utilization of the set of computational resources by the first MLM batch characterizes at least one of: an expected utilization of memory resources during parallel execution of one or more MLMs of the first MLM batch, or an expected utilization of one or more processing units during parallel execution of the one or more MLMs of the first MLM batch. (The specification of data to be stored is understood to be a field of use limitation. The limitation further specifies the combined expected utilization - See MPEP 2106.05(h).)

Regarding Claim 19
2A Prong 1: 
wherein the processing device is further to: subsequent to initiating execution of the first MLM batch, generate at least a second batch queue, wherein the second batch queue comprises at least one MLM batch that is different from each MLM batch of the first batch queue; determine first performance metrics associated with execution of the first batch queue; compute second performance metrics associated with prospective execution of the second batch queue; and responsive to a comparison of the first performance metrics and the second performance metrics, switching from the execution of the first batch queue to an execution of the second batch queue.  (These steps are practically performable in the human mind and is understood to be a recitation of a mental process (i.e., evaluation/judgment).)
2A Prong 2 & 2B: The claim does not recite any additional elements.

	Regarding Claim 20 
2A Prong 1: The claim does not recite any Abstract idea.
2A Prong 2 & 2B: 
wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system implemented using one or more application programming interfaces; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system for generating or presenting at least one of augmented reality content, virtual reality content, or mixed reality content; a system implemented using a robot; a system for performing conversational AI operations; a system for generating synthetic data; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. (The specification of data to be stored is understood to be a field of use limitation. The limitation further specifies the combined expected utilization - See MPEP 2106.05(h).)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-11, 15-18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al ("Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing", hereinafter "Choi") in view of Narayanan et al. ("Accelerating Model Search with Model Batching", hereinafter "Narayanan").

Regarding Claim 1
Choi discloses: A method comprising:
receiving an identification of a plurality of machine learning models (MLMs) for execution on a set of computational resources ([Section 4.1, Section 4.3, Algorithm 1, and Fig 8] describes multi-model ML inference serving, which aims to assign incoming inference requests to the minimal number of GPUs.); 
obtaining execution metrics characterizing expected utilization of the set of computational resources during execution of individual MLMs of the plurality of MLMs ([Page 200 left column last para, Section 4.1, Section 4.3, and Algorithm 1] describes collecting profile information (i.e. execution metrics) for each model.); 
generating a first batch queue comprising one or more MLM batches ([Section 4.2-4.3 and Algorithm 1] describes M models placed in partitions and gpulet (i.e. MLM batches).), wherein at least one MLM batch comprises one or more MLMs of the plurality of MLMs, at least one MLM batch having a combined expected utilization of the set of computational resources not exceeding a threshold utilization ([Section 4.3, Section 5.2, and Algorithm 1] describes being within SLO constraints. [Section 4.4-4.5 and Algorithm 2] describes inference aware scheduling to prevent GPUs exceeding its given limits.); and 
initiating parallel execution of a first MLM batch of the one or more MLM batches of the first batch queue. ([Page 200 right column third bullet, Section 4.4 second para, Section 5.1 under “ML Models:”] describes gpulets for concurrent ML inference execution (i.e. parallel execution) on partitions of GPUs.)
Choi does not explicitly disclose: MLM batches;
However, Narayanan discloses in the same field of endeavor: receiving an identification of a plurality of machine learning models (MLMs) for execution on a set of computational resources ([Abstract, Section 2, and Figure 1] describes simultaneously running multiple models (called a model batch) on a GPU.); generating a first batch queue comprising one or more MLM batches, wherein at least one MLM batch comprises one or more MLMs of the plurality of MLMs ([Abstract, Section 2, and Figure 1] explicitly discloses Model Batch’s.); initiating parallel execution of a first MLM batch of the one or more MLM batches of the first batch queue. ([Abstract, Section 2, and Figure 1] describe parallel executions of a model batch.)
It would have been obvious to a person of ordinary skill in art before the effective filling date of the invention to implement the function of Model batching disclosed by Narayanan into the method of Machine Learning Models on Multi-GPU Servers disclosed by Choi to explicitly disclose model batches. The modification would have been obvious because one of the ordinary skills of the art would be motivated to utilize the feature of Model batching disclosed by Narayanan as all the references are in the field of machine learning. A person of ordinary skill of the art would have been motivated to perform the combination for being able to share processing steps among different models and obtain performance gains.

Regarding Claim 16
Choi in view of Narayanan discloses: A system comprising: a memory device; and a processor, communicatively coupled to the memory device, ([Sections 4-5 and Fig 8], Choi discloses a system.) to: (Claim 16 is a system claim that corresponds to claim 1 and the rest of the limitations are rejected on the same ground)

Regarding Claim 21
Choi in view of Narayanan discloses: A processor comprising processing circuitry to perform operations ([Sections 4-5 and Fig 8], Choi discloses processing circuitry.)  comprising: (Claim 21 is a claim that corresponds to claim 1 and the rest of the limitations are rejected on the same ground)

Regarding Claim 2
Choi in view of Narayanan discloses: The method of claim 1, wherein the execution metrics characterizing expected utilization of the set of computational resources include at least one of: a size of input data into an MLM of the plurality of MLMs, a total memory used during execution of the MLM, a peak memory use during execution of the MLM, or a peak processing clock speed during execution of the MLM. ([Page 200 left column last para, Section 4.1, Section 4.3, and Algorithm 1], Choi describes collecting profile information (i.e. execution metrics) for each model including SLO and batch size.)

Regarding Claim 3
Choi in view of Narayanan discloses: The method claim 1, wherein the execution metrics further include expected utilization of one or more virtual processing units supported by the set of computational resources.  ([Abstract, Section 4.1, Section 4.3, Algorithm 1, and Fig 8], Choi describes SLO and batch size for virtual GPUs.)

Regarding Claim 4
Choi in view of Narayanan discloses: The method of claim 1, wherein obtaining the execution metrics for an MLM of the plurality of MLMs comprises: collecting the execution metrics during individual execution of the MLM.  ([Section 4.3, and Algorithm 1], Choi discloses executing each model in algorithm 1.)

Regarding Claim 5
Choi in view of Narayanan discloses: The method of claim 4, further comprising: storing the collected execution metrics in a memory device.  ([Sections 4-5, Table 1, Algorithm 1, and Fig 8] Choi describes collecting profile information and metrics stored in variables.)

Regarding Claim 6
Choi in view of Narayanan discloses: The method of claim 1, wherein obtaining the execution metrics for an MLM of the plurality of MLMs comprises estimating the execution metrics for the MLM using one or more of:
an architecture of the MLM, a size of an input into the MLM, a number of computational operations associated with the MLM, or one or more number formats used by the computational operations associated with the MLM.  ([Page 200 left column last para, Section 4.1, Section 4.3, and Algorithm 1], Choi describes collecting profile information (i.e. execution metrics) for each model including SLO and batch size.)

Regarding Claim 7
Choi in view of Narayanan discloses: The method of claim 1, wherein the combined expected utilization of the set of computational resources by the first MLM batch characterizes expected utilization of memory resources during parallel execution of one or more MLMs of the first MLM batch.  ([Section 4.3, Section 5.2, and Algorithm 1], Choi describes being within SLO constraints. [Section 4.4-4.5 and Algorithm 2], Choi describes inference aware scheduling to prevent GPUs exceeding its given limits.)

Regarding Claim 8
Choi in view of Narayanan discloses: The method of claim 7, wherein the combined expected utilization of the set of computational resources by the first MLM batch characterizes expected utilization of one or more processing units during parallel execution of the one or more MLMs of the first MLM batch. ([Section 4.3, Section 5.2, and Algorithm 1], Choi describes being within SLO constraints. [Section 4.4-4.5 and Algorithm 2], Choi describes inference aware scheduling to prevent GPUs exceeding its given limits.)

Choi in view of Narayanan discloses: The method of claim 1, wherein of the set of computational resources comprises at least one of a central processing unit (CPU), a data processing unit (DPU), or a graphics processing unit (GPU).  ([Section 4.3, Section 5.2, and Algorithm 1], Choi describes being within SLO constraints. [Section 4.4-4.5 and Algorithm 2], Choi describes inference aware scheduling to prevent GPUs exceeding its given limits.)

Regarding Claim 9
Choi in view of Narayanan discloses: The method of claim 1, further comprising: initiating, concurrently with the parallel execution of the first MLM batch, parallel execution of a second MLM batch of the one or more MLM batches of the first batch queue, the first MLM batch and the second MLM batch being executed on: two or more separate graphics processing units (GPUs), or two or more separate virtual GPUs supported by a same GPU.  ([Section 4.3, Section 1 and Algorithm 1], Choi describes heterogeneous ML models to be mapped to multiple gpulets (i.e. separate virtual GPUs)

Regarding Claim 10
Choi in view of Narayanan discloses: The method of claim 1, further comprising: responsive to completing execution of the first MLM batch, initiating parallel execution of a second MLM batch of the one or more MLM batches of the first batch queue. ([Section 4.2-4.3 and Algorithm 1] describes M models placed in multiple partitions and gpulets (i.e. MLM batches).)

Regarding Claim 11
Choi in view of Narayanan discloses: The method of claim 1, further comprising: subsequent to initiating execution of the first MLM batch, generating at least a second batch queue, wherein the second batch queue comprises at least one MLM batch that is different from at least one other MLM batch of the first batch queue.  ([Section 4.2-4.3 and Algorithm 1] describes M models placed in multiple partitions and gpulets (i.e. MLM batches).)

Regarding Claim 15
Choi in view of Narayanan discloses: The method of claim 1, wherein generating the first batch queue comprises: forming, using a priority metric, a priority queue for the plurality of MLMs ; and performing a plurality of MLM placement operations, wherein individual MLM placement operations comprise: selecting a next MLM in the priority queue; placing the selected MLM, using the threshold utilization and the execution metrics for the selected MLM, into at least one of: an existing MLM batch of the first batch queue, or a new MLM batch of the first batch queue. ([Section 4 Algorithm 1 and Fig 8], Choi describes model priority where each model is sorted in ascending order by rate and SLO, then placing models in partitions and gpulets.)

Regarding Claim 17
Choi in view of Narayanan discloses: The system of claim 17, wherein to obtain the execution metrics for an MLM of the plurality of MLMs, the processing device is to perform at least one of: collect the execution metrics during individual execution of the MLM; or. estimate the execution metrics for the MLM using one or more of:an architecture of the MLM, a size of an input into the MLM, a number of computational operations associated with the MLM, or one or more number formats used by the computational operations associated with the MLM. ([Page 200 left column last para, Section 4.1, Section 4.3, and Algorithm 1], Choi describes collecting profile information (i.e. execution metrics) for each model including SLO and batch size.)

Regarding Claim 18
Choi in view of Narayanan discloses: The system of claim 17, wherein the combined expected utilization of the set of computational resources by the first MLM batch characterizes at least one of: an expected utilization of memory resources during parallel execution of one or more MLMs of the first MLM batch, or an expected utilization of one or more processing units during parallel execution of the one or more MLMs of the first MLM batch.  ([Section 4.3, Section 5.2, and Algorithm 1], Choi describes being within SLO constraints. [Section 4.4-4.5 and Algorithm 2], Choi describes inference aware scheduling to prevent GPUs exceeding its given limits.)

Regarding Claim 20
Choi in view of Narayanan discloses: The system of claim 17, wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system implemented using one or more application programming interfaces; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system for generating or presenting at least one of augmented reality content, virtual reality content, or mixed reality content; a system implemented using a robot; a system for performing conversational AI operations; a system for generating synthetic data; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. ([Section 4.3, Section 1 and Algorithm 1], Choi describes heterogeneous ML models to be mapped to multiple gpulets (i.e. virtual GPUs)

Claim(s) 12-14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Choi in view of Narayanan and Abrol et al. (US 10,884,636 B1, hereinafter "Abrol") .

Regarding Claim 12
Choi in view of Narayanan does not explicitly discloses: The method of claim 12, further comprising: determining first performance metrics associated with execution of the first batch queue; computing second performance metrics associated with prospective execution of the second batch queue; and responsive to a comparison of the first performance metrics and the second performance metrics, switching from the execution of the first batch queue to an execution of the second batch queue. 
However, Abrol discloses in the same field of endeavor: The method of claim 12, further comprising: determining first performance metrics associated with execution of the first batch queue; computing second performance metrics associated with prospective execution of the second batch queue; and responsive to a comparison of the first performance metrics and the second performance metrics, switching from the execution of the first batch queue to an execution of the second batch queue. ([Col 29 lines 38 lines 1-35, Col 68 lines 15-53, Fig. 3C, and Fig 4-10] describes workload performance in a storage system and switching computing instances.)
It would have been obvious to a person of ordinary skill in art before the effective filling date of the invention to implement the function of workload performance disclosed by Abrol into the method of Choi in view of Narayanan to determine performance metrics and switching executions. The modification would have been obvious because one of the ordinary skills of the art would be motivated to utilize the feature of workload performance disclosed by Abrol as all the references are in the field of machine learning. A person of ordinary skill of the art would have been motivated to perform the combination for being able to evaluate workload performance and perform preventive actions.

Regarding Claim 13
Choi in view of Narayanan and Abrol discloses: The method of claim 12, further comprising: displaying a first efficiency report to a user, wherein the first efficiency report comprises runtime performance metrics associated with execution of the first batch queue; displaying a second efficiency report to the user, wherein the second efficiency report comprises estimated performance metrics associated with prospective execution of the second batch queue; and responsive to receiving, from the user, a selection of the second batch queue, switching from execution of the first batch queue to execution of the second batch queue. ([Col 69-77 and Fig 10], Abrol discloses a user interface for displaying performance metrics and receiving user changes.)

Regarding Claim 14
Choi in view of Narayanan and Abrol discloses: The method of claim 12, further comprising: storing at least one of the first batch queue or the second batch queue in a memory device. ([Section 4, Table 2, Fig 8, and Artifact Appendix], Choi describes utilizing memory for scheduling.)

Regarding Claim 19
Choi in view of Narayanan and Abrol discloses: The system of claim 17, wherein the processing device is further to: subsequent to initiating execution of the first MLM batch, generate at least a second batch queue, wherein the second batch queue comprises at least one MLM batch that is different from each MLM batch of the first batch queue; determine first performance metrics associated with execution of the first batch queue; compute second performance metrics associated with prospective execution of the second batch queue; and responsive to a comparison of the first performance metrics and the second performance metrics, switching from the execution of the first batch queue to an execution of the second batch queue.  ([Col 29 lines 38 lines 1-35, Col 68 lines 15-53, Fig. 3C, and Fig 4-10], Abrol describes workload performance in a storage system and switching computing instances.)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Kim et al. (US 20210117977 A1) describes parallel machine learning models.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TEWODROS E MENGISTU whose telephone number is (571)270-7714. The examiner can normally be reached Mon-Fri 9:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ABDULLAH KAWSAR can be reached at (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TEWODROS E MENGISTU/           Examiner, Art Unit 2127
Read full office action
Prosecution Timeline

Aug 04, 2023
Application Filed
Mar 20, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/168,227
Patent 12566817
AUTOMATIC MACHINE LEARNING MODEL EVALUATION
2y 5m to grant Granted Mar 03, 2026
16/872,322
Patent 12482032
Selective Data Rejection for Computationally Efficient Distributed Analytics Platform
2y 5m to grant Granted Nov 25, 2025
17/046,292
Patent 12450465
NEURAL NETWORK SYSTEM, NEURAL NETWORK METHOD, AND PROGRAM
2y 5m to grant Granted Oct 21, 2025
18/387,799
Patent 12400252
ARTIFICIAL INTELLIGENCE BASED TRANSACTIONS CONTEXTUALIZATION PLATFORM
2y 5m to grant Granted Aug 26, 2025
18/984,272
Patent 12380369
HYPERPARAMETER TUNING IN AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA) MODELS
2y 5m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
49%
Grant Probability
77%
With Interview (+28.2%)
4y 5m
Median Time to Grant
Low
PTA Risk
Based on 127 resolved cases by this examiner. Grant probability derived from career allow rate.