Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that in preparing responses, the applicant fully consider the references cited in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
Specification
The disclosure is objected to because of the following informalities: paragraph [0014], where it states ",,".
Appropriate correction is required.
The disclosure is objected to because of the following informalities: paragraph [0031], where it states "a measurements".
Appropriate correction is required.
The disclosure is objected to because of the following informalities: paragraph [0033], where it states "a parameters".
Appropriate correction is required.
The disclosure is objected to because of the following informalities: paragraph [0033], where it states "20 word".
Appropriate correction is required.
The disclosure is objected to because of the following informalities: paragraph [0053], where it states "practices .For".
Appropriate correction is required.
The disclosure is objected to because of the following informalities: paragraph [0053], where it states "third party".
Appropriate correction is required.
35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112, requires the specification to be written in “full, clear, concise, and exact terms.” The specification is replete with terms which are not clear, concise and exact. The specification should be revised carefully in order to comply with 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112. Examples of some unclear, inexact or verbose terms used in the specification are: paragraph [0053], where it states "because it end user rather than the service provider that shoulders the cost of the hardware performs the load balancing operations".
The disclosure is objected to because of the following informalities: paragraph [0072], where it states "AI mode;".
Appropriate correction is required.
The disclosure is objected to because of the following informalities: paragraph [0079], where it states "AI mode;".
Appropriate correction is required.
Claim Objections
Claim 8 is objected to because of the following informality:
Claim 8, line 7 states: “comprising compute resources the a plurality of model endpoints executing instances of the”. It is unclear whether “plurality of endpoints’, is an established name or short reference to how previous endpoints were referenced.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims [ 1-20 ] rejected under 35 U.S.C. 101 because the claimed invention is directed to an mental process without significantly more.
Step 1:
Claim 1 is directed to “A system for improved utilization of compute hardware distributed among a plurality of endpoints of a cloud-based service operating a trained machine learning model, the system comprising:” a system of steps, and is therefore directed to a process, which is one of the four statutory categories.
Step 2A, Prong One:
Claim 1 recites the limitations:
[…]to determine a net resource consumption for processing tasks in a workload […];
determine a distribution of available resource capacity […];
allocate parallelizable tasks of the workload […];
All of which can be performed in the human mind through observation, evaluation, judgement, and opinion, with the aid of pen and paper, and therefore reciting a mental process.
Accordingly, claim 1 recited a judicial exception (i.e. an abstract idea).
Step 2A Prong Two:
The additional elements recited in claim 1 include:
in a workload generated by a client application
In a shared resource pool comprising compute resources
Among the compute resources at the multiple model endpoints
Regarding the additional element (i), the limitation recites where the aforementioned “task” and acts as mere instructions to implement the limitations, which can be done by the human mind or a computer See MPEP 2106.05(f). See MPEP 2106.05(g).
Regarding the additional element (ii), the limitation recited amounts well-understood activity, as it is common for a “distribution” of a resource is commonly understood to come from somewhere like a “pool” of data. See MPEP 2106.05(g).
Regarding the additional element (iii), the limitation recites where the aforementioned “task” and acts as mere instructions to implement the limitations, which can be done by the human mind or a computer See MPEP 2106.05(f). See MPEP 2106.05(g).
Step 2B:
Regarding the additional element (i), the limitation is reciting the location in which the process for processing tasks is taking place. The courts have found that insignificant extra-solution activity is not enough to amount to significantly more than the recited judicial exception. See MPEP 2106.05(g).
Regarding the additional element (ii), the limitation is reciting the distribution of capacity found within a shared resource pool, which is later used to execute activities done by the trained machine learning model, which can be done by the human mind, making these instructions to add to the exception. The courts have found that adding mere instructions to apply the exception is not enough to amount to significantly more than the recited judicial exception. See MPEP 2106.05(a) and 2106.05(f).
Regarding the additional element (iii), the limitation is reciting the allocation of tasks from the workload to the resources found in the endpoints based on the “net resource consumption” on tasks and how much available capacity is found within the shared resource pool. Once again, these actions can be done by the human mind, making these instructions add to the exception. The courts have found that adding mere instructions to apply the exception is not enough to amount to significantly more than the recited judicial exception. See MPEP 2106.05(a) and 2106.05(f).
The combination of these additional elements amounts to a method comprising steps which can be performed mentally, implemented by generic computing components, and comprising a step of insignificant extra-solution and well-understood, routine and conventional activity.
Therefore, the additional elements, when considered individually and in combination, fail to add an inventive concept to the claim.
Consequently, claim 1 as a whole does not amount to significantly more than the recited judicial exceptions and the claim is not eligible.
Claim 2 is dependent on claim 1, and therefore inherits the same judicial exception recited in claim 1. Further, claim 2 recites wherein the parallelizable tasks of the workload are allocated among the plurality of model endpoints according to a target allocation distribution that is based on a fractional distribution of the available resource capacity across the plurality of model endpoints within the shared resource pool which can be performed in the human mind through observation, evaluation, judgement, and opinion, with the aid of pen, paper, or a computer, and therefore reciting a mental process.
Accordingly, for the same reasons presented with respect to claim1, the additional elements are not indicative of integration into a practical application, nor do they amount to significantly more than the recited judicial exceptions. Thus, claim 2 is not eligible.
Claim 3 is dependent on claim 1, and therefore inherits the same judicial exception recited in claim 1 and 2, including the judicial exceptions “wherein the parallelizable tasks of the workload are allocated among the plurality of model endpoints according to a target allocation distribution that is based on a fractional distribution of the available resource capacity across the plurality of model endpoints within the shared resource pool” recited in claim 2. Claim 3 recites the limitation “wherein the target allocation distribution is a distribution of the net resource consumption of the tasks in the workload among the plurality of model endpoints that is proportional to the fractional distribution of the available resource capacity among the plurality of model endpoints” Since a person would still be able to base their allocation on the allocation details established in claim 3 in the human mind, through observation, evaluation, judgement, and opinion, with the aid of pen, paper, or a computer, the limitation in claim 3 is still reciting a mental process.
Accordingly, for the same reasons presented with respect to claim 1 and 2, the additional elements are not indicative of integration into a practical application, nor do they amount to significantly more than the recited judicial exceptions. Thus, claim 3 is not eligible.
Claim 4 is dependent on claim 1, and therefore inherits the same judicial exception recited in claim 1. Further, claim 4 recites determine resource consumption characteristics of the workload; and determine the net resource consumption for each of the parallelizable tasks of the workload […] which can be performed in the human mind through observation, evaluation, judgement, and opinion, with the aid of pen, paper, and a computer, and are therefore reciting a mental process.
Claim 4 recites the additional element of based on the resource consumption characteristics, which is mere instructions to apply the exception for the same reasons presented with respect to claim 1.
These additional elements of mere instructions to apply the exception are not indicative of integration into a practical application. Further, these additional elements of mere instructions to apply the exception are not enough to amount to significantly more than the recited judicial exceptions. Even when considered in combination with the additional elements of claim 1, the additional elements do not amount to significantly more than the recited judicial exceptions and do not provide an inventive concept. Thus, claim 4 is not eligible.
Claim 5 is dependent on claim 4, and therefore inherits the same judicial exception recited in claim 4. Further, claim 5 recites, […] determined based on an identity of the transformer model and a set of inputs to the workload which can be performed in the human mind through observation, evaluation, judgement, and opinion, with the aid of pen, paper, or a computer, and therefore reciting a mental process.
Accordingly, for the same reasons presented with respect to claim 4, the additional elements are not indicative of integration into a practical application, nor do they amount to significantly more than the recited judicial exceptions. Thus, claim 5 is not eligible.
Claim 6 is dependent on claim 4, and therefore inherits the same judicial exception recited in claim 4. Further, claim 6 recites, […] determined at least in part based on a size of data input to each of the parallelizable tasks and an estimated size of data output in response to processing of the data input which can be performed in the human mind through observation, evaluation, judgement, and opinion, with the aid of pen, paper, or a computer, and therefore reciting a mental process.
Accordingly, for the same reasons presented with respect to claim 4, the additional elements are not indicative of integration into a practical application, nor do they amount to significantly more than the recited judicial exceptions. Thus, claim 6 is not eligible.
Claim 7 is dependent on claim 1, and therefore inherits the same judicial exception recited in claim 1. Further, claim 7 recites determines the distribution of available resource capacity which can be performed in the human mind through observation, evaluation, judgement, and opinion, with the aid of pen, paper, and a computer, and are therefore reciting a mental process.
Claim 7 recites the following additional elements, endpoint discovery mechanism and capacity measurements, which only act as mere instructions to apply the exception for the same reasons presented with respect to claim 1
Accordingly, for the same reasons presented with respect to claim 1, the additional elements are not indictive of integration into a practical application, not do they amount to significantly more than their recited exceptions. Thus, claim 7 is not eligible.
Claim 8 recites A method for improved utilization of compute hardware distributed among multiple model endpoints of a cloud-based service operating a trained machine learning model, the method comprising: determining a net resource consumption for […], determining a distribution of available resource capacity […], and allocating parallelizable tasks of the workload […] to perform the steps of the method of claim 1. Thus, for the same reasons presented with respect to claim 1, claim 8 is rejected because the claimed invention is directed to an abstract idea without significantly more.
For clarity of the record, the additional elements recited above amount to mere instructions to apply the exception, which is neither indicative of integration into a practical application nor amounts to significantly more than the recited judicial exceptions.
Claims 9-14 recite substantially the same limitations as those recited in claims 2-7, respectively, applied to the method of claim 8. Thus, for the same reasons presented with respect to claims 2-7, claims 9-14 are directed to an abstract idea without significantly more and are not eligible.
Claim 15 recites One or more tangible computer-readable storage media encoding processor executable instructions for executing a computer process for […] the computer process comprising: determining a net resource consumption […], determining a distribution of available resource capacity […], allocating parallelizable tasks of the workload […], to perform the steps of the method of claim 1. Thus, for the same reasons presented with respect to claim1, claim 15 is rejected because the claimed invention is directed to an abstract idea without significantly more.
For clarity of the record, the additional elements recited above amount to mere instructions to apply the exception, which is neither indicative of integration into a practical application nor amounts to significantly more than the recited judicial exceptions.
Claims 16-20 recite substantially the same limitations as those recited in claims 2-7, respectively, applied to the method of claim 8. Thus, for the same reasons presented with respect to claims 2-7, claims 16-20 are directed to an abstract idea without significantly more and are not eligible.
Claim 1 is rejected under 35 U.S.C. because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because of software per se.
Regarding claim 1, it recites, “A system for improved utilization of compute hardware distributed among a plurality of endpoints of a cloudbased service operating a trained machine learning model, the system comprising:”
Although the claim is drafted as a “system”, the claim does not recite any structural components that would constitute a statutory machine. Instead, the claim merely recites instructions configured to perform certain machine learning functions. The claim is not tied to any physical components that constitute a machine, and as explained in MPEP 2106.03, non-statutory subject matter includes software per se, such as a program or instructions not embodied in a computer readable medium or implemented by a machine.
Claims 2-7 depend on claim 1 and do not cure this deficiency and are therefore rejected under the same premise.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-20 are rejected under 35 U.S.SC. 102(a)(2) as being anticipated by Fang (U.S. Pub No. 2025/0097163 A1).
Regarding claim 1, Fang teaches
a load balancer stored in the memory and executable to: determine a distribution of available resource capacity in a shared resource pool comprising compute resources at the plurality of model endpoints executing instances of the trained machine learning model ([0005] –“estimating that (i) the first client is using a first subset of the first target amount of resource and not using a second subset of first target amount of resource, and (ii) the second client is using a third subset of the second target amount of resource and not using a fourth subset of second target amount of resource; determining that the second subset of first target amount of resource is greater than the fourth subset of second target amount of resource”)
For clarity of the record, the examiner would like to point out paragraph 0018 of the specification, which recites, “According to one implementation, the disclosed load balancing logic allocates parallelizable workload tasks among cloud-based endpoints of a cloud-based AI service, such as a transformer model, in a manner that ensures the net resource consumption of the allocated tasks is distributed across the multiple endpoints within the shared resource pool according to a target allocation distribution. In one implementation, the target allocation distribution is based on (e.g., proportional to) a fractional distribution of the available resource capacity across the multiple endpoints within the shared resource pool.”. Thus, it is understood that through usage of a load balancer, resources are being fractionally distributed in accordance to the set amount of resource for each endpoint, which in other iterations can be understood to be “clients”, as both receive the allocation.
Regarding claim 2, Fang teaches the system of claim 1, wherein the parallelizable tasks of the workload are allocated among the plurality of model endpoints according to a target allocation distribution that is based on a fractional distribution of the available resource capacity across the plurality of model endpoints within the shared resource pool ([0005] – “allocating, to respectively a first client and a second client, a first target amount of resource and a second target amount of resource for using a service”)
For clarity of the record, the Examiner would like to point to paragraph [0018] of the Specification to which recites “the target allocation distribution is based on (e.g., proportional to) a fractional distribution of the available resource capacity across the multiple endpoints within the shared resource pool”. Thus, the tasks found in the workload are allocated via the fractional distribution of available resource capacity to the endpoints containing the workload.
Regarding claim 3 Fang teaches the system of claim 2, wherein the target allocation distribution is a distribution of the net resource consumption of the tasks in the workload among the plurality of model endpoints that is proportional to the fractional distribution of the available resource capacity among the plurality of model endpoints ([0005] – “ allocating, to respectively a first client and a second client, a first target amount of resource and a second target amount of resource for using a service; receiving, from a third client, a request for allocating resources for using the service; estimating that (i) the first client is using a first subset of the first target amount of resource and not using a second subset of first target amount of resource, and (ii) the second client is using a third subset of the second target amount of resource and not using a fourth subset of second target amount of resource; determining that the second subset of first target amount of resource is greater than the fourth subset of second target amount of resource; and allocating at least a portion of the second subset of first target amount of resource as a third target amount of resource to the third client, responsive at least in part to determining that the second subset of first amount of resource is greater than the fourth subset of second amount of resource.”)
For clarity of the record, the Examiner would like to point to paragraph [0018] of the Specification, which recites, “the target allocation distribution is based on (e.g., proportional to) a fractional distribution of the available resource capacity across the multiple endpoints within the shared resource pool.” Thus, it is understood that the targeted allocation distribution is dependent on the comparison, determination, and allocation of available resource capacity found across the various endpoints.
Regarding claim 4, Fang teaches the system of claim 1, wherein the consumption determination engine is configured to determine resource consumption characteristics of the workload; ([0005] – estimating that (i) the first client is using a first subset of the first target amount of resource and not using a second subset of first target amount of resource, and (ii) the second client is using a third subset of the second target amount of resource and not using a fourth subset of second target amount of resource.) and determine the net resource consumption for each of the parallelizable tasks of the workload based on the resource consumption characteristics ([0005] – “determining that the second subset of first target amount of resource is greater than the fourth subset of second target amount of resource; and allocating at least a portion of the second subset of first target amount of resource as a third target amount of resource to the third client, responsive at least in part to determining that the second subset of first amount of resource is greater than the fourth subset of second amount of resource”)
For clarity of the record, the examiner would like to point to paragraph [0014], which states “According to one implementation, a method provides load-balancing of tasks cross various model endpoints of a trained machine learning model. The method includes determining net resource consumption for processing tasks of a workload generated by a client application for input to the trained machine learning model; determining a distribution of available resource capacity across each of the model endpoints; and allocating parallelizable tasks of the workload to compute resources at the multiple model endpoints based on the net resource consumption of the tasks and on the distribution of available resource capacity in the shared resource pool.”. Thus, it is understood that the means of determining the resource consumption characteristics of the workload and determining the net resource consumption for each of the parallelizable tasks of the workload is analogous to estimating resource capacity based off the characteristics of the workload and the endpoints, determining which has a greater capacity against the other, and allocating resources based off it.
Regarding claim 5, Fang teaches the system of claim 4, wherein the trained machine learning model is a transformer model and the net resource consumption for each of the parallelizable tasks is determined based on an identity of the transformer model and a set of inputs to the workload – ([0002] – “While generative artificial intelligence (GenAI) is still in its early stages of adoption, several dedicated platforms have emerged that specialize in training and generating the foundation models. Machine-learning models can be trained on big datasets and leverage deep-learning technologies. For example, a machine-learning model may use a transformer model and/or large language model (LLM).”)
For clarity of the record, the Examiner would like to point to paragraph [0011] of the Specification which recites “transformer model is a neural network that learns context and meaning by tracking relationships in sequential data. Examples of transformer-based models include GPT (Generative Pre-trained Transformer), OPT (Open Pretrained Transformer), and Bloom language model (Bioscience Large Open-science Open-access Multilingual). It is common for transformer models to be provided to end customers as cloud-based software services.” Thus, any of the referred “transformers” is to be understood as a means to take in data and track how it is relation to other processes and data within the machine.
Regarding claim 6, Fang teaches, the system of claim 4, wherein the net resource consumption for each of the parallelizable tasks is determined at least in part based on a size of data input to each of the parallelizable tasks and an estimated size of data output in response to processing of the data input ([0228] - “Once properly validated, OMS 1150 may then invoke the order provisioning subsystem (OPS) 1155 that is configured to provision resources for the order including processing, memory, and networking resources. The provisioning may include allocating resources for the order and configuring the resources to facilitate the service requested by the client order. The manner in which resources are provisioned for an order and the type of the provisioned resources may depend upon the type of cloud service that has been ordered by the client. For example, according to one workflow, OPS 1155 may be configured to determine the particular cloud service being requested and identify a number of pods that may have been pre-configured for that particular cloud service. The number of pods that are allocated for an order may depend upon the size/amount/level/scope of the requested service. For example, the number of pods to be allocated may be determined based upon the number of users to be supported by the service, the duration of time for which the service is being requested, and the like. The allocated pods may then be customized for the particular requesting client for providing the requested service.”
For clarity of the record, the Examiner would like to point to paragraph [0030], which recites, “In general, resource consumption characteristics are characteristics usable to quantify the resource capacity (e.g., GPU utilization) needed to execute a task or workload. Example capacity consumption characteristics include characteristics of input files identified in the workload 110, such as a size of each file, the amount of data on each file, the type of data in each file, and/or the amount of memory required to read each file.” Thus, when dealing with the determining of resource consumption of the parallelizable tasks, size is a factor that must be taken into consideration when dealing with the processing of the data it contains.
Regarding claim 7, Fang teaches, the system of claim 1, wherein the load balancer determines the distribution of available resource capacity by requesting, from an endpoint discovery mechanism, capacity measurements pertaining to availability of compute resources supporting execution of the model instances at the model endpoints, the endpoint discovery mechanism being configured to retrieve the capacity measurements from the model endpoints. [(0085) – “The serving operator 225 is a cloud component or service that may facilitate the deployment and management of machine-learning (ML) models for performing real-time inference in a cloud 107. It may automate tasks related to model serving, including scaling the inference service based on demand, load balancing, and routing requests to the appropriate model version or instance.”)
For clarity of the record, the Examiner would like to point to paragraph [0018], which states, “According to one implementation, the disclosed load balancing logic allocates parallelizable workload tasks among cloud-based endpoints of a cloud-based AI service, such as a transformer model, in a manner that ensures the net resource consumption of the allocated tasks is distributed across the multiple endpoints within the shared resource pool according to a target allocation distribution.” Thus, the load balancing logic is understood to be a means of transporting and allocating various facilitations related to the machine-learning (ML) model.
Regarding claim 8, which recites, A method for improved utilization of compute hardware distributed among multiple model endpoints of a cloud-based service operating a trained machine learning model, the method comprising: determining a net resource consumption for processing tasks in a workload generated by a client application for input to the trained machine learning model; determining a distribution of available resource capacity in a shared resource pool comprising compute resources the a plurality of model endpoints executing instances of the trained machine learning model; and allocating parallelizable tasks of the workload among the compute resources at the multiple model endpoints based on the net resource consumption of the tasks and on the distribution of available resource capacity in the shared resource pool. to perform the steps of the method of claim 1. Thus, for the same reasons presented with respect to claim 1, claim 8 is taught by Fang. Claims 9-14 recite substantially the same limitations as those recited in claims 2-7, respectively, applied to the method of claim 8. Thus, for the same reasons presented with respect to claims 2-7, Fang teaches claims 9-14.
Regarding claim 15, which recites, One or more tangible computer-readable storage media encoding processor- executable instructions for executing a computer process for improved utilization of compute hardware distributed among a plurality of model endpoints of a cloud-based service operating a trained machine learning model, the computer process comprising: determining a net resource consumption for processing tasks in a workload generated by a client application for input to the trained machine learning model; determining a distribution of available resource capacity in a shared resource pool comprising compute resources at the plurality of model endpoints executing instances of the cloud-based AI mode; and allocating parallelizable tasks of the workload among the compute resources at the multiple model endpoints based on the net resource consumption of the tasks and on the distribution of available resource capacity in the shared resource. to perform the steps of the method of claim 1. Thus, for the same reasons presented with respect to claim 1, claim 15 is taught by Fang. Claims 16-20 recite substantially the same limitations as those recited in claims 2-7, respectively, applied to the method of claim 8. Thus, for the same reasons presented with respect to claims 2-7, Fang teaches claims 16-20.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL NWUHA whose telephone number is (571)272-9367. The examiner can normally be reached Monday-Friday; 7:30 am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kevin Young can be reached at (571) 270-3180. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SAMUEL OBINNA NNAJI NWUHA/Examiner, Art Unit 2194
/KEVIN L YOUNG/Supervisory Patent Examiner, Art Unit 2194