DETAILED ACTION
This office action is in response to claims filed 2 March 2026
Claims 1-9, and 12-34 are pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to the rejection made under 35 USC § 103 have been considered but are moot because the arguments do not specifically challenge the new references (METSCH and CAO) applied in the prior rejection of record.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) in claims 21, 27, and 28 are:
“An artificial intelligence (AI) interface service…configured to execute”;
“A scheduler…configured to identify…identify…calculate…select…assign…execute”;
“The scheduler is further configured to schedule”; and
“The scheduler is further configured to schedule”.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. In [0047] Artificial intelligence interface services and Schedulers are run “via processors and memory” implying they are software stored in memory and executed by processors.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Allowable Subject Matter
Claims 29, and 30 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Additionally, claim 31 was not rejected using prior art, but stands rejected under other statues.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 21-28, 31, and 34 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 21,
On lines 24-25, the claim fails to particularly point out or distinctly claim whether the “artificial intelligence (AI) inference service…[is] configured to execute the machine learning model” (lines 7-8) or the “scheduler, communicatively coupled to the AI inference service [is] configured to…execute the machine learning model” (lines 9, and 24-25) executes the machine learning module. In other words, the claim removed clarifying language and now states that the AI inference service, AND the scheduler execute the machine learning module, and therefore, the claim does not particularly point out or distinctly claim which one executes the machine learning model. For examination purposes, the examiner will interpret the machine learning model as being executed by the AI inference service.
Regarding claims 22-28, 31, and 34, they are dependent upon rejected claim 21, and fail to resolve the deficiencies thereof. They are therefore rejected for similar rationale.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 7-9, 12-13, 17-19, 21-23, 27, and 32-34 are rejected under 35 U.S.C. 103 as being unpatentable over METSCH et al. Pub. No.: US 2019/0324799 A1 (hereafter METSCH), in view of ROSS et al. Patent No.: US 10,685,295 B1 (hereafter Ross), in view of CAO et al. Patent No.: US 10,827,020 B1 (hereafter CAO).
ROSS was cited previously.
Regarding claim 1, METSCH teaches the invention substantially as claimed, including:
At least one non-transitory computer readable medium encoded with instructions which, when executed, cause a system to perform actions ([0048] A non-transitory machine readable storage medium including program code, when executed, to cause a programmable processor to perform the method described above or below) comprising:
receiving a request to execute a [computational task] ([0002] A user can request from the computing service to solve a computational task);
identifying one or more nodes of a clustered computing system having a plurality of nodes…each of the one or more nodes comprising a plurality of candidate hardware resources for executing the [computational task] ([0067] FIG. 7 shows an example of predicting 700 workload characteristics. Predicting workload characteristics may enable to distribute workloads on a system such that resources of the system are used more equally and/or a utilization level of resources of the system is more uniform. When receiving 702 an orchestration event, predicting 700 may be performed in a foreground flow during provisioning of a workload. Out of all possible (or shortlisted) resource aggregates/compute hosts (i.e., “nodes”) in a cluster/system those are picked or determined 704, which satisfy the resource request requirements);
identifying one or more candidate hardware resources from the plurality of candidate hardware resources ([0068] For each candidate (or determined shared resource), the involved resources (respectively subsystems) may be identified and the models as stored in the database (see e.g. the examples according to FIG. 4 and FIG. 5) can be loaded 706. The models may be provided 708 by the database. Fingerprints of the workload that is to be handled, and the fingerprint describing the current behavior of the resource may be used, and a future fingerprint may be predicted 710), each of the plurality of hardware resources comprising a processing unit, an accelerator, or a combination thereof ([0035] For example, the subsystems of the shared resource may be at least two of a processor, a network controller interface, an input/output channel, a memory, and a data storage. [0119] The term “processor” or “controller” is by far not limited to hardware exclusively capable of executing software, but may include digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC) (i.e., at least ASICs perform specific tasks at accelerated rates, and therefore are interpreted as “accelerators”), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage);
calculating a score for each of the identified one or more candidate hardware resources ([0069] The prediction can be used to calculate 712 a utility (i.e. utility “score”, as referenced in [0072] cited below) of a resource) based at least on execution priorities for the [computational process] ([0069] This utility calculation can be based on the dominant state (e.g. its expected level of utilization/saturation/throughput/latency etc.) that the resource will reside in once workload(s) to be handled are place on it, as well as the probability/frequency to stay in the state. [0070] The factor C can be used to positively or negatively reward the fact that the resource is for example not in a single high utilization state. Factor C can be a fixed value, or a term based on the probability or the frequency of a state transition (i.e., factor C rewards, or “prioritizes” a predicted execution state of the resource while executing a computational process));
selecting one of the identified candidate hardware resources based at least on the calculating ([0071] The Utility of a resource can then be used to pick (or select 130) the best candidate (or shared resource, respectively). Furthermore other utilities about the resources behavior (e.g. for preventing interference) can be included. [0072] These individual utility scores can be comparable between the sub-systems/resources of a resource aggregate (e.g. a compute hosts) and therefore may allow for fast reasoning);
assigning the [computational process] to the selected identified candidate hardware resource; and executing the [computational process] using the selected identified candidate hardware resource ([0023] Further, the predicted workload characteristics may be provided and selecting 130 one of the at least two determined shared resources is based on the predicted workload characteristics according to the method 100. In some examples, the predicted workload characteristics of the determined shared resources differ from each other, for example if another computational process is already running on one of the determined shared resources and/or the determined shared resources comprise different subsystems, e.g. hardware components. The differing workload characteristics can be used to select 130 the one shared resource to perform the computational process.).
While METSCH discusses selecting and allocating of nodes and corresponding resources to computational processes for execution, METSCH does not explicitly teach that these processes are
machine learning models;
However, in analogous art that similarly teaches allocation of resources, ROSS teaches computational processes as:
machine learning models ([Column 3, Line 66-Column 4, Line 1] The processing system (100) receives a machine learning model to be executed on a special purpose machine learning processor (202). [Column 5, Lines 18-21] The processing system (100) allocates resources of the special purpose machine learning model processor based on the determined amount of resources required by the executable binary (212));
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined ROSS’s teaching of a machine learning model as a computational process to execute using allocated resources, with METSCH’s teaching of allocating resources to a computational process based on node and resource evaluations, to realize, with a reasonable expectation of success, a system that allocates resources to a computational process based on node and resource evaluations, as in METSCH, where the computational processes are machine learning models, as in ROSS. A person having ordinary skill would have been motivated to make this combination to maximize performance and efficiency of a machine learning model task by choosing the optimal combination of machine learning models that will run the machine learning task (ROSS Column 8, Lines 32-37).
While METSCH and ROSS discuss identification of particular nodes in a clustered computing system that satisfy resource request requirements, METSCH and ROSS do not explicitly teach:
identifying one or more nodes of a clustered computing system having a plurality of nodes based at least in part on node affinity of the one or more nodes.
However, in analogous art that similarly allocates computational processes to nodes of a clustered computing system, CAO teaches:
identifying one or more nodes of a clustered computing system having a plurality of nodes based at least in part on node affinity of the one or more nodes ([Column 6, Line 57-Column 7, Line 2] Scheduler extension 230 may further include a configuration executor 236. Configuration executor 236 may execute the assignments of microservices to cluster nodes as determined by microservice allocator 234…In an example implementation, the container orchestration platform (not shown) may include node affinity data to guide the container orchestration platform to place the microservices in the correspondingly assigned cluster nodes (i.e., cluster node identification for selection is based at least partially on “node affinity data”)).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined CAO’s teaching of identifying nodes for selection based on node affinity, with the combination of METSCH and ROSS’s teaching of identifying nodes for use in executing machine learning models that satisfy resource request requirements, to realize, with a reasonable expectation of success, a system that selects nodes for use in executing machine learning models that satisfy resource request requirements, as in METSCH and ROSS, where the requirements include a node affinity requirement, as in CAO. A person having ordinary skill would have been motivated to make this combination to make more efficient workload placement decisions (CAO Column 2, Lines 13-15).
Regarding claim 2, METSCH further teaches:
wherein identifying the one or more candidate hardware resources is further based at least in part on analyzing compute resource metrics of the [computational process] ([0068] For each candidate (or determined shared resource), the involved resources (respectively subsystems) may be identified and the models as stored in the database (see e.g. the examples according to FIG. 4 and FIG. 5) can be loaded 706. The models may be provided 708 by the database. Fingerprints of the workload that is to be handled, and the fingerprint describing the current behavior of the resource (i.e., resource fingerprints represent analysis of metrics related to the resources) may be used, and a future fingerprint may be predicted 710) to determine an approximate compute resource need to run the machine learning model ([0069] This utility calculation can be based on the dominant state (e.g. its expected level of utilization/saturation/throughput/latency etc.) that the resource will reside in once workload(s) to be handled are place on it, as well as the probability/frequency to stay in the state (i.e., expected utilization/saturation/throughput/latency etc. represents the approximate usage or “need” of the resource in question)).
Regarding claim 3, ROSS further teaches:
the actions further comprising: analyzing compute resource metrics based at least in part on analyzing a deep learning model graph structure including what operations are performed on each graph node of a plurality of graph nodes ([Column 6, Lines 50-61] Another benefit to knowing the amount of resources necessary to run a machine learning model is improving data center efficiency. As described above, an example system can determine the amount of resources necessary to run each machine learning model (i.e., “compute resource metrics”). Special purpose machine learning model processors can be tasked with a specific number of machine learning model operations. Thus the number of operations, the amount of IO, and the amount of storage required to execute the operations of computational dataflow graphs representing machine learning models (i.e., “deep learning model graph structure”) that are assigned to execute in the datacenter may be known with a high degree of precision (i.e., operations executed on resources of a computational dataflow graph represent “operations performed on graph nodes”)).
Regarding claim 7, ROSS further teaches:
wherein the assigning is further based at least on compute resource metrics for the machine learning model ([Column 5, Lines 18-21] The processing system (100) allocates resources of the special purpose machine learning model processor based on the determined amount of resources required by the executable binary (212) (i.e., determined amount of resources required by the executable binary is indicative of “computer resource metrics” required by the compiled “machine learning model” as discussed in at least Column 1, Lines 28-38)).
Regarding claim 8, METSCH further teaches:
the clustered computing environment is a multi-node edge (Out of all possible (or shortlisted) resource aggregates/compute hosts in a cluster/system those are picked or determined 704, which satisfy the resource request requirements (i.e., a cluster comprises multiple host nodes, and is therefore considered a “multi-node edge”)).
Regarding claims 9, 12-13, 17, and 19 they comprise limitations similar to those of claims 1-3, and 7. They are therefore rejected for at least similar rationale.
Regarding claim 18, ROSS further teaches:
the compute resource metrics comprise graphics processing unit (GPU) utilization, GPU memory, machine learning model approximated FLOPS requirements, machine learning model memory requirements, central processing unit (CPU) utilization, host memory, inference request count, number of k8 pods, inference request latency, or combinations thereof ([Column 6, Lines 36-49] For example, one computational graph representing a machine learning model may need to perform 20,000 operations per unit time (i.e., “FLOPS requirements”, which along with IO requirements is interpreted as being indicative of CPU or GPU utilization requirements) and use 30 gigabytes per unit time of input/output (IO) to communicate information. Another computational graph representing a second model may need to perform 80,000 operations per unit time and use 10 gigabytes per unit time of IO. If an example special purpose machine learning model processor can perform 100,000 operations per unit time and has 100 gigabytes per unit time of IO, these two models can be run together. By knowing the operations per unit time and the IO per unit time required of a model at compile time, an example system can load-balance special purpose machine learning model processors automatically for optimal model execution (i.e., Column 6 goes on to describe an amount of storage required to execute the operations of computational dataflow graphs (lines 57-61), which are interpreted as host memory requirements)).
Regarding claims 21-23, and 27, they comprise limitations similar to those of claims 1-3, and 7. They are therefore rejected for at least similar rationale.
Regarding claim 32, METSCH further teaches:
wherein identifying the one or more candidate hardware resources is further based at least in part on analyzing the one or more candidate hardware resources in a particular order ([0068] For each candidate (or determined shared resource), the involved resources (respectively subsystems) may be identified and the models as stored in the database (see e.g. the examples according to FIG. 4 and FIG. 5) can be loaded 706. The models may be provided 708 by the database. Fingerprints of the workload that is to be handled, and the fingerprint describing the current behavior of the resource may be used, and a future fingerprint may be predicted 710 (i.e., each resource of the candidate nodes is identified in some “order”)).
Regarding claims 33-34, they comprise limitations similar to claim 32, and are therefore rejected for similar rationale.
Claims 5-6, 15-16, 20, 25-26, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over METSCH, in view of ROSS, in view of CAO, as cited in claims 1, and 9 above, and in further view of ZHAO et al. Pub. No.: US 2019/0384641 A1 (hereafter ZHAO).
ZHAO was cited previously.
Regarding claim 5, while METSCH, ROSS, and CAO discuss calculating scores for node resources, they do not explicitly teach:
wherein the calculating is further based at least on a weighted sum of execution priorities for each of the identified one or more candidate hardware resources
However, in analogous art that similarly teaches scoring of resources in nodes for allocation, ZHAO teaches:
wherein the calculating is further based at least on a weighted sum of execution priorities for each of the identified one or more candidate hardware resources ([0037] At block 440, multiple computing resources (i.e., “identified candidate hardware resources”) are ranked based on status information 330 (i.e., “scores”) of the multiple computing resources so as to obtain a resource list 332. The status information 330 here may involve indicators of various aspects of the computing resources 160. According to example embodiments of the present disclosure, the status information of the multiple computing resources 160 comprises at least any one indicator of processing capacity information, memory resource information and bandwidth resource information of the multiple computing resources 160. [0044] In Equation 1, Status (i) represents status information of the i.sup.th computing resource in the resource pool 320, ProcessingCapacity represents processing capacity information, Weight.sub.processing capacity represents importance of the processing capacity information, MemoryCapacity represents memory resource information, Weight.sub.memory capacity represents importance of the memory resource information, BandWidth represents bandwidth resource information, and Weight.sub.bandwidth represents importance of the bandwidth resource information (i.e., equation 1 sums the weighted values for each resource)).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined ZHAO’s teaching of using a weighted sum of values to determine a score for resources, with the combination of METSCH, ROSS, and CAO’s teaching of determining scores for resources, to realize, with a reasonable expectation of success, a system that determines scores for resources, as in METSCH, ROSS, and CAO, based on a weighted sum, as in ZHAO. A person having ordinary skill would have been motivated to make this combination to enable a user or system to adapt resource scores to better achieve allocation objectives.
Regarding claim 6, ZHAO further teaches:
the execution priorities for the machine learning model comprise per-node per-time period inference request counts, per-node number of kubernetes (k8) pods running, per-node machine learning models running, free processor memory space, processor utilization, or combinations thereof ([0037] At block 440, multiple computing resources are ranked (i.e., “scored”) based on status information 330 of the multiple computing resources so as to obtain a resource list 332. The status information 330 here may involve indicators of various aspects of the computing resources 160. According to example embodiments of the present disclosure, the status information of the multiple computing resources 160 comprises at least any one indicator of processing capacity information, memory resource information (i.e., the priority placed on resources by the machine learning model include at least “free processor memory space” and “processor utilization”) and bandwidth resource information of the multiple computing resources 160 (i.e., [0054] also describes how status information is indicative of % of resource utilization, including processor utilization or presumably memory space utilization)).
Regarding claims 15-16, they comprise limitations similar to claims 5-6, and are therefore rejected for similar rationale.
Regarding claim 20, ZHAO further teaches:
the assigning is further based at least on a comparison between a determined approximate compute resource need to run the machine learning model, and the weighted sum of the execution priorities for each of the identified candidate hardware resources ([0044] In Equation 1, Status (i) represents status information of the i.sup.th computing resource in the resource pool 320, ProcessingCapacity represents processing capacity information, Weight.sub.processing capacity represents importance of the processing capacity information, MemoryCapacity represents memory resource information, Weight.sub.memory capacity represents importance of the memory resource information, BandWidth represents bandwidth resource information, and Weight.sub.bandwidth represents importance of the bandwidth resource information (i.e., equation 1 sums the weighted values for each resource). [0038] At block 450, a mapping between a corresponding layer among the multiple layers and a corresponding computing resource among the multiple computing resources is determined based on the layer list and the resource list, the mapping indicating that one computing resource among the multiple computing resources will process parameters associated with one layer among the multiple layers (i.e., status information represents a weighted sum of execution priorities of the hardware resources, and as discussed in the rejection of claim 1, ZHAO uses the resource status information and layer ranking to determine the mapping, or “assignment” between layers and resources)).
Regarding claims 25-26, and 28 they comprise limitations similar to claims 5-6, and 20 and are therefore rejected for similar rationale.
Claims 4, 14, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over METSCH, in view of ROSS, in view of CAO, as applied to claims 1, 9, and 22 above, and in further view of NAMBIAR et al. Pub. No.: US 2016/0234071 A1 (hereafter NAMBIAR).
NAMBIAR was cited previously.
Regarding claim 4, while the combination of METSCH, ROSS, and CAO teaches identifying candidate resources for executing a machine learning model based on node affinity, the combination of METSCH, ROSS, and CAO does not explicitly teach:
wherein the identifying is further based at least in part on rack-awareness in the clustered computing system.
However, in analogous art that similarly identifies resources based on affinities, NAMBIAR teaches:
wherein the identifying is further based at least in part on rack-awareness in the clustered computing system ([0032] In various embodiments, when application scheduler 32 receives a request to execute a job within distributed application 30, application scheduler 32 determines what resources are available for executing the requested job, including what resources are available for placing (storing) data associated with the workloads. Application scheduler 32 can determine where to place data among hosts 16 using a data placement policy 34, along with a scheduling policy. Data placement policy 34 can specify guidelines for selecting network nodes (such as hosts 16), including but not limited to, node availability, node capacity, node locality, data placement cost, data transfer costs, network topology, user preferences associated with the storage node, and/or other data placement guideline…data placement policy 34 and/or replica placement policy 36 may define a rack awareness policy that specifies that data should be placed on network nodes associated with different racks (i.e., rack awareness policy also specifies the affinity/anti-affinity data has with particular nodes on particular racks)).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined NAMBIAR’s teaching of selecting resources to execute a task based on rack awareness policy that specifies affinity between nodes and racks, with the combination of ROSS, ZHAO, and SINGH’s teaching of selecting resources to execute a machine learning model based on a node affinity, to realize, with a reasonable expectation of success, a system that selects resources based at least on a rack awareness policy, as in NAMBIAR, and a node affinity that is used to execute a machine learning model, as in ROSS, ZHAO, and SINGH. A person having ordinary skill would have been motivated to make this combination to optimize deployment of distributed application frameworks while enhancing network performance associated with using the frameworks (NAMBIAR [0002]).
Regarding claims 14, and 24, they comprise limitations similar to claim 4, and are therefore rejected for similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
KAWASHIMA et al. Pub. No.: US 2007/0180314 A1 discloses determining primary CPU node failure, and in response, determining whether a backup node is available to take over.
BOSE et al. Pub. No.: US 2004/0209580 A1 discloses determining a primary server failure, and activating a backup server.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL W AYERS whose telephone number is (571)272-6420. The examiner can normally be reached M-F 8:30-5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL W AYERS/Primary Examiner, Art Unit 2195