DETAILED ACTION
Status of Claims
This action is in reply to the communication filed on 12/31/2025.
Claims 1-3, 5-11, 13-19 and 21-23 are currently pending and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/31/2025 has been entered.
Response to Arguments
Applicant’s arguments filed 12/04/2025 (entered 12/31/2025) with respect to the rejections under 35 USC § 103 have been considered but are not persuasive.
On pg. 15 of the Remarks, Applicant essentially argues:
"the Final Office Action cites paragraph 175 of the specification, which provides that "a larger cross-node quantity indicates a larger occupied bandwidth." However, in the cited paragraph the larger cross-node quantity indicates a larger occupied bandwidth, but is not the larger occupied bandwidth itself:
"For example, a smoothed value of the bandwidth used in real time by the existing job on a network link may be monitored by using a monitoring system, and is denoted as B. A current node is scored on this basis, score= 1 + 1/(B+1), a larger cross-node quantity indicates a larger occupied bandwidth and a lower score, and a new job should be prevented from being placed on the node." Specification, 0175
As shown, the larger cross-node quantity merely indicates a larger occupied bandwidth. However, the larger cross-node quantity is not the larger occupied bandwidth itself Likewise, the larger cross-node quantity also indicates a lower score. Indication is not the same as equivalency...elsewhere the specification indicates the term "cross-node quantity" refers to an actual frequency of data exchange between nodes. See, e.g., Specification,170. In contrast, bandwidth refers to a possible maximum rate of data transfer [wikipedia Bandwidth(computing) link] In short, the cross-node quantity refers to an actual frequency, while bandwidth refers to a possible rate. For those reasons, Wang's residual bandwidth cannot be equated to the claimed cross-node quantity."
Examiner respectfully disagrees the highlighted language precludes the interpretation and mapping to Wang provided in the 103 rejections. As can be seen in the AppSpec ¶0175 quote, the variable 'B', i.e. “bandwidth used by the existing job”, is explicitly used as the operative variable in the scoring formula and thus acts as the “cross node quantity”.
Applicant's bandwidth (BW) definition is what Wang refers to as a server's 'link capacity'/'total bandwidth'; however, a server's used/residual (Capacity - used)1 BW represents the amount of data transferred across the link, e.g. if a server’s residual BW == 0 then its network link is completely saturated by network traffic. Accordingly, Examiner maintains used/residual BW is reasonably interpreted as a metric for measuring ‘frequency of data exchange between nodes’ and its mapping to “cross-node quantity” in the rejection(s) is consistent with the description in AppSpec.
If Applicant intends for the term “cross-node quantity” to require a particular metric other than bandwidth usage e.g. “quantity of other servers with which the server needs to exchange data” (¶0179); “quantity of network transmission connections” (¶0230-0231) then Examiner suggests amending the claims to recite the desired species.
On pg. 16 of the Remarks, Applicant essentially argues:
As shown, Wang's case III considers subtrees. Wang's case III does not consider an entire candidate node. In addition, case III estimates the residual bandwidth. As shown above, Wang's residual bandwidth cannot be equated to the claimed cross-node quantity. Furthermore, Wang's aggregate bandwidths are the bandwidths that are needed, not an actual frequency and therefore likewise cannot be equated to the claimed cross-node quantity. Thus, Wang fails to disclose that when the n tasks cannot all be placed in the candidate node, the larger cross-node quantity indicates the smaller network transmission performance score and the smaller cross- node quantity indicates the larger network transmission performance score.
As explicitly described at Case I of the algorithm actual placement of the VMs (tasks) occurs at servers (i.e. lowest subtrees) "When a given node is a server (the lowest subtree that has no further subtree as shown in line 2), MAPLE attempts to allocate the entire VM ensemble placement request into a same server". Accordingly reaching Case III means there is no singular candidate server where the entire ensemble (n tasks) can be placed, which corresponds to the recited condition “when the n tasks cannot all be placed in the candidate node”. For the reasons described above, Examiner maintains the of mapping used/residual BW to “cross-node quantity” is proper; Wang's aggregate BW essentially corresponds to the recited “cross-node degree of the n tasks”.
Additionally, as noted in the 09/04/2025 Final OA, the last wherein clause of claim 1 can arguably be treated as non- limiting intended use under the interpretation that the recited n = 1, an embodiment explicitly addressed in AppSpec and included the scope of the claims, since it requires a condition that can never be true (see MPEP 2111.04).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 5-9, 11, 13-17, 19, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Jayaram et al. (“FfDL: A Flexible Multi-tenant Deep Learning Platform”, ver.: arXiv:1909.06526v1, 09/2019) in view of Wang et al. (“Network-aware Placement of Virtual Machine Ensembles using Effective Bandwidth Estimation”, 2014).
Claims 1, 9, and 17:
Jayaram discloses the limitations as shown in the rejections below:
job scheduling apparatus (FfDL (Fabric for Deep Learning) platform) comprising: a communication interface (API service, REST and/or gRPC endpoint thereof) configured to receive a target job comprising n tasks (pods) (see at least pg. 1, Abstract; pg. 4, Fig. 1; pg. 4-5, § 3.1 - 3.2; pg. 9, § 5).
a processor (executing scheduler/lifecycle manager (LCM)) coupled to the receiver and configured to: perform node filtering in a node cluster based on the n tasks to obtain n candidate node sets, (pg. 6, § 3.5, para. 2-3).
select, from an mth candidate node set corresponding to an mth task in the n tasks, a candidate node with a network transmission performance score (NTPS) that is the highest (highest rank) as a target node of the mth task, wherein the target node is for processing the mth task, wherein the NTPS is based on…a node leisure degree (maximize free resources/pack utilization) (pg. 5-6, § 3.4; pg. 6, § 3.5, para. 2-3). Exemplary quotation:
“we made a decision to use the Pack placement policy, where pods from a DL job are packed (“crammed”) into as few physical machines as possible. We implemented an extension to the K8S scheduler to support Pack. In the scenario outlined above, Pack would place all four jobs on the same machine, leaving three machines free with 4 GPUs/machine (pg. 5, § 3.4, para. 2)…scheduler matches the requested resource demands of all the pods in a DL job (e.g. CPU, memory, GPU, and storage) with the available resources on the nodes…scheduler assigns a node to the pod by (1) filtering the nodes that satisfy the pod resource requirements and other predicate constraints, (2) ranking the candidate nodes based on priority functions, and (3) selecting the node with the highest rank…Since in a DL platform, GPU is typically a scarce resource, the objective is to pack GPU resources. The default filtering and ranking steps are mapped onto node preferences (or biases) for placement of the pods” (pg. 6, § 3.5, para. 2-3).
Jayaram discloses (pg. 5, § 3.4; pg. 11) packing/cramming the tasks into as few nodes as possible when placing (when the n tasks can all be placed in the candidate node) considering CPU, memory, and primarily GPU resources when scoring nodes for task placement but does not describe considering task communication requirements and/or network load (cross-node degree/quantity) and accordingly does not specifically disclose the remaining limitations.
Wang, however, discloses (pg. 100, Abstract, § I, para. 2-4) an analogous placement scheme “MAPLE” for placing VMs (tasks) of ensembles (jobs) that “seeks to minimize both the nominally allocated network bandwidth and the number of servers in which VMs are placed” (pg. 103, § V); In MAPLE “VM placement decisions are then made taking into account the estimated available residual bandwidth at each server” (pg. 103, Fig. 3) (select the candidate node by determining a cross-node quantity of a candidate node (server/”subtree”) in the mth candidate node set when the candidate node processes another job (VM ensemble) in an operating state (consuming bandwidth)). Wang further elaborates (pg. 103-104) that first “MAPLE attempts to allocate the entire VM ensemble placement request into a same server”, so the VMs of the ensemble have a bandwidth requirement of zero (cross-node degree of the n tasks) (pg. 103, Fig. 4), and the candidate node/server preferentially selected (larger…NTPS) is the one with the relatively highest bandwidth utilization/smallest “residual bandwidth” (larger cross-node quantity). When the VMs of the ensemble cannot all be placed on the same server and need to be spread out candidate servers with lower network utilization are preferred in proportion to the aggregate bandwidth needed for communication between the subgroups/individual tasks/VMs of the ensemble (smaller cross-node quantity indicates the larger NTPS) (Wang pg. 104, col. 1, para. 4 – pg. 105, col.1, para. 1) . (“When the algorithm cannot find any subtree that can host the entire VM ensemble, it attempts to allocate the requested VMs into different subtrees…whenever a VM request cannot be entirely placed into one subtree (case III), the requesting VMs will be divided into two groups. The aggregate bandwidths needed by each group is determined as the minimum aggregated bandwidths between the two groups”).
It would have been obvious to one of ordinary skill in the art prior to the filing date of the invention to modify Jayaram’s placement policy with Wang’s network bandwidth aware placement scheme in order to “allocate computing and network resources in a manner that balances efficiency of resource utilization with performance predictability” and increase task throughput (Wang pg. 107, § VII; pg. 100, Abstract).
Claims 3, 11, and 19:
The combination of Jayaram/Wang discloses the limitations as shown in the rejections above. Wang further discloses higher affinity between the n tasks indicates a higher NTPS (pg. 103, col. 2). Jayaram further discloses (pg. 4, § 3.1, para. 2; pg. 6, § 3.5, para. 2; pg. 14, § 6, para. 4) support for jobs which utilize parameter server (PS) architecture comprising parameter server and learner (worker) type tasks (“DL training job typically consists of a set of learning processes (“learners”)…A distributed job may also include one or more parameter servers”), and further discloses selecting the candidate node further comprises: determining a type (e.g. parameter server, learner, helper) of the mth task; and performing first steps…comprise: determining, when the type is a worker node task, whether another one of the n tasks (of the same Job) needs to be placed in a candidate node in the mth candidate node set; increasing, when the worker node task or a parameter node task needs to be placed in the candidate node (can be packed into), the NTPS in at least Jayaram pg. 5-6, § 3.4 – 3.6 disclosing the scheduler identifies all learner pods/tasks of the job that need to be placed and schedules them holistically as group/gang such that they are placed into as few nodes as possible. Exemplary quotation:
“all components of a DL job are scheduled as a gang. In general, a DL job comprises a collection of Kubernetes sets (e.g. stateful sets), where each set is a collection of homogeneous pods where tasks, such as learners and parameter servers, run. In addition to pods, the DL job deployment…We will refer to a scheduler whose function is to place all pods that belong to a DL job onto nodes in the cluster holistically, as a gang scheduler.” (pg. 6, § 3.5).
Claims 5, 13, and 21:
The combination of Jayaram/Wang discloses the limitations as shown in the rejections above. Jayaram further discloses wherein a lower node leisure degree (idleness) indicates a higher NTPS, and wherein the processor is further configured to: determine whether hardware resources that are of a candidate node in the mth candidate node set and that are used for job training are used; and increase, when the hardware resources are used, a NTPS of the candidate node in at least pg. 5-6, § 3.4; pg. 6, § 3.5, para. 3; pg. 9, § 5.2; disclosing that when selecting a node for a pod/task their scheduler employs a “Pack placement policy, where pods from a DL job are packed (“crammed”) into as few physical machines as possible”; and thus prefers (assigns a higher rank/NTPS to) nodes whose hardware resources are used relative to nodes whose resources are idle with a strongest preference for the node with the highest utilization that can still accommodate the pod (a higher allocation rate indicates a larger increasing amplitude for the NTPS).
Claims 6, 14, and 22:
The combination of Jayaram/Wang discloses the limitations as shown in the rejections above. Jayaram further discloses select the candidate node by: determining an allocation rate (utilization) of the hardware resources; and increase the NTPS based on the allocation rate, wherein a higher allocation rate indicates a larger increasing amplitude for the NTPS and a lower allocation rate indicates a smaller increasing amplitude for the NTPS in at least pg. 5-6, § 3.4; pg. 6, § 3.5, para. 3; pg. 9, § 5.2; disclosing that when selecting a node for a pod/task their scheduler employs a “Pack placement policy, where pods from a DL job are packed (“crammed”) into as few physical machines as possible”; and thus has a strongest preference for the node with the highest utilization that can still accommodate the pod (a higher allocation rate indicates a larger increasing amplitude for the NTPS).
Claims 7, 15, and 23:
The combination of Jayaram/Wang discloses the limitations as shown in the rejections above. Jayaram further discloses wherein the n tasks carry hardware resource requirements, wherein the method further comprises further performing the node filtering based on the hardware resource requirement, and wherein hardware resources of the n candidate node sets match the hardware resource requirements (pg. 6, § 3.5, para. 2-3): “scheduler matches the requested resource demands of all the pods in a DL job (e.g. CPU, memory, GPU, and storage) with the available resources on the nodes, finding a set of nodes on to which pods are placed…scheduler assigns a node to the pod by (1) filtering the nodes that satisfy the pod resource requirements and other predicate constraints.”
Claims 8 and 16:
The combination of Jayaram/Wang discloses the limitations as shown in the rejections above. Jayaram further discloses wherein the target job comprises a training job of an artificial intelligence (AI) (Deep Learning (DL)) model (pg. 1, Abstract).
Claims 2, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jayaram in view of Wang in further view of Gao et al. (“GAI: A Centralized Tree-Based Scheduler for Machine Learning Workload in Large Shared Clusters”, 2018).
Claims 2, 10, and 18:
The combination of Jayaram/Wang discloses the limitations as shown in the rejections above. The combination of Jayaram/Wang does not specifically disclose wherein a higher aggregation degree of the n tasks on the same rack indicates a higher NTPS, and wherein the processor is further configured to further select the candidate node by: determining whether the n tasks can all be placed on a rack on which a candidate node in the mth candidate node set is located.
Gao, however, discloses “Gatekeeper for AI (GAI), a centralized scheduler for ML workload on large shared clusters” (pg. 612, para. 4) analogous to the schedulers of the claims and Jayaram. Gao further discloses (pg. 617-619, § 4 – 4.1) GAI employs rack-aware scheduling which prefers (indicates a higher NTPS) to schedule all the tasks of the same ML to the same machine/server (node) or to the same rack thus discloses determining whether the n tasks can all be placed on a rack on which a candidate node in the mth candidate node set is located; increasing, when the n tasks can all be placed on the rack, a NTPS (preferring placements of the candidate node. Exemplary quotation:
“GAI uses a centralized rack-aware tree scheduling method and maintains a resource tree in memory to place all tasks of the ML training jobs in one machine or in the machines belong to the same rack as far as possible” (pg. 617, last para.)
For clarity, Examiner notes that preferring placement (increasing NTPS) on nodes that belong to the same rack as much as possible inherently teaches avoiding placement (decreasing NTPS) on nodes that belong to a rack that cannot accommodate all the tasks of the job as much as possible.
It would have been obvious to one of ordinary skill in the art prior to the filing date of the invention to modify Jayaram/Wang to increase the placement rank/score of nodes that allow all the tasks of a ML job to be scheduled to the same rack (and implicitly decrease the rank of those that do not) as taught by Gao to decrease network communication overhead costs when running the ML job (Cao, pg. 616).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
US 20130014101 A1 is directed to techniques for optimal co-location of communicating clusters of VMs, very similar to Wang’s (corresponds to Wang reference [6]).
“A Traffic-Aware Virtual Machine Placement Method for Cloud Data Centers” and US 8392575 B1 are directed to network traffic aware task resource assignment methods.
Each of 20090248865; 20080225710, and 20140143423 describe alternative metrics for measuring network load.
Any inquiry of a general nature or relating to the status of this application or concerning this communication or earlier communications from the Examiner should be directed to Paul Mills whose telephone number is 571-270-5482. The Examiner can normally be reached on Monday-Friday 11:00am-8:00pm. If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s supervisor, April Blair can be reached at 571-270-1014.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/P. M./
Paul Mills
02/21/2026
/APRIL Y BLAIR/Supervisory Patent Examiner, Art Unit 2196
1 “the residual bandwidth of a server is calculated as (C − Reff), where Reff in this case refers to the corresponding empirically estimated effective bandwidth of the aggregated traffic currently transferred across the link” (Wang pg. 102, col. 1, last para)