DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
The present application having Application No. 18/185,641 filed on 03/17/2023 presents claims 1-20 for examination.
Examiner Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/15/2023 is acknowledged, the submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The applicant’s drawings submitted are acceptable for examination purposes.
Claim Objections
Applicant is advised that should claim 11 be found allowable, claim 16 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “a communication scheduler to generate schedule trees…” and “data reduction logic to identify…” in claims 1-5 and 8-10. The terms “scheduler” and “logic” are generic, nonce terms that do not have a generally recognized structural meaning in the art and are each recited solely in terms of the functions they perform, without any recited structure in the claim itself.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
As discussed above, the “communication scheduler” and “data reduction logic” limitations in claim 1 are interpreted under 35 U.S.C. 112(f). However, the specification does not clearly define the scope of the corresponding structures and algorithms for these elements, nor does it clearly delineate which implementation of scheduling and reduction functionality are included within, or excluded from, the claims.
For the “communication scheduler,” the specification discloses that the scheduler 124 is configured to generate schedule trees…for scheduling collective communications, and provides example multi-tree schedulers in Figs. 2-4 and a high-level procedure 500 in Fig. 5. The description, however, does not specify what aspects of the disclosed scheduling behavior are required to fall within the claim—e.g., whether any multi-tree schedule is sufficient, whether single-tree or ring schedules are encompassed, or whether skew-aware generation of the schedule trees is essential. Similarly, for “data reduction logic,” the specification describes logic 126 only in functional terms (identify skewed nodes, perform first and second operations, trigger based on thresholds) and does not set a clear structural boundary.
Because the claims rely on generic nonce terms (“scheduler” and “logic”) whose scope depends entirely on ambiguous, function-only descriptions, a POSITA would not be able to determine with reasonable certainty what specific implementations of scheduling and reduction hardware/software fall inside the scope of “communication scheduler” and “data reduction logic” as claimed. The metes and bounds of the claims therefore cannot be ascertained with the clarity.
Claims 2-10 are also rejected for the same rationale discussed above with respect to claim 1, by virtue of their dependency to claim 1.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-10 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Under the 112(f) interpretation, “communication scheduler” in claim 1 covers the structures and acts in the specification that generates first and second sets of schedule trees to perform skew-tolerant multi-tree reductions as recited. The specification, however, provides only high-level, example-driven descriptions of schedule tree generation (e.g., 2x2 mesh in Figs. 2-3 and 16-node tree in Fig. 4) and the overall procedure 500 in Fig. 5. It does not set out a sufficient algorithm for generating schedule trees for topologies of skewed/non-skewed nodes as implied by the claims, nor does it specify concrete selection or optimization rules for routing, load balancing, or conflict resolution beyond general statements that the scheduler accounts for network topology and optimally schedules communications.
Similarly, “data reduction logic” is claimed broadly as any logic that can (i) identify one or more skewed nodes, (ii) perform a first operations on non-skewed nodes…, and (iii) perform a second operation to generate final results based on partial results and skewed-node data, with optional trigger behaviors based on thresholds. The specification describes one approach using bitmap all-reduce, a custom barrier, and threshold based triggering, but does not disclose underlying algorithms or control structures in sufficient detail to enable all implementations across the claimed breadth without undue experimentation. Nor does it demonstrate possession of all variations encompassed by the broad functional recitations, such as all possible probe operations, threshold strategies, or integrations rules for skewed-node data at different phases of the collective operation.
Accordingly, the specification does not reasonably convey to those skilled in the art that the inventor has possession of, or has enabled, the full scope of the “communication scheduler” and “data reduction logic” limitations as construed under 112(f), as required by 112(a).
Claims 2-10 are also rejected for the same rationale discussed above with respect to claim 1, by virtue of their dependency to claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 8-9 and 11-20 are rejected under AIA 35 U.S.C. 103 as being unpatentable over Underwood et al. (US 2023/0035657 A1) (hereinafter Underwood) in view of Huang et al. (IDS: “Communication Algorithm-Architecture Co-Design for Distributed Deep Learning”2021 ACM/IEEE 48th Annual ISCA; DOI 10.1109/ISCA52012.2021.00023) (hereinafter Huang) and further in view of Faraj et al. (US 2013/0145379 A1) (hereinafter Faraj).
As per claim 1, A system (e.g. Underwood: [Fig. 1] [0023] discloses a distributed computing system including a plurality of noes, with each compute node including CPU coupled to a NIC.) comprising: a communication scheduler to generate schedule trees for scheduling data communication among a plurality of nodes configured to perform a collective operation using data contributed from the plurality of nodes; and data reduction logic (e.g. Underwood: [Fig. 3] [0037] illustrates exemplary allreduce operation where four node FAA in which each input buffer is decomposed into eight transfers. The FAA can simultaneously send the first two blocks from every node, i.e., no nodes 312-318, to node 312; the second two blocks from every node to node 314; the third two blocks from every node to node 316; and so forth. The phase denoted as reduce scatter phase in which the FAA can form partial results or intermediate results on each node. [Fig. 4] [0038] illustrates an exemplary allreduce operation with segmentation and distribution of data segments among compute nodes in a network… In the FAA, each of the N nodes… can compute a final result for 1/Nth of the input array. The FAA divides the input array into N segments, i.e., segments 410-416…” This shows how data are distributed among nodes. Although it refers to them as FAA patterns rather than “schedule trees,” it implies that these per-segment, structured, multi-hop patters are tree-like schedules. Also see [Fig. 2] [0032-0033]. These passages show that Underwood’s NIC-based logic organizes and sequences data transfers among multiple nodes for a collective allreduce operation using data from all nodes. The collective logic (in NIC and software) defines which segments each node sends and receives, thereby scheduling data communication among a plurality of nodes configured to perform a collective allreduce operation using data contributed form the plurality of nodes. ) to: identify one or more skewed nodes among the plurality of nodes, perform, according to a first set of schedule trees, a first operation to generate partial results based on data contributed from non-skewed nodes (e.g. Underwood: [0014] discloses Allreduce is a collective operation in which compute nodes contribute partial results. [Fig. 2] [0032] discloses data segments are transferred/contributed among nodes according to allreduce algorithm, each node creates partial result, and the partial results from all the nodes continue up the tree. [Fig. 3] [0037] “FIG. 3 illustrates an exemplary allreduce operation… The example shown in FIG. 3 illustrates a FAA… The FAA can simultaneously send the first two blocks from every node… This phase is denoted as reduce scatter phase 304 in which the FAA can form a quarter of the partial results or intermediate results 306 on each node.” FIG. 3 labels “Intermediate results 306: partially reduced data.” Also see [0064-0072]. Thus, Underwood’s reduce-scatter phase corresponds to a first operation that generates partial results from data contributed by the nodes.), and perform, according to a second set of schedule trees, a second operation to generate final results based on the partial results and data contributed from the one or more skewed nodes (e.g. Underwood: [Fig. 3] [0037] discloses after reduce scatter phase in which the FAA can form partial results on each node, performing Allgather operation which can result in output buffers including full result. Underwood’s Allgather operation is a second operation that generates the final results based on previously generated partial results using data segments transferred/contributed by the nodes. [0014] discloses Allreduce is a collective operation in which every compute node contributes a partial result and the allreduce can combine the results and distribute them to all the participating processes. [0032-0033] discloses data segments are transferred/contributed among nodes according to allreduce algorithm, each node creates partial result, and the partial results from all the nodes continue up the tree to form a full result.).
Underwood teaches a system for transferring data segments via structured communication patterns among nodes for the collective operation, a NIC-based allreduce with a first operation generating partial results and a second operation generating final results as discussed above. Underwood does not expressly disclose generating “schedule trees”; identify one or more skewed nodes among the plurality of nodes.
Underwood teaches scheduling data communication among nodes for allreduce but does not explicitly disclose “generating schedule trees” including “a first set of schedule trees” and “a second set of schedule trees.”
However, Huang discloses generating schedule trees for scheduling data communication among a plurality of nodes configured to perform a collective operation using data contributed from the plurality of nodes; schedule trees including a first set of schedule trees and a second set of schedule trees (e.g. Huang: [Page 184, Fig. 3 and related description] [Page 185, Col. 1] “constructs schedule trees for a Mesh network as shown in Fig. 3…These newly constructed trees are used to build the reduce-scatter schedule trees and finally, are adjusted to generate all-gather schedule trees, as shown in Fig. 3d and 3e, respectively.” “Multitree all-reduce algorithm initializes a tree for each node in the network as the root…then it stats to construct the schedule trees for the all-gather phase…For every new time step t, a full topology graph…These newly constructed trees are used to build the reduce-scatter schedule trees and finally, are adjusted to generate all-gather schedule trees.” These passages show that Huang’s MULTITREE algorithm generates schedule trees, including Reduce-scatter schedule trees [a first set of schedule trees] and All-gather schedule trees [a second set of schedule trees] (See Fig. 3), based on graph, and schedules communications along these trees across time steps. Huang discloses a scheduler (MULTITREE algorithm) that constructs schedule trees and per-node schedule table in the network interface, which are used to schedule all-reduce communication among nodes. This corresponds to a “communication schedule to generate schedule trees for scheduling data communication among a plurality of nodes configured to perform a collective operation using data contributed from the plurality of nodes. [Page 186, Col. 2] Huang then describes NI-resident schedule tables corresponding to these schedule trees: We co-design the network interface (NI) to facilitate MULTITREE all-reduce scheduling. Algorithm 1 constructs trees for each data chunk. These tree schedules can be converted into schedule tables (one table per node). Fig. 5 shows the all-reduce schedule tables for the example in §III-B. Each table entry consists of an Op field for the opcode, a FlowID field for the tree flow (tree ID), a Parent and Children fields for the dependencies in this tree flow. In addition, the Step field indicates the time step in which this communication should be initiated. The Start Addr and Size fields are for the starting address and the size for the gradient message, respectively.” The Op field includes “Reduce, Gather, and NOP,” and the step filed defines when to initiate each communication, while FlowID, Parent, and Children define the tree structure.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Huang’s Multitree allreduce algorithm to generate schedule trees and network interface schedule tables into Underwood’s NIC-based allreduce system because Huang’s MULTITREE’s topology-aware multi-tree scheduling and network interface co-design improves scalability and achieves higher data communication speed between nodes over the state-of-the-art approach (See Huang: [Page 182, Col. 2] [Abstract]). Both references address all-reduce in distributed deep learning, and Huang even calls out ring and double binary tree allreduce (similar to Underwood’s context), so using Huang’s schedule trees as Underwood’s “communication scheduler” is a natural co-design.
The combination of Underwood and Huang does not expressly disclose “identify one or more skewed nodes among the plurality of nodes.”
However, Faraj discloses identify[ing] one or more skewed nodes among the plurality of nodes (e.g. Faraj: [0013] [0037] [0040] Methods, apparatus, and products for determining collective barrier operation skew in a parallel computer are disclosed in this specification. The parallel computer includes a plurality of compute nodes organized into an operational group and determining collective barrier operation skew includes an iterative process carried out once for each of the compute nodes until each compute node has been selected as a delayed node. The iterative process includes: selecting one of the plurality compute nodes as a delayed node; entering, by each compute node other than the delayed node, a collective barrier operation; entering, after a predetermined delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time, where the barrier completion time is the period of time between the delayed node entering the barrier and receiving the exit signal. Once each compute node in the operational group has measured a barrier completion time, the barrier operation skew may be calculated by: identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum barrier completion time and the minimum barrier completion time. Thus, nodes that consistently have longer completion times are treated as “delayed nodes” are effectively skewed nodes and nodes that complete quickly are non-skewed. The nodes that are not identified as delayed nodes are implicitly considered as non-skewed nodes. Skew information is intended to influence how collective operations are implemented.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of identifying delayed nodes (skewed nodes) and non-skewed nodes as taught by Faraj into the combination of Underwood and Huang because Faraj explicitly teaches measuring skew and modifying collective code in dependence on the skew (See Faraj: [0061]). All three references are in the HPC/distributed training domain and seek to improve collective performance. A POSITA would naturally use Underwood’s NIC-based two-phase allreduce as the base algorithm, adopt Huang’s MULTITREE schedule trees and NI schedule tables, and incorporate Faraj’s skew measurement to identify skewed nodes which would allow determining which nodes participate in which phase (letting non-skewed nodes participate in the first phase and integrating skewed nodes in a second phase), thereby handling skew in multi-tree allreduce.
As per claim 2, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above], wherein the data reduction logic is configured to identify a skewed node based on a determination that the skewed node is unavailable to contribute data according to the first set of schedule trees (e.g. Faraj: [0035] Barrier operation skew…refers to an amount of time between a first completion…and last completion…In a non-homogenous system, compute nodes may take different amounts of time to enter and complete a collective barrier operation. [0037] discloses delayed node enters the barrier later compare to other nodes. This implies that delayed node is not available to contribute when non-skewed nodes (nodes other than delayed nodes) are. The delayed node concept corresponds to a node that is not yet at the barrier when others have entered it, effectively implies delayed nodes are unavailable to provide data when others are ready.).
As per claim 3, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above], wherein the data reduction logic is to perform the first operation to generate the partial results without the data contributed from the one or more skewed nodes (e.g. Underwood: [0032-0033] discloses each participating nodes provide partial result. [0037] discloses reduce scatter phase in which the partial results or intermediate results of allreduce operation are generated. [0070] in the first phase the FAA can compute partial results. Furthermore, Faraj’s skew detection that identifies delayed nodes and Underwood’s two-phase allreduce. A POSITA would have found it obvious to allow non-skewed nodes to proceed in the first phase without waiting for delayed or skewed nodes.).
As per claim 4, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above], wherein the communication scheduler is to generate the second set of schedule trees based on identifying the one or more skewed nodes (e.g. Huang: [Page 184, Col. 2, Fig. 3 and related description] [Page 185] discloses MULTITREE algorithm constructs schedule trees form a topology graph G(V,E), and that the algorithm can be adapted for asymmetric or irregular networks by prioritizing trees with “larger remaining height” so that communication on the longest path is scheduled earlier. Huang explains that the algorithm ca be run when a new set of nodes is allocated for the workloads, and that the topology graph can be extended to handle variations. [Fig. 3 (d),(e)] discloses generating reduce-scatter schedule trees and All-gather schedule trees. Faraj: [0013] [0037] [0040] discloses determining collective barrier operation skew to identify delayed node with a maximum barrier completion time. [0061] discloses modifying code that when executed carries out the collective barrier operation in dependence upon the barrier operation skew. A POSITA, combining these teachings, would find it obvious to use the skew information (which identifies skewed nodes) as part of the input when constructing the second set of schedule trees (Huang’s all-gather schedule trees) so that these trees account for known slow or unavailable nodes.).
As per claim 5, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above], wherein to perform the second operation according to the second set of schedule trees the data reduction logic is further configured to distribute the final results to the plurality of nodes (e.g. Underwood: [0014] discloses every compute node contributes a partial result and the allreduce can combine the results and distribute them to all the participating processes/nodes. [0032-0033] [Fig. 2] discloses RA ensures that full result is stored on each node in the distributed computing system. [Fig. 3] [0037] discloses Allgather operation which results in outputting full result on every node. [Fig. 4] [0038] In the FAA, each of the N nodes, can compute a final result and broadcast the results to every other node. Underwood’s Allreduce clearly performs a second operation (Allgather) that distributes final results to all nodes. Also see [0050] [0064-0065]. Huang: [Page 184, Fig. 3, (e)] also discloses generating All-gather schedule trees (broadcast). Huang’s all-gather schedule trees are explicitly designed for broadcast all-gather to all nodes, and its network interface schedule tables’ gather operations send messages from parent to children until all nodes receive the final result. Thus, the feature is taught by Underwood’s allgather combined with Huang’s all-gather trees and NI schedule tables.).
As per claim 6, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above] , further comprising a network interface controller configured to perform a probe operation and the data reduction logic identifies the one or more skewed nodes based on the probe operation (e.g. Underwood: [0019] discloses a system and method for a novel implementation of collective operation by providing novel NIC enhancements to accelerate bulk data allreduce algorithm and a broad class of bulk data collective operations. [0021] NIC-based implementation manages sequencing of operations and optimizes the performance of allreduce algorithm. Faraj: [0013] [0035] [0037] [0040] [0055] discloses a more explicit “probe” procedure to characterize nodes’ timing behavior at a collective. Faraj’s iterative method for determining barrier operation skew (selecting delayed node) and measuring completion times. This repeated collective barrier plus measurement is effectively a probe of each nodes’ timing characteristics in the collective. It would have been obvious to implement Faraj’s skew-measurement procedure as a NIC-assisted “probe operation,”, where the NIC participates in the barrier and timing measurement to determine or identify the skewed nodes.).
As per claim 8, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above], wherein the data reduction logic is further configured to trigger performance of the second operation in response to a determination that at least one of the one or more skewed nodes is available to contribute data according to the second set of schedule trees (e.g. Underwood: [0027] discloses trigger operations can provide ability to setup network operations that can synchronously trigger with the completion of other network operations. [0031] the system can leverage triggered operations to pace (add flow control to) operations across all participants (nodes) in the network, thereby enabling the novel allreduce algorithm. [0042] a triggered operation is an operation that is scheduled to occur when a counting event reaches a threshold. [0043] discloses the system can apply the following triggered operations: TriggerPUT to notify operation has completed; TriggeredAtomic operation to send the input data; and TriggeredPut notification to disseminate the final result. [0045] when the system identifies an arrival of a “put” operation from a node, the system may release corresponding PTLTriggeredAtomic to Node. When the system receives all inputs, the system can apply a PtlTriggeredPut to send the combined result in the segment to every other node. Also see [0050]. As discussed above Faraj teaches measuring completion time to determine skew. Underwood demonstrates that NIC logic can trigger operations based on completion threshold and disseminating final results; Faraj provides the skew-related condition; and Huang provides the second set of schedule trees that integrates the results in Allgather stage. The combination yields the claimed triggering behavior.) .
As per claim 9, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above], wherein the data reduction logic is further configured to trigger performance of the second operation in response to a determination that all the one or more skewed nodes are available to contribute data according to the second set of schedule trees (e.g. Underwood: [0014] Allreduce is a collective operation in which every node contributes a partial result and the allreduce can combine the results. [0027] discloses trigger operations can provide ability to setup network operations that can synchronously trigger with the completion of other network operations. [0031] the system can leverage triggered operations to pace (add flow control to) operations across all participants (nodes) in the network, thereby enabling the novel allreduce algorithm. [0032-0033] [0037] discloses obtaining partial results from all the nodes and then generating full result. [0043] discloses the system can apply the following triggered operations: TriggerPUT to notify operation has completed; TriggeredAtomic operation to send the input data; and TriggeredPut notification to disseminate the final result. [0045] when the system identifies an arrival of a “put” operation from a node, the system may release corresponding PTLTriggeredAtomic to Node. When the system receives all inputs, the system can apply a PtlTriggeredPut to send the combined result in the segment to every other node. Huang: [Figs. 1, 3 and related description] also discloses performing All-gather phase after reduce-scatter phase where data from all nodes are received.).
As per claims 11-15, these are method claims having similar limitations as cited in system claims 1-5, respectively. Thus, claims 11-15 are also rejected under the same rationale as cited in the rejection of rejected claims 1-5, respectively.
As per claims 16-20, these are method claims having similar limitations as cited in system claims 1-5, respectively. Thus, claims 16-20 are also rejected under the same rationale as cited in the rejection of rejected claims 1-5, respectively.
Claim 7 is rejected under AIA 35 U.S.C. 103 as being unpatentable over Underwood in view of Huang and Faraj and further in view of Meher Aditya Kumar Addepalli (US 12,073,253 B1) (hereinafter Addepalli).
As per claim 7, the combination of Underwood, Huang and Faraj discloses The system of claim 6 [See rejection to claim 6 above], but does not expressly disclose wherein the network interface controller is to update, based on the probe operation, a bitmap that includes a bit for each of the plurality of nodes, wherein the bit is indicative of whether a corresponding node is available to contribute data for the collective operation.
However, Addepalli discloses wherein the network interface controller is to update, based on the probe operation, a bitmap that includes a bit for each of the plurality of nodes, wherein the bit is indicative of whether a corresponding node is available to contribute data for the collective operation (e.g. Addepalli: [Fig. 1] [Col. 5, lines 64-67; Col. 6, lines 1-15] discloses maintaining a bit map where the state of the bit indicates whether the corresponding computing resource (node) is available or not. In some embodiment, a “0” state of a bit indicates that the managed resource (node) is available. [Col. 2, lines 1-6] discloses using a bitmap to manage computing resources by maintaining the current states of the computing resources in the bitmap. Also see [Col. 7, lines 9-25] [Fig. 2] [Col. 8, lines 48-58].).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of maintaining bitmap where bit indicates status of the computing resource as taught by Addepalli into the combination of Underwood, Huang and Faraj because it improves efficiency for managing computing resources. Bitmap provides ways to manage computing resources in an efficient manner (See Addepalli: [Col. 1, lines 48-50] [Col. 8, lines 48-58]).
Claim 10 is rejected under AIA 35 U.S.C. 103 as being unpatentable over Underwood in view of Huang and Faraj and further in view of Chakravorty et al. (US 2012/0096468 A1) (hereinafter Chakravorty).
As per claim 10, the combination of Underwood, Huang and Faraj discloses The system of claim 1 [See rejection to claim 1 above], wherein the data reduction logic is further configured to trigger performance of the first operation in response to a determination that at least a threshold number of the plurality of nodes are non-skewed (e.g. Underwood: [0002] [0015] discloses increasing number of GPUs are required to perform collective operations in parallel. [0040] discloses FAA segments the data based on the number of nodes or size constraints when the node count exceeds a specific threshold. [0042] discloses a triggered operation is an operation that is scheduled to occur when a counting event reaches a threshold. Also see [0047] [0051].).
The combination does not expressly disclose to trigger performance of the first operation in response to a determination that at least a threshold number of the plurality of nodes are non-skewed.
However, Chakravorty discloses to trigger performance of the first operation in response to a determination that at least a threshold number of the plurality of nodes are non-skewed (e.g. Chakravorty: [0002] [0005-0010] discloses specifying a maximum and a minimum number [threshold] of resources for a job. The schedulers queues jobs as they are received and assigns the jobs to resources as the resources become available. When the cluster contains enough resources to meet the minimum requirements [threshold] of the job, assigning the resources to that job. [0023] discloses when total compute resources are inadequate to enable allocation of a designated minimum for all jobs ready for execution, some jobs may be queued until resources to meet their minimum requirements are available. [0039] discloses parameters include a minimum number of nodes to allocate to the jobs. Allocator does not allocate nodes to job unless a number of nodes meeting the designated minimum [threshold number of available nodes] are available for allocation to the job. Also see [0042] [0054] [0056]. Thus, execution of job only triggers when a minimum number of resources designated for the job is available.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of determining whether a minimum number of nodes are available to execute job as taught by Chakravorty into the combination of Underwood, Huang and Faraj because it would allow allocator to determine an appropriate, user desired number of nodes to assign to each job scheduled for execution according to SLA, prior to initiating execution of the job and also allow prioritizing jobs based on available and a minimum required resources (See Chakravorty: [0041-0042]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hiren Patel whose telephone number is (571) 270-3366. The examiner can normally be reached on Monday-Friday 9:30 AM to 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated- interview-request-air-form.
If attempts to reach the above noted Examiner by telephone are unsuccessful, the Examiner’s supervisor, April Y. Blair, can be reached at the following telephone number: (571) 270-1014. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center or Private PAIR to authorized users only. Should you have questions on access to Patent Center or the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
February 6, 2026
/HIREN P PATEL/Primary Examiner, Art Unit 2196