Office Action Analysis: 17945290 — DYNAMIC CROSS-ARCHITECTURE APPLICATION ADAPTION

Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to Applicant’s amendment filed 1/26/2026. Claims 1-29 are pending.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-29 are rejected under 35 U.S.C. 103 as being unpatentable over Ramanathan et al. (US 20220116289 A1) in view of Phull et al. (US 20120233486 A1), hereinafter referred to as Ramanathan and Phull, respectively.

Regarding Claim 1, Ramanthan discloses A non-transitory machine-readable medium storing instructions, which when executed by a processor cause the processor to ([0093] In an example, the instructions 782 provided via the memory 754, the storage 758, or the processor 752 may be embodied as a non-transitory, machine-readable medium 760 including code to direct the processor 752 to perform electronic operations. Please note the non-transitory machine-readable medium 760 including code to direct the processor 752 to perform electronic operations corresponds to Applicant’s non-transitory machine-readable medium storing instructions executed by a processor and causing the processor to carry out the operations.): 
wherein the workload comprises a high-performance computing (HPC) or artificial intelligence (AI) workload ([0001] Components that can perform edge computing operations (“edge nodes”) can reside in whatever location needed by the system architecture or ad hoc service (e.g., in a high performance compute data center. Please note the edge nodes residing a high performance compute data center corresponds to Applicant’s workload comprising a HPC workload, as it inherently means the nodes are performing HPC operations.)
for a portion of the section currently executing on a compute resource of a plurality of heterogeneous compute resources of a node of the cluster compute system, determine an alternate placement among the plurality of heterogeneous compute resources ([0109] A node instance auto-scale operation may automatically scale worker nodes from a pool of hetero or homogenous servers in an edge cluster. ; [0134] The edge node instance auto-scaler may run a series of checks to identify nodes that are removable. Application instances may be identified that may be relocated to other nodes. The edge node instance auto-scaler may use criteria such as first selecting nodes not running any application instances for removal, or selecting a node with a minimum number of application instances. Other nodes may be identified to which these instances maybe migrated, for example with minimum cost. Please note that the edge node instance auto-scaler identifying application instances of nodes that may be migrated with minimum cost corresponds to determining an alternate placement among the plurality of compute resources. Since the nodes can be scaled from a pool of heterogenous servers in an edge cluster, this corresponds to the plurality of compute resources of the node of the cluster computer system being heterogenous. ); 
and after predicting an improvement to the FOM based on the alternate placement, relocate the portion to the alternate placement ([0137] The edge cluster auto-scaler or node instance autoscaler supports migrating container workloads from (e.g., least loaded) nodes, for example with consideration for high availability, to the other nodes in the cluster that have capacity to handle those workloads. Please note that migrated container workloads from least loaded nodes to other nodes in the cluster to handle those workloads corresponds to relocating the portion to the alternate placement, as the section of the workload identified as significant to the FOM of computation time as subsequently disclosed by Phull could be relocated to another node in the cluster to improve this computation time by having a  node that has greater capacity to handle it.).
Ramanathan does not explicitly disclose during execution of a workload on a cluster computer system: identify a section of the workload as significant to a figure of merit (FOM) of the workload;
However, Phull discloses during execution of a workload on a cluster computer system: identify a section of the workload as significant to a figure of merit (FOM) of the workload ([0055] the workload/data is divided based on the compute capabilities of the processing units involved.; [0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that the run-time balancer analyzing the computation and communication patterns of different MPI processes in order to divide the workload into partitions based on the compute capabilities of the processing units involved corresponds to identifying a section of the workload as significant to a FOM of the workload during execution of a workload on a cluster computer system, as while the workload is executing by the processes, it identifies a partition, i.e., section of the workload, that is significant to improving the computation time of each process. Since Applicant states in [0021] that “Typically, the FOM of a workload is expressed in terms of time (e.g., latency and/or time to complete a particular function or set of activities),” the time of computation being improved corresponds to being significant to a FOM of the workload, as computation time corresponds to the time to complete the set of activities of the workload.); 
Ramanathan and Phull are both considered to be analogous to the claimed invention because they are in the same field of HPC cluster node performance improvement. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ramanathan to incorporate the teachings of Phull to modify the HPC workload alternate placement relocation system to identify a section of the workload as significant to a FOM of the workload, allowing for improvement of system efficiency and performance by partitioning the workload optimally based on available resources of nodes, as described in Phull.
 
Regarding Claim 2, Ramanathan-Phull as described in Claim 1, Ramanathan further discloses determination of an alternative placement includes predicting a best placement of the portion among the plurality of heterogeneous compute resources based on one or more of: a predicted increase or decrease in data transfer via interconnects coupling the plurality of heterogeneous compute resources; a predicted increase or decrease in memory access cost; a predicted increase or decrease in compute efficiency; parallel efficiency; a number of concurrent contexts allowed; a predicted increase or decrease of power utilization; a predicted increase or decrease in thermal behavior; a predicted increase or decrease in scheduling schemes; a predicted increase or decrease in concurrency; and other architectural features of the node ([0023] The systems and methods described herein may be used to provide power considerations while scaling application instances' resources (e.g., scaling up or down. Please note that providing power considerations while scaling application instances’ resources up corresponds to Applicant’s determination of an alternative placement including prediction of a best placement of the portion among the plurality of heterogenous compute resources based on a predicted decrease of power utilization, as the scaling system would consider the decrease of power utilization in its relocation or alternative placement. Note that since Applicant states “one or more of” , Examiner is interpreting “based on a predicted increase or decrease of power utilization” as fulfilling the requirements of the claim.).
 
Regarding Claim 3, Ramanathan-Phull as described in Claim 1, Phull further discloses identification of the section of the workload is based on an annotation, a string, an interrupt, or a profiling control contained within a binary representation of the workload that identifies a beginning or an end of the section ([0057] With reference to FIG. 3, the repartition can assign larger data blocks 302.sub.2 and 302.sub.4 (as compared to blocks 210.sub.2 and 210.sub.4) to processor nodes 202 and 204, respectively, and can assign smaller data blocks 302.sub.6 and 302.sub.8 (as compared to blocks 210.sub.6 and 210.sub.8) to processor nodes 206 and 208, respectively. Please note that, referencing Fig. 3, the workload has been partitioned into distinct data blocks 302.sub.2, 302.sub.4, 302.sub.6, and 302.sub.8, meaning that internally to the system, there is inherently a means by which the beginning and ends of these sections are identified, corresponding to the identification of the section of the workload being based on an annotation.).
 
Regarding Claim 4, Ramanathan-Phull as described in Claim 1, Phull further discloses capture information indicative of a behavior of the workload, wherein identification of the section of the workload is based on the behavior ([0055] the workload/data is divided based on the compute capabilities of the processing units involved.; [0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. Please note that carrying out dynamic data partitioning of the workload where the run-time balancer analyzes the computation patterns of the MPI processes and directs the repartition of the data set accordingly corresponds to Applicant’s capturing information indicative of a behavior of the workload, wherein identification of the section of the workload is based on the behavior, as the partitions of the workload corresponds to the sections of the workload are identified based on the captured behavior of the MPI processes associated with the workload.).
 
Regarding Claim 5, Ramanathan-Phull as described in Claim 4, Phull further discloses the behavior comprises execution of the section meeting or exceeding a measure of work done or a repetition threshold ([0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. For example, assume P0 is a master process with P1, P2 and P3 as slave processes. After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that carrying out dynamic data partitioning of the workload where the slave processes are ran for a fixed number of iterations after which the master process observes the pattern for each process and suggests a new partitioning ratio corresponds to Applicant’s behaviors comprising execution of the section meeting a repetition threshold, as the fixed number of iterations each slave process is run for corresponds to meeting a repetition threshold, and the pattern of the process corresponds to the behavior of the execution of the section.).
 
Regarding Claim 6, Ramanathan-Phull as described in Claim 4, Phull further discloses capturing of information indicative of the behavior of the workload is performed via one or more of function interposition, event queries, hardware counters, dynamic instruction count, and other measures of work ([0056] After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that the slave processes sending timing profiles (computation and communication time) to the master process for the master process to observe the pattern and suggest a new partitioning ratio corresponds to Applicant’s capturing of information indicative of the behavior of the workload being performed via hardware counters, as it is known in the art that hardware counters measure time intervals in computer systems, and therefore one would need to be employed to capture a timing profile containing the computation time that is indicative of the behavior of the workload. Note that since Applicant states “via one or more of” different measures of work, Examiner is interpreting “being performed via hardware counters” as fulfilling the requirements of the claim.). 

Regarding Claim 7, Ramanathan-Phull as described in Claim 6, Phull further discloses wherein the information indicative of the behavior includes existence or non-existence of dependencies between the portion and one or more other portions of the section ([0055] the workload/data is divided based on the compute capabilities of the processing units involved. One way to accomplish this is to characterize the cluster by profiling it statically and generating a map of relative computation power for the different nodes involved, and then using this information for generating data partitions […] Second, in the case of multi-tenancy where applications share resources in the cluster, it would be difficult to predict the execution time of an application statically. Please note that profiling the cluster to divide the workload by generating a map of relative computation power for the different nodes involved and using that information for generating data partitions, but in the case of multi-tenancy where applications share resources in the cluster corresponds to Applicant’s information indicative of the behavior including existence of dependencies between the portion and other portions of the section, as the nodes involved correspond to the portions of the section and the shared resources between their applications correspond to the existence of dependencies between the portions of the section, as they would depend on the same resources, and therefore this would need to be considered in determining the behavior of the workload.).
 
Regarding Claim 8, Ramanathan-Phull as described in Claim 6, Phull further discloses wherein the information indicative of the behavior includes whether the portion exhibits data locality ([0055] the workload/data is divided based on the compute capabilities of the processing units involved. One way to accomplish this is to characterize the cluster by profiling it statically and generating a map of relative computation power for the different nodes involved, and then using this information for generating data partitions […] a cluster of heterogeneous CPUs with different memory bandwidths, cache levels and processing elements. Please note that the cluster of heterogenous CPUs having different memory bandwidths being a factor that must be considered when profiling the processes to determine partitions of the workload corresponds to Applicant’s information indicative of the behavior including whether the portion exhibits data locality, as having data locality in a particular cluster would improve its data locality, and therefore the fact that it must be considered in the dynamic data partitioning scheme means the data locality is inherently included as part of the memory bandwidth analysis issue that is solved.).
 
Regarding Claim 9, Ramanathan-Phull as described in Claim 1, Ramanathan further discloses wherein the processor is internal to the cluster computer system ([0086] The edge computing node 750 may include or be coupled to acceleration circuitry 764, which may be embodied by […] one or more CPUs, […] or other forms of specialized processors or circuitry designed to accomplish one or more specialized tasks. Please note the edge computing node 750 including acceleration circuitry 764 which is embodied by specialized processors corresponds to Applicant’s processor being internal to the cluster computer system, as the node of the cluster has a processor internal to it.).
 
Regarding Claim 10, Ramanathan-Phull as described in Claim 1, Ramanathan further discloses wherein the processor is external to the cluster computer system ([0001] components that can perform edge computing operations (“edge nodes”) can reside in whatever location needed by the system architecture or ad hoc service (e.g., in a high performance compute data center); [0004] compute resources may be placed in locations that are remote from a conventional data center. Please note that an instance of the system in which the edge nodes of the cluster computer system reside in a HPC data center but have compute resources placed in locations that are remote from a conventional data center corresponds to the processor being external to the cluster computer system, as the system could still be implemented but with the compute resources, i.e., the processor of the node, being remote to the HPC data center.).
 
Regarding Claim 11, Ramanathan discloses A method ([0095] cause the machine to perform any one or more of the methodologies of the present disclosure. Please note that the methodologies of the disclosure of Ramanathan correspond to Applicant’s method.) comprising: 
wherein the workload comprises a high-performance computing (HPC) or artificial intelligence (AI) workload ([0001] Components that can perform edge computing operations (“edge nodes”) can reside in whatever location needed by the system architecture or ad hoc service (e.g., in a high performance compute data center. Please note the edge nodes residing a high performance compute data center corresponds to Applicant’s workload comprising a HPC workload, as it inherently means the nodes are performing HPC operations.)
for a portion of the section currently executing on a compute resource of a plurality of heterogeneous compute resources of a node of the cluster compute system, determining an alternate placement among the plurality of heterogeneous compute resources ([0109] A node instance auto-scale operation may automatically scale worker nodes from a pool of hetero or homogenous servers in an edge cluster. ; [0134] The edge node instance auto-scaler may run a series of checks to identify nodes that are removable. Application instances may be identified that may be relocated to other nodes. The edge node instance auto-scaler may use criteria such as first selecting nodes not running any application instances for removal, or selecting a node with a minimum number of application instances. Other nodes may be identified to which these instances maybe migrated, for example with minimum cost. Please note that the edge node instance auto-scaler identifying application instances of nodes that may be migrated with minimum cost corresponds to determining an alternate placement among the plurality of compute resources. Since the nodes can be scaled from a pool of heterogenous servers in an edge cluster, this corresponds to the plurality of compute resources of the node of the cluster computer system being heterogenous. ); 
and after predicting an improvement to the FOM based on the alternate placement, relocating the portion to the alternate placement ([0137] The edge cluster auto-scaler or node instance autoscaler supports migrating container workloads from (e.g., least loaded) nodes, for example with consideration for high availability, to the other nodes in the cluster that have capacity to handle those workloads. Please note that migrated container workloads from least loaded nodes to other nodes in the cluster to handle those workloads corresponds to relocating the portion to the alternate placement, as the section of the workload identified as significant to the FOM of computation time as subsequently disclosed by Phull could be relocated to another node in the cluster to improve this computation time by having a  node that has greater capacity to handle it.).
Ramanathan does not explicitly disclose during execution of a workload on a cluster computer system: identify a section of the workload as significant to a figure of merit (FOM) of the workload;
However, Phull discloses during execution of a workload on a cluster computer system: identifying a section of the workload as significant to a figure of merit (FOM) of the workload ([0055] the workload/data is divided based on the compute capabilities of the processing units involved.; [0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that the run-time balancer analyzing the computation and communication patterns of different MPI processes in order to divide the workload into partitions based on the compute capabilities of the processing units involved corresponds to identifying a section of the workload as significant to a FOM of the workload during execution of a workload on a cluster computer system, as while the workload is executing by the processes, it identifies a partition, i.e., section of the workload, that is significant to improving the computation time of each process. Since Applicant states in [0021] that “Typically, the FOM of a workload is expressed in terms of time (e.g., latency and/or time to complete a particular function or set of activities),” the time of computation being improved corresponds to being significant to a FOM of the workload, as computation time corresponds to the time to complete the set of activities of the workload.); 
Ramanathan and Phull are both considered to be analogous to the claimed invention because they are in the same field of HPC cluster node performance improvement. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ramanathan to incorporate the teachings of Phull to modify the HPC workload alternate placement relocation system to identify a section of the workload as significant to a FOM of the workload, allowing for improvement of system efficiency and performance by partitioning the workload optimally based on available resources of nodes, as described in Phull.
 
Regarding Claim 12, Ramanathan-Phull as described in Claim 11, Ramanathan further discloses determining an alternative placement comprises predicting a best placement of the portion among the plurality of heterogeneous compute resources based on one or more of: a predicted increase or decrease in data transfer via interconnects coupling the plurality of heterogeneous compute resources; a predicted increase or decrease in memory access cost; a predicted increase or decrease in compute efficiency; parallel efficiency; a number of concurrent contexts allowed; a predicted increase or decrease of power utilization; a predicted increase or decrease in thermal behavior; a predicted increase or decrease in scheduling schemes; a predicted increase or decrease in concurrency; and other architectural features of the node ([0023] The systems and methods described herein may be used to provide power considerations while scaling application instances' resources (e.g., scaling up or down. Please note that providing power considerations while scaling application instances’ resources up corresponds to Applicant’s determination of an alternative placement comprising prediction of a best placement of the portion among the plurality of heterogenous compute resources based on a predicted decrease of power utilization, as the scaling system would consider the decrease of power utilization in its relocation or alternative placement. Note that since Applicant states “based one or more of” different architectural features of the node, Examiner is interpreting “based on a predicted decrease of power utilization” as fulfilling the requirements of the claim.).
 
Regarding Claim 13, Ramanathan-Phull as described in Claim 11, Phull further discloses identifying a section of the workload is based on an annotation, a string, an interrupt, or a profiling control contained within a binary representation of the workload that identifies a beginning or an end of the section ([0057] With reference to FIG. 3, the repartition can assign larger data blocks 302.sub.2 and 302.sub.4 (as compared to blocks 210.sub.2 and 210.sub.4) to processor nodes 202 and 204, respectively, and can assign smaller data blocks 302.sub.6 and 302.sub.8 (as compared to blocks 210.sub.6 and 210.sub.8) to processor nodes 206 and 208, respectively. Please note that, referencing Fig. 3, the workload has been partitioned into distinct data blocks 302.sub.2, 302.sub.4, 302.sub.6, and 302.sub.8, meaning that internally to the system, there is inherently a means by which the beginning and ends of these sections are identified, corresponding to the identification of the section of the workload being based on an annotation.).
 
Regarding Claim 14, Ramanathan-Phull as described in Claim 11, Phull further discloses capturing information indicative of a behavior of the workload, wherein said identifying a section of the workload is based on the behavior ([0055] the workload/data is divided based on the compute capabilities of the processing units involved.; [0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. Please note that carrying out dynamic data partitioning of the workload where the run-time balancer analyzes the computation patterns of the MPI processes and directs the repartition of the data set accordingly corresponds to Applicant’s capturing information indicative of a behavior of the workload, wherein identification of the section of the workload is based on the behavior, as the partitions of the workload corresponds to the sections of the workload are identified based on the captured behavior of the MPI processes associated with the workload.).
 
Regarding Claim 15, Ramanathan-Phull as described in Claim 14, Phull further discloses the behavior comprises execution of the section meeting or exceeding a predetermined or configurable measure of work done or a repetition threshold ([0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. For example, assume P0 is a master process with P1, P2 and P3 as slave processes. After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that carrying out dynamic data partitioning of the workload where the slave processes are ran for a fixed number of iterations after which the master process observes the pattern for each process and suggests a new partitioning ratio corresponds to Applicant’s behaviors comprising execution of the section meeting a repetition threshold, as the fixed number of iterations each slave process is run for corresponds to meeting a repetition threshold, and the pattern of the process corresponds to the behavior of the execution of the section.).
 
Regarding Claim 16, Ramanathan-Phull as described in Claim 14, Phull further discloses capturing information indicative of a behavior of the workload is performed via one or more of function interposition, event queries, hardware counters, dynamic instruction count, and other measures of work ([0056] After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that the slave processes sending timing profiles (computation and communication time) to the master process for the master process to observe the pattern and suggest a new partitioning ratio corresponds to Applicant’s capturing of information indicative of the behavior of the workload being performed via hardware counters, as it is known in the art that hardware counters measure time intervals in computer systems, and therefore one would need to be employed to capture a timing profile containing the computation time that is indicative of the behavior of the workload. Note that since Applicant states “via one or more of” different measures of work, Examiner is interpreting “via hardware counters” as fulfilling the requirements of the claim).
 
Regarding Claim 17, Ramanathan-Phull as described in Claim 16, Phull further discloses the information indicative of the behavior includes existence or non-existence of dependencies between the portion and one or more other portions of the section ([0055] the workload/data is divided based on the compute capabilities of the processing units involved. One way to accomplish this is to characterize the cluster by profiling it statically and generating a map of relative computation power for the different nodes involved, and then using this information for generating data partitions […] Second, in the case of multi-tenancy where applications share resources in the cluster, it would be difficult to predict the execution time of an application statically. Please note that profiling the cluster to divide the workload by generating a map of relative computation power for the different nodes involved and using that information for generating data partitions, but in the case of multi-tenancy where applications share resources in the cluster corresponds to Applicant’s information indicative of the behavior including existence of dependencies between the portion and other portions of the section, as the nodes involved correspond to the portions of the section and the shared resources between their applications correspond to the existence of dependencies between the portions of the section, as they would depend on the same resources, and therefore this would need to be considered in determining the behavior of the workload.).
 
Regarding Claim 18, Ramanathan-Phull as described in Claim 16, Phull further discloses the information indicative of the behavior includes whether the portion exhibits data locality ([0055] the workload/data is divided based on the compute capabilities of the processing units involved. One way to accomplish this is to characterize the cluster by profiling it statically and generating a map of relative computation power for the different nodes involved, and then using this information for generating data partitions […] a cluster of heterogeneous CPUs with different memory bandwidths, cache levels and processing elements. Please note that the cluster of heterogenous CPUs having different memory bandwidths being a factor that must be considered when profiling the processes to determine partitions of the workload corresponds to Applicant’s information indicative of the behavior including whether the portion exhibits data locality, as having data locality in a particular cluster would improve its data locality, and therefore the fact that it must be considered in the dynamic data partitioning scheme means the data locality is inherently included as part of the memory bandwidth analysis issue that is solved.). 

Regarding Claim 19, Ramanathan discloses A cluster computer system comprising: a compute node having a plurality of heterogeneous compute resources [0109] A node instance auto-scale operation may automatically scale worker nodes from a pool of hetero or homogenous servers in an edge cluster. Please note that since the nodes can be scaled from a pool of heterogenous servers in an edge cluster, this corresponds to the compute node having a plurality of compute resources that are heterogenous. Additionally, since the nodes are in the edge cluster, this corresponds to the cluster computer system comprising the compute node.); a head node having a processor ([0086] The edge computing node 750 may include or be coupled to acceleration circuitry 764, which may be embodied by […] one or more CPUs, […] or other forms of specialized processors or circuitry designed to accomplish one or more specialized tasks; [0115] The initial edge cluster of diagram 900 may be an edge cluster initiated with a minimal number of worker nodes in initial state 902. As application instances are scaled up or down based on existing or predicted metrics, new nodes may be added or removed. Please note  that with an initial edge cluster with a minimal number of worker nodes in initial state 902 after which new nodes may be added, there may be an instance in which there is only one initial node of the cluster prior to new nodes being added, corresponding to the head node. Furthermore, the edge computing node 750 including acceleration circuitry 764 which is embodied by specialized processors corresponds to Applicant’s head node having a processor.); and instructions that when executed by the processor or one or more of the plurality of heterogeneous compute resources cause the cluster computer system to ([0093] In an example, the instructions 782 provided via the memory 754, the storage 758, or the processor 752 may be embodied as a non-transitory, machine-readable medium 760 including code to direct the processor 752 to perform electronic operations. Please note the code to direct the processor 752 to perform electronic operations corresponds to Applicant’s instructions executed by a processor and causing the cluster computer system to carry out the operations.): 
wherein the workload comprises a high-performance computing (HPC) or artificial intelligence (AI) workload ([0001] Components that can perform edge computing operations (“edge nodes”) can reside in whatever location needed by the system architecture or ad hoc service (e.g., in a high performance compute data center. Please note the edge nodes residing a high performance compute data center corresponds to Applicant’s workload comprising a HPC workload, as it inherently means the nodes are performing HPC operations.)
for a portion of the section currently executing on a compute resource of the plurality of heterogeneous compute resources, determine an alternate placement among the plurality of heterogeneous compute resources ([0109] A node instance auto-scale operation may automatically scale worker nodes from a pool of hetero or homogenous servers in an edge cluster. ; [0134] The edge node instance auto-scaler may run a series of checks to identify nodes that are removable. Application instances may be identified that may be relocated to other nodes. The edge node instance auto-scaler may use criteria such as first selecting nodes not running any application instances for removal, or selecting a node with a minimum number of application instances. Other nodes may be identified to which these instances maybe migrated, for example with minimum cost. Please note that the edge node instance auto-scaler identifying application instances of nodes that may be migrated with minimum cost corresponds to determining an alternate placement among the plurality of compute resources. Since the nodes can be scaled from a pool of heterogenous servers in an edge cluster, this corresponds to the plurality of compute resources of the node of the cluster computer system being heterogenous. ); 
and after predicting an improvement to the FOM based on the alternate placement, relocate the portion to the alternate placement ([0137] The edge cluster auto-scaler or node instance autoscaler supports migrating container workloads from (e.g., least loaded) nodes, for example with consideration for high availability, to the other nodes in the cluster that have capacity to handle those workloads. Please note that migrated container workloads from least loaded nodes to other nodes in the cluster to handle those workloads corresponds to relocating the portion to the alternate placement, as the section of the workload identified as significant to the FOM of computation time as subsequently disclosed by Phull could be relocated to another node in the cluster to improve this computation time by having a  node that has greater capacity to handle it.).
Ramanathan does not explicitly disclose during execution of a workload on a cluster computer system: identify a section of the workload as significant to a figure of merit (FOM) of the workload;
However, Phull discloses during execution of a workload on the cluster computer system: identify a section of the workload as significant to a figure of merit (FOM) of the workload ([0055] the workload/data is divided based on the compute capabilities of the processing units involved.; [0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that the run-time balancer analyzing the computation and communication patterns of different MPI processes in order to divide the workload into partitions based on the compute capabilities of the processing units involved corresponds to identifying a section of the workload as significant to a FOM of the workload during execution of a workload on a cluster computer system, as while the workload is executing by the processes, it identifies a partition, i.e., section of the workload, that is significant to improving the computation time of each process. Since Applicant states in [0021] that “Typically, the FOM of a workload is expressed in terms of time (e.g., latency and/or time to complete a particular function or set of activities),” the time of computation being improved corresponds to being significant to a FOM of the workload, as computation time corresponds to the time to complete the set of activities of the workload.); 
Ramanathan and Phull are both considered to be analogous to the claimed invention because they are in the same field of HPC cluster node performance improvement. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ramanathan to incorporate the teachings of Phull to modify the HPC workload alternate placement relocation system to identify a section of the workload as significant to a FOM of the workload, allowing for improvement of system efficiency and performance by partitioning the workload optimally based on available resources of nodes, as described in Phull.
 
Regarding Claim 20, Ramanathan-Phull as described in Claim 19, Ramanathan further discloses the plurality of heterogeneous compute resources include one or more of (i) a central processing unit (CPU) or a CPU core, (ii) a graphics processing unit (GPU) or a GPU core, (iii) a field- programmable gate array (FPGA), and (iv) another type of accelerator ([0066] In some examples, the compute node 700 may be embodied as a single device such as an integrated circuit, an embedded system, a field-programmable gate array (FPGA), a system-on-a-chip (SOC), or other integrated system or device. In the illustrative example, the compute node 700 includes or is embodied as a processor 704 and a memory 706. The processor 704 may be embodied as any type of processor.; [0109] A node instance auto-scale operation may automatically scale worker nodes from a pool of hetero or homogenous servers in an edge cluster.  Please note the compute node 700 being embodied as an FPGA corresponds to Applicant’s plurality of heterogenous compute resources including a FPGA. Since the nodes can be scaled from a pool of heterogenous servers in an edge cluster, this corresponds to the compute resources being heterogenous. Note that since Applicant states “the plurality of heterogeneous compute resources include one or more of” different accelerators, Examiner is interpreting “include an FPGA” as fulfilling the requirements of the claim).

Regarding Claim 21, Ramanathan-Phull as described in Claim 20, Ramanathan further discloses the compute resource is the CPU or the CPU core in a given socket or a given non-uniform memory access (NUMA) node and the alternate placement is a same CPU or CPU core in a different socket or a different NUMA node ([0076] The processor 752 and accompanying circuitry may be provided in a single socket form factor, multiple socket form factor, or a variety of other formats, including in limited hardware configurations or configurations that include fewer than all elements shown in FIG. 7B; [0086] The edge computing node 750 may include or be coupled to acceleration circuitry 764, which may be embodied by […] an arrangement of GPUs, […] one or more CPUs. Please note that the edge computing node being embodied by a CPU in a particular socket form factor or alternatively the same CPU in another socket form factor corresponds to Applicant’s compute resource being the CPU in a given socket and the alternate placement being the same CPU in a different socket, as both are capable of embodying the nodes utilized in the alternate placement system, and therefore the node could alternatively be placed on the same CPU in a different socket form factor in a different hardware configuration.).
 
Regarding Claim 22, Ramanathan-Phull as described in Claim 20, Ramanathan further discloses the compute resource is the CPU or the CPU core and the alternate placement is the GPU, the GPU core, the FPGA, or said another type of accelerator ([0086] The edge computing node 750 may include or be coupled to acceleration circuitry 764, which may be embodied by […] an arrangement of GPUs, […] one or more CPUs. Please note that the edge computing node being embodied by a CPU or alternatively an arrangement of GPUs corresponds to Applicant’s compute resource being the CPU and the alternate placement being the GPU, as both are capable of embodying the nodes utilized in the alternate placement system, and therefore the node could alternatively be placed on a GPU instead of the CPU.).
 
Regarding Claim 23, Ramanathan-Phull as described in Claim 20, Ramanathan further discloses the compute resource is the GPU or the GPU core and the alternate placement is the CPU, the CPU core, the FPGA, or said another type of accelerator ([0086] The edge computing node 750 may include or be coupled to acceleration circuitry 764, which may be embodied by […] an arrangement of GPUs, […] one or more CPUs. Please note that the edge computing node being embodied by an arrangement of GPUs or alternatively a CPU corresponds to Applicant’s compute resource being the GPU and the alternate placement being the CPU, as both are capable of embodying the nodes utilized in the alternate placement system, and therefore the node could alternatively be placed on a CPU instead of the GPU.).
 
Regarding Claim 24, Ramanathan-Phull as described in Claim 19, Ramanathan further discloses determination of an alternative placement includes predicting a best placement of the portion among the plurality of heterogeneous compute resources based on one or more of: a predicted increase or decrease in data transfer via interconnects coupling the plurality of heterogeneous compute resources; a predicted increase or decrease in memory access cost; a predicted increase or decrease in compute efficiency; parallel efficiency; a number of concurrent contexts allowed; a predicted increase or decrease of power utilization; a predicted increase or decrease in thermal behavior; a predicted increase or decrease in scheduling schemes; a predicted increase or decrease in concurrency; and other architectural features of the node ([0023] The systems and methods described herein may be used to provide power considerations while scaling application instances' resources (e.g., scaling up or down. Please note that providing power considerations while scaling application instances’ resources up corresponds to Applicant’s determination of an alternative placement including prediction of a best placement of the portion among the plurality of heterogenous compute resources based on a predicted decrease of power utilization, as the scaling system would consider the decrease of power utilization in its relocation or alternative placement. Note that since Applicant states “one or more of” , Examiner is interpreting “based on a predicted increase or decrease of power utilization” as fulfilling the requirements of the claim.).
 
Regarding Claim 25, Ramanathan-Phull as described in Claim 19, Phull further discloses identification of the section of the workload is based on an annotation, a string, an interrupt, or a profiling control contained within a binary representation of the workload that identifies a beginning or an end of the section ([0057] With reference to FIG. 3, the repartition can assign larger data blocks 302.sub.2 and 302.sub.4 (as compared to blocks 210.sub.2 and 210.sub.4) to processor nodes 202 and 204, respectively, and can assign smaller data blocks 302.sub.6 and 302.sub.8 (as compared to blocks 210.sub.6 and 210.sub.8) to processor nodes 206 and 208, respectively. Please note that, referencing Fig. 3, the workload has been partitioned into distinct data blocks 302.sub.2, 302.sub.4, 302.sub.6, and 302.sub.8, meaning that internally to the system, there is inherently a means by which the beginning and ends of these sections are identified, corresponding to the identification of the section of the workload being based on an annotation.).
 
Regarding Claim 26, Ramanathan-Phull as described in Claim 19, Phull further discloses capture information indicative of a behavior of the workload, wherein identification of the section of the workload is based on the behavior ([0055] the workload/data is divided based on the compute capabilities of the processing units involved.; [0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. Please note that carrying out dynamic data partitioning of the workload where the run-time balancer analyzes the computation patterns of the MPI processes and directs the repartition of the data set accordingly corresponds to Applicant’s capturing information indicative of a behavior of the workload, wherein identification of the section of the workload is based on the behavior, as the partitions of the workload corresponds to the sections of the workload are identified based on the captured behavior of the MPI processes associated with the workload.).
 
Regarding Claim 27, Ramanathan-Phull as described in Claim 26, Phull further discloses the behavior comprises execution of the section meeting or exceeding a measure of work done or a repetition threshold ([0056] As such, in accordance with aspects of the present principles, a dynamic data partitioning scheme can be employed, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes and directs the repartition of the data set accordingly. For example, assume P0 is a master process with P1, P2 and P3 as slave processes. After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that carrying out dynamic data partitioning of the workload where the slave processes are ran for a fixed number of iterations after which the master process observes the pattern for each process and suggests a new partitioning ratio corresponds to Applicant’s behaviors comprising execution of the section meeting a repetition threshold, as the fixed number of iterations each slave process is run for corresponds to meeting a repetition threshold, and the pattern of the process corresponds to the behavior of the execution of the section.).
 
Regarding Claim 28, Ramanathan-Phull as described in Claim 26, Phull further discloses capturing of information indicative of the behavior of the workload is performed via one or more of function interposition, event queries, hardware counters, dynamic instruction count, and other measures of work ([0056] After running for a fixed number of iterations, the slave processes send their own timing profiles (computation and communication time) to the master process. The master process observes the computation and communication pattern for each process and suggests a new partitioning ratio to balance the computation across the processes to achieve optimal performance. Please note that the slave processes sending timing profiles (computation and communication time) to the master process for the master process to observe the pattern and suggest a new partitioning ratio corresponds to Applicant’s capturing of information indicative of the behavior of the workload being performed via hardware counters, as it is known in the art that hardware counters measure time intervals in computer systems, and therefore one would need to be employed to capture a timing profile containing the computation time that is indicative of the behavior of the workload. Note that since Applicant states “via one or more of” different measures of work, Examiner is interpreting “being performed via hardware counters” as fulfilling the requirements of the claim.). 
 
Regarding Claim 29, Ramanathan-Phull as described in Claim 26, Phull further discloses the information indicative of the behavior includes one or more of (i) existence or non-existence of dependencies between the portion and one or more other portions of the section and (ii) whether the portion exhibits data locality ([0055] the workload/data is divided based on the compute capabilities of the processing units involved. One way to accomplish this is to characterize the cluster by profiling it statically and generating a map of relative computation power for the different nodes involved, and then using this information for generating data partitions […] Second, in the case of multi-tenancy where applications share resources in the cluster, it would be difficult to predict the execution time of an application statically. Please note that profiling the cluster to divide the workload by generating a map of relative computation power for the different nodes involved and using that information for generating data partitions, but in the case of multi-tenancy where applications share resources in the cluster corresponds to Applicant’s information indicative of the behavior including existence of dependencies between the portion and other portions of the section, as the nodes involved correspond to the portions of the section and the shared resources between their applications correspond to the existence of dependencies between the portions of the section, as they would depend on the same resources, and therefore this would need to be considered in determining the behavior of the workload. Note that since Applicant states “information indicative of the behavior includes one or more of” (i) or (ii), Examiner is interpreting (i) “existence of dependencies between the portion and one or more other portions of the section” as fulfilling the requirements of the claim).

Response to Arguments
Applicant's arguments filed 01/26/2026 have been fully considered but they are not persuasive.
Applicant’s arguments are summarized as follows:
Phull is an old reference dating to 2011 that discloses an archaic technique for load balancing on heterogenous processing clusters implementing parallel execution.
Phull does not teach or reasonably suggest for a portion of the section currently executing on a compute resource of a plurality of heterogenous compute resources of a node of the cluster compute system, determine an alternate placement among the plurality of heterogenous compute resources as recited by Claim 1, and instead discloses “a dynamic data partitioning scheme, where a run-time balancer analyzes the discrepancy in the computation and communication patterns of different MPI processes” and “assigning larger data blocks to faster processor nodes and smaller data blocks to slower processor nodes to achieve relative parity of computation times.” Phull’s approach is data repartitioning, i.e., redistributing data among processors to balance load, which is not the same as code/portion relocation to different types of computer resources. Phull fails to teach relocating “a portion of the section” to an alternate placement among heterogenous compute resources as recited by Claim 1. Therefore, the rejection for Claim 1 under 35 U.S.C. 103 should be withdrawn.
Claim 11 and 19 contain similar limitations to Claim 1, and therefore, their rejections under 35 U.S.C. 103 should also be withdrawn.
The dependent Claims of Claims 1, 11, and 19 should have their rejections under 35 U.S.C. 103 withdrawn.

Regarding A, in response to applicant's argument based upon the age of the references, contentions that the reference patents are old are not impressive absent a showing that the art tried and failed to solve the same problem notwithstanding its presumed knowledge of the references.  See In re Wright, 569 F.2d 1124, 193 USPQ 332 (CCPA 1977).

Regarding B, the examiner respectfully disagrees. The quoted citations from paragraphs 0056 and 0057 of Phull in Applicant’s response were not cited in the original office action as being pertinent to Claim 1. As stated above, Ramanathan was cited as teaching for a portion of the section currently executing on a compute resource of a plurality of heterogenous compute resources of a node of the cluster compute system, determine an alternate placement among the plurality of heterogenous compute resources, as recited from [0109] and [0134]-the edge node instance auto-scaler identifying application instances of nodes that may be migrated with minimum cost corresponds to determining an alternate placement among the plurality of compute resources, and since the nodes can be scaled from a pool of heterogenous servers in an edge cluster, this corresponds to the plurality of compute resources of the node of the cluster computer system being heterogenous. Therefore, in effect, even if Phull does not teach the aforementioned limitations as Applicant argues, a system meeting the requirements of the limitations may be obtained by a person of ordinary skill in the art by combining the teachings of Phull with Ramanathan.
Therefore, the recited features can be found in the cited combination of references, and independent Claim 1 remains rejected under 35 U.S.C. 103 for the reasons stated above, and the combinations cited would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the application. The rejections under 35 U.S.C. 103 are maintained.

Regarding C, the examiner respectfully disagrees. The independent claims 11 and 19 contain similar limitations to rejected Independent Claim 1 and do not add limitations that overcome the rejection; therefore, they likewise remain rejected, and the application is not in condition for allowance. The rejections under 35 U.S.C. 103 are maintained.

Regarding D, the examiner respectfully disagrees. The dependent claims 2-10, 12-18, and 20-29 depend on unpatentable claims and do not add limitations that overcome the rejection; therefore, they likewise remain rejected, and the application is not in condition for allowance. The rejections under 35 U.S.C. 103 are maintained.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Korupolu et al. (US 20080295094 A1) discloses optimized placement of applications within a heterogenous network, and optimized placement of applications with consideration of affinities between nodes (see [0004-0009, 0022-0025]).

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARAZ T AKBARI whose telephone number is (571)272-4166. The examiner can normally be reached Monday-Thursday 9:30am-7:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, April Blair can be reached at (571)270-1014. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FARAZ T AKBARI/               Examiner, Art Unit 2196
Read full office action
DYNAMIC CROSS-ARCHITECTURE APPLICATION ADAPTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

DYNAMIC CROSS-ARCHITECTURE APPLICATION ADAPTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email