Last updated: April 19, 2026
Application No. 17/330,710
DYNAMIC LOAD BALANCING OF OPERATIONS FOR REAL-TIME DEEP LEARNING ANALYTICS

Final Rejection §101§103
Filed
May 26, 2021
Examiner
XU, ZUJIA
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
6 (Final)
Interview Optional

— +81.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 169 resolved cases, 2023–2026
Examiner Intelligence

XU, ZUJIA View full profile →
Grants 68% — above average
Career Allow Rate
114 granted / 169 resolved
+12.5% vs TC avg
Strong +82% interview lift
Without
With
+81.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
33 currently pending
Career history
202
Total Applications
across all art units
Statute-Specific Performance

§101
16.0%
-24.0% vs TC avg
§103
46.2%
+6.2% vs TC avg
§102
2.0%
-38.0% vs TC avg
§112
31.0%
-9.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 169 resolved cases
Office Action

§101 §103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to Applicant’s Amendment and Remarks filed on 10 December 2025.
Claims 1-34 are pending in this application.

Claim objections
Claims 2, 6-8, 13 and 19-21 are objected to because of the following informalities:
In claims 2, 6-8, 13 and 19-21, it recites “the batch of frames”. It should be amended as “the batch video frames”. (see claim 1 regarding to “a batch of video frames”).
	Appropriate correction is required.

Claim Rejections - 35 USC § 101
	35 U.S.C. 101 reads as follows:
	Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-34 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1, Statutory Category: Yes, the claim 1 is a system that performs a series of steps and therefore falls in the statutory category of a machine.
Step 2A- Prong 1: Judicial Exception Recited: Yes, the claim recites: “identify that a first hardware accelerator is a pre-selected hardware accelerator to execute video frame transformations corresponding to a set of clients based, at least in part, on configuration information; assign the set of clients to cause the first hardware accelerator to execute the video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames; generate a determination that a first metric associated with each of the set of clients exceeds a threshold, wherein the metric is computed based, at least in part, on a type of the video frame transformations; assign a subset of clients of the set of clients to a set of second hardware accelerators to cause the metric associated with each of the set of clients to be below the threshold in response to the determination; as a result of assigning the subset of clients, reassign at least one of the subset of clients to the first hardware accelerator to cause the metric associated with each of the set of clients to remain below the threshold” As drafted, the claim as a whole recites a system that performs a series of steps which could be performed in the human mind, but for the recitation of generic computing components. The human mind can easily judging/evaluating/identifying a first hardware accelerator as a pre-selected hardware accelerator which is to execute a set of clients based on configuration information, assigning/scheduling/planning the set of processes to the first hardware accelerator for processing, determining if the computed/determined metric associated with each of the set of clients exceeds a threshold, assigning/scheduling/planning a subset of the processes to another set of hardware accelerators in order to allow the metric to below the threshold, and then scheduling/planning/assigning/reassigning the at least one of the subset of clients/process back to the first hardware accelerator based on if the determined metric is remain below the threshold. Therefore, but for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion). 
	Therefore, yes, the claims do recite judicial exceptions.
Step 2A- Prong 2: Integrated into a practical Application: No, this judicial exception is not integrated into a practical application. In particular, the claim recites additional elements “one or more processors; and memory storing instructions that, as a result of being executed by the one or more processors, cause the system to”, “wherein the configuration information comprises a mapping between the video frame transformations and hardware accelerators” are directed to generic computing components/functions (MPEP § 2106.05(b) merely applying the abstract idea (MPEP § 2106.05(f)). Further, the limitation of “cause the first hardware accelerator to execute at least one of the video frame transformations corresponding to the subset of clients” which is merely applying the judicial exception or abstract idea (See MPEP 2106.05(f)). (i.e., The claim does not define any particular machine to “cause” this “video frame transformation,” other than a generic machine such as the “hardware accelerator,” and no details what so ever on how the claimed function (i.e., video frame transformations) will occur). The combination of these additional elements is no more than mere instructions to apply the exception using a generic computer component (MPEP 2106.05(f)). Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to the abstract idea.
Step 2B: Claim provides an Inventive Concept: No. The additional element “one or more processors; and memory storing instructions that, as a result of being executed by the one or more processors, cause the system to” and “wherein the configuration information comprises a mapping between the video frame transformations and hardware accelerators” are directed to generic computing components/functions (MPEP § 2106.05(b) merely applying the abstract idea (MPEP § 2106.05(f)). Further, the limitation of “cause the first hardware accelerator to execute at least one of the video frame transformations corresponding to the subset of clients” which is merely applying the judicial exception or abstract idea (See MPEP 2106.05(f)). (i.e., The claim does not define any particular machine to “cause” this “video frame transformation,” other than a generic machine such as the “hardware accelerator,” and no details what so ever on how the claimed function (i.e., video frame transformations) will occur). The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A ((MPEP 2106.05(f)). These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.

For these reasons, there is no inventive concept in the claim, and thus the claim is ineligible. 

Independent claim 22 is rejected for the same reason as claim 1 above. In addition, claim 22 further recites “assigning a set of application clients to a video image compositor (VIC) engine”, “determining that an average time used by the VIC engine to execute the video frame transformations for the set of application clients exceeds a frame processing threshold”, “assigning, to a set of processing units, distinct from VIC engine, a subset of application clients of the set of application clients to cause an average time used…to be below the frame processing threshold…”, “reassigning the subset of application clients to cause the average time used by the VIC engine to remain below the frame processing threshold…” are being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. For example, the human mind can easily evaluating/planning/scheduling/assigning a set of application clients to a video image compositor (VIC) engine for processing the requests, judging/determining/comparing whether the average time of the processing is exceeding a threshold and based on the determination further planning/scheduling/assigning the processing load to different processing devices in order to improving the processing time and keep determining whether reassigning the subset of application clients causes the average time used by the VIC engine to remain below the frame processing threshold, and then scheduling/moving the processes back to the previous VIC for processing to improving the processing time. Therefore, but for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion). Further, the claimed limitation of “a video image compositor (VIC) engine” and “processing unit” are directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)). And the limitation of “cause the VIC engine to execute the one or more portions of the video frame transformations as a result of reassigning the subset of application clients” which is merely applying the judicial exception or abstract idea (See MPEP 2106.05(f)). (i.e., The claim does not define any particular machine to “cause” this “video frame transformation,” other than a generic machine such as the “VIC engine,” and no details what so ever on how the claimed function (i.e., video frame transformations) will occur). 

Independent claim 30 is rejected for the same reason as claims 1 and 22 above. In addition, the claim 30 further recites “A non-transitory computer readable storage medium storing thereon executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)). In addition, the limitation of “obtain first performance data associated with a video image compositor (VIC)” which is insignificant pre-solution data gathering (see MPEP § 2106.05(g)) which are well understood, routine, conventional activity in the field. The “obtain” steps are for the purpose of “communication” and “transmitting the data” and these can be reached on one of court case (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) see MPEP § 2106.05(d) II). Accordingly, a conclusion that “obtain” are well understood, routine, conventional activity is supported under Berkheimer options 2. Further, the limitation of “cause the set of processors to execute at least a portion of the video frame transformations corresponding to the subset of application clients to relieve load from the VIC” which is merely applying the judicial exception or abstract idea (See MPEP 2106.05(f)). (i.e., The claim does not define any particular machine to “cause” this “video frame transformation,” other than a generic machine such as the “VIC engine,” and no details what so ever on how the claimed function (i.e., video frame transformations) will occur).

With respect to the dependent claim 2, the claim elaborates that wherein the threshold is determined based, at least in part, on a framerate associated with the batch of frames. (“threshold is determined based, at least in part, on a framerate associated with the batch of frames” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 3, the claim elaborates that wherein the metric further comprises a value indicating a percentage of activity of the first hardware accelerator (“a value indicating a percentage of activity” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 4, the claim elaborates that wherein the first hardware accelerator further comprises a video image compositor (VIC) and the set of second hardware accelerators further comprises a graphics processing unit (GPU). (“a video image compositor” and “GPU” are directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 5, the claim elaborates that wherein the set of clients submit application programming interface (API) calls to cause the first hardware accelerator to execute the video frame transformations. (“submit application programming interface (API) calls to cause the first hardware accelerator to execute” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 6, the claim elaborates that wherein the metric further comprises an amount of time at least one of the set of clients utilizes the first hardware accelerator to perform processing of the batch of frames (“an amount of time at least one of the set of clients utilizes the first hardware accelerator to perform processing” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 7, the claim elaborates that wherein the metric further comprises a percentage of processing capability the first hardware accelerator utilized to process the batch of frames on behalf of at least one of the set of clients. (“a percentage of processing capability” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 8, the claim elaborates that wherein the metric further comprises an average amount of load generated by at least processing the batch of frames provided by at least one of the set of clients (“metric further comprises an average amount of load generated” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 9, the claim elaborates that wherein the determination is generated during an interval of time (“first determination is generated” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 10, the claim elaborates that wherein first determination is generated based at least in part on historical data (“the first determination is generated based at least in part on historical data” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 11, the claim elaborates that wherein the configuration information is generated based, at least in part, on user input. (“wherein the configuration information is generated based, at least in part, on user input” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 12, the claim elaborates that wherein the threshold is designated by a user (“wherein the threshold is designated by a user” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 13, the claim elaborates that wherein the type comprises modifying resolutions of the batch of frames, converting formats of the batch of frames, or modifying color of the batch of frames. (“modifying resolutions of the batch of frames, converting formats of the batch of frames, or modifying color of the batch of frames” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 14, the claim elaborates that wherein the metric is obtained at least from a hardware performance counter included in the first hardware accelerator (“obtained” which is insignificant pre-solution data gathering (see MPEP § 2106.05(g)) and can be reached on one of the court case (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) see MPEP § 2106.05(d) II)).

With respect to the dependent claim 15, the claim elaborates that wherein the first hardware accelerator further comprises a field-programmable gate array (FPGA) (“field-programmable gate array” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 16, the claim elaborates that wherein the memory further stores instructions that, as a result of being executed by the one or more processors, cause the system to obtain, through a system call, the metric (“obtain” which is insignificant pre-solution data gathering (see MPEP § 2106.05(g)) and can be reached on one of the court case (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) see MPEP § 2106.05(d) II)).

With respect to the dependent claim 17, the claim elaborates that wherein the set of clients further comprise a set of components of an artificial intelligence pipeline (“artificial intelligence pipeline” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 18, the claim elaborates that wherein the artificial intelligence pipeline includes one or more neural networks (“the artificial intelligence pipeline includes one or more neural networks” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 19, the claim elaborates that wherein the memory further stores instructions that, as a result of being executed by the one or more processors, cause the system to cause the first hardware accelerator to process the batch of frames by at least converting the batch of frames from a first format to a second format (“process the batch of frames by at least converting the batch of frames from a first format to a second format” as being treated as insignificant extra-solution activity and merely data manipulation (see MPEP § 2106.05(g) “Selecting a particular data source or type of data to be manipulated”).

With respect to the dependent claim 20, the claim elaborates that wherein the memory further stores instructions that, as a result of being executed by the one or more processors, cause the system to cause the first hardware accelerator to process the batch of frames by at least scaling the batch of frames (“scaling the batch of frames” as being treated as insignificant extra-solution activity and merely data manipulation (see MPEP § 2106.05(g) “Selecting a particular data source or type of data to be manipulated:”).

With respect to the dependent claim 21, the claim elaborates that wherein the memory further stores instructions that, as a result of being executed by the one or more processors, cause the system to cause the first hardware accelerator to process the batch of frames by at least modifying one or more color values associated with at least one frame of the batch of frames (“modifying one or more color values” which is insignificant extra-solution activity and merely data manipulation (see MPEP § 2106.05(g) “Selecting a particular data source or type of data to be manipulated:”).

With respect to the dependent claim 23, the claim elaborates that wherein the assignment and the reassignment of the subset of application clients are repeated one or more times (“assignment and the reassignment” are repeated one or more times” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 24, the claim elaborates that wherein the frame processing threshold is determined based at least in part on a framerate of video processing (“the frame processing threshold is determined” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 25, the claim elaborates that wherein the assigning the subset of application clients is executed using a load balancer. (“load balancer” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 26, the claim elaborates that wherein the load balancer that assigns the subset of application clients comprises a processing thread (“assigns the subset of application clients” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, “wherein the load balancer comprises a processing thread” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 27, the claim elaborates that wherein a load balancer that assigns the subset of application clients maintains a table of set of application clients (“assigns the subset of application clients” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, “maintains a table of set of application clients” as being treated as Insignificant Extra-Solution Activity (storing data) which is additionally well understood, routine, conventional activity (see MPEP § 2106.05(d)) and this can be reached on one of court case (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93)).

With respect to the dependent claim 28, the claim elaborates that wherein the table of the set of application clients includes a processing engine assigned to each client and an average time taken by the client to compute the video frame transformations (“the table of the set of application clients includes a processing engine assigned to each client and an average time taken by the client to compute video frame transformations” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 29, the claim elaborates that wherein the set of processing units comprises a graphics processing unit (GPU). (“GPU” is directed to generic computing components/functions (MPEP § 2106.05(b) merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 31, the claim elaborates that wherein the threshold is determined based, at least in part, on one or more framerates associated with the set of application clients.. (“threshold is determined” as being treated as part of abstract idea and is analogues to Mental processes, such that concept can be performed in the human mind. In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respect to the dependent claim 32, the claim elaborates that wherein the set of processors comprises a field-programmable gate array (FPGA) (“FPGA” is directed to generic computing components/functions merely applying the abstract idea (MPEP § 2106.05(f)).

With respect to the dependent claim 33, the claim elaborates that wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to maintain a table of the set of application clients of which each application client is a member (“cause the computer system to maintain a table” which is insignificant extra-solution activity and merely data storing (see MPEP § 2106.05(g)).

With respect to the dependent claim 34, the claim elaborates that wherein the table of the set of application clients includes information indicating that the set of processors is assigned to the subset of application clients, and an average time taken by the set of processors to compute video frame transformations for the subset of application clients (“the table of the set of application clients includes information” is directed to generic computing components/functions (MPEP § 2106.05(b)).

Claim Rejections - 35 USC § 103
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
	A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 9, 13 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Bernat et al (US Pub. 2018/0027062 A1) in view of Dutta (US Patent. 9,313,134 B2) and further in view of Diard (US Patent. 7,075,541 B2), Lowry et al. (US Pub. 2018/0262684 A1) and Shraer et al. (US Pub. 2017/0353536 A1).
	Bernat, Dutta, Diard and Shraer were cited in the previous Office Action.

As per claim 1, Bernat teaches the invention substantially as claimed including A system (Bernat, Fig. 12) comprising: 
	one or more processors; and memory storing instructions that, as a result of being executed by the one or more processors, cause the system to (Bernat, [0164] lines 3-6, compute device comprising one or more processors; one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors): 
	identify that a first hardware accelerator is a pre-selected hardware accelerator to execute operations corresponding to a set of clients based, at least in part, on configuration information (Bernat, [0084] lines 18-32, workload assignor 1632 includes an acceleration assignor 1634 which is configured to assign certain workloads or portions thereof to corresponding accelerators (e.g., the accelerators 1250, 1260). In doing so, the acceleration assignor 1634 may identify a type of the workload based on a profile of resource utilizations of the workload over time or based on a tag, an analysis of the computer-executable instructions within the workload (as set of clients), a header of the workload, metadata indicative of the types operations to be executed in the workload (as set of clients, see specification [0057] “a client (e.g., application, processes, or other components), or from a request from a CPU (e.g., the CPU 1272) to offload a portion of a workload onto a particular accelerator 1250, 1260 or type of accelerator (e.g., FPGA, graphics accelerator, cryptography accelerator, compression accelerator, etc.); [0058] lines 2-4, an accelerator (e.g., the accelerator 1250) is configured to execute assigned workloads; [Examiner noted: the corresponding/particular hardware accelerator is identified (as first hardware accelerator/pre-selected hardware accelerator) for assignment based on the configuration]);
	assign the set of clients to cause the first hardware accelerator to execute the operations (Bernat, [0058] lines 2-4, an accelerator (e.g., the accelerator 1250) is configured to execute assigned workloads; [0084] lines 28-32, metadata indicative of the types operations to be executed in the workload (as include set of clients), or from a request from a CPU (e.g., the CPU 1272) to offload a portion of a workload onto a particular accelerator 1250, 1260 or type of accelerator (e.g., FPGA, graphics accelerator, cryptography accelerator, compression accelerator, etc.), and assign the workload to a corresponding accelerator 1250, 1260 for execution); 
	generate a determination that a metric associated with each of the set of clients exceeds a threshold (Bernat, Fig. 18, 1726 execute the workload with corresponding logic portion of the accelerator, 1734, monitor utilization of the resources (as metric) associated with the resource utilization threshold, 1740 threshold satisfied? [0085] lines 1-10, the resource utilization analyzer, in the illustrative embodiment, is configured to receive the resource utilization data 1604 (as metric) and determine whether the present utilization of resources among the managed nodes 1230, such as the utilization by logic portions (e.g., the logic portions 1252, 1254) of shared resources within an accelerator (e.g., the accelerator 1250) is in violation of resource utilization threshold indicated in the resource utilization threshold data 1606; [0089] line 14, resource utilizations that exceed the corresponding resource utilization thresholds).
	cause the first hardware accelerator to execute at least one of the operations corresponding to the subset of clients (Bernat, Fig. 18, 1740; Fig. 19, 1904; [0084] lines 18-32, workload assignor 1632 includes an acceleration assignor 1634 which is configured to assign certain workloads or portions thereof to corresponding accelerators (e.g., the accelerators 1250, 1260). In doing so, the acceleration assignor 1634 may identify a type of the workload based on a profile of resource utilizations of the workload over time or based on a tag, an analysis of the computer-executable instructions within the workload (as set of clients), a header of the workload, metadata indicative of the types operations to be executed in the workload or from a request from a CPU (e.g., the CPU 1272) to offload a portion of a workload onto a particular accelerator 1250, 1260 or type of accelerator (e.g., FPGA, graphics accelerator, cryptography accelerator, compression accelerator, etc.); also see [0002] accelerating the execution of a set of operations in a workload; [0091] determine to manage accelerator resource utilizations if at least one accelerator having the ability to execute workloads on separate logic portions 1252, 1254 is present in the system 1210…assigns a workload to be accelerated by a logic portions (e.g., the logic portion 1252) of an accelerator (e.g., the accelerator 1250)…the orchestrator server 1220 may make the determination to assign a workload in response to identifying the workload or a portion of the workload (as include the subset) as being amenable to a particular type of acceleration for which an accelerator (e.g., the accelerator 1250)).

	Bernat fails to specifically teach when execute the operations, the operations are video frame transformations, wherein the configuration information comprises a mapping between the video frame transformations and hardware accelerators, execute the video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames.

	However, Dutta teaches when execute the operations, the operations are transformations (Dutta, Col 1, lines 56-60, An example method for leveraging hardware accelerators for scalable distributed streams in a network environment is provided and includes allocating a plurality of hardware accelerators to a corresponding plurality of bolts of a distributed stream in a network; Col 2, lines 11-14, the “bolt” implements processing logic to process (e.g., run functions, filter tuples, (as transformations) perform stream aggregations, talk to databases, etc.) the data elements in the stream);
	wherein the configuration information comprises a mapping between the transformations and hardware accelerators, and execute the transformations (Dutta, Fig. 1, 16(1)-(N) hardware accelerators, 20(1)-(N) Bolt (i.e., which including the process of filter tuples (i.e., as transformations); Fig. 2, 44 input data to hardware accelerator for execution to generate output data 46; Col 4, lines 11-25, broker 14 may receive capability information from bolts 20(1)-20(N) and hardware accelerators 16(1)-16(N) and map hardware accelerators 16(1)-16(N) to corresponding bolts 20(1)-20(N). The capability information from bolts 20(1)-20(N) may include respective locations in distributed streams 17(1)-17(M) and identities; the capability information from hardware accelerators 16(1)-16(N) may include respective network locations (e.g., Internet Protocol (IP) address), and capabilities (e.g., RegEx processor, graphics processor, etc.). The mapping may be formatted into any suitable table, spreadsheet, memory mapping, etc. as suitable and based on particular needs. According to various embodiments, the mapping may be used to route the data elements of distributed streams 17(1)-17(M) to appropriate hardware accelerators 16(1)-16(N) for stream processing; Col 8, lines 23-27,  distributed stream 17 may be processed by a set of computing devices called worker nodes 54. According to various embodiments, worker nodes 54 may include hardware accelerators 16(1)-16(N) and/or other computing devices).  
	
	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat with Dutta because Dutta’s teaching of mapping the accelerators with the corresponding transformation operations for processing would have provided Bernat’s system with the advantage and capability to allow the system to easily assigning the operations to the corresponding hardware accelerators based on the mapping in order to improving the system performance and processing speed. 

	Although Bernat and Dutta teach processing transformations in the accelerator, Bernat and Dutta fail to specifically teach the transformation is video frame transformations, execute the video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames, wherein the metric is computed based, at least in part, on a type of the video frame transformations, assign a subset of clients of the set of clients to a set of second hardware accelerators to cause the metric associated with each of the set of clients to be below the threshold in response to the determination; as a result of assigning the subset of clients, reassign at least one of the subset of clients to the first hardware accelerator to cause the metric associated with each of the set of clients to remain below the threshold.

	However, Diard teaches transformation is frame transformations, execute the frame transformations to a batch of frames to modify one or more individual frames in the batch of frames. (Diard, Col 1, lines 24-38, Graphics processing subsystems are designed to render realistic animated images in real time, e.g., at 30 or more frames per second. These subsystems are most often implemented on expansion cards that can be inserted into appropriately configured slots on a motherboard of a computer system and generally include one or more dedicated graphics processing units (GPUs) and dedicated graphics memory. The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.) and write the results to the graphics memory. The GPU is a "slave" processor that operates in response to commands received from a driver program executing on a "master" processor, generally the central processing unit (CPU) of the system; also see Col 2, lines 66-67, The graphics processors are instructed to render a number of frames, wherein the first and second graphics process; Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending);
	wherein the metric is computed based, at least in part, on a type of the frame transformations (Diard, Col 1, lines 31-38, The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.) and write the results to the graphics memory. The GPU is a "slave" processor that operates in response to commands received from a driver program executing on a "master" processor; Col 3, lines 31-35, The load coefficient may be, e.g., an average of the recorded numeric values that can be compared to an arithmetic mean of the numeric values of the processor identifiers in order to determine whether an imbalance exists; Col 11, lines 18-27, determined whether the load coefficient exceeds a "high" threshold. The high threshold is preselected and may be exactly 0.5 or a somewhat higher value (e.g., 0.55 or 0.6). If the load coefficient exceeds the high threshold, then the loads are adjusted at step 512 by moving the boundary line P in FIG. 2 down by a preset amount (e.g., one line, five lines, ten lines). This reduces the fraction of the display area that is rendered by GPU-1, which will tend to reduce the load on GPU-1 and increase the load on GPU-0; Col 16, lines 8-23, In some multi-card embodiments used to render scenes in which foreground regions (most often but not always at the bottom of the display area) are consistently more complex than background regions, a performance advantage can be gained by assigning GPU 914a to process the background region of the scene and assigning GPU 914b to process the foreground region. For example, in FIG. 2, suppose that the foreground appears toward the bottom of display area 200. In that case, GPU 914a would be assigned to render top region 202 while GPU 914b would be assigned to render bottom region 204. The higher complexity of the foreground (bottom) region tends to increase the rendering time of GPU 914b. In response, the load-balancing processes described herein will tend to move the boundary line P toward the bottom of the display area [Examiner noted: the metric (i.e., load coefficient) is computed based, at least in part, on a type of the transformations (i.e., render scenes in which foreground regions are more complex than the background regions (as different types of transformations)]);
	assign a subset of clients of the set of clients to a second hardware accelerator to cause the metric associated with each of the set of clients to be below the threshold in response to the determination (Diard, Fig. 5, 510 does average exceed high threshold, YES to 512 move boundary line down by preselected amount (as to cause the metric (i.e., load coefficient) below the threshold); Col 11, lines 11-27, determined whether the load coefficient exceeds a "high" threshold. The high threshold is preselected and may be exactly 0.5 or a somewhat higher value (e.g., 0.55 or 0.6). If the load coefficient exceeds the high threshold, then the loads are adjusted at step 512 by moving the boundary line P in FIG. 2 down by a preset amount (e.g., one line, five lines, ten lines) (as including a subset of clients of the set of clients). This reduces the fraction of the display area that is rendered by GPU-1, which will tend to reduce the load on GPU-1 and increase the load on GPU-0; see Fig. 2, 202 and 204, the load is adjusted between two GPUs; also see Col 10, lines 48-55, graphics driver may balance the load after some number (Q) of frames (as set of processes/frame renderings; i.e., set of clients); where Q might be, e.g., 1, 2, 5, 10, 20, etc…Alternatively, load balancing may be performed at regular time intervals (e.g., once per second) or according to any other criteria)
	as a result of assigning the subset of clients, reassign at least one of the subset of clients to the first hardware accelerator to cause the metric associated with each of the set of clients to remain below the threshold (Diard, Abstract, lines 9-12, re-partitioned to increase a size of the portion assigned to the less heavily loaded processor and to decrease a size of the portion assigned to the more heavily loaded processor; Fig. 2, 202, 204, P and P’; Fig. 5, 510 does average exceed high threshold? NO to 514 is average below low threshold? YES to 516, move boundary line up by preselected amount (as reassigning to cause the metric associated with each of the set of clients to remain below the threshold); Col 8, lines 55-67, rendering commands 308 and associated rendering data for the next frame F1…At this point, the clip rectangles for each GPU may be modified by the graphics driver program based on the feedback data received in response to the various write notifier commands (e.g., commands 306, 310). For example, where the display area is divided as shown in FIG. 2, the value of P may be modified (e.g., to P') in response to feedback data: if the GPU that processes top portion 202 tends to finish its frames first, the value of P is increased, and if the GPU that processes bottom portion 204 tends to finish first, the value of P is decreased. Specific embodiments of re-partitioning a display area in response to feedback data are described below; Col 11, lines 18-27, at step 510 it is determined whether the load coefficient exceeds a "high" threshold… at step 514, it is determined whether the load coefficient is less than a "low" threshold. The low threshold is predefined and may be exactly 0.5 or a somewhat lower value (e.g., 0.45 or 0.4). If the load coefficient is below the low threshold, then the loads are adjusted at step 516 by moving the boundary line P in FIG. 2 up by a preset amount (e.g., one line, five lines, ten lines) (as reassign at least one of the subset of clients based on the dynamic load balancing between GPUs; also see Col 10, lines 48-55, graphics driver may balance the load after some number (Q) of frames (as set of processes/frame renderings).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat and Dutta with Diard because Diard’s teaching of dynamically adjusting the processing load among the GPUs based on the feedback data would have provided Bernat and Dutta’s system with the advantage and capability to allow the system to efficiently utilizing the resources based on the load which improving the processing speed and system efficiency.

	Although Bernat, Dutta and Diard teach the frame transformations, Bernat, Dutta and Diard fail to specifically teach the frame transformations is video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames.

	However, Lowry teaches the frame transformations is video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames (Lowry, [0041] lines 3-9, Embodiments provide that a NVIDIA® K6000 (NVIDIA® is a registered trademark of NVIDIA) may be used as the GPU. Embodiments provide this as the first stage in the video pipeline (VP). In the GPU (14), video frames may be debayered, as well as other necessary video transformations, such as motion compensation, white balance, black level correction, etc.,).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta and Diard with Lowry because Lowry’s teaching of video frame transformations would have provided Bernat, Dutta and Diard’s system with the advantage and capability to allow the system to processing the video and image frames transformations (i.e., augmented reality) which improving the system performance and efficiency (see Lowry, [0004]).

	Although Bernat, Dutta, Diard and Lowry teach assign a subset of clients of the set of clients to the second hardware accelerator, Bernat, Dutta, Diard and Lowry fail to specifically teach when assign a subset of clients of the set of clients, it is assign to a set of second hardware accelerators.

	However, Shraer teaches when assign a subset of clients of the set of clients, it is assign to a set of second hardware accelerators (Shraer, [0016] lines 3-9, each set of partitions 132 may include multiple partitions...All of the partitions of all of the data sets 132 form the data set for the application job that is executed by the application system; [0033] lines 8-12, move operations can be considered for worker computers that are in a top subset of worker computers with high load measures, e.g., the top x %, or all worker computers with loads above a threshold load measure; [0062] lines 10-12,  a partition may be migrated from one worker computer to two or more other worker computers, (as set of computers/hardware accelerators, please note: second hardware accelerator was taught by Bernat, Dutta and Diard).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard and Lowry with Shraer because Shraer’s teaching of assigning/migrating the operation/partition to two or more other worker computers for processing would have provided Bernat, Dutta, Diard and Lowry’s system with the advantage and capability to allow the system to re-balancing the load among different computers/accelerators in order to efficiently utilizing the resources which improving the system efficiency and performance.

As per claim 2, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Diard further teaches wherein the threshold is determined based, at least in part, on a framerate associated with the batch of frames (Diard, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; Col 11, lines 7-17, The load coefficient is used to determine whether an adjustment to the clip rectangles for the GPUs needs to be made. If the GPUs are equally loaded, the likelihood of either GPU finishing a frame first is about 50%, and the average value over a suitable number of frames (e.g., 20) will be about 0.5 if identifier values of 0 and 1 are used. An average value in excess of 0.5 indicates that GPU-1 (which renders the bottom portion of the image) is more heavily loaded than GPU-0, and an average value below 0.5 indicates that GPU-0 (which renders the top portion of the image) is more heavily loaded than GPU-1; also see Col 12, lines 30-31, 30 frames are rendered per second). 

As per claim 9, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat further teaches wherein the determination is generated during an interval of time (Bernat, [0002] line 29, execution time of the workload; [0058] lines 7-9, during execution of the workloads (as during an interval of time; i.e., time for execution of the workload), the accelerator 1250 is to monitor the actual utilization of the resources allocated to the logic portions 1252, 1254).

As per claim 13, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Diard further teaches wherein the type comprises modifying resolutions of the batch of frames, converting formats of the batch of frames, or modifying color of the batch of frames (Diard, Col 5, lines 36-40, These functions can include, for example, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending, and so on).

As per claim 15, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat further teaches wherein the first hardware accelerator further comprises a field-programmable gate array (FPGA) (Bernat, [0002] lines 1-2,  Typical architectures for accelerator devices such as field programmable gate arrays (FPGAs)).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Fawcett (US Pub. 2020/0225989 A1).
	Fawcett was cited in the previous Office Action.

As per claim 3, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Diard further teaches wherein the metric further comprises a value indicating a level of activity of the first hardware accelerator (Diard, Col 11, lines 11-27, determined whether the load coefficient exceeds a "high" threshold (as load level activity). The high threshold is preselected and may be exactly 0.5 or a somewhat higher value (e.g., 0.55 or 0.6). If the load coefficient exceeds the high threshold, then the loads are adjusted at step 512 by moving the boundary line P in FIG. 2 down by a preset amount (e.g., one line, five lines, ten lines). This reduces the fraction of the display area that is rendered by GPU-1, which will tend to reduce the load on GPU-1 and increase the load on GPU-0; see Fig. 2, 202 and 204, the load is adjusted between two GPUs; please notes: first hardware accelerator was taught by Bernat).

	Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the first metric further comprises a value indicating a percentage of activity.

	However, Fawcett teaches wherein the first metric further comprises a value indicating a percentage of activity (Fawcett, lines 1-4, A load metric is a measurement of the current usage level of a resource relative to the resource's total capacity. For example, a load metric of 75 percent means that that the resource is at 75 percent of its total capacity).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Fawcett because Fawcett’s teaching of load metric that indicating the percentage of load capacity would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to easily identifying the current capacity of the load which enable the system to allocating the resource based on the load and improving the system performance.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Takemoto et al. (US Patent. 5,347,622).
	Takemoto was cited in the previous Office Action.

As per claim 4, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Diard further teaches the set of second hardware accelerators further comprises a graphics processing unit (GPU) (Diard, Fig. 6, GPU 1, GPU 2).

	Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the first hardware accelerator further comprises a video image compositor (VIC).

	However, Takemoto teaches wherein the first hardware accelerator further comprises a video image compositor (VIC) (Takemoto, Col 2, lines 17-20, A first plurality of crosspoint switches connect the plurality of digital video signal inputs to the key processing subsystem and to the video image compositor).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Takemoto because Takemoto’s teaching of video image compositor would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to processing the digital video images which improving the system performance and efficiency.  

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Da Silva et al. (US Pub. 2022/0156639 A1).
	Silva was cited in the previous Office Action.

As per claim 5, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Dutta teaches the first hardware accelerator to execute the frame transformations (Dutta, Fig. 1, 16(1)-(N) hardware accelerators, 20(1)-(N) Bolt (i.e., which including the process of filter tuples (i.e., as transformations); Fig. 2, 44 input data to hardware accelerator for execution to generate output data 46; Col 4, lines 11-25, broker 14 may receive capability information from bolts 20(1)-20(N) and hardware accelerators 16(1)-16(N) and map hardware accelerators 16(1)-16(N) to corresponding bolts 20(1)-20(N). The capability information from bolts 20(1)-20(N) may include respective locations in distributed streams 17(1)-17(M) and identities; the capability information from hardware accelerators 16(1)-16(N) may include respective network locations (e.g., Internet Protocol (IP) address), and capabilities (e.g., RegEx processor, graphics processor, etc.). The mapping may be formatted into any suitable table, spreadsheet, memory mapping, etc. as suitable and based on particular needs. According to various embodiments, the mapping may be used to route the data elements of distributed streams 17(1)-17(M) to appropriate hardware accelerators 16(1)-16(N) for stream processing; Col 8, lines 23-27, distributed stream 17 may be processed by a set of computing devices called worker nodes 54. According to various embodiments, worker nodes 54 may include hardware accelerators 16(1)-16(N) and/or other computing devices; also see Col 2, lines 66-67, The graphics processors are instructed to render a number of frames, wherein the first and second graphics process; Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending).  In addition, Lowry teaches the frame transformations is video frame transformations (Lowry, [0041] lines 3-9, Embodiments provide that a NVIDIA® K6000 (NVIDIA® is a registered trademark of NVIDIA) may be used as the GPU. Embodiments provide this as the first stage in the video pipeline (VP). In the GPU (14), video frames may be debayered, as well as other necessary video transformations, such as motion compensation, white balance, black level correction, etc).

	Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the set of clients submit one or more application programming interface (API) calls to cause the first hardware accelerator to execute.

	However, Silva teaches wherein the set of clients submit one or more application programming interface (API) calls to cause the first hardware accelerator to execute (Silva, [0052] lines 4-20, client devices 328 may send project requests that request a service and/or API for machine learning models 336. The apparatus 302 may receive the project requests. The processor 304 may execute the director instructions 314 to implement a director mechanism. The project requests may be provided to the director mechanism. The director mechanism may predict resource instances 320 and/or processing resources to be used for projects and for triggering target resource instances 320. In some examples, the resource instances 320 may be containers, virtual machines, and/or physical machines. In some examples, a resource instance 320 may include NVM 334 and/or heterogeneous processors (e.g., CPUs 322, GPUs 324 324, and/or TPUs 330). The processors (e.g., CPUs 322, GPUs 324, and/or TPUs 330) may be capable of processing machine learning model processing workloads).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Silva because Silva’s teaching of set of clients that submit the API requests/calls that causing the GPUs to processing the operations would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the user to easily controlling an instructing the hardware accelerators for processing the operations based on the API calls which improving the user experience and system performance. 

Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Ben ZEEV et al. (US Pub. 2017/0132046 A1; hereafter: ZEEV).
	ZEEV was cited in the previous Office Action.

As per claim 6, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat teaches the first hardware accelerator (Bernat, [0002] lines 1-2, Typical architectures for accelerator devices such as field programmable gate arrays (FPGAs)). In addition, Diard teaches perform processing of the batch of frames (Diard, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; Col 11, lines 7-17, The load coefficient is used to determine whether an adjustment to the clip rectangles for the GPUs needs to be made. If the GPUs are equally loaded, the likelihood of either GPU finishing a frame first is about 50%, and the average value over a suitable number of frames (e.g., 20) will be about 0.5 if identifier values of 0 and 1 are used. An average value in excess of 0.5 indicates that GPU-1 (which renders the bottom portion of the image) is more heavily loaded than GPU-0, and an average value below 0.5 indicates that GPU-0 (which renders the top portion of the image) is more heavily loaded than GPU-1; also see Col 12, lines 30-31, 30 frames are rendered per second).

	Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the metric further comprises an amount of time at least one of the set of clients utilizes the first hardware accelerator to perform processing of the batch of frames.

	However, ZEEV teaches wherein the metric further comprises an amount of time at least one of the set of clients utilizes the first hardware accelerator to perform processing of the batch of frames (ZEEV, [0027] lines 1-7, Information related to usage of the resource for an individual time interval may comprise, for example, a number of requests from the tenant for the resource, a percentage of processing capability of the resource used by the tenant, an amount of time of the time interval during which processing capability was used by the resource, and/or other metric to measure usage of the resource by the tenant during the time interval).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with ZEEV because ZEEV’s teaching of the metric data including the amount of time used for processing would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to easily determining the processing time needed for task processing which improving the system performance and efficiency. 

As per claim 7, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Diard teaches process the batch of frames on behalf of at least one of the set of clients (Diard, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; Col 11, lines 7-17, The load coefficient is used to determine whether an adjustment to the clip rectangles for the GPUs needs to be made. If the GPUs are equally loaded, the likelihood of either GPU finishing a frame first is about 50%, and the average value over a suitable number of frames (e.g., 20) will be about 0.5 if identifier values of 0 and 1 are used. An average value in excess of 0.5 indicates that GPU-1 (which renders the bottom portion of the image) is more heavily loaded than GPU-0, and an average value below 0.5 indicates that GPU-0 (which renders the top portion of the image) is more heavily loaded than GPU-1; also see Col 12, lines 30-31, 30 frames are rendered per second). In addition, Bernat teaches the first hardware accelerator (Bernat, [0002] lines 1-2,  Typical architectures for accelerator devices such as field programmable gate arrays (FPGAs)).

	Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the metric further comprises a percentage of processing capability the first hardware accelerator utilized to processing.

	However, ZEEV teaches wherein the metric further comprises a percentage of processing capability the first hardware accelerator utilized to processing (ZEEV, [0027] lines 1-7, Information related to usage of the resource for an individual time interval may comprise, for example, a number of requests from the tenant for the resource, a percentage of processing capability of the resource used by the tenant, an amount of time of the time interval during which processing capability was used by the resource, and/or other metric to measure usage of the resource by the tenant during the time interval).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with ZEEV because ZEEV’s teaching of the metric data including the percentage of processing capability would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to easily determining the processing capability of the device for task processing which improving the system performance and efficiency. 

Claims 8 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Ueda (US Pub. 2012/0254443 A1).
	Ueda was cited in the previous Office Action.

As per claim 8, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Diard teaches the batch of frames provided by at least one of the set of clients (Diard, Fig. 1, 116b, 123a, 123b, 122b; Col 6, lines 41-43,  a graphics driver program (or other program) executing on CPU 102 delivers rendering commands and associated data for processing by GPUs 114a, 114b; Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; Col 11, lines 7-17, The load coefficient is used to determine whether an adjustment to the clip rectangles for the GPUs needs to be made. If the GPUs are equally loaded, the likelihood of either GPU finishing a frame first is about 50%, and the average value over a suitable number of frames (e.g., 20) will be about 0.5 if identifier values of 0 and 1 are used. An average value in excess of 0.5 indicates that GPU-1 (which renders the bottom portion of the image) is more heavily loaded than GPU-0, and an average value below 0.5 indicates that GPU-0 (which renders the top portion of the image) is more heavily loaded than GPU-1; also see Col 12, lines 30-31, 30 frames are rendered per second).

	Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the metric further comprises an average amount of load generated by at least processing.

	However, Ueda teaches wherein the metric further comprises an average amount of load generated by at least processing (Ueda, [0043] lines 1-7, the transfer condition for transferring to the alternate server may include threshold conditions for various metrics of the instances in the load distribution target server group for the designated load balancer 110, such as the average CPU utilization rate, the average memory utilization rate, the average degree of I/O utilization).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Ueda because Ueda’s teaching of the metric data including the average amount of load/utilization would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to easily determining the average processing load for task processing in order to distributing the tasks based on the average processing load which improving the system performance and efficiency. 

As per claim 10, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein first determination is generated based at least in part on historical data.

	However, Ueda teaches wherein first determination is generated based at least in part on historical data (Ueda, [0043] lines 1-7, the transfer condition for transferring to the alternate server may include threshold conditions for various metrics of the instances in the load distribution target server group for the designated load balancer 110, such as the average CPU utilization rate, the average memory utilization rate, the average degree of I/O utilization; also see [0060] in addition to the target server size based on the demands quantified by the load balancer, a predicted server size based on demand prediction using history information is determined. If the demands quantified by the load balancer are underestimated compared with the demands predicted with the history information, the server size based on the demand prediction may be selected).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Ueda because Ueda’s teaching of the utilizing the historical data for determining the target server for processing would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to predicting the processing load based on the historical data for task assignment which improving the system performance and efficiency. 

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of IZENBERG et al. (US Pub. 2017/0195173 A1).
	IZENBERG was cited in the previous Office Action.

As per claim 11, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat, Dutta, Diard, Lowry and Shraer fail to explicitly teach wherein the configuration information is generated based, at least in part, on user input.

	However, IZENBERG teaches wherein the configuration information is generated based, at least in part, on user input (IZENBERG, [0066] lines 5-10, the resource manager may recommend that the pre-existing instance 1053A be used for the second FPGA-utilizing application (assuming that the same FPGA can be used for the second application). In other embodiments, or based on the preferences of the client (as configuration is generated based on the user input), a new instance such as 1053B may be launched for the second application; [0067] in their programmatic interactions with the resource manager, and the resource manager may take the appropriate resource allocation choices based on the client's preferences)

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with IZENBERG because IZENBERG’s teaching of providing/generating the resource allocation preference based on the user input would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to allocating the different resource based on the user selection and preferences which improving the user experience and system performance. 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Bernat et al (US Pub. 2020/0409748 A1; hereafter Bernat ‘748’).
	Bernat ‘748’ was cited in the previous Office Action.

As per claim 12, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the threshold is designated by a user.

	However, Bernat’748’ teaches wherein the threshold is designated by a user (Bernat, [0111] one or more power consumption thresholds specified in a policy (e.g., a SLA, QoS requirements, load balancing policies, application specifications, etc.); [0121] lines 1-4, as a function of a policy (e.g., a SLA, QoS requirements, a load balancing policy, user-defined specifications in an application associated with the workload, etc.)).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Bernat’748’ because Bernat’748’’s teaching of resource utilization threshold that is defined by user would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the user to easily controlling the resource limit for running the workload which improving the user experience and system performance. 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Bonebakker et al. (US Patent. 8,688,430 B1).
	Bonebakker was cited in the previous Office Action.

As per claim 14, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the metric is obtained at least from a hardware performance counter included in the first hardware accelerator.

	However, Bonebakker teaches wherein the metric is obtained from a hardware performance counter included in the first hardware accelerator (Bonebakker, Col 3, lines 29-31, one or more system metrics can be sampled values obtained from hardware counters within processor 102; please note: first hardware accelerator was taught by Bernat).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Bonebakker because Bonebakker’s teaching of obtaining metrics from the hardware counters would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to obtaining the metrics from counters which improving the system performance and efficiency. 

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Joglekar et al. (US Pub. 2021/0374027 A1).
	Joglekar was cited in the previous Office Action.

As per claim 16, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the memory further stores instructions that, as a result of being executed by the one or more processors, cause the system to obtain, through a system call, the metric.

	However, Joglekar teaches wherein the memory further stores instructions that, as a result of being executed by the one or more processors, cause the system to obtain, through a system call, the metric (Joglekar, [0082] the metrics collecting agents 214 can retrieve and interpret metrics and their associated metric values from system calls made by the monitored services 210 and 212).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Joglekar because Joglekar’s teaching of obtaining metrics from the system call would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to obtaining the metrics from system calls which improving the system performance and efficiency. 

Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Alvelda, VII et al. (US Pub. 2020/0106829 A1).
	Alvelda was cited in the previous Office Action.

As per claim 17, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach wherein the set of clients further comprise a set of components of an artificial intelligence pipeline.

	However, Alvelda teaches wherein the set of clients further comprise a set of components of an artificial intelligence pipeline (Alvelda, Abstract, lines 1-7, methods suitable for partitioning processes between devices. In particular, the present invention relates to partitioning client devices and server devices based on the performance and available resources of the respective devices to efficiently execute artificial intelligent, machine learning, and other processes; [0028] lines 1-3, The present disclosure makes use of a code base and library of executable files where machine learning and AI algorithms; [0030] lines 3-7, Specifically, FIG. 1 depicts an illustrative system 100 for partitioning data processing pipelines (e.g., for AI and machine learning algorithms) over a plurality of client devices).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Alvelda because Alvelda’s teaching of partitioning data machine learning processing pipelines over a plurality of client devices would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to processing the AI driven applications which improving the system performance and efficiency. 

As per claim 18, Bernat, Dutta, Diard, Lowry, Shraer and Alvelda teach the invention according to claim 17 above. Alvelda further teaches wherein the artificial intelligence pipeline includes one or more neural networks (Alvelda, [0028] lines 1-3, The present disclosure makes use of a code base and library of executable files where machine learning and AI algorithms; [0030] lines 3-7, Specifically, FIG. 1 depicts an illustrative system 100 for partitioning data processing pipelines (e.g., for AI and machine learning algorithms) over a plurality of client devices; [0027] encrypted machine-readable data in complex distributed data stores such as neural weight matrices).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Galluzzi et al. (US Patent. 11,019,298 B2).
	Galluzzi was cited in the previous Office Action.

As per claim 19, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat teaches cause the first hardware accelerator to process the operations (Bernat, [0084] lines 28-32, metadata indicative of the types operations to be executed in the workload (as include operations), or from a request from a CPU (e.g., the CPU 1272) to offload a portion of a workload onto a particular accelerator 1250, 1260 or type of accelerator (e.g., FPGA, graphics accelerator, cryptography accelerator, compression accelerator, etc.), and assign the workload; Fig. 18, 1726 execute the workload with corresponding logic portion of the accelerator). In addition, Diard teaches when processing the operations, it is processing the batch of frames (Diard, Col 1, lines 24-38, Graphics processing subsystems are designed to render realistic animated images in real time, e.g., at 30 or more frames per second. These subsystems are most often implemented on expansion cards that can be inserted into appropriately configured slots on a motherboard of a computer system and generally include one or more dedicated graphics processing units (GPUs) and dedicated graphics memory. The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.) and write the results to the graphics memory. The GPU is a "slave" processor that operates in response to commands received from a driver program executing on a "master" processor, generally the central processing unit (CPU) of the system)).

	Bernat, Dutta, Diard, Lowry and Shraer fail to explicitly teach when processing batch of frames it is by at least converting the batch of frames from a first format to a second format.

	However, Galluzzi teaches when processing batch of frames it is by at least converting the batch of frames from a first format to a second format (Galluzzi, claim 1, grabbing the video frames in the first format and converting the video frames to a second format different from the first format).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Galluzzi because Galluzzi’s teaching of converting the first format of video frames to second format would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to processing the different format of video frames which improving the system performance and efficiency. 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Wu et al. (US Pub. 2017/0064313 A1).
	Wu was cited in the previous Office Action. 

As per claim 20, Bernat, Dutta, Diard, Lowry and Shraer teach the invention according to claim 1 above. Bernat teaches cause the first hardware accelerator to process operations (Bernat, [0084] lines 28-32, metadata indicative of the types operations to be executed in the workload (as include operations), or from a request from a CPU (e.g., the CPU 1272) to offload a portion of a workload onto a particular accelerator 1250, 1260 or type of accelerator (e.g., FPGA, graphics accelerator, cryptography accelerator, compression accelerator, etc.), and assign the workload; Fig. 18, 1726 execute the workload with corresponding logic portion of the accelerator). In addition, Diard teaches when processing the operations, it is processing the batch of frames (Diard, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; also see Col 12, lines 30-31, 30 frames are rendered per second).

	Bernat, Dutta, Diard, Lowry and Shraer fail to specifically teach when processing the batch of frames, it is by at least scaling the batch of frames.

	However, Wu teaches when processing the batch of frames, it is by at least scaling the batch of frames (Wu, [0185] The accelerator scales the reference frames (2110), producing scaled versions (2130) of the reference frames at the second spatial resolution, which matches the spatial resolution of the current frame (2120). The scaled versions (2130) of the reference frames are labeled 088′, 089′, and 090′ for the sake of illustration, but the labelsneed not correspond to values tracked by the accelerator. Then, the accelerator performs motion compensation for blocks of the current frame (2120) using motion-compensated prediction values from the scaled versions (2130) of the reference frames).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Wu because Wu’s teaching of scaling the batch of frames for processing would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to improve the frame processing speed which improving the system performance and efficiency. 

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Diard, Lowry and Shraer, as applied to claim 1 above, and further in view of Saulters (US Pub. 2014/0125670 A1).
	Saulters was cited in the previous Office Action.

As per claim 21, Bernat, Dutta, Diard and Shraer teach the invention according to claim 1 above. Bernat teaches cause the first hardware accelerator to process operations (Bernat, [0084] lines 28-32, metadata indicative of the types operations to be executed in the workload, or from a request from a CPU (e.g., the CPU 1272) to offload a portion of a workload onto a particular accelerator 1250, 1260 or type of accelerator (e.g., FPGA, graphics accelerator, cryptography accelerator, compression accelerator, etc.), and assign the workload; Fig. 18, 1726 execute the workload with corresponding logic portion of the accelerator)). In addition, Diard teaches when processing the operations, it is processing the batch of frames (Diard, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; also see Col 12, lines 30-31, 30 frames are rendered per second; Col 2, lines 66-67, The graphics processors are instructed to render a number of frames, wherein the first and second graphics process; Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending).

	Bernat, Dutta, Diard, Lowry and Shraer fail to explicitly teach when processing, it is by at least modifying one or more color values associated with at least one frame of the batch of frames.

	However, Saulters teaches when processing, it is by at least modifying one or more color values associated with at least one frame of the batch of frames (Saulters, [0046] lines 1-8, Graphics driver 140 loads a shader to GPU 122, so as to enable GPU 122 to adjust color values of one or more sample areas on the current rendered frame, based on at least the values of the frame transformation matrix for the current rendered frame and the previous rendered frame (as discussed in Step S01) and the depth values of the current rendered frame (as discussed in Step S02), whereby a motion blur effect is created or approximated in the current rendered frame. Also see [0049] With such information, GPU 122 can determine, for example, the locations and sizes of sample areas 252 and 254 on the rendered frame 220 and adjust the color values of the sample areas 252).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Diard, Lowry and Shraer with Saulters because Saulters’s teaching of adjusting color values of one or more sample areas on the current rendered frame would have provided Bernat, Dutta, Diard, Lowry and Shraer’s system with the advantage and capability to allow the system to adjusting the color values of the frame in order to improving the system performance and efficiency. 

Claims 22-25 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Diard (US Patent. 7,075,541 B2) in view of Bernat et al (US Pub. 2018/0027062 A1) and further in view of Dutta (US Patent. 9,313,134 B2), Takemoto et al. (US Patent 5,347,622), Lowry et al. (US Pub. 2018/0262684 A1), Desai et al. (US Pub. 2011/0078318 A1), and Shraer et al. (US Pub. 2017/0353536 A1).
	Diard, Bernat, Dutta, Takemoto, Desai and Shraer were cited in the previous Office Action.

As per claim 22, Diard teaches the invention substantially as claimed including A method comprising: 
	assigning a set of application clients to a GPU to execute frame transformations corresponding to the set of application clients (Diard, Col 1, lines 58-60, For example, each GPU may be instructed to render pixel data for a different portion of the displayable image, such as a number of lines of a raster-based display (as assigning the set of application clients to GPU for processing; see specs [0057] “a client (e.g., application, processes, or other components”), Col 1, lines 24-38, Graphics processing subsystems are designed to render realistic animated images in real time, e.g., at 30 or more frames per second. These subsystems are most often implemented on expansion cards that can be inserted into appropriately configured slots on a motherboard of a computer system and generally include one or more dedicated graphics processing units (GPUs) and dedicated graphics memory. The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.) and write the results to the graphics memory. The GPU is a "slave" processor that operates in response to commands received from a driver program executing on a "master" processor, generally the central processing unit (CPU) of the system; Col 12, lines 30-31, 30 frames are rendered per second; Col 2, lines 66-67, The graphics processors are instructed to render a number of frames, wherein the first and second graphics process; Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending); 
	determining that time/load used by the GPU to execute the frame transformations for the set of application clients exceeds a frame processing threshold based, at least in part, on a type of frame transformations (Diard, Col 13, lines 12-24, different feedback data may be used instead of or in addition to the GPU identifiers described above. For example, instead of providing one feedback array in system memory, with both GPUs writing feedback data to the same location for a given frame, each GPU may write to a corresponding entry of a different feedback array, and the feedback data may include timing information, e.g., a timestamp indicating when each GPU finished a particular frame. In this embodiment, the graphics driver is configured to use the timing information to determine whether one GPU is consistently using more time per frame than another and adjust the clip rectangles accordingly to balance the load; also see Col 11, lines 17-25, determined whether the load coefficient exceeds a "high" threshold. The high threshold is preselected and may be exactly 0.5 or a somewhat higher value (e.g., 0.55 or 0.6). If the load coefficient exceeds the high threshold, then the loads are adjusted at step 512 by moving the boundary line P in FIG. 2 down by a preset amount (e.g., one line, five lines, ten lines) (as determine the time/related load used exceed a frame processing threshold); Col 16, lines 8-23, In some multi-card embodiments used to render scenes in which foreground regions (most often but not always at the bottom of the display area) are consistently more complex than background regions, a performance advantage can be gained by assigning GPU 914a to process the background region of the scene and assigning GPU 914b to process the foreground region. For example, in FIG. 2, suppose that the foreground appears toward the bottom of display area 200. In that case, GPU 914a would be assigned to render top region 202 while GPU 914b would be assigned to render bottom region 204. The higher complexity of the foreground (bottom) region tends to increase the rendering time of GPU 914b. In response, the load-balancing processes described herein will tend to move the boundary line P toward the bottom of the display area [Examiner noted: as type of the transformations (i.e., render scenes in which foreground regions are more complex than the background regions (as different types of transformations)]);
	assigning, to a processing unit distinct from the GPU , a subset of the application clients of the set of application clients to cause time/load used by the processing unit to be below the frame processing threshold, wherein assigning the subset of application clients causes the processing unit to execute one or more portions of the frame transformations to modify one or more individual frames in a batch of frames (Diard, Col 13, lines 12-24, different feedback data may be used instead of or in addition to the GPU identifiers described above. For example, instead of providing one feedback array in system memory, with both GPUs writing feedback data to the same location for a given frame, each GPU may write to a corresponding entry of a different feedback array, and the feedback data may include timing information, e.g., a timestamp indicating when each GPU finished a particular frame. In this embodiment, the graphics driver is configured to use the timing information to determine whether one GPU is consistently using more time per frame than another and adjust the clip rectangles accordingly to balance the load; Col 11, lines 18-27, at step 510 it is determined whether the load coefficient exceeds a "high" threshold...then the loads are adjusted at step 512 by moving the boundary line P in FIG. 2 down by a preset amount (e.g., one line, five lines, ten lines). This reduces the fraction of the display area that is rendered by GPU-1, which will tend to reduce the load on GPU-1 and increase the load on GPU-0. (as assigning to a processing unit distinct from the original GPU, a subset of the application clients to cause time used by the set of processing units to be below the threshold); Col 2, lines 66-67, The graphics processors are instructed to render a number of frames, wherein the first and second graphics process; Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending); and
	reassigning the subset of application clients to cause the time/load used by the GPU to remain below the frame processing threshold (Diard, Abstract, lines 9-12, re-partitioned to increase a size of the portion assigned to the less heavily loaded processor and to decrease a size of the portion assigned to the more heavily loaded processor; Fig. 2, 202, 204, P and P’; Fig. 5, 510 does average exceed high threshold? NO to 514 is average below low threshold? YES to 516, move boundary line up by preselected amount (as reassigning to cause the metric associated with each of the set of clients to remain below the threshold); Col 8, lines 55-67, rendering commands 308 and associated rendering data for the next frame F1…At this point, the clip rectangles for each GPU may be modified by the graphics driver program based on the feedback data received in response to the various write notifier commands (e.g., commands 306, 310). For example, where the display area is divided as shown in FIG. 2, the value of P may be modified (e.g., to P') in response to feedback data: if the GPU that processes top portion 202 tends to finish its frames first, the value of P is increased, and if the GPU that processes bottom portion 204 tends to finish first, the value of P is decreased. Specific embodiments of re-partitioning a display area in response to feedback data are described below; Col 11, lines 18-27, at step 510 it is determined whether the load coefficient exceeds a "high" threshold… at step 514, it is determined whether the load coefficient is less than a "low" threshold. The low threshold is predefined and may be exactly 0.5 or a somewhat lower value (e.g., 0.45 or 0.4). If the load coefficient is below the low threshold, then the loads are adjusted at step 516 by moving the boundary line P in FIG. 2 up by a preset amount (e.g., one line, five lines, ten lines) (as reassign at least one of the subset of clients based on the dynamic load balancing between GPUs; also see Col 10, lines 48-55, graphics driver may balance the load after some number (Q) of frames (as set of processes/frame renderings).
	cause the GPU to execute the one or more portions of the frame transformations as a result of reassigning the subset of application clients. (Diard, Abstract, lines 9-12, re-partitioned to increase a size of the portion assigned to the less heavily loaded processor and to decrease a size of the portion assigned to the more heavily loaded processor; Fig. 2, 202, 204, P and P’; Fig. 5, 510 issue initial clip rectangle command, 514 is average below low threshold? YES to 516, move boundary line up by preselected amount; Col 8, lines 55-67, rendering commands 308 and associated rendering data for the next frame F1…At this point, the clip rectangles for each GPU may be modified by the graphics driver program based on the feedback data received in response to the various write notifier commands (e.g., commands 306, 310). For example, where the display area is divided as shown in FIG. 2, the value of P may be modified (e.g., to P') in response to feedback data: if the GPU that processes top portion 202 tends to finish its frames first, the value of P is increased, and if the GPU that processes bottom portion 204 tends to finish first, the value of P is decreased. Specific embodiments of re-partitioning a display area in response to feedback data are described below; Col 11, lines 18-27, at step 510 it is determined whether the load coefficient exceeds a "high" threshold… at step 514, it is determined whether the load coefficient is less than a "low" threshold. The low threshold is predefined and may be exactly 0.5 or a somewhat lower value (e.g., 0.45 or 0.4). If the load coefficient is below the low threshold, then the loads are adjusted at step 516 by moving the boundary line P in FIG. 2 up by a preset amount (e.g., one line, five lines, ten lines); Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending).

	Diard fails to specifically teach when assigning to a GPU, it is indicated as a pre-selected hardware accelerator to execute, based, at least in part, on configuration information.

	However, Bernat teaches when assigning to a GPU, it is indicated as a pre-selected hardware accelerator to execute, based, at least in part, on configuration information (Bernat, [0084] lines 18-32, workload assignor 1632 includes an acceleration assignor 1634 which is configured to assign certain workloads or portions thereof to corresponding accelerators (e.g., the accelerators 1250, 1260). In doing so, the acceleration assignor 1634 may identify a type of the workload based on a profile of resource utilizations of the workload over time or based on a tag, an analysis of the computer-executable instructions within the workload, a header of the workload, metadata indicative of the types operations to be executed in the workload, or from a request from a CPU (e.g., the CPU 1272) to offload a portion of a workload onto a particular accelerator 1250, 1260 or type of accelerator (e.g., FPGA, graphics accelerator, cryptography accelerator, compression accelerator, etc.) and assign the workload (as including set of tasks/operations) to a corresponding accelerator 1250, 1260 for execution; [Examiner noted: the corresponding/particular hardware accelerator is identified (as first hardware accelerator/pre-selected hardware accelerator) for assignment based on the type of the operations within the workload because the type of the operation and that particular accelerator need to be corresponding with each other]);

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard with Bernat because Bernat’s teaching of assigning the processes based on the configuration of the processes to the corresponding accelerator would have provided Diard’s system with the advantage and capability to allow the system to easily determining the correct type of the accelerator for processing the corresponding type of the processes which improving the resource utilization and system efficiency. 

	Diard and Bernat fail to specifically teach wherein the configuration information that indicates a mapping between the video frame transformations and hardware accelerators. 

	However, Dutta teaches wherein the configuration information that indicates a mapping between the transformations and hardware accelerators (Dutta, Fig. 1, 16(1)-(N) hardware accelerators, 20(1)-(N) Bolt (i.e., which including the process of filter tuples (i.e., as transformations); Fig. 2, 44 input data to hardware accelerator for execution to generate output data 46; Col 4, lines 11-25, broker 14 may receive capability information from bolts 20(1)-20(N) and hardware accelerators 16(1)-16(N) and map hardware accelerators 16(1)-16(N) to corresponding bolts 20(1)-20(N). The capability information from bolts 20(1)-20(N) may include respective locations in distributed streams 17(1)-17(M) and identities; the capability information from hardware accelerators 16(1)-16(N) may include respective network locations (e.g., Internet Protocol (IP) address), and capabilities (e.g., RegEx processor, graphics processor, etc.). The mapping may be formatted into any suitable table, spreadsheet, memory mapping, etc. as suitable and based on particular needs. According to various embodiments, the mapping may be used to route the data elements of distributed streams 17(1)-17(M) to appropriate hardware accelerators 16(1)-16(N) for stream processing; Col 8, lines 23-27,  distributed stream 17 may be processed by a set of computing devices called worker nodes 54. According to various embodiments, worker nodes 54 may include hardware accelerators 16(1)-16(N) and/or other computing devices).  
	
	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard and Bernat with Dutta because Dutta’s teaching of mapping the accelerators with the corresponding transformation operations for processing would have provided Diard and Bernat’s system with the advantage and capability to allow the system to easily assigning the operations to the corresponding hardware accelerators based on the mapping in order to improving the system performance and processing speed. 

	Diard, Bernat and Dutta fail to specifically teach the GPU is video image compositor (VIC) engine.

	However, Takemoto teaches the GPU is video image compositor (VIC) engine (Takemoto, Col 2, lines 17-20, A first plurality of crosspoint switches connect the plurality of digital video signal inputs to the key processing subsystem and to the video image compositor).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat and Dutta with Takemoto because Takemoto’s teaching of video image compositor would have provided Diard, Bernat and Dutta’s system with the advantage and capability to allow the system to processing the digital video images which improving the system performance and efficiency.  

	Diard, Bernat, Dutta and Takemoto fail to specifically teach the frame transformations is video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames.

	However, Lowry teaches the frame transformations is video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames (Lowry, [0041] lines 3-9, Embodiments provide that a NVIDIA® K6000 (NVIDIA® is a registered trademark of NVIDIA) may be used as the GPU. Embodiments provide this as the first stage in the video pipeline (VP). In the GPU (14), video frames may be debayered, as well as other necessary video transformations, such as motion compensation, white balance, black level correction, etc.,).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat, Dutta and Takemoto with Lowry because Lowry’s teaching of video frame transformations would have provided Diard, Bernat, Dutta and Takemoto’s system with the advantage and capability to allow the system to processing the video and image frames transformations (i.e., augmented reality) which improving the system performance and efficiency (see Lowry, [0004]).

	Diard, Bernat, Dutta, Takemoto and Lowry fail to specifically teach when comparing and determining the time/load used, it is an average time.

	However, Desai teaches when comparing and determining the time used for load balancing, it is an average time (Desai, Fig. 3, 210 load balancing agent; [0003] load balancing can be determined based in part on an analysis of the current load on each server. Load in this instance can comprise at least the amount of resources allocated to executing user sessions. When a user requests access to an application or otherwise initiates the creation of a user session, the system can responsively determine which server has the least amount of load, and then establish the user session on that server; [0008] lines 1-7, determining the current load value can include evaluating any of the following: a number of page faults; an amount of memory used by the second computer; an average amount of time the second computer uses a central processing unit of the second computer).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat, Dutta, Takemoto and Lowry with Desai because Desai’s teaching of determining the average amount of time used as load of each server for performing the load balancing would have provided Diard, Bernat, Dutta, Takemoto and Lowry’s system with the advantage and capability to allow the system to utilizing the average time used for processing as the load indicator for performing the load balancing in order to balancing the processing times among the different processing device which improving the system resource utilization and efficiency. 

	Although Diard, Bernat, Dutta, Takemoto, Lowry and Desai teach assign a subset of clients of the set of clients to a processing unit, Diard, Bernat, Dutta, Takemoto, Lowry and Desai fail to specifically teach when assign a subset of clients of the set of clients, it is assign to a set of processing units.

	However, Shraer teaches when assign a subset of clients of the set of clients, it is assign to a set of processing units (Shraer, [0016] lines 3-9, each set of partitions 132 may include multiple partitions...All of the partitions of all of the data sets 132 form the data set for the application job that is executed by the application system; [0033] lines 8-12, move operations can be considered for worker computers that are in a top subset of worker computers with high load measures, e.g., the top x %, or all worker computers with loads above a threshold load measure; [0062] lines 10-12,  a partition may be migrated from one worker computer to two or more other worker computers, (as set of computers/hardware accelerators, please note: second hardware accelerator was taught by Bernat and Diard).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat, Dutta, Takemoto, Lowry and Desai with Shraer because Shraer’s teaching of assigning/migrating the operation/partition to two or more other worker computers for processing would have provided Diard, Bernat, Dutta, Takemoto, Lowry and Desai’s system with the advantage and capability to allow the system to re-balancing the load among different computers/accelerators in order to efficiently utilizing the resources which improving the system efficiency and performance.

As per claim 23, Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer teach the invention according to claim 22 above. Diard further teaches wherein the assignment and the reassignment of the subset of application clients are repeated one or more times (Diard, Col 2, lines 28-31, It would, therefore, be desirable to provide a mechanism whereby the processing load on each GPU can be monitored and the division of the display area among the GPUs can be dynamically adjusted to balance the loads; also see Col 8, lines 17-25, each GPU provides feedback data to the graphics driver program (or another program executing on CPU 102). The feedback data provides information about the time taken by a particular GPU to render its portion of the image. The graphics driver program uses this feedback to dynamically balance the load among the GPUs by modifying the clip rectangle from time to time, e.g., by changing the dividing line to a different line P', based on the relative loads on the two GPUs). 

As per claim 24, Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer teach the invention according to claim 22 above. Diard further teaches wherein the frame processing threshold is determined based at least in part on a framerate of processing (Diard, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; Col 11, lines 7-17, The load coefficient is used to determine whether an adjustment to the clip rectangles for the GPUs needs to be made. If the GPUs are equally loaded, the likelihood of either GPU finishing a frame first is about 50%, and the average value over a suitable number of frames (e.g., 20) will be about 0.5 if identifier values of 0 and 1 are used. An average value in excess of 0.5 indicates that GPU-1 (which renders the bottom portion of the image) is more heavily loaded than GPU-0, and an average value below 0.5 indicates that GPU-0 (which renders the top portion of the image) is more heavily loaded than GPU-1). In addition, Takemoto teaches video processing (Takemoto, Col 1, line 35,  digital video image compositing; Col 2, lines 17-20, A first plurality of crosspoint switches connect the plurality of digital video signal inputs to the key processing subsystem and to the video image compositor). 

As per claim 25, Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer teach the invention according to claim 22 above. Desai further teaches wherein the assigning the subset of application clients is executed using a load balancer (Desai, Fig. 3, 210 load balancing agent; [0003] load balancing can be determined based in part on an analysis of the current load on each server. Load in this instance can comprise at least the amount of resources allocated to executing user sessions. When a user requests access to an application or otherwise initiates the creation of a user session, the system can responsively determine which server has the least amount of load, and then establish the user session on that server; [0008] lines 1-7, determining the current load value can include evaluating any of the following: a number of page faults; an amount of memory used by the second computer; an average amount of time the second computer uses a central processing unit of the second computer).

As per claim 29, Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer teach the invention according to claim 22 above. Desai further teaches wherein the set of processing units comprises a graphics processing unit (GPU) (Desai, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames).

Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer, as applied to claim 25 above, and further in view of Biran et al. (US Pub. 2018/0039516 A1). 
	Biran was cited in the previous Office Action.

As per claim 26, Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer teach the invention according to claim 25 above. Desai further teaches wherein the load balancer that assigns the subset of application clients (Desai, Fig. 3, 210 load balancing agent; [0003] load balancing can be determined based in part on an analysis of the current load on each server. Load in this instance can comprise at least the amount of resources allocated to executing user sessions. When a user requests access to an application or otherwise initiates the creation of a user session, the system can responsively determine which server has the least amount of load, and then establish the user session on that server; [0008] lines 1-7, determining the current load value can include evaluating any of the following: a number of page faults; an amount of memory used by the second computer; an average amount of time the second computer uses a central processing unit of the second computer; [0109] the load balancing agent 210 executes or otherwise assigns the user session to the selected server (Step 728)).

	Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer fail to specifically teach the load balancer that assigns the subset of application clients comprises a processing thread.

	However, Biran teaches the load balancer that assigns the subset of application clients comprises a processing thread (Biran, [0042] lines 1-2, load balancer 27 executes a request processing thread)

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer with Biran because Biran’s teaching of load balancer executing the processing thread (as comprising a processing thread) would have provided Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer’s system with the advantage and capability to allow the system to utilizing the load balancer for processing request processing thread in order to improving the system processing speed and efficiency.

Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer, as applied to claim 25 above, and further in view of MALDANER (US Pub. 2011/0276695 A1). 
	MALDANER was cited in the previous Office Action.

As per claim 27, Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer teach the invention according to claim 25 above. Desai further teaches wherein a load balancer that assigns the subset of application clients (Desai, Fig. 3, 210 load balancing agent; [0003] load balancing can be determined based in part on an analysis of the current load on each server. Load in this instance can comprise at least the amount of resources allocated to executing user sessions. When a user requests access to an application or otherwise initiates the creation of a user session, the system can responsively determine which server has the least amount of load, and then establish the user session on that server; [0008] lines 1-7, determining the current load value can include evaluating any of the following: a number of page faults; an amount of memory used by the second computer; an average amount of time the second computer uses a central processing unit of the second computer; [0109] the load balancing agent 210 executes or otherwise assigns the user session to the selected server (Step 728)).

	Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer fail to specifically teach the load balancer maintains a table of the set of application clients.

	However, MALDANER teaches the load balancer maintains a table of the set of application clients (MALDANER, [0108] last two lines: the load balancer 255 may maintain a metrics table 420 for each service, device or application; [0114] lines 1-10, The managed objects or variables provided via the network management protocol may provide any type and form of metrics or operational characteristics of the service, server or device to be used by the appliance for load balancing, or any other function of the load balancer 255. In one embodiment, the device provided metrics 420 may include any of the metrics 410 collected by the load balancer 255 as described above. In another embodiment, the device provided metrics 420 may include any type and form of information on any resource usage of the managed device, service or system. In one embodiment, the metrics 410 include CPU, memory and/or disk usage of the device and/or service 270. In other embodiments, the metrics 420 may include information on a number of connections, sessions or clients of the service 270 (as load balancer having metrics table maintains a table of set of application clients)).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer with MALDANER because MALDANER’s teaching of load balancer having a metrics table that including the information about the clients of service would have provided Diard, Bernat, Dutta, Takemoto, Lowry, Desai and Shraer’s system with the advantage and capability to allow the system to easily determining and managing the different information regarding to the clients services which improving the system performance and efficiency. 

Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer and MALDANER, as applied to claim 27 above, and further in view of Kawai et al. (US Pub. 2006/0271700 A1) and Nakamura et al. (US Pub. 2012/0016994 A1).
	Kawai and Nakamura were cited in the previous Office Action.

	As per claim 28, Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer and MALDANER teach the invention according to claim 27 above. Lowry teaches compute the video frame transformations (Lowry, [0041] lines 3-9, Embodiments provide that a NVIDIA® K6000 (NVIDIA® is a registered trademark of NVIDIA) may be used as the GPU. Embodiments provide this as the first stage in the video pipeline (VP). In the GPU (14), video frames may be debayered, as well as other necessary video transformations, such as motion compensation, white balance, black level correction, etc.,).

	Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer and MALDANER fail to specifically teach wherein the table of the set of application clients includes a processing engine assigned to each client and an average time taken by the client to compute.

	However, Kawai teaches wherein the table of the set of application clients includes a processing engine assigned to each client and a time taken by the client to compute (Kawai, Fig. 6, 112 user information table, users with assigned different services (as processing engines) and delay time; also see claim 3, the record medium with the load distribution program recorded thereon according to claim 2, wherein the delay time determination means refers to a delay time management table where communication delay time taken between the client connection server which provides an Internet connection service to the client and each data center is set in advance and determines the processing delay time taken between the client and each data center).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer and MALDANER with Kawai because Kawai’s teaching of providing a table which including the client and servers assignments and the communication delay times would have provided Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer and MALDANER’s system with the advantage and capability to allow the system to easily determining and managing the different clients and servers assignment/association for processing the workload which improving the system performance and efficiency. 

	Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer, MALDANER and Kawai fail to specifically teach the delay time is an average time taken by the client to compute.

	However, Nakamura teaches the delay time is an average time taken by the client to compute (Nakamura, Fig. 3, processing time management table, distributed node ID, average processing time;  [0021] lines 1-10, the client 200 has: a CPU 201, a memory 202; a storage medium 203; and a communication interface 210, and lying on the memory 202 are: an information request program 204 making a request of an information processing program of the distributed node for information processing; a node management table 206 holding network information of all distributed nodes connectable from the client; a processing time management table 207 measuring and holding time required for the information processing in each distributed node).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer, MALDANER and Kawai with Nakamura because Nakamura’s teaching of average processing time would have provided Diard, Bernat, Dutta, Takemoto, Lowry, Desai, Shraer, MALDANER and Kawai’s system with the advantage and capability to allow the system to easily determining and managing the different average processing times associated with different nodes which improving the system performance and efficiency. 

Claims 30-32 are rejected under 35 U.S.C. 103 as being unpatentable over Bernat et al (US Pub. 2020/0409748 A1) in view of Dutta (US Patent. 9,313,134 B2) and further in view of Shraer et al. (US Pub. 2017/0353536 A1), Diard (US Patent. 7,075,541 B2), Lowry et al. (US Pub. 2018/0262684 A1), and Takemoto et al. (US Patent 5,347,622).
	Bernat, Dutta, Shraer, Diard and Takemoto were cited in the previous Office Action.

As per claim 30, Bernat teaches the invention substantially as claimed including A non-transitory computer readable storage medium storing thereon executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to (Bernat, [0029] lines 1-12, The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors… A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory): 
	obtain first performance data associated with an accelerator, the first performance data related to a set of application clients assigned to the accelerator to execute operations corresponding to the set of applications clients (Bernat, [0088] lines 8-16, offloading accelerator kernel tasks to the FPGA 1619, 1629, 1685. More particularly, a request going from a compute sled to a target accelerator device generally traverses one or more hops, such as the NIC 1618, 1628, 1684, prior to reaching the target accelerator device. In an embodiment, the NIC 1618, 1628, 1684 may include inline processing logic to identify one or more accelerator tasks (as application client) of a workload that can be performed by the FPGA 1619, 1629, 1685, respectively; [0089] lines 1-8, the orchestrator server 1616 includes a kernel analysis and decision logic unit 1617. The kernel analysis and decision logic unit 1617 may be embodied as any device or circuitry to obtain telemetry data indicative of resource usage and power consumption of the accelerator sleds; Fig. 23, 2302 monitor power consumption of a source accelerator sled having one or more accelerator devices executing a workload, 2304 monitor power consumption of the accelerator sled relative to one or more power consumption thresholds specified in a policy); 
	determine, that the first performance data exceeds a threshold (Bernat, Fig. 23 2306 power consumption threshold exceeded? [0111] In block 2306, the orchestrator server 1616 determines whether a power consumption threshold is exceeded. If not, then the method 2300 returns to block 2302. Otherwise, the orchestrator server 1616 determines whether accelerator resources in the system 1300 are available for a scale-out operation); 
	as a result of the determination that the first performance data exceeds the threshold, assign a subset of application clients of the set of application clients to a processor to cause second performance data related to the set of application clients to be below the threshold, cause the processor to execute at least a portion of the operations correspond to the subset of application clients to relieve load from the accelerator (Bernat, Fig. 23, 2312 Migrate the workload from one or more of the accelerator devices of the source accelerator sled to one or more accelerator devices of the target accelerator sled; [0112] lines 1-7, if accelerator resources are available, then in block 2310, the orchestrator server 1616 scales-out the workload to one or more accelerator devices on a target accelerator sled. For instance, to do so, in block 2312, the orchestrator server 1616 may migrate the workload from one or more of the accelerator devices of the source accelerator sled to one or more accelerator devices of the target accelerator sled (as second processor perform the task/application client which to relieve load from the source accelerator therefore, second performance data (after migration) related to the set of application clients to be below the threshold, i.e., the purpose of the migration is for make sure below the threshold); also see Fig. 23 2306 power consumption threshold exceeded? [0111] In block 2306, the orchestrator server 1616 determines whether a power consumption threshold is exceeded. If not, then the method 2300 returns to block 2302. Otherwise, the orchestrator server 1616 determines whether accelerator resources in the system 1300 are available for a scale-out operation).

	Bernat fails to specifically teach the operations are video frame transformations, wherein the accelerator is selected based, at least in part, on a mapping between the video frame transformations and hardware accelerators, execute at least a portion of the video frame transformations.

	However, Dutta teaches when execute the operations, the operations are transformations (Dutta, Col 1, lines 56-60, An example method for leveraging hardware accelerators for scalable distributed streams in a network environment is provided and includes allocating a plurality of hardware accelerators to a corresponding plurality of bolts of a distributed stream in a network; Col 2, lines 11-14, the “bolt” implements processing logic to process (e.g., run functions, filter tuples, (as transformations) perform stream aggregations, talk to databases, etc.) the data elements in the stream);
	wherein the accelerator is selected based, at least in part, on a mapping between the transformations and hardware accelerators, execute at least a portion of the transformations (Dutta, Fig. 1, 16(1)-(N) hardware accelerators, 20(1)-(N) Bolt (i.e., which including the process of filter tuples (i.e., as transformations); Fig. 2, 44 input data to hardware accelerator for execution to generate output data 46; Col 4, lines 11-25, broker 14 may receive capability information from bolts 20(1)-20(N) and hardware accelerators 16(1)-16(N) and map hardware accelerators 16(1)-16(N) to corresponding bolts 20(1)-20(N). The capability information from bolts 20(1)-20(N) may include respective locations in distributed streams 17(1)-17(M) and identities; the capability information from hardware accelerators 16(1)-16(N) may include respective network locations (e.g., Internet Protocol (IP) address), and capabilities (e.g., RegEx processor, graphics processor, etc.). The mapping may be formatted into any suitable table, spreadsheet, memory mapping, etc. as suitable and based on particular needs. According to various embodiments, the mapping may be used to route the data elements of distributed streams 17(1)-17(M) to appropriate hardware accelerators 16(1)-16(N) for stream processing; Col 8, lines 23-27,  distributed stream 17 may be processed by a set of computing devices called worker nodes 54. According to various embodiments, worker nodes 54 may include hardware accelerators 16(1)-16(N) and/or other computing devices).  
	
	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat with Dutta because Dutta’s teaching of mapping the accelerators with the corresponding transformation operations for processing would have provided Bernat’s system with the advantage and capability to allow the system to easily assigning the operations to the corresponding hardware accelerators based on the mapping in order to improving the system performance and processing speed. 

	Although Bernat and Dutta teach assign a subset of clients of the set of clients to a processor, Bernat and Dutta fail to specifically teach when assign a subset of clients of the set of clients, it is assign to a set of processors.

	However, Shraer teaches when assign a subset of clients of the set of clients, it is assign to a set of processors (Shraer, [0016] lines 3-9, each set of partitions 132 may include multiple partitions...All of the partitions of all of the data sets 132 form the data set for the application job that is executed by the application system; [0033] lines 8-12, move operations can be considered for worker computers that are in a top subset of worker computers with high load measures, e.g., the top x %, or all worker computers with loads above a threshold load measure; [0062] lines 10-12,  a partition may be migrated from one worker computer to two or more other worker computers, (as set of computers/hardware accelerators).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat and Dutta with Shraer because Shraer’s teaching of assigning/migrating the operation/partition to two or more other worker computers for processing would have provided Bernat and Dutta’s system with the advantage and capability to allow the system to re-balancing the load among different computers/accelerators in order to efficiently utilizing the resources which improving the system efficiency and performance.

	Bernat, Dutta and Shraer fail to specifically teach when determine that the first performance data exceeds a threshold, it is based, at least in part, on a type of the video frame transformations, when execute, it is execute at least a portion of the video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames, wherein the video frame transformations correspond to the subset of application clients to relive load form the VIC; reassign the subset of application clients to the VIC to cause the second performance data to remain below the threshold, wherein the reassignment causes the VIC to execute at least the portion of the video frame transformations.
	
	However, Diard teaches when determine that the first performance data exceeds a threshold, it is based, at least in part, on a type of the frame transformations (Diard, Col 1, lines 31-38, The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.) and write the results to the graphics memory. The GPU is a "slave" processor that operates in response to commands received from a driver program executing on a "master" processor; Col 3, lines 31-35, The load coefficient may be, e.g., an average of the recorded numeric values that can be compared to an arithmetic mean of the numeric values of the processor identifiers in order to determine whether an imbalance exists; Col 11, lines 18-27, determined whether the load coefficient exceeds a "high" threshold. The high threshold is preselected and may be exactly 0.5 or a somewhat higher value (e.g., 0.55 or 0.6). If the load coefficient exceeds the high threshold, then the loads are adjusted at step 512 by moving the boundary line P in FIG. 2 down by a preset amount (e.g., one line, five lines, ten lines). This reduces the fraction of the display area that is rendered by GPU-1, which will tend to reduce the load on GPU-1 and increase the load on GPU-0; Col 16, lines 8-23, In some multi-card embodiments used to render scenes in which foreground regions (most often but not always at the bottom of the display area) are consistently more complex than background regions, a performance advantage can be gained by assigning GPU 914a to process the background region of the scene and assigning GPU 914b to process the foreground region. For example, in FIG. 2, suppose that the foreground appears toward the bottom of display area 200. In that case, GPU 914a would be assigned to render top region 202 while GPU 914b would be assigned to render bottom region 204. The higher complexity of the foreground (bottom) region tends to increase the rendering time of GPU 914b. In response, the load-balancing processes described herein will tend to move the boundary line P toward the bottom of the display area; Col 2, lines 66-67, The graphics processors are instructed to render a number of frames, wherein the first and second graphics process; Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending [Examiner noted: as a type of the transformations (i.e., render scenes in which foreground regions are more complex than the background regions (as different types of transformations)]).
	when execute, it is execute at least a portion of the frame transformations to a batch of frames to modify one or more individual frames in the batch of frames, wherein the frame transformations correspond to the subset of application clients to relive load form the GPU (Diard, Col 1, lines 58-60, For example, each GPU may be instructed to render pixel data for a different portion of the displayable image, such as a number of lines of a raster-based display (as frame transformations correspond the subset of application clients to GPU for processing; see specs [0057] “a client (e.g., application, processes, or other components”); Col 13, lines 12-24, different feedback data may be used instead of or in addition to the GPU identifiers described above. For example, instead of providing one feedback array in system memory, with both GPUs writing feedback data to the same location for a given frame, each GPU may write to a corresponding entry of a different feedback array, and the feedback data may include timing information, e.g., a timestamp indicating when each GPU finished a particular frame. In this embodiment, the graphics driver is configured to use the timing information to determine whether one GPU is consistently using more time per frame than another and adjust the clip rectangles accordingly to balance the load; Col 11, lines 18-27, at step 510 it is determined whether the load coefficient exceeds a "high" threshold...then the loads are adjusted at step 512 by moving the boundary line P in FIG. 2 down by a preset amount (e.g., one line, five lines, ten lines). This reduces (as relieve load) the fraction of the display area that is rendered by GPU-1, which will tend to reduce the load on GPU-1 and increase the load on GPU-0. (as assigning to a processing unit distinct from the original GPU, a subset of the application clients to cause time used by the set of processing units to be below the threshold); Col 2, lines 66-67, The graphics processors are instructed to render a number of frames, wherein the first and second graphics process; Col 5, lines 37-40, lighting transformations, coordinate transformations, scan-conversion of geometric primitives to rasterized data, shading computations, shadow rendering, texture blending);
	reassign the subset of application clients to the GPU to cause the second performance data to remain below the threshold, wherein the reassignment causes the GPU to execute at least the portion of the frame transformations (Diard, Abstract, lines 9-12, re-partitioned to increase a size of the portion assigned to the less heavily loaded processor and to decrease a size of the portion assigned to the more heavily loaded processor; Fig. 2, 202, 204, P and P’; Fig. 5, 514 is average below low threshold? YES to 516, move boundary line up by preselected amount; Col 8, lines 55-67, rendering commands 308 and associated rendering data for the next frame F1…At this point, the clip rectangles for each GPU may be modified by the graphics driver program based on the feedback data received in response to the various write notifier commands (e.g., commands 306, 310). For example, where the display area is divided as shown in FIG. 2, the value of P may be modified (e.g., to P') in response to feedback data: if the GPU that processes top portion 202 tends to finish its frames first, the value of P is increased, and if the GPU that processes bottom portion 204 tends to finish first, the value of P is decreased. Specific embodiments of re-partitioning a display area in response to feedback data are described below; Col 11, lines 18-27, at step 510 it is determined whether the load coefficient exceeds a "high" threshold… at step 514, it is determined whether the load coefficient is less than a "low" threshold. The low threshold is predefined and may be exactly 0.5 or a somewhat lower value (e.g., 0.45 or 0.4). If the load coefficient is below the low threshold, then the loads are adjusted at step 516 by moving the boundary line P in FIG. 2 up by a preset amount (e.g., one line, five lines, ten lines)).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta and Shraer with Diard because Diard’s teaching of dynamically adjusting the processing load among the GPUs based on the feedback data would have provided Bernat, Dutta and Shraer’s system with the advantage and capability to allow the system to efficiently utilizing the resources based on the load which improving the processing speed and system efficiency.

	Bernat, Dutta, Shraer and Diard fail to specifically teach the frame transformations is video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames.

	However, Lowry teaches the frame transformations is video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames (Lowry, [0041] lines 3-9, Embodiments provide that a NVIDIA® K6000 (NVIDIA® is a registered trademark of NVIDIA) may be used as the GPU. Embodiments provide this as the first stage in the video pipeline (VP). In the GPU (14), video frames may be debayered, as well as other necessary video transformations, such as motion compensation, white balance, black level correction, etc.,).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Shraer and Diard with Lowry because Lowry’s teaching of video frame transformations would have provided Bernat, Dutta, Shraer and Diard’s system with the advantage and capability to allow the system to processing the video and image frames transformations (i.e., augmented reality) which improving the system performance and efficiency (see Lowry, [0004]).

	Bernat, Dutta, Shraer, Diard and Lowry fail to specifically teach the accelerator/GPU is video image compositor (VIC).

	However, Takemoto teaches the accelerator/GPU is video image compositor (VIC) (Takemoto, Col 2, lines 17-20, A first plurality of crosspoint switches connect the plurality of digital video signal inputs to the key processing subsystem and to the video image compositor).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Shraer, Diard and Lowry with Takemoto because Takemoto’s teaching of video image compositor would have provided Bernat, Dutta, Shraer, Diard and Lowry’s system with the advantage and capability to allow the system to processing the digital video images which improving the system performance and efficiency.  

As per claim 31, Bernat, Dutta, Shraer, Diard, Lowry and Takemoto teach the invention according to claim 30 above. Diard further teaches wherein the threshold is determined based, at least in part, on one or more framerates associated with the set of application clients (Diard, Col 10, lines 64-Col 11, line 6, the graphics driver might not know whether the GPUs have finished a particular frame and the GPUs may be rendering a frame (as task) that is several frames earlier in the command stream than a current frame in the graphics driver. Where the feedback array is written in a circular fashion, as in process 400 described above, selecting Q to be equal to B provides an average over the B most recently rendered frames. In some embodiments, a weighted average may be used, e.g., giving a larger weight to more recently-rendered frames; Col 11, lines 7-17, The load coefficient is used to determine whether an adjustment to the clip rectangles for the GPUs needs to be made. If the GPUs are equally loaded, the likelihood of either GPU finishing a frame first is about 50%, and the average value over a suitable number of frames (e.g., 20) will be about 0.5 if identifier values of 0 and 1 are used. An average value in excess of 0.5 indicates that GPU-1 (which renders the bottom portion of the image) is more heavily loaded than GPU-0, and an average value below 0.5 indicates that GPU-0 (which renders the top portion of the image) is more heavily loaded than GPU-1; also see Col 12, lines 30-31, 30 frames are rendered per second; Col 12, lines 36-46, Correspondingly, the high threshold and low threshold may have any values, and the two threshold values may be equal (e.g., both equal to 0.5), so long as the high threshold is not less than the low threshold. Both thresholds are advantageously set to values near or equal to the arithmetic mean of the two identifiers; an optimal selection of thresholds in a particular system may be affected by considerations such as the frequency of load rebalancing and any overhead associated with changing the clip rectangles assigned to each GPU. The threshold comparison is advantageously defined such that there is some condition for which the load is considered balanced). 

As per claim 32, Bernat, Dutta, Shraer, Diard, Lowry and Takemoto teach the invention according to claim 30 above. Bernat further teaches wherein the set of second processors further comprises a field-programmable gate array (FPGA) (Bernat, Fig. 16, 1620 FPGA).

Claim 33 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Shraer, Diard, Lowry and Takemoto, as applied to claim 30 above, and further in view of MALDANER (US Pub. 2011/0276695 A1). 
	MALDANER was cited in the previous Office Action.

As per claim 33, Bernat, Dutta, Shraer, Diard, Lowry and Takemoto teach the invention according to claim 30 above. Bernat, Dutta, Shraer, Diard, Lowry and Takemoto fail to specifically teach maintain a table of set of application clients of which each application client is a member.

	However, MALDANER teaches maintain a table of set of application clients of which each application client is a member (MALDANER, [0108] last two lines: the load balancer 255 may maintain a metrics table 420 for each service, device or application; [0114] lines 1-10, The managed objects or variables provided via the network management protocol may provide any type and form of metrics or operational characteristics of the service, server or device to be used by the appliance for load balancing, or any other function of the load balancer 255. In one embodiment, the device provided metrics 420 may include any of the metrics 410 collected by the load balancer 255 as described above. In another embodiment, the device provided metrics 420 may include any type and form of information on any resource usage of the managed device, service or system. In one embodiment, the metrics 410 include CPU, memory and/or disk usage of the device and/or service 270. In other embodiments, the metrics 420 may include information on a number of connections, sessions or clients of the service 270 (as load balancer having metrics table maintains a table of application clients)).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Shraer, Diard, Lowry and Takemoto with MALDANER because MALDANER’s teaching of load balancer having a metrics table that including the information about the clients of service would have provided Bernat, Dutta, Shraer, Diard, Lowry and Takemoto’s system with the advantage and capability to allow the system to easily determining and managing the different information regarding to the clients services which improving the system performance and efficiency. 

Claim 34 is rejected under 35 U.S.C. 103 as being unpatentable over Bernat, Dutta, Shraer, Diard, Lowry, Takemoto and MALDANER, as applied to claim 33 above, and further in view of Kawai et al. (US Pub. 2006/0271700 A1) and Nakamura et al. (US Pub. 2012/0016994 A1).
	Kawai and Nakamura were cited in the previous Office Action.

As per claim 34, Bernat, Dutta, Shraer, Diard, Lowry, Takemoto and MALDANER teach the invention according to claim 33 above. Bernat teaches compute operations for the subset of application clients (Bernat, [0088] lines 8-16, offloading accelerator kernel tasks to the FPGA 1619, 1629, 1685. More particularly, a request going from a compute sled to a target accelerator device generally traverses one or more hops, such as the NIC 1618, 1628, 1684, prior to reaching the target accelerator device. In an embodiment, the NIC 1618, 1628, 1684 may include inline processing logic to identify one or more accelerator tasks of a workload that can be performed by the FPGA 1619, 1629, 1685, respectively). In addition, Lowry teaches operations is video frame transformations (Lowry, [0041] lines 3-9, Embodiments provide that a NVIDIA® K6000 (NVIDIA® is a registered trademark of NVIDIA) may be used as the GPU. Embodiments provide this as the first stage in the video pipeline (VP). In the GPU (14), video frames may be debayered, as well as other necessary video transformations, such as motion compensation, white balance, black level correction, etc.,).

	Bernat, Dutta, Shraer, Diard, Lowry, Takemoto and MALDANER fail to specifically teach wherein the table of set of application clients includes information indicating that the set of processor is assigned to the subset of application clients, and an average time taken by the set of processors to compute.

	However, Kawai teaches wherein the table of set of application clients includes information indicating that the set of processor is assigned to the subset application clients and a delay time taken by the set of processors to compute (Kawai, Fig. 6, 112 user information table, users with assigned different services (as include set of second processors) and delay time; also see claim 3, the record medium with the load distribution program recorded thereon according to claim 2, wherein the delay time determination means refers to a delay time management table where communication delay time taken between the client connection server which provides an Internet connection service to the client and each data center is set in advance and determines the processing delay time taken between the client and each data center).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Shraer, Diard, Lowry, Takemoto and MALDANER with Kawai because Kawai’s teaching of providing a table which including the client and servers assignments and the communication delay times would have provided Bernat, Dutta, Shraer, Diard, Lowry, Takemoto and MALDANER’s system with the advantage and capability to allow the system to easily determining and managing the different clients and servers assignment/association for processing the workload which improving the system performance and efficiency. 

	Bernat, Dutta, Shraer, Diard, Lowry, Takemoto, MALDANER and Kawai fail to specifically teach the delay time is an average time taken by the set of second processors to compute.

	However, Nakamura teaches the delay time is an average time taken by the set of second processors to compute (Nakamura, Fig. 3, processing time management table, distributed node ID, average processing time;  [0021] lines 1-10, the client 200 has: a CPU 201, a memory 202; a storage medium 203; and a communication interface 210, and lying on the memory 202 are: an information request program 204 making a request of an information processing program of the distributed node for information processing; a node management table 206 holding network information of all distributed nodes connectable from the client; a processing time management table 207 measuring and holding time required for the information processing in each distributed node).

	It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Bernat, Dutta, Shraer, Diard, Lowry, Takemoto, MALDANER and Kawai with Nakamura because Nakamura’s teaching of average processing time would have provided Bernat, Dutta, Shraer, Diard, Lowry, Takemoto, MALDANER and Kawai’s system with the advantage and capability to allow the system to easily determining and managing the different average processing times associated with different nodes which improving the system performance and efficiency. 

Response to Arguments
Applicant’s arguments with respect to independent claims 1, 22 and 30 under 35 U.S.C. § 103 rejection have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

In the remark applicant’s argue in substance: 
	(a), Applicant respectfully submits that claim 1 is not directed to a mental process because claim 1 recites limitations that cannot be practically performed by the human mind. Furthermore, without conceding to the appropriateness of the rejection, claim 1 is amended to recite, in part, "execute the video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames," which Applicant submits cannot be practically performed by the human mind.

	(b), The specification explains that distributing work to available hardware accelerators may "overcome manual turning and/or configuration of transformations, reduce bottlenecks, and improve stream density." Specification at [0054]. The claims recite "assign[ing] the set of clients to cause the first hardware accelerator to execute the video frame transformations," "assign a subset of clients of the set of clients to a set of second hardware accelerators to cause the metric associated with each of the set of clients to be below the threshold in response to the determination," and "as a result of assigning the subset of clients, reassign at least one of the subset of clients to the first hardware accelerator to cause the metric associated with each of the set of clients to remain below the threshold." Thus, claim 1 may address one or more technical problems identified in the specification. For at least these reasons, the claims are at least integrated into a practical application and are not abstract under 35 U.S.C. § 101. 

	(c), Claim 3: the rationale for the combination of Fawcett with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight. The rationale provided is essentially that Fawcett fills a gap in the combination of Bernat, Dutta, Diard and Shraer, identified by using Applicant's own application as a guide, and improperly asserts that Bernat, Dutta, Diard and Shraer are improved by providing the features used to fill that gap. In other words, the only motivation to combine these references is found in Applicant's own application and not in the cited references, which is impermissible hindsight.

	(d), Claim 4 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 4 is allowable over Bernat, Dutta, Diard and Shraer at least for depending from an allowable independent claim. Furthermore, Takemoto fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Takemoto with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight.

	(e), Claim 5 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 5 is allowable over Bernat, Dutta, Diard and Shraer at least for depending from an allowable independent claim. Furthermore, Da Silva fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Da Silva with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight. 

	(f), Claims 6 and 7 depend from claim 1 described above. Accordingly, Applicant 
respectfully submits that claims 6 and 7 are allowable at least for depending from an allowable independent claim. Furthermore, Ben Zeev fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Ben Zeev with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight.

	(g), Claims 8 and 10 depend from claim 1 described above. Accordingly, Applicant respectfully submits that claims 8 and 10 are allowable at least for depending from an allowable independent claim. Furthermore, Ueda fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Ueda with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight. 

	(h), Claim 11 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 11 is allowable at least for depending from an allowable independent claim. Furthermore, Izenberg fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Izenberg with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight.

	(i), Claim 12 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 12 is allowable at least for depending from an allowable independent claim. Furthermore, Bernat '748 fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Bernat '748 with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight. 

	(j), Claim 14 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 14 is allowable at least for depending from an allowable independent claim. Furthermore, Bonebakker fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Bonebakker with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight.

	(k), Claim 16 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 16 is allowable at least for depending from an allowable independent claim. Furthermore, Johlekar fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Johlekar with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight.

	(l), Claims 17 and 18 depend from claim 1 described above. Accordingly, Applicant respectfully submits that claims 17 and 18 are allowable at least for depending from an allowable independent claim. Furthermore, Alvelda fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Alvelda with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight.

	(m), Claim 19 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 19 is allowable at least for depending from an allowable independent claim. Furthermore, Galluzzi fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Galluzzi with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight. 

	(n), Claim 20 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 20 is allowable at least for depending from an allowable independent claim. Furthermore, Wu fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Wu with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight.

	(o) Claim 21 depends from claim 1 described above. Accordingly, Applicant respectfully submits that claim 21 is allowable at least for depending from an allowable independent claim. Furthermore, Saulters fails to cure the deficiencies in Bernat, Dutta, Diard and Shraer discussed above. Additionally, the rationale for the combination of Saulters with Bernat, Dutta, Diard and Shraer constitutes impermissible hindsight. 

	(p), Claims 23-25, and 29 each depend from one of claims 22 described above. Accordingly, Applicant respectfully submits that claims 23-25, and 29 are allowable at least for depending from an allowable independent claim.

	(q), Claim 26 depends from claim 22 described above. Accordingly, Applicant respectfully submits that claim 26 is allowable at least for depending from an allowable independent claim. Furthermore, Biran fails to cure the deficiencies in Diard, Bernat, Dutta, Takemoto, Desai and Shraer discussed above. Additionally, the rationale for the combination of Biran with Diard, Bernat, Dutta, Takemoto, Desai and Shraer constitutes impermissible hindsight. 

	(r), Claim 27 depends from claim 22 described above. Accordingly, Applicant respectfully submits that claim 27 is allowable at least for depending from an allowable independent claim. Furthermore, Maldaner fails to cure the deficiencies in Diard, Bernat, Dutta, Takemoto, Desai and Shraer discussed above. Additionally, the rationale for the combination of Maldaner with Diard, Bernat, Dutta, Takemoto, Desai and Shraer constitutes impermissible hindsight.

	(s), Claim 28 depends from claim 22 described above. Accordingly, Applicant respectfully submits that claim 28 is allowable at least for depending from an allowable independent claim. Furthermore, Kawai and Nakamura fail to cure the deficiencies in Diard, Bernat, Dutta, Takemoto, Desai and Shraer discussed above. Additionally, the rationale for the combination of Kawai and Nakamura with Diard, Bernat, Dutta, Takemoto, Desai and Shraer constitutes impermissible hindsight. 

	(t), Claims 31 and 32 each depend from one of claims 30 described above. Accordingly, Applicant respectfully submits that claims 31 and 32 are allowable at least for depending from an allowable independent claim.

	(u), Claim 33 depends from claim 30 described above. Accordingly, Applicant respectfully submits that claim 33 is allowable at least for depending from an allowable independent claim. Furthermore, Maldaner fails to cure the deficiencies in Bernat '748, Dutta, Shraer, Diard, and Takemoto discussed above. Additionally, the rationale for the combination of Maldaner with Diard, Bernat, Dutta, Takemoto, Desai and Shraer constitutes impermissible hindsight.

	(v), Claim 34 depends from claim 30 described above. Accordingly, Applicant respectfully submits that claim 34 is allowable at least for depending from an allowable independent claim. Furthermore, Maldaner and Kawai fail to cure the deficiencies in Bernat '748, Dutta, Shraer, Diard, and Takemoto discussed above. Additionally, the rationale for the combination of Maldaner and Kawai with Diard, Bernat, Dutta, Takemoto, Desai and Shraer constitutes impermissible hindsight.

Examiner respectfully disagreed with Applicant’s argument for the following reasons:
	As to point (a), In response to applicant’s argument that “claim 1 is amended to recite, in part, "execute the video frame transformations to a batch of video frames to modify one or more individual frames in the batch of video frames," which Applicant submits cannot be practically performed by the human mind”. Examiner would like to point out that this particular limitation has been evaluated under Step 2A- Prong 2 and Step 2B. That is, the “execution” step which is merely applying the judicial exception or abstract idea (See MPEP 2106.05(f)). (i.e., The claim does not define any particular machine to “cause” this “video frame transformation,” other than a generic machine such as the “hardware accelerator,” and no details what so ever on how the claimed function (i.e., video frame transformations) will occur, by indicating “modify one or more individual frames in the batch of video frames” is not enough for providing details what so ever on how the claimed function will occur, it is just for transformation and modifying video frame, the claim also does not providing how the “modifying” will occur). Therefore, applicant’s argument has not been found to be persuasive. Please refers to 101 rejection above.

	As to point (b), in response to applicant’s argument that “Thus, claim 1 may address one or more technical problems identified in the specification. For at least these reasons, the claims are at least integrated into a practical application and are not abstract under 35 U.S.C. § 101”. Examiner respectfully disagreed.
	Firstly, the claimed invention merely recites the steps of “identify”, “assign”, “generate a determination”, “assign”, and “reassign”. These limitations can be performed by human mind (please refers to 101 rejection above). 
	Secondly, applicant merely recites the specification [0054] to show it is integrated into a practical application and providing the improvement. However, MPEP 2106.05(a) discloses that “It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception”. Here, the claim does NOT providing any additional limitations other than the generic computing components/Functions that perform the abstract idea (i.e., cause” this “video frame transformation,” other than a generic machine such as the “hardware accelerator,” and no details what so ever on how the claimed function (i.e., video frame transformations) will occur, “modify one or more individual frames in the batch of video frames” is not enough for providing details what so ever on how the claimed function will occur. How this “modification” is performed?). Therefore, applicant’s argument has not been found to be persuasive. Please refers to 101 rejection above.

	As to points (c) to (o), (q) to (s), (u) and (v), In response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning. But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper. See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).

As to points (p) and (t), Examiner respectfully disagrees. The applicant has not established that the based claims are allowable, and therefore, the dependent claims are not allowable by virtue of their dependence. Thus, the applicant’s argument is not persuasive.

For the reasons above, Applicant’s argument has not been found to be persuasive, and therefore the rejections are maintained. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954. The examiner can normally be reached M-F 9:30-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ZUJIA XU/Examiner, Art Unit 2195
Read full office action
Prosecution Timeline

May 26, 2021
Application Filed
Dec 11, 2023
Non-Final Rejection — §101, §103
Mar 05, 2024
Examiner Interview Summary
Mar 05, 2024
Applicant Interview (Telephonic)
Mar 12, 2024
Response Filed
Jun 06, 2024
Final Rejection — §101, §103
Jun 24, 2024
Interview Requested
Jul 31, 2024
Response after Non-Final Action
Aug 15, 2024
Response after Non-Final Action
Aug 15, 2024
Applicant Interview (Telephonic)
Sep 12, 2024
Request for Continued Examination
Sep 16, 2024
Response after Non-Final Action
Sep 26, 2024
Non-Final Rejection — §101, §103
Nov 29, 2024
Interview Requested
Dec 09, 2024
Examiner Interview Summary
Dec 09, 2024
Applicant Interview (Telephonic)
Dec 30, 2024
Response Filed
Apr 30, 2025
Final Rejection — §101, §103
Jun 23, 2025
Applicant Interview (Telephonic)
Jun 23, 2025
Examiner Interview Summary
Jun 24, 2025
Response after Non-Final Action
Jul 14, 2025
Request for Continued Examination
Jul 19, 2025
Response after Non-Final Action
Sep 05, 2025
Non-Final Rejection — §101, §103
Nov 18, 2025
Examiner Interview Summary
Nov 18, 2025
Applicant Interview (Telephonic)
Dec 10, 2025
Response Filed
Mar 26, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/240,406
Patent 12602249
Hardware Resource Allocation System for Allocating Resources to Threads
2y 5m to grant Granted Apr 14, 2026
17/651,943
Patent 12541397
THREAD MANAGEMENT
2y 5m to grant Granted Feb 03, 2026
17/615,729
Patent 12504983
SUPERVISORY DEVICE WITH DEPLOYED INDEPENDENT APPLICATION CONTAINERS FOR AUTOMATION CONTROL PROGRAMS
2y 5m to grant Granted Dec 23, 2025
18/725,590
Patent 12498971
COMPUTING TASK SCHEDULING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
2y 5m to grant Granted Dec 16, 2025
17/349,937
Patent 12436805
COMPUTER SYSTEM WITH PROCESSING CIRCUIT THAT WRITES DATA TO BE PROCESSED BY PROGRAM CODE EXECUTED ON PROCESSOR INTO EMBEDDED MEMORY INSIDE PROCESSOR
2y 5m to grant Granted Oct 07, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
68%
Grant Probability
99%
With Interview (+81.5%)
3y 6m
Median Time to Grant
High
PTA Risk
Based on 169 resolved cases by this examiner. Grant probability derived from career allow rate.
DYNAMIC LOAD BALANCING OF OPERATIONS FOR REAL-TIME DEEP LEARNING ANALYTICS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email