Prosecution Insights
Last updated: May 29, 2026
Application No. 17/448,546

SYSTEM AND METHOD FOR RESOURCE ALLOCATION AND SCHEDULING

Final Rejection §103§112
Filed
Sep 23, 2021
Priority
Sep 23, 2020 — CN 202011006550.6 +1 more
Examiner
LIN, HSING CHUN
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Shanghai United Imaging Metahealthcare Co. Ltd.
OA Round
4 (Final)
60%
Grant Probability
Moderate
5-6
OA Rounds
0m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 60% of resolved cases
60%
Career Allowance Rate
65 granted / 109 resolved
+4.6% vs TC avg
Strong +80% interview lift
Without
With
+80.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
20 currently pending
Career history
147
Total Applications
across all art units

Statute-Specific Performance

§101
2.5%
-37.5% vs TC avg
§103
87.1%
+47.1% vs TC avg
§102
3.2%
-36.8% vs TC avg
§112
6.6%
-33.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 109 resolved cases

Office Action

§103 §112
DETAILED ACTION The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-3, 5-11, 15-16, 19-21, and 23-26 are pending in this application. Response to Arguments Applicant’s arguments regarding the rejections of claims 1-20 under 35 U.S.C. 112b have been fully considered and are 1-20 persuasive. The rejections have been withdrawn. However, new 35 U.S.C. 112b rejections are applied to claims 1-3, 5-11, 15-16, 19-21, and 23-26 based on the amendments. Applicant's arguments regarding the 35 U.S.C. 103 rejections of claims 1-3, 5-11, 15-16, 19-21, and 23-26 have been fully considered but they are moot in light of the references being applied in the current rejection or are unpersuasive. Regarding the 35 U.S.C. 103 rejection, the applicant argues the following in the remarks: The prior art fails to teach amended claims 1 and 20. Regarding claim 11, McGrath does not involve comparing the communication distance of the "far edge" device with that of the "on-premise layer 1030." Specifically, McGrath merely discloses that since far edge devices may become compute limited or may not be power efficient as needed to perform a given task, the "on-premise layer 1030" is the next potential tier of a low-latency network edge architecture that provides low latency. It can be seen that the "on-premise layer 1030" in McGrath is merely a backup layer when the "far edge" devices are unable to perform a given task, which is completely silent about the latency magnitude relationship provided by the "far edge" devices and the "on-premise layer 1030". New claim 24 is similar to claim 18 which applied the Sun reference and the Sun reference does not teach the limitations of new claim 24. The dependent claims 2-3, 5-10, 15-16, 19, 21, 23, and 25-26 are allowable since they are dependent on claims 1 and 11. Examiner has thoroughly considered Applicant' s arguments, but respectfully finds them unpersuasive for at least the following reasons: As to point (a), the arguments are moot in light of the references applied in the current rejection. As to point (b), the examiner respectfully disagrees. An endpoint device used by a user which is considered a far edge device performs a task with a lower latency compared to on-premise computing which can include an on-premise rack. McGrath recites in [0107] and [0108] that the endpoint provides the lowest latency possible whereas on premise computing provides a next tier of low-latency architecture, so the endpoint provides a shorter latency compared to the on premise computing. This concept can be illustrated in an example where a user is running a task locally on an office computer which means that the task is being run on an endpoint device and that would have a lowest latency since it does not need to network with a remote device. If the user has to run the task on a rack that is on premises in an office building, that would have a higher latency compared to the task being run locally on the office computer. As to point (c), the arguments are moot in light of the references applied in the current rejection. As to point (d), the examiner respectfully disagrees. Applicant's arguments regarding dependent claims fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the dependent claims define a patentable invention without specifically pointing out how the language of the dependent claims patentably distinguishes them from the references. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1-3, 5-11, 15-16, 19-21, and 23-26 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. As per claims 1 and 20 (line numbers refer to claim 1): Lines 9-10 recite “the idle state refers to a state in which a container is not processing a task” but it is unclear which container “a container” refers to (The claim recites a plurality of containers and one or more target containers.). Lines 14-15 recite “a respective target task from a message queue” and it is unclear if this refers to “a respective target task from a message queue” in line 12. If so, lines 14-15 can be amended to “the respective target task from the message queue”. As per claim 11: Line 20 recites “an edge node” but it is unclear if this refers to “an edge node” in line 19. Additionally, it is unclear if “an edge node” in lines 19 and 20 are part of the at least the portion of the plurality of edge nodes. Claims 2-3, 5-10, 15-16, 19, 21, and 23-26 are dependent claims of claim 1 and 11 and fail to resolve the deficiencies of claims 1 and 11 so they are rejected for the same reasons. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-3, 5, 6, 8, 9, 20, 21, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over He (US 20210208951 A1), in view of He (CN110109649A hereinafter He2), in view of Kim et al. (KR20190143248A hereinafter Kim), and further in view of McQuighan et al. (US 20190155660 A1 hereinafter McQuighan). The claim mappings of He2 are made with a translation of CN110109649A. The claim mappings of Kim are made with a translation of KR20190143248A. He, He2, and Kim were cited in a previous office action. As per claim 1, He teaches the invention substantially as claimed including a method implemented on a processing apparatus, wherein the processing apparatus is configured to perform operations comprising: allocating virtual graphic processing unit (VGPU) resources for a plurality of containers on the processing apparatus, wherein each of the plurality of containers is allocated with a corresponding virtual graphic processing unit (VGPU) resource and is associated with an operation or a service (Figs. 1 and 4; [0071] As shown in FIG. 5, an apparatus 500 for sharing a GPU of the present embodiment; [0069] When Fake-GPU1, FakeGPU2, FakeGPU3 are allocated to different containers; [0041] identify a plurality of available GPUs and allocate to different containers based on different virtual GPU information; [0038] a physical GPU virtualizes 3 virtual GPUs, named as Fake GPU1, Fake GPU2, and Fake GPU3. After Fake GPU1 is mounted to container A; [0057] training tasks running in the target container; [0076] the apparatus 500 for sharing a GPU may further include: a process isolation unit, configured to control the target physical GPU to isolate model training tasks from different containers through different processes, in response to the target physical GPU being simultaneously mounted to at least two containers; [0068] {circle around (7)} containerd calls nvidia-container to mount physical card-Physical GPU0. As of this step, programs inside the container may call the dynamic library libnvidia-container for GPU acceleration; The nvidia-container is for high performing tasks.); identifying one or more target containers from the plurality of containers ([0007] receive a GPU use request initiated by a target container; a virtual GPU determination unit, determine a target virtual GPU based on the GPU use request; where the target virtual GPU is at least one of all virtual GPUs; [0071] The request receiving unit 501 is configured to receive a GPU use request initiated by a target container; [0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0024] containers running on a containerized cloud platform); for each of the one or more target containers, causing the each of the one or more target containers to obtain a respective target task that includes at least one task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0022] A user may interact with the server 105 through the network 104 using the terminal devices 101, 102, 103, to receive or send messages and the like; claim 5 controlling the target physical GPU to isolate model training tasks from different containers through different processes, in response to the target physical GPU being simultaneously mounted to at least two containers; [0024] receiving a GPU use request initiated by a target container from the terminal devices 101, 102, and 103 through the network; A user sends tasks so that means a target task is obtained.), wherein to obtain a respective target task, the processing apparatus is configured to perform operations including: causing the each of the one or more target containers to identify a respective tag of a corresponding requested volume of VGPU resource corresponding to each of the at least one task; causing the each of the one or more target containers to identify the respective target task from the at least one task based on the each of the one or more target containers and the respective tag corresponding to the each of the at least one task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories. The demand type may include classification methods such as video memory type, video memory manufacturer, and batch, in order to select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements; [0069] Fake-GPU1, FakeGPU2, FakeGPU3 are allocated to different containers); and causing the each of the one or more target containers to process the respective target task ([0047] select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container; claim 5 controlling the target physical GPU to isolate model training tasks from different containers through different processes, in response to the target physical GPU being simultaneously mounted to at least two containers), wherein to identify the respective target task from the at least one task, the processing apparatus is configured to perform operations ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type…select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements.). He fails to teach wherein the one or more target containers are in an idle state and the idle state refers to a state in which a container is not processing a task; causing the each of the one or more target containers to obtain a respective target task from a message queue that includes at least one task, wherein to obtain a respective target task from a message queue; identify the respective target task from the at least one task based on a volume of VGPU resource allocated to the each of the one or more target containers and the respective tag corresponding to the each of the at least one task; and the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container. However, He2 teaches wherein the one or more target containers are in an idle state and the idle state refers to a state in which a container is not processing a task ([0014] selecting a basic container with the highest degree of match and/or the longest idle time as the candidate container; [0034] a container pool configured to manage containers; [0125] a matching idle container can be selected from the container pool; [0125] When the service component of the service container is uninstalled, it is put back into the container pool as an idle class library container.); causing the each of the one or more target containers to obtain a respective target task from a message queue that includes at least one task, wherein to obtain a respective target task from a message queue ([0138] the application request needs to be queued first, and then the container control device is notified to load a new service container instance. When the service container instance is available, the queued request is forwarded to the container for processing; [0125] When a new service container needs to be created, a matching idle container can be selected from the container pool to quickly load and run the service container.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He with the teachings of He2 to reduce computing operations (see He2 [0089] selecting the container with the longest idle time as the candidate container can avoid loading and unloading the base container image and reduce the computing operations of the system.). He and He2 fail to teach identify the respective target task from the at least one task based on a volume of VGPU resource allocated to the each of the one or more target containers and the respective tag corresponding to the each of the at least one task; the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container. However, Kim teaches identify the respective target task from the at least one task based on a volume of VGPU resource allocated to the each of the one or more target containers and the respective tag corresponding to the each of the at least one task ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He and He2 with the teachings of Kim to prevent failure (see Kim [0008] The technical problem to be achieved by the present invention is to provide the most practical method and system for implementing a GPU in a container-based virtualization environment using NVIDIA Docker. Additionally, we propose a method and system to prevent program failure or deadlock by considering GPU sharing between multiple containers.). He, He2, and Kim fail to teach the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container. However, McQuighan teaches the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container (Fig. 6, elements 606 and 608; [0078] Returning to FIG. 6, as a result of the scheduling of the API request, the process 600 further includes determining if one or more GPUs operably connected to the scheduled virtual machine contain enough available memory to load and/or execute the API request (606). If the GPUs being connected to or accessible by the scheduled virtual machine maintain the required available memory, the process 600 further includes assigning the API request to a slot or container of the virtual machine (608); [0037] determines that the current slot on a specific VM to which the API request was allocated does not actually have enough GPU memory available to run the request without a failure, partial failure, error, etc., the API server may transfer, transmit, and/or assign the request to a different VM; [0020] Example embodiments presented herein may also refer to CPUs and GPUs generally, where such units may be a virtual CPU and/or a virtual GPU; [0080] rejecting the API request or failing the API request (614). In response to a rejected or failed API request, the process 600 includes reporting the failed or rejected API request to the user (616); [0038] As soon as a VM acquires too much memory, new requests, such as API request #2 (211b) will be rejected and a failure response 213 may be returned to the user; [0049] Once a VM is selected, the processing server 306 further determines a slot on the chosen VM to which to assign the request. In common parlance this may be referred to as receiving work (e.g., job, request, message, etc.) into a slot, pod, or other container.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He, He2, and Kim with the teachings of McQuighan to prevent a request from being executed where there aren’t enough resources (see McQuighan [0037] determines that the current slot on a specific VM to which the API request was allocated does not actually have enough GPU memory available to run the request without a failure, partial failure, error, etc., the API server may transfer, transmit, and/or assign the request to a different VM). As per claim 2, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein the processing apparatus includes at least one cloud server cluster (Fig. 1; [0024] The server 105 may provide various services through various built-in applications. Take a GPU acceleration application that may provide GPU acceleration services for containers running on a containerized cloud platform). As per claim 3, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein the processing apparatus is further configured to perform operations including: receiving a processing request from a terminal device, the processing request including at least one task; for each of the at least one task of the processing request, determining a requested volume of a VGPU resource corresponding to the task; and marking the each of the at least one task of the processing request according to at least the requested volume of the VGPU resource ([0022] A user may interact with the server 105 through the network 104 using the terminal devices 101, 102, 103, to receive or send messages and the like; [0024] receiving a GPU use request initiated by a target container from the terminal devices 101, 102, and 103 through the network 104; then, determining a target virtual GPU based on the GPU use request; [0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories; [0032] Specifically, the GPU use request may include a variety of information, such as user identity information, container affiliation information, container number, business information corresponding to container, business information run by container, business type, and GPU demand applied for.). Additionally, He2 teaches adding the at least one task of the processing request to the message queue ([0138] the application request needs to be queued first, and then the container control device is notified to load a new service container instance. When the service container instance is available, the queued request is forwarded to the container for processing;). As per claim 5, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories. The demand type may include classification methods such as video memory type, video memory manufacturer, and batch, in order to select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements). Additionally, He2 teaches a current task in the message queue ([0138] the application request needs to be queued first, and then the container control device is notified to load a new service container instance. When the service container instance is available, the queued request is forwarded to the container for processing). Additionally, Kim teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations including: determining whether a requested volume of VGPU resource corresponding to a current task matches a respective capacity of the each of the one or more target containers; and in response to determining that the requested volume of the VGPU resource corresponding to the current task matches the respective capacity of the each of the one or more target containers, designating the current task as the target task ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls). As per claim 6, He, He2, Kim, and McQuighan teach the method of claim 5. He2 teaches wherein the processing apparatus is further configured to perform operations including: in response to determining that the current task does not match the respective capacity of the each of the one or more target containers, putting the current task back into the message queue ([0138] If all containers of the application have reached the capacity limit and no new container instances can be added, the requests of the application will be queued.); and determining a subsequent task in the message queue matches the respective capacity of the each of the one or more target containers ([0138] When the service container instance is available, the queued request is forwarded to the container for processing; [0139] if the cluster resources are insufficient when 100 requests arrive, for example, only 5 containers can be loaded for the application at most, then 50 of the remaining 97 requests will be evenly distributed to 5 newly loaded containers (10 requests per container), 3 requests will be assigned to each of the existing containers, and the remaining 38 requests will be queued.). Additionally, Kim teaches determining that the requested volume of the VGPU resource corresponding to the current task does not match the respective capacity of the each of the one or more target containers; and determining whether a requested volume of VGPU resource corresponding to a subsequent task in the message queue matches the respective capacity of the each of the one or more target containers ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls). As per claim 8, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories. The demand type may include classification methods such as video memory type, video memory manufacturer, and batch, in order to select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements). Additionally, Kim teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations including: determining whether a requested volume of VGPU resource corresponding to a current task is smaller than or equal to a respective capacity of the each of the one or more target containers; in response to determining that the requested volume of the VGPU resource corresponding to the current task is smaller than or equal to the respective capacity of the each of the one or more target containers, determining that the requested volume of the VGPU resource corresponding to the current task matches the respective capacity of the each of the one or more target containers; and in response to determining that the requested volume of the VGPU resource corresponding to the current task is larger than the respective capacity of the each of the one or more target containers, determining that the requested volume of the VGPU resource corresponding to the current task does not match the respective capacity of the each of the one or more target containers ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls). As per claim 9, He, He2, Kim, and McQuighan teach the method of claim 1. He2 teaches wherein a capacity of a first container of the plurality of containers is different from a capacity of a second container of the plurality of containers ([0080] if these containers are located on different hosts, a container on a host with the lowest load is preferentially selected as a candidate container; [0089] the basic container with the highest matching degree and/or the longest idle time is selected as a candidate container; [0010] the base container is a container loaded with a base container image; the library container is a container loaded with a base container image and a public library; the service container is a candidate container selected from containers loaded with a base container image, a public library, a private library, and a service component; the dormant container is a base container snapshot that saves the running status of the base container). As per claim 20, it is a system claim of claim 1, so it is rejected for the same reasons as claim 1. As per claim 21, He, He2, Kim, and McQuighan teach the method of claim 5. He teaches wherein the requested volume of the VGPU resource corresponding to the current task is determined based on an acquired tag marking the requested volume of the VGPU resource corresponding to the current task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type…select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements.). As per claim 26, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein the corresponding requested volume of VGPU resource corresponding to the each of the at least one task is determined based on a volume of data relating to the each of the at least one task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type…select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements), the data relating to the at least one task including one or more images, a count of images to be processed, a size of each of the one or more images, and a processing algorithm corresponding to the each of the one or more images ([0032] Specifically, the GPU use request may include a variety of information, such as user identity information, container affiliation information, container number, business information corresponding to container, business information run by container, business type, and GPU demand applied for. Here, the GPU demand includes video memory capacity, video memory level, video memory type). Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over He, He2, Kim, and McQuighan, as applied to claim 6 above, in view of Vembu et al. (US 20180293185 A1 hereinafter Vembu). Vembu was cited in a previous office action. As per claim 7, He, He2, Kim, and McQuighan teach the method of claim 6. McQuighan teaches wherein the each of the at least one task has a priority level, the at least one task being arranged in an order in the message queue ([0041] the new API requests 211a-b are put in a queue. This information is put into the scheduler 212 and queued requests may be prioritized into a score). He, He2, Kim, and McQuighan fail to teach the at least one task being arranged in an order in the message queue according to the priority level of the each of the at least one task. However, Vembu teaches the at least one task being arranged in an order in the message queue according to the priority level of the each of the at least one task ([0194] At 2101, priorities associated with tasks/threads are identified. At 2102, the tasks are submitted into priority-based task queues.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He, He2, Kim, and McQuighan with the teachings of Vembu to allow the highest priority operations to be performed first (see Vembu [0194] Any arbitration which is performed at the front end of a graphics pipeline stage or individual functional unit may then consult the priority of each of the operations waiting to be processed and may schedule the operations in accordance with the priorities). Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over He, He2, Kim, and McQuighan, as applied to claim 1 above, in view of Woo (US 20210279157 A1). Woo was cited in a previous office action. As per claim 10, He, He2, Kim, and McQuighan teach the method of claim 1. He2 teaches wherein the processing apparatus is further configured to perform operations including: putting a task processed by the first container back into the message queue; the at least one task in the message queue ([0104] In one embodiment, the service container can report its own load changes. The reporting method can be a single report (once for each report processed), a periodic report (once every n seconds) or a batch report (once for each n requests processed). After the service container processes the last request, if the service container does not receive a new request within m seconds (m is a positive number), the local container resource manager is notified to uninstall the application software package in the container, and the service container uninstall event is reported to the service routing (or metadata database); [0138] the application request needs to be queued first). Additionally, Kim teaches in response to determining that the at least one task does not match a respective capacity of the each of the one or more target containers, resetting the each of the one or more target containers ([0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available, and any memory allocation requested by that container will be suspended until the scheduler allocates more GPU memory to the container; [0061] Containers should be scheduled based on their maximum available GPU memory; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls). He, He2, Kim, and McQuighan fail to teach setting a renewed first container according to a mirrored first container if a first container collapses. However, Woo teaches setting a renewed first container according to a mirrored first container if a first container collapses ([0113] provides a replication control to restart/recover a container abnormally terminated). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He, He2, Kim, and McQuighan with the teachings of Woo to recover containers abnormally terminated (see Woo [0113] provides a replication control to restart/recover a container abnormally terminated). Claims 11, 15, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Dilley et al. (US 10791168 B1 hereinafter Dilley) in view of McGrath et al. (US 20200296155 A1 hereinafter McGrath). Dilley and McGrath were cited in a prior office action. As per claim 11, Dilley teaches a method implemented on a processing apparatus, wherein the processing apparatus is configured to perform operations comprising: identifying, from a plurality of edge nodes that are communicated with a terminal device, a target edge node, the target edge node including one or more target containers (Col. 22 lines 43-45 The workload placement manager 506 determines one or more edges at which to place workload; Col. 10 lines 45-47 Each edge includes at least one proxy server instance 414 to manage network connections between external endpoints 110; Col. 10 lines 52-53 one or more containers in the edge; Col. 8 lines 58-61 each edge node 122 includes processor hardware that runs an operating system, container and cluster scheduling and management software, and tenant workloads; Col. 7 lines 54-60 code of an example workload is packaged in one or more containers (sometimes referred to as a ‘containerized application’) in which the code within each container configures a different node 122 to provide a different micro-service. An example workload includes one or more code packages that implement workload functions to be executed at an edge data center 106; Col. 34 lines 30-32 the address resolution service identifies a target edge location with the lowest distance to the endpoint); transmitting at least one task to the target edge node for processing (Col. 22 lines 43-45 The workload placement manager 506 determines one or more edges at which to place workload; Col. 22 lines 48-56 The workload placement manager 506 sends placement requests, which indicate the one or more edges at which to place instances of the workload together with the configuration specification, to the workload message server 508. To load the workload to selected edges, the workload message server 508 sends workload placement commands over the workload placement network 108 to edge message clients 533 at the one or more edges 106 at which a workload is to be placed.); and receiving a processing result of the at least one task from the target edge node, wherein the processing result of the at least one task is determined by (Col. 33 lines 16-17 where the workload instance processes the message and returns a response message; Col. 22 lines 43-45 The workload placement manager 506 determines one or more edges at which to place workload;): causing the one or more target containers to obtain and process the at least one task (Col. 10 lines 52-53 The L4 proxy functionality runs in one or more containers in the edge, much like other workloads; Col. 21 lines 35-36 direct messages to code containers and their workloads within the edge; Col. 7 lines 54-60 For instance, code of an example workload is packaged in one or more containers (sometimes referred to as a ‘containerized application’) in which the code within each container configures a different node 122 to provide a different micro-service. An example workload includes one or more code packages that implement workload functions to be executed at an edge data center 106.); and selecting a processing approach for the at least one task based on a type of each of the at least one task, wherein the identifying, from the plurality of edge nodes that are communicated with the terminal device, the target edge node includes (Col. 5 lines 23-37 Tenants specify application performance requirements via the administrative UI for their individual tenant applications, which can include geographic location, network communication latency, time of use, application sizing and resource usage preferences, for example. Different tenant applications typically have different performance requirements. The performance requirements of some tenant applications change over time. From time to time, for example, a tenant may adjust the performance requirements of a tenant application. Based upon tenant-specified performance requirements, the orchestration manager 104 orchestrates placement of tenant applications at edge data centers 106 steers external endpoint requests to edges where requested tenant applications are placed, and schedules execution of tenant applications at the edges; Col. 10 lines 45-47 Each edge includes at least one proxy server instance 414 to manage network connections between external endpoints 110): obtaining node information of the plurality of edge nodes; determining a communication distance between each of at least a portion of the plurality of edge nodes and the terminal device based on the node information, wherein the communication distance refers to a distance between the terminal device and an edge node or a network delay time between the terminal device and an edge node; identifying a first edge node from the at least a portion of the plurality of edge nodes based on the determined communication distances between the at least a portion of the plurality of edge nodes and the terminal device (Col. 16 lines 37-41 Another example traffic steering manager 510 selects an edge using information within the orchestration manager, which collects information indicative of geographic locations of edges and external endpoints and indicative of network communication latency; Col. 14 lines 54-62 Some criteria that an example workload placement system uses to determine suitable locations include…Endpoint to edge latency preferences: for tenant ingress workloads that have a specified preferred maximum round trip time between endpoint and edge to optimize interactive application performance; Col. 6 line 66-Col. 7 line 3 A determination as to which edge 106 is most appropriate to handle a request at a given moment in time is based at least in part upon external endpoint location, the time it takes for data to travel from the external endpoint to a given edge (“latency”); Col. 34 lines 30-32 the address resolution service identifies a target edge location with the lowest distance to the endpoint); transmitting a first request regarding the target edge node to the first edge node (Col. 16 lines 23-29 The traffic steering manager 510 receives a request from an external endpoint to access a workload, and in response to the request, the traffic steering manager 510 selects an edge 106 hosting the requested workload to service the request based upon one or more of multiple factors including geographic location of the edge, tenant-specified latency preferences); in response to receiving, from the first edge node, a first response indicating that the first edge node is capable of processing the at least one task, designating the first edge node as the target edge node; in response to determining that the first edge node transmits an indication that the first edge node is incapable of processing the at least one task to a cloud server, receiving, from the cloud server, a second response including an identification of a second edge node allocated by the cloud server, wherein the second edge node is capable of processing the at least one task, and determining the target edge node based on the second response (Figs. 1, 5, 9C; Col. 29 line 63-Col. 30 line 29 Decision module 946 determines whether an edge from the group of candidate edges for the workload is within the RTT SLO for this group of endpoints. Recall that the grouped external endpoints 110 are all within a short distance of each other. In response to a determination at decision module 946 that a currently selected edge meets the RTT SLO, decision module 948 determines whether the currently selected edge has sufficient available resources (e.g., CPU cycles, memory storage) to accommodate the endpoint traffic. In response to a determination at decision module 948 that a currently selected edge has sufficient resource availability to serve the given workload, module 950 adds the currently selected edge to a set of edges, NewEdges. In response to a determination at decision module 948 that a currently selected edge does not have sufficient resource availability to serve the given workload, decision module 952 determines whether there are more external endpoints 110 in the group of external endpoints that have not yet been evaluated. If there are additional external endpoints to evaluate, then following decision module 952, control flows back to module 944 and a next in order group of external endpoints is selected. Also, control flows to decision module 952 in response to a determination at decision module 948 that that a currently selected edge does not have sufficient resource availability to serve the given workload. In response to a determination at decision module 952 that there are no additional edges in the group to be evaluated, control flows to module 954, which requests placement of the workload at the edges within the NewEdges set. Note that some endpoints may not be able to be served within RTT SLO as a result of this control flow, as in the case where no edge exists within the RTT SLO. Thus, the SLO is a service level objective rather than a service level agreement or assurance; Col. 29 lines 35-38 the workload placement manager 506 adds that additional edge to an EdgeSet for the workload and requests, via the workload message server 508; Col. 15 lines 20-23 Communicate a set of edges for a given workload to the workload command message server 508, to initiate deployment of that workload's code to the intended edges; Col. 20 lines 14-19 Instruct a cluster scheduling system 542 to execute the workload within an edge, according to the configuration specification associated with the workload. Report cluster scheduler issues and events such as workload launch or failure, for example, back to the orchestration manager 104 via the edge message client 533; Col. 18 lines 22-25 Referring to FIG. 5, the edge message client 533 maintains a connection with the workload message server 508 through which it receives control instructions and transmits workload status information back to the message server 508; Col. 29 lines 32-38 the workload placement manager 506 determines whether there is a different edge with sufficient available resources to accommodate the external endpoint traffic. If so, the workload placement manager 506 adds that additional edge to an EdgeSet for the workload and requests, via the workload message server 508). Dilley fails to teach a second request indicating that the first edge node is incapable of processing the at least one task, and a first communication distance between the first edge node and the terminal device is shorter than a second communication distance between the second edge node and the terminal device. However, McGrath teaches a second request indicating that the first edge node is incapable of processing the at least one task, and a first communication distance between the first edge node and the terminal device is shorter than a second communication distance between the second edge node and the terminal device ([0107] Endpoint devices being used by the end user or accessible via the nearby local layer 1020 may be considered as the “far edge” devices. Devices in this scenario may provide the lowest latency possible. However, at some point, far edge devices may become compute limited or may not be power efficient as needed to perform a given task. For instance, at some point of network traffic load, AR/VR use cases will experience severe degradation (even to the point of providing a worse performance than executing the workload only at the far edge on the device itself); [0108] On premise computing at the on-premise layer 1030 is a next potential tier of a low-latency network edge architecture. On premise refers to a location (typically within the customer premises) that may be able to host a certain amount of compute (from a small form factor rack to multiple racks); [0057] Examples of latency, resulting from network communication distance and processing time constraints, may range from less than a millisecond (ms) when among the endpoint layer 200, under 5 ms at the edge devices layer 210). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Dilley with the teachings of McGrath to determine an edge that can most efficiently handle a workload (see McGrath [0117] In further examples, advanced forms of workload mapping may be used in an edge computing system to map specific forms of compute activities to specific locations and systems (or types of systems and location capabilities, to more efficiently bring the workload data to available compute resources).). As per claim 15, Dilley and McGrath teach the method of claim 11. Dilley teaches wherein the identifying, from the plurality of edge nodes that are communicated with the terminal device, the target edge node further includes: causing the terminal device to transmit a third request regarding the target edge node to the cloud server directly in response to determining that resources of the terminal device are insufficient (Col. 17 lines 28-30 sends a request to the message server 508 indicating the set of edges 106 intended to run a given tenant workload; Col. 22 lines 43-45 The workload placement manager 506 determines one or more edges at which to place workload; Col. 10 lines 45-47 Each edge includes at least one proxy server instance 414 to manage network connections between external endpoints 110; Col. 31 lines 16-17 A content delivery network (CDN) provides edge delivery of static content to end users by caching that content on CDN edge server locations following a first request for that content, and serving it from a local cache upon subsequent requests; Col. 20 lines 14-19 Instruct a cluster scheduling system 542 to execute the workload within an edge, according to the configuration specification associated with the workload. Report cluster scheduler issues and events such as workload launch or failure, for example, back to the orchestration manager 104 via the edge message client 533; Col. 18 lines 22-25 Referring to FIG. 5, the edge message client 533 maintains a connection with the workload message server 508 through which it receives control instructions and transmits workload status information back to the message server 508; Col. 30 lines 9-12 a determination at decision module 948 that a currently selected edge does not have sufficient resource availability to serve the given workload); receiving, from the cloud server, a third response including an identification of a third edge node, the third edge node being capable of processing the at least one task, and the third edge node corresponding to a shortest communication distance among communication distances between edge nodes allocated by the cloud server and the terminal device; and determining the target edge node based on the third response (Col. 17 lines 28-32 sends a request to the message server 508 indicating the set of edges 106 intended to run a given tenant workload. The message server 508 instructs each indicated edges 106 to run the workload (via its corresponding edge message client 533); Col. 30 lines 6-8 determination at decision module 948 that a currently selected edge has sufficient resource availability to serve the given workload; Col. 34 lines 30-32 the address resolution service identifies a target edge location with the lowest distance to the endpoint; Col. 31 lines 16-17 determines whether a currently selected edge C has resources available for use by the workload W.). As per claim 23, Dilley and McGrath teach the method of claim 11. Dilley teaches wherein the processing approach includes at least one of scheduling resources to perform corresponding operations, allocating and scheduling resources to perform corresponding operations, and postponing the processing of the at least one task (Col. 22 lines 63-67 a corresponding configuration specification and instructing the edge message client 533 to execute the tenant workload according to settings in the configuration specification, such as settings as to scheduling and edge resource allocation; Col. 25 lines 37-41 In the event of a workload failure, for example, the workload placement manager 506 can take action to select a different edge, for example one nearby to a failed location, to schedule the intended tenant workload; Col. 31 lines 52-57 If a workload placement to an edge fails, then workload placement manager 506 takes the failed edge out of the candidate edge set for a prescribed removal time interval, for example one to five days. Future edge refinement will consider other edges further down the consistent stride order in place of the failed edge.). Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Dilley and McGrath, as applied to claim 11 above, in view of Khalid et al. (US 20210027415 A1 hereinafter Khalid). Khalid was cited in a prior office action. As per claim 16, Dilley and McGrath teach the method of claim 11. Dilley and McGrath fail to teach wherein the one or more target containers correspond to virtual graphic processing unit (VGPU) resources, the target edge node being configured to process an image rendering task in parallel with the terminal device. However, Khalid teaches wherein the one or more target containers correspond to virtual graphic processing unit (VGPU) resources, the target edge node being configured to process an image rendering task in parallel with the terminal device ([0043] The vGPUs 215 in container 219; [0015] The MEC virtualization system may receive, for example, video frames from the end device, process/render/encode graphics; claim 1 wherein the service request is for multi-access edge compute (MEC)-based virtual graphic processing unit (vGPU) services; [0042] end device 180 may generate primary rendering data 308. Primary rendering data may include, for example, object tracking, AR object rendering, model rendering; [0036] For example, container engine 240 may provision containers 219 (e.g., groups of vGPUs 215 executing functions 217) for a group of parallel functions that are used simultaneously to provide rendering data for the customer applications; [0041] a container 219 in MEC cluster 210; [0069] a user of end device 180 may launch application 185, which may cause application 185 to request MEC-based support services for XR from a preconfigured cloud platform in external network 160; [0036] container engine 240 may provision containers 219 (e.g., groups of vGPUs 215 executing functions 217) for a group of parallel functions that are used simultaneously to provide rendering data for the customer applications). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Dilley and McGrath with the teachings of Khalid to optimally allocate resources (see Khalid [0038] optimally allocate resources at each MEC cluster 210 and between MEC clusters 210.). Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Dilley and McGrath, as applied to claim 11 above, in view of Song (CN110196753A). Song was cited in a prior office action. As per claim 19, Dilley and McGrath teach the method of claim 11. Dilley teaches wherein the cloud server includes one or more target containers (Col. 8 lines 46-47 The example edge node 122 includes a hardware layer 302 that includes a server system; Col. 8 lines 58-62 each edge node 122 includes processor hardware that runs an operating system, container and cluster scheduling and management software, and tenant workloads. The container management system layer 304 manages the execution of software containers). Dilley and McGrath fail to teach the one or more target containers corresponding to VGPU resources. However, Song teaches the one or more target containers corresponding to VGPU resources ([0008] The embodiments of the present invention provide a container-based graphics processor (GPU) virtualization method, device and readable medium, which are used to implement container-based GPU virtualization to support various models of GPU computing cards.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Dilley and McGrath with the teachings of Song to improve GPU resource utilization (see Song [0158] achieve multi-container sharing of GPU resources, thereby improving the efficiency of GPU resource utilization.). Claims 24 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Dilley and McGrath, as applied to claim 11 above, in view of Eberlein et al. (US 20210182108 A1 hereinafter US 20210182108 A1). As per claim 24, Dilley and McGrath teach the method of claim 11. Dilley teaches wherein the identifying, from the plurality of edge nodes that are communicated with the terminal device, the target edge node further includes: in response to determining that there is no edge node capable of processing the at least one task, causing the cloud server to transmit a fourth response to the terminal device (Col. 26 lines 65-67 Edges are removed from the edge set that do not have sufficient resource capacity to execute the workload; Col. 27 lines 8-16 In response to a determination at decision module 908 that no edges remain in the EdgeSet, then control flows to a placement failed module 910, which records a failed placement status for the workload. The recorded failed placement status is viewable at the display UI at the administrative module. An example module 910 also sends a failed placement alert message over the communication network to the workload owner if such a notification mechanism has been configured; Col. 4 lines 37-39 Example external endpoints include a web browser with a graphical user interface, which is typically used by a human end user;). Dilley and McGrath fail to teach the fourth response instructing the terminal device to transmit the at least one task to the cloud server directly for processing; causing the cloud server to process the at least one task, wherein to process the at least one task, the processing apparatus is configured to perform operations including: establishing a correspondence relationship between the type of the each of the at least one task and the processing approach, wherein the correspondence relationship between the type of the each of the at least one task and the processing approach is at least determined by training a primary relationship between task types and processing approaches using a machine learning method, and the type of the each of the at least one task includes data calculation tasks, image reconstruction tasks, image rendering tasks, and model training tasks; and selecting the processing approach for processing the at least one task according to the correspondence relationship and the type of the each of the at least one task; and ; and receiving a processing result of the at least one task from the cloud server. However, Eberlein teaches the fourth response instructing the terminal device to transmit the at least one task to the cloud server directly for processing; causing the cloud server to process the at least one task, wherein to process the at least one task, the processing apparatus is configured to perform operations including: establishing a correspondence relationship between the type of the each of the at least one task and the processing approach, wherein the correspondence relationship between the type of the each of the at least one task and the processing approach is at least determined by training a primary relationship between task types and processing approaches using a machine learning method, and the type of the each of the at least one task includes data calculation tasks, image reconstruction tasks, image rendering tasks, and model training tasks; and selecting the processing approach for processing the at least one task according to the correspondence relationship and the type of the each of the at least one task; and ; and receiving a processing result of the at least one task from the cloud server ([0120] The Computer 602 can receive requests over Network 630 (for example, from a client software application executing on another Computer 602) and respond to the received requests by processing the received requests using a software application or a combination of software applications. In addition, requests can also be sent to the Computer 602 from internal users (for example, from a command console or by another internal access method), external or third-parties, or other entities, individuals, systems, or computers; [0118] The Computer 602 can serve in a role in a distributed computing system as, for example, a client, network component, a server… Computer 602 can be configured to operate within an environment, or a combination of environments, including cloud-computing; [0052] A workload profile is computed by the Resource Consumption Prediction 308 for the software processes using workload data or a machine-learning prediction model is trained to be able to assign an expected workload profile to a new software process (for example, with a given process—name such as “PDF—rendering”). Based on the computed workload profile or the machine-learning prediction model for the new software process, the new software process is scheduled in the available managed landscape; [0060] In some implementations, Load statistics DB 306 includes data 404. Data 404 can include an identifier of the running task (Run-Id), a Task-ID (the task/software process type, matching to a definition in the Process Repository 314), Parameter-Values for the parameters of the task defined in the Process Repository 314, and the measured Load-Statistics providing data on particular tasks/software processes. Data 402 is updated by the Load monitor 302 and used by Resource Consumption Prediction 308 to train the machine-learning prediction model described in FIG. 3. In some implementations, data 406 is maintained by the Process Repository 314 and can include a Task-ID and associated Parameters. Data 406 is used to provide particular data associated with each type of software process associated with a Task-ID (such as, Cache-size and work-processes associated with ABAP). When a software process is to be scheduled, it can be parametrized. For example, for rendering a PDF, the size of the document is a parameter; [0053] A workload management system (WMS) (considered to be an aggregate of the Resource Consumption Prediction 308, scheduler 310, To be scheduled processes 312, Process Repository 314, and Landscape Directory 316) has knowledge about the Managed Landscape 304 and available hardware configurations (for example, refer to FIG. 4, 404). Additionally, the WMS has knowledge about the current workload of the available managed landscape. The WMS can fit the new software process to the managed landscape according to an algorithm (for example, using “best fit” or “equal distribution”/“next fit”). For a “first fit” algorithm, this enhanced approach would take into account all workload resource types and find the first where all workload resource types fit. For a “best fit” algorithm, the enhanced approach would take into account and compute an average of the fit-assessment and take a highest average. Since RAM is typically the most critical resource, it can be over weighted in computing an average to prioritize a resource with best fitting RAM; [0061] In some implementations, three process and task types are defined: [0062] a) Long-running process (for example, with state)—such as, an in-memory DB and application server. [0063] b) Scheduled task related to a “long running processes”—such as, a backup of a DB, provisioning of a tenant to a DB, deployment of an application/update package to an application server. [0064] c) Scheduled tasks—such as, PDF rendering or machine-learning prediction model training.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Dilley and McGrath with the teachings of Eberlein to optimize the distribution of workloads (see Eberlein [0005] The subject matter described in this specification can be implemented to realize one or more of the following advantages. First, distribution of heterogeneous workloads can be optimized). As per claim 25, Dilley, McGrath, and Eberlein teach the method of claim 24. Eberlein teaches wherein the correspondence relationship between the type of the each of the at least one task and the processing approach is further determined by ([0060] In some implementations, Load statistics DB 306 includes data 404. Data 404 can include an identifier of the running task (Run-Id), a Task-ID (the task/software process type, matching to a definition in the Process Repository 314), Parameter-Values for the parameters of the task defined in the Process Repository 314, and the measured Load-Statistics providing data on particular tasks/software processes. Data 402 is updated by the Load monitor 302 and used by Resource Consumption Prediction 308 to train the machine-learning prediction model described in FIG. 3. In some implementations, data 406 is maintained by the Process Repository 314 and can include a Task-ID and associated Parameters. Data 406 is used to provide particular data associated with each type of software process associated with a Task-ID (such as, Cache-size and work-processes associated with ABAP). When a software process is to be scheduled, it can be parametrized. For example, for rendering a PDF, the size of the document is a parameter; [0053] A workload management system (WMS) (considered to be an aggregate of the Resource Consumption Prediction 308, scheduler 310, To be scheduled processes 312, Process Repository 314, and Landscape Directory 316) has knowledge about the Managed Landscape 304 and available hardware configurations (for example, refer to FIG. 4, 404). Additionally, the WMS has knowledge about the current workload of the available managed landscape. The WMS can fit the new software process to the managed landscape according to an algorithm (for example, using “best fit” or “equal distribution”/“next fit”). For a “first fit” algorithm, this enhanced approach would take into account all workload resource types and find the first where all workload resource types fit. For a “best fit” algorithm, the enhanced approach would take into account and compute an average of the fit-assessment and take a highest average. Since RAM is typically the most critical resource, it can be over weighted in computing an average to prioritize a resource with best fitting RAM;): receiving a manual setting from a user, analyzing historical records, and analyzing configurations of the target edge node, the historical records are analyzed based on historical performances obtained using the machine learning method, and the historical performances include resource occupancies, processing speeds, and transmission durations ([0033] In some implementations, a scheduler or other described systems/processes (for example, refer to FIG. 3) can provide various user interfaces (such as, a graphical user interface (GUI)) to permit configuration of any value described in this disclosure. For example, a “density slider” could be implemented to permit a system operator to adjust a density higher (that is, a more aggressive setting) at a planning time for a workload to more densely pack software processes together on computing equipment; [0110] For a software process similar to the new software process and prior to receiving the request to schedule the new software process, resource usage data is read every time interval delta-t from a managed landscape and stored per process, as stored data into a load statistics database. The stored data is read from the load statistics database; [0111] a machine-learning prediction model to predict workload consumption for the new software process. In some implementations, the machine-learning prediction model is continuously updated using data from the load statistics database; [0089] In some implementations, an optional extension (control circuit) can be configured to measure resource utilization, especially an overload situation and process termination and add these data sets to the Load Statistics DB 306 as additional data points for use in optimizing distribution of heterogeneous software process workloads. For example, an overload situation might occur when performance drops due to insufficient computing memory. In this case, paging can occur. Another example can include a software processes taking too long to complete because of insufficient CPU resources or saturated IO resources; [0042] For example, when measuring resource demand of a software process, average (av 202) workload usage parameters can be calculated during runtime of the software process. If the software process is running “long” (for example, 1 h or 8 h), an average per time-slice (that is of 1 or 8 hours) can be calculated; [0149] a client computer having a graphical user interface or a Web browser through which a user can interact with). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to HSING CHUN LIN whose telephone number is (571)272-8522. The examiner can normally be reached Mon - Fri 9AM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /H.L./Examiner, Art Unit 2195 /Aimee Li/Supervisory Patent Examiner, Art Unit 2195
Read full office action

Prosecution Timeline

Show 1 earlier event
Mar 26, 2024
Non-Final Rejection mailed — §103, §112
Jun 26, 2024
Response Filed
Oct 28, 2024
Final Rejection mailed — §103, §112
Jan 15, 2025
Request for Continued Examination
Jan 21, 2025
Response after Non-Final Action
May 08, 2025
Non-Final Rejection mailed — §103, §112
Aug 08, 2025
Response Filed
Nov 19, 2025
Final Rejection mailed — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12554523
REDUCING DEPLOYMENT TIME FOR CONTAINER CLONES IN COMPUTING ENVIRONMENTS
3y 8m to grant Granted Feb 17, 2026
Patent 12547458
PLATFORM FRAMEWORK ORCHESTRATION AND DISCOVERY
4y 7m to grant Granted Feb 10, 2026
Patent 12468573
ADAPTIVE RESOURCE PROVISIONING FOR A MULTI-TENANT DISTRIBUTED EVENT DATA STORE
2y 11m to grant Granted Nov 11, 2025
Patent 12461785
GRAPHIC-BLOCKCHAIN-ORIENTATED SHARDING STORAGE APPARATUS AND METHOD THEREOF
3y 4m to grant Granted Nov 04, 2025
Patent 12443425
ISOLATED ACCELERATOR MANAGEMENT INTERMEDIARIES FOR VIRTUALIZATION HOSTS
3y 10m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+80.0%)
3y 4m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 109 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month