Last updated: April 19, 2026
Application No. 17/448,546
SYSTEM AND METHOD FOR RESOURCE ALLOCATION AND SCHEDULING

Final Rejection §103§112
Filed
Sep 23, 2021
Examiner
LIN, HSING CHUN
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Shanghai United Imaging Metahealthcare Co. Ltd.
OA Round
4 (Final)
This examiner grants 59% of cases after interview

— +79.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 108 resolved cases, 2023–2026
Examiner Intelligence

LIN, HSING CHUN View full profile →
Grants 59% of resolved cases
Career Allow Rate
64 granted / 108 resolved
+4.3% vs TC avg
Strong +80% interview lift
Without
With
+79.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
37 currently pending
Career history
145
Total Applications
across all art units
Statute-Specific Performance

§101
17.1%
-22.9% vs TC avg
§103
35.8%
-4.2% vs TC avg
§102
6.5%
-33.5% vs TC avg
§112
34.0%
-6.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 108 resolved cases
Office Action

§103 §112
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-3, 5-11, 15-16, 19-21, and 23-26 are pending in this application.

Response to Arguments
Applicant’s arguments regarding the rejections of claims 1-20 under 35 U.S.C. 112b have been fully considered and are 1-20 persuasive. The rejections have been withdrawn.  However, new 35 U.S.C. 112b rejections are applied to claims 1-3, 5-11, 15-16, 19-21, and 23-26 based on the amendments.

Applicant's arguments regarding the 35 U.S.C. 103 rejections of claims 1-3, 5-11, 15-16, 19-21, and 23-26 have been fully considered but they are moot in light of the references being applied in the current rejection or are unpersuasive.
Regarding the 35 U.S.C. 103 rejection, the applicant argues the following in the remarks:
The prior art fails to teach amended claims 1 and 20.
Regarding claim 11, McGrath does not involve comparing the communication distance of the "far edge" device with that of the "on-premise layer 1030." Specifically, McGrath merely discloses that since far edge devices may become compute limited or may not be power efficient as needed to perform a given task, the "on-premise layer 1030" is the next potential tier of a low-latency network edge architecture that provides low latency. It can be seen that the "on-premise layer 1030" in McGrath is merely a backup layer when the "far edge" devices are unable to perform a given task, which is completely silent about the latency magnitude relationship provided by the "far edge" devices and the "on-premise layer 1030".
New claim 24 is similar to claim 18 which applied the Sun reference and the Sun reference does not teach the limitations of new claim 24. 
The dependent claims 2-3, 5-10, 15-16, 19, 21, 23, and 25-26 are allowable since they are dependent on claims 1 and 11.

Examiner has thoroughly considered Applicant' s arguments, but respectfully finds them unpersuasive for at least the following reasons:
As to point (a), the arguments are moot in light of the references applied in the current rejection. 
As to point (b), the examiner respectfully disagrees. An endpoint device used by a user which is considered a far edge device performs a task with a lower latency compared to on-premise computing which can include an on-premise rack. McGrath recites in [0107] and [0108] that the endpoint provides the lowest latency possible whereas on premise computing provides a next tier of low-latency architecture, so the endpoint provides a shorter latency compared to the on premise computing. This concept can be illustrated in an example where a user is running a task locally on an office computer which means that the task is being run on an endpoint device and that would have a lowest latency since it does not need to network with a remote device. If the user has to run the task on a rack that is on premises in an office building, that would have a higher latency compared to the task being run locally on the office computer. 
As to point (c), the arguments are moot in light of the references applied in the current rejection. 
As to point (d), the examiner respectfully disagrees. Applicant's arguments regarding dependent claims fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the dependent claims define a patentable invention without specifically pointing out how the language of the dependent claims patentably distinguishes them from the references.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-3, 5-11, 15-16, 19-21, and 23-26 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

As per claims 1 and 20 (line numbers refer to claim 1):
Lines 9-10 recite “the idle state refers to a state in which a container is not processing a task” but it is unclear which container “a container” refers to (The claim recites a plurality of containers and one or more target containers.).
Lines 14-15 recite “a respective target task from a message queue” and it is unclear if this refers to “a respective target task from a message queue” in line 12. If so, lines 14-15 can be amended to “the respective target task from the message queue”.


As per claim 11:
	Line 20 recites “an edge node” but it is unclear if this refers to “an edge node” in line 19. Additionally, it is unclear if “an edge node” in lines 19 and 20 are part of the at least the portion of the plurality of edge nodes. 

Claims 2-3, 5-10, 15-16, 19, 21, and 23-26 are dependent claims of claim 1 and 11 and fail to resolve the deficiencies of claims 1 and 11 so they are rejected for the same reasons. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 6, 8, 9, 20, 21, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over He (US 20210208951 A1), in view of He (CN110109649A hereinafter He2), in view of Kim et al. (KR20190143248A hereinafter Kim), and further in view of McQuighan et al. (US 20190155660 A1 hereinafter McQuighan).
The claim mappings of He2 are made with a translation of CN110109649A.
 The claim mappings of Kim are made with a translation of KR20190143248A.
He, He2, and Kim were cited in a previous office action.

As per claim 1, He teaches the invention substantially as claimed including a method implemented on a processing apparatus, wherein the processing apparatus is configured to perform operations comprising: allocating virtual graphic processing unit (VGPU) resources for a plurality of containers on the processing apparatus, wherein each of the plurality of containers is allocated with a corresponding virtual graphic processing unit (VGPU) resource and is associated with an operation or a service (Figs. 1 and 4; [0071] As shown in FIG. 5, an apparatus 500 for sharing a GPU of the present embodiment; [0069] When Fake-GPU1, FakeGPU2, FakeGPU3 are allocated to different containers; [0041] identify a plurality of available GPUs and allocate to different containers based on different virtual GPU information; [0038] a physical GPU virtualizes 3 virtual GPUs, named as Fake GPU1, Fake GPU2, and Fake GPU3. After Fake GPU1 is mounted to container A; [0057] training tasks running in the target container; [0076] the apparatus 500 for sharing a GPU may further include: a process isolation unit, configured to control the target physical GPU to isolate model training tasks from different containers through different processes, in response to the target physical GPU being simultaneously mounted to at least two containers; [0068] {circle around (7)} containerd calls nvidia-container to mount physical card-Physical GPU0. As of this step, programs inside the container may call the dynamic library libnvidia-container for GPU acceleration; The nvidia-container is for high performing tasks.); 
identifying one or more target containers from the plurality of containers ([0007] receive a GPU use request initiated by a target container; a virtual GPU determination unit, determine a target virtual GPU based on the GPU use request; where the target virtual GPU is at least one of all virtual GPUs; [0071] The request receiving unit 501 is configured to receive a GPU use request initiated by a target container; [0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0024] containers running on a containerized cloud platform); 
for each of the one or more target containers, 
causing the each of the one or more target containers to obtain a respective target task that includes at least one task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0022] A user may interact with the server 105 through the network 104 using the terminal devices 101, 102, 103, to receive or send messages and the like; claim 5 controlling the target physical GPU to isolate model training tasks from different containers through different processes, in response to the target physical GPU being simultaneously mounted to at least two containers; [0024] receiving a GPU use request initiated by a target container from the terminal devices 101, 102, and 103 through the network; A user sends tasks so that means a target task is obtained.),
wherein to obtain a respective target task, the processing apparatus is configured to perform operations including: causing the each of the one or more target containers to identify a respective tag of a corresponding requested volume of VGPU resource corresponding to each of the at least one task; causing the each of the one or more target containers to identify the respective target task from the at least one task based on the each of the one or more target containers and the respective tag corresponding to the each of the at least one task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories. The demand type may include classification methods such as video memory type, video memory manufacturer, and batch, in order to select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements; [0069] Fake-GPU1, FakeGPU2, FakeGPU3 are allocated to different containers); and
causing the each of the one or more target containers to process the respective target task ([0047] select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container; claim 5 controlling the target physical GPU to isolate model training tasks from different containers through different processes, in response to the target physical GPU being simultaneously mounted to at least two containers),
wherein to identify the respective target task from the at least one task, the processing apparatus is configured to perform operations ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type…select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements.).

He fails to teach wherein the one or more target containers are in an idle state and the idle state refers to a state in which a container is not processing a task; causing the each of the one or more target containers to obtain a respective target task from a message queue that includes at least one task, wherein to obtain a respective target task from a message queue; identify the respective target task from the at least one task based on a volume of VGPU resource allocated to the each of the one or more target containers and the respective tag corresponding to the each of the at least one task; and the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container.

However, He2 teaches wherein the one or more target containers are in an idle state and the idle state refers to a state in which a container is not processing a task ([0014] selecting a basic container with the highest degree of match and/or the longest idle time as the candidate container; [0034] a container pool configured to manage containers; [0125] a matching idle container can be selected from the container pool; [0125] When the service component of the service container is uninstalled, it is put back into the container pool as an idle class library container.); 
causing the each of the one or more target containers to obtain a respective target task from a message queue that includes at least one task, wherein to obtain a respective target task from a message queue ([0138] the application request needs to be queued first, and then the container control device is notified to load a new service container instance. When the service container instance is available, the queued request is forwarded to the container for processing; [0125] When a new service container needs to be created, a matching idle container can be selected from the container pool to quickly load and run the service container.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He with the teachings of He2 to reduce computing operations (see He2 [0089] selecting the container with the longest idle time as the candidate container can avoid loading and unloading the base container image and reduce the computing operations of the system.). 

He and He2 fail to teach identify the respective target task from the at least one task based on a volume of VGPU resource allocated to the each of the one or more target containers and the respective tag corresponding to the each of the at least one task; the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container.

However, Kim teaches identify the respective target task from the at least one task based on a volume of VGPU resource allocated to the each of the one or more target containers and the respective tag corresponding to the each of the at least one task ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He and He2 with the teachings of Kim to prevent failure (see Kim [0008] The technical problem to be achieved by the present invention is to provide the most practical method and system for implementing a GPU in a container-based virtualization environment using NVIDIA Docker. Additionally, we propose a method and system to prevent program failure or deadlock by considering GPU sharing between multiple containers.).

He, He2, and Kim fail to teach the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container.

However, McQuighan teaches the processing apparatus is configured to perform operations including: marking the each of the at least one task with a respective matching status tag, wherein the respective matching status tag indicates whether the respective tag corresponding to the each of the at least one task matches a capacity of a target container of the one or more target containers successfully, and the respective matching status tag includes a respective match failure tag indicating a failure of the matching to a target container of the one or more target containers; identifying the respective match failure tag each time a current target container of the one or more target containers identifies a corresponding target task from the at least one task: and in response to determining that a volume of VGPU resource allocated to the current target container is smaller than or equal to a corresponding requested volume of VGPU resource corresponding to the target task having the match failure tag, omitting the target task having the match failure tag by the current target container (Fig. 6, elements 606 and 608; [0078] Returning to FIG. 6, as a result of the scheduling of the API request, the process 600 further includes determining if one or more GPUs operably connected to the scheduled virtual machine contain enough available memory to load and/or execute the API request (606). If the GPUs being connected to or accessible by the scheduled virtual machine maintain the required available memory, the process 600 further includes assigning the API request to a slot or container of the virtual machine (608); [0037] determines that the current slot on a specific VM to which the API request was allocated does not actually have enough GPU memory available to run the request without a failure, partial failure, error, etc., the API server may transfer, transmit, and/or assign the request to a different VM; [0020] Example embodiments presented herein may also refer to CPUs and GPUs generally, where such units may be a virtual CPU and/or a virtual GPU; [0080] rejecting the API request or failing the API request (614). In response to a rejected or failed API request, the process 600 includes reporting the failed or rejected API request to the user (616); [0038] As soon as a VM acquires too much memory, new requests, such as API request #2 (211b) will be rejected and a failure response 213 may be returned to the user; [0049] Once a VM is selected, the processing server 306 further determines a slot on the chosen VM to which to assign the request. In common parlance this may be referred to as receiving work (e.g., job, request, message, etc.) into a slot, pod, or other container.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He, He2, and Kim with the teachings of McQuighan to prevent a request from being executed where there aren’t enough resources (see McQuighan [0037] determines that the current slot on a specific VM to which the API request was allocated does not actually have enough GPU memory available to run the request without a failure, partial failure, error, etc., the API server may transfer, transmit, and/or assign the request to a different VM).
	

As per claim 2, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein the processing apparatus includes at least one cloud server cluster (Fig. 1; [0024] The server 105 may provide various services through various built-in applications. Take a GPU acceleration application that may provide GPU acceleration services for containers running on a containerized cloud platform).

As per claim 3, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein the processing apparatus is further configured to perform operations including: receiving a processing request from a terminal device, the processing request including at least one task; for each of the at least one task of the processing request, determining a requested volume of a VGPU resource corresponding to the task; and marking the each of the at least one task of the processing request according to at least the requested volume of the VGPU resource ([0022] A user may interact with the server 105 through the network 104 using the terminal devices 101, 102, 103, to receive or send messages and the like; [0024] receiving a GPU use request initiated by a target container from the terminal devices 101, 102, and 103 through the network 104; then, determining a target virtual GPU based on the GPU use request; [0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories; [0032] Specifically, the GPU use request may include a variety of information, such as user identity information, container affiliation information, container number, business information corresponding to container, business information run by container, business type, and GPU demand applied for.).
Additionally, He2 teaches adding the at least one task of the processing request to the message queue ([0138] the application request needs to be queued first, and then the container control device is notified to load a new service container instance. When the service container instance is available, the queued request is forwarded to the container for processing;).


As per claim 5, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories. The demand type may include classification methods such as video memory type, video memory manufacturer, and batch, in order to select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements).
Additionally, He2 teaches a current task in the message queue ([0138] the application request needs to be queued first, and then the container control device is notified to load a new service container instance. When the service container instance is available, the queued request is forwarded to the container for processing).
Additionally, Kim teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations including: determining whether a requested volume of VGPU resource corresponding to a current task matches a respective capacity of the each of the one or more target containers; and in response to determining that the requested volume of the VGPU resource corresponding to the current task matches the respective capacity of the each of the one or more target containers, designating the current task as the target task ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls).

As per claim 6, He, He2, Kim, and McQuighan teach the method of claim 5. He2 teaches wherein the processing apparatus is further configured to perform operations including: in response to determining that the current task does not match the respective capacity of the each of the one or more target containers, putting the current task back into the message queue ([0138] If all containers of the application have reached the capacity limit and no new container instances can be added, the requests of the application will be queued.); and determining a subsequent task in the message queue matches the respective capacity of the each of the one or more target containers ([0138] When the service container instance is available, the queued request is forwarded to the container for processing; [0139] if the cluster resources are insufficient when 100 requests arrive, for example, only 5 containers can be loaded for the application at most, then 50 of the remaining 97 requests will be evenly distributed to 5 newly loaded containers (10 requests per container), 3 requests will be assigned to each of the existing containers, and the remaining 38 requests will be queued.).
Additionally, Kim teaches determining that the requested volume of the VGPU resource corresponding to the current task does not match the respective capacity of the each of the one or more target containers; and determining whether a requested volume of VGPU resource corresponding to a subsequent task in the message queue matches the respective capacity of the each of the one or more target containers ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls).

As per claim 8, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task, to indicate that the container needs to occupy a certain GPU to implement GPU acceleration; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type. The demand quantity may refer to the number of GPU when candidate GPUs all have the same video memory, or may also refer to a video memory demand when the candidate GPUs have different video memories. The demand type may include classification methods such as video memory type, video memory manufacturer, and batch, in order to select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements). 
Additionally, Kim teaches wherein to identify the respective target task from the at least one task, the processing apparatus is further configured to perform operations including: determining whether a requested volume of VGPU resource corresponding to a current task is smaller than or equal to a respective capacity of the each of the one or more target containers; in response to determining that the requested volume of the VGPU resource corresponding to the current task is smaller than or equal to the respective capacity of the each of the one or more target containers, determining that the requested volume of the VGPU resource corresponding to the current task matches the respective capacity of the each of the one or more target containers; and in response to determining that the requested volume of the VGPU resource corresponding to the current task is larger than the respective capacity of the each of the one or more target containers, determining that the requested volume of the VGPU resource corresponding to the current task does not match the respective capacity of the each of the one or more target containers ([0061] Containers should be scheduled based on their maximum available GPU memory; [0062] When a user program calls the memory allocation API, the wrapper module sends memory size information to the scheduler through a UNIX socket prepared in the container. The scheduler tracks all memory allocation calls in the container. So the scheduler can know in real time how much free memory is allowed for that container. When there is enough memory to allocate, the scheduler sends a message to the wrapper module. After the actual allocation is done by calling the CUDA API through the wrapper module, the allocated address is sent to the scheduler along with the memory size; [0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available; [0015] share volumes with the container; [0005] providing fully virtualized GPUs in containers; [0011] each user program is completely isolated when using ConVGPU; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls).

As per claim 9, He, He2, Kim, and McQuighan teach the method of claim 1. He2 teaches wherein a capacity of a first container of the plurality of containers is different from a capacity of a second container of the plurality of containers ([0080] if these containers are located on different hosts, a container on a host with the lowest load is preferentially selected as a candidate container; [0089] the basic container with the highest matching degree and/or the longest idle time is selected as a candidate container; [0010] the base container is a container loaded with a base container image; the library container is a container loaded with a base container image and a public library; the service container is a candidate container selected from containers loaded with a base container image, a public library, a private library, and a service component; the dormant container is a base container snapshot that saves the running status of the base container).

As per claim 20, it is a system claim of claim 1, so it is rejected for the same reasons as claim 1. 

As per claim 21, He, He2, Kim, and McQuighan teach the method of claim 5. He teaches wherein the requested volume of the VGPU resource corresponding to the current task is determined based on an acquired tag marking the requested volume of the VGPU resource corresponding to the current task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type…select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements.).

As per claim 26, He, He2, Kim, and McQuighan teach the method of claim 1. He teaches wherein the corresponding requested volume of VGPU resource corresponding to the each of the at least one task is determined based on a volume of data relating to the each of the at least one task ([0031] A certain container under the containerized cloud platform initiates the GPU use request to the executing body based on a GPU acceleration demand required by a user issued task; [0047] In step 302 and step 303, the executing body determines two requirements of the target container for the required GPU based on the GPU use request, respectively, which are the demand quantity and the demand type…select the most suitable target virtual GPU for GPU acceleration for tasks running in the target container through the above two requirements), the data relating to the at least one task including one or more images, a count of images to be processed, a size of each of the one or more images, and a processing algorithm corresponding to the each of the one or more images ([0032] Specifically, the GPU use request may include a variety of information, such as user identity information, container affiliation information, container number, business information corresponding to container, business information run by container, business type, and GPU demand applied for. Here, the GPU demand includes video memory capacity, video memory level, video memory type).

	
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over He, He2, Kim, and McQuighan, as applied to claim 6 above, in view of Vembu et al. (US 20180293185 A1 hereinafter Vembu).
Vembu was cited in a previous office action.

As per claim 7, He, He2, Kim, and McQuighan teach the method of claim 6. McQuighan teaches wherein the each of the at least one task has a priority level, the at least one task being arranged in an order in the message queue ([0041] the new API requests 211a-b are put in a queue. This information is put into the scheduler 212 and queued requests may be prioritized into a score).

He, He2, Kim, and McQuighan fail to teach the at least one task being arranged in an order in the message queue according to the priority level of the each of the at least one task.

However, Vembu teaches the at least one task being arranged in an order in the message queue according to the priority level of the each of the at least one task ([0194] At 2101, priorities associated with tasks/threads are identified. At 2102, the tasks are submitted into priority-based task queues.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He, He2, Kim, and McQuighan with the teachings of Vembu to allow the highest priority operations to be performed first (see Vembu [0194] Any arbitration which is performed at the front end of a graphics pipeline stage or individual functional unit may then consult the priority of each of the operations waiting to be processed and may schedule the operations in accordance with the priorities).

		
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over He, He2, Kim, and McQuighan, as applied to claim 1 above, in view of Woo (US 20210279157 A1).
Woo was cited in a previous office action.
As per claim 10, He, He2, Kim, and McQuighan teach the method of claim 1. He2 teaches wherein the processing apparatus is further configured to perform operations including: putting a task processed by the first container back into the message queue; the at least one task in the message queue ([0104] In one embodiment, the service container can report its own load changes. The reporting method can be a single report (once for each report processed), a periodic report (once every n seconds) or a batch report (once for each n requests processed). After the service container processes the last request, if the service container does not receive a new request within m seconds (m is a positive number), the local container resource manager is notified to uninstall the application software package in the container, and the service container uninstall event is reported to the service routing (or metadata database); [0138] the application request needs to be queued first).
Additionally, Kim teaches in response to determining that the at least one task does not match a respective capacity of the each of the one or more target containers, resetting the each of the one or more target containers ([0038-0039] When a user program calls a memory allocation API, the CUDA wrapper API module sends memory size information to the GPU memory scheduler through a UNIX socket prepared in the container, and the GPU memory scheduler tracks all memory allocation calls in the container. If a running container does not have enough GPU memory, the GPU memory scheduler will be suspended until the requested memory size becomes available, and any memory allocation requested by that container will be suspended until the scheduler allocates more GPU memory to the container; [0061] Containers should be scheduled based on their maximum available GPU memory; [0048] The GPU memory scheduler checks the GPU memory limit of each container. When a container uses GPU memory to the limit, the scheduler rejects allocation calls).

He, He2, Kim, and McQuighan fail to teach setting a renewed first container according to a mirrored first container if a first container collapses.

However, Woo teaches setting a renewed first container according to a mirrored first container if a first container collapses ([0113] provides a replication control to restart/recover a container abnormally terminated).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined He, He2, Kim, and McQuighan with the teachings of Woo to recover containers abnormally terminated (see Woo [0113] provides a replication control to restart/recover a container abnormally terminated).

Claims 11, 15, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Dilley et al. (US 10791168 B1 hereinafter Dilley) in view of McGrath et al. (US 20200296155 A1 hereinafter McGrath).
Dilley and McGrath were cited in a prior office action.
As per claim 11, Dilley teaches a method implemented on a processing apparatus, wherein the processing apparatus is configured to perform operations comprising: identifying, from a plurality of edge nodes that are communicated with a terminal device, a target edge node, the target edge node including one or more target containers (Col. 22 lines 43-45 The workload placement manager 506 determines one or more edges at which to place workload; Col. 10 lines 45-47 Each edge includes at least one proxy server instance 414 to manage network connections between external endpoints 110; Col. 10 lines 52-53 one or more containers in the edge; Col. 8 lines 58-61 each edge node 122 includes processor hardware that runs an operating system, container and cluster scheduling and management software, and tenant workloads; Col. 7 lines 54-60 code of an example workload is packaged in one or more containers (sometimes referred to as a ‘containerized application’) in which the code within each container configures a different node 122 to provide a different micro-service. An example workload includes one or more code packages that implement workload functions to be executed at an edge data center 106; Col. 34 lines 30-32 the address resolution service identifies a target edge location with the lowest distance to the endpoint); 
transmitting at least one task to the target edge node for processing  (Col. 22 lines 43-45 The workload placement manager 506 determines one or more edges at which to place workload; Col. 22 lines 48-56 The workload placement manager 506 sends placement requests, which indicate the one or more edges at which to place instances of the workload together with the configuration specification, to the workload message server 508. To load the workload to selected edges, the workload message server 508 sends workload placement commands over the workload placement network 108 to edge message clients 533 at the one or more edges 106 at which a workload is to be placed.); and 
receiving a processing result of the at least one task from the target edge node, wherein the processing result of the at least one task is determined by (Col. 33 lines 16-17 where the workload instance processes the message and returns a response message; Col. 22 lines 43-45 The workload placement manager 506 determines one or more edges at which to place workload;):
causing the one or more target containers to obtain and process the at least one task (Col. 10 lines 52-53 The L4 proxy functionality runs in one or more containers in the edge, much like other workloads; Col. 21 lines 35-36 direct messages to code containers and their workloads within the edge; Col. 7 lines 54-60  For instance, code of an example workload is packaged in one or more containers (sometimes referred to as a ‘containerized application’) in which the code within each container configures a different node 122 to provide a different micro-service. An example workload includes one or more code packages that implement workload functions to be executed at an edge data center 106.); and 
selecting a processing approach for the at least one task based on a type of each of the at least one task, wherein the identifying, from the plurality of edge nodes that are communicated with the terminal device, the target edge node includes (Col. 5 lines 23-37 Tenants specify application performance requirements via the administrative UI for their individual tenant applications, which can include geographic location, network communication latency, time of use, application sizing and resource usage preferences, for example. Different tenant applications typically have different performance requirements. The performance requirements of some tenant applications change over time. From time to time, for example, a tenant may adjust the performance requirements of a tenant application. Based upon tenant-specified performance requirements, the orchestration manager 104 orchestrates placement of tenant applications at edge data centers 106 steers external endpoint requests to edges where requested tenant applications are placed, and schedules execution of tenant applications at the edges; Col. 10 lines 45-47 Each edge includes at least one proxy server instance 414 to manage network connections between external endpoints 110): 
obtaining node information of the plurality of edge nodes; determining a communication distance between each of at least a portion of the plurality of edge nodes and the terminal device based on the node information, wherein the communication distance refers to a distance between the terminal device and an edge node or a network delay time between the terminal device and an edge node; identifying a first edge node from the at least a portion of the plurality of edge nodes based on the determined communication distances between the at least a portion of the plurality of edge nodes and the terminal device (C
Read full office action
Prosecution Timeline

Sep 23, 2021
Application Filed
Mar 21, 2024
Non-Final Rejection — §103, §112
Jun 26, 2024
Response Filed
Oct 22, 2024
Final Rejection — §103, §112
Jan 15, 2025
Request for Continued Examination
Jan 21, 2025
Response after Non-Final Action
May 02, 2025
Non-Final Rejection — §103, §112
Aug 08, 2025
Response Filed
Nov 14, 2025
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/837,306
Patent 12554523
REDUCING DEPLOYMENT TIME FOR CONTAINER CLONES IN COMPUTING ENVIRONMENTS
2y 5m to grant Granted Feb 17, 2026
17/355,265
Patent 12547458
PLATFORM FRAMEWORK ORCHESTRATION AND DISCOVERY
2y 5m to grant Granted Feb 10, 2026
18/074,254
Patent 12468573
ADAPTIVE RESOURCE PROVISIONING FOR A MULTI-TENANT DISTRIBUTED EVENT DATA STORE
2y 5m to grant Granted Nov 11, 2025
17/806,614
Patent 12461785
GRAPHIC-BLOCKCHAIN-ORIENTATED SHARDING STORAGE APPARATUS AND METHOD THEREOF
2y 5m to grant Granted Nov 04, 2025
17/535,922
Patent 12443425
ISOLATED ACCELERATOR MANAGEMENT INTERMEDIARIES FOR VIRTUALIZATION HOSTS
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
59%
Grant Probability
99%
With Interview (+79.8%)
3y 4m
Median Time to Grant
High
PTA Risk
Based on 108 resolved cases by this examiner. Grant probability derived from career allow rate.