Last updated: May 29, 2026
Application No. 18/038,694
METHOD AND SYSTEM FOR MULTIPLE SERVICES TO SHARE SAME GPU, AND DEVICE AND MEDIUM

Final Rejection §103
Filed
May 24, 2023
Priority
Mar 12, 2021 — CN 202110271407.8 +1 more
Examiner
SUN, ANDREW NMN
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Shandong Yingxin Computer Technologies Co. Ltd.
OA Round
2 (Final)
This examiner grants 50% of cases after interview

— +100.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 8 resolved cases, 2023–2026
Examiner Intelligence

SUN, ANDREW NMN View full profile →
Grants 50% of resolved cases
Career Allowance Rate
4 granted / 8 resolved
-5.0% vs TC avg
Strong +100% interview lift
Without
With
+100.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
18 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
100.0%
+60.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 8 resolved cases
Office Action

§103
DETAILED ACTION
Claims 1, 3-7, 9-13, 15-17, and 19-21 are pending.
Claims 2, 8, 14, and 18 are canceled.
Claims 1, 3-7, 9-13, 15-17, and 19-21 are rejected.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Most of the applicant’s arguments with respect to the 35 U.S.C. 103 rejections (Remarks pp. 8-13) are moot in view of the Examiner’s new ground of rejections based on new references added to address the associated limitations. However, there are some comments that the Examiner would like to comment on.

1. The applicant argues that Yeh fails to disclose an independent ‘GPU service’ carrier and the feature of ‘associating the GPU services with the GPU Pods’ and ‘associating the Kubernetes Pods with the GPU Pods’ in amended claim 1 of the present application, and does not explicitly state that the user is the initiator of the request, that is, Yeh fails to disclose the interaction between the user and the GPU.” The applicant also argues that Yeh does not teach the feature of “dispatching the GPU Pods and the Kubernetes Pods for calculation”.
The Examiner respectfully disagrees with this statement. Claim 1 does not recite a “GPU service” carrier, and Yeh already explicitly states that a user can initiate a request to create GPU services, as well as the GPU services being associated with GPU Pods, and the Kubernetes Pods being associated with the GPU Pods (
Yeh teaches a request to create GPU services (services based on sharePod) that is sent by a user, stating, “Third and most importantly, KubeShare makes vGPU become a first class resource in Kubernetes where vGPU has unique identity (i.e., GPUID) and can be explicitly requested by users,” Yeh 4.6, and “vGPUs are resource objects managed by the KubeShare-DevMgr controller. The lifecycle of vGPU consists of four phases: creation, active, idle, and deletion. A vGPU is created when KubeShare-DevMgr receives a sharePod request containing an non-existent GPUID,” Yeh 4.4.
Here, a user can explicitly request to create a vGPU, and an associated sharePod will be created and associated with the vGPU.
Yeh teaches creating corresponding GPU services – in the form of sharePods – based on requests sent by users, stating “First, our GPU sharing mechanism and implementation only applies to the sharePod objects that request our GPU sharing service… For instance, in our extended Kubernetes cluster, users can either create a pod with non-sharable GPU through the native pod API or create a pod with shared GPU through our extended sharePod API. In contrast, other solutions based on scheduler extender force all the GPUs in a cluster to be controlled and scheduled by their extended mechanism,” Yeh 4.6.
Yeh teaches that each sharePod may have a specified number of GPU containers, mapped to GPU Pods, stating that “A pod represents a logical host that contains one or more containers which are always co-located, co-scheduled, and run in a shared context,” Yeh 2.1. Thus GPU services are associated with GPU Pods.

    PNG
    media_image1.png
    17
    367
    media_image1.png
    Greyscale

Note for the specification for the sharePod, “containers” may be specified.

Yeh also teaches associating Kubernetes Pods with GPU Pods, stating “The role of KubeShare is to create and manage sharePod, which is a custom resource kind we created in Kubernetes to represent the pod with ability to attach shared custom device on its containers. As shown in Script 1, the specification to create sharePod is called SharePodSpec, which contains the information of the original pod-Spec, the resource usage requirements of the GPU, the identifier of a GPU (GPUID), and the nodeName of the GPU,” Yeh 4.1.

    PNG
    media_image1.png
    17
    367
    media_image1.png
    Greyscale

Here, the sharePod (Kubernetes Pods) is created based on the specification as shown in “Script 1,” which provides the configuration for container for GPU (GPU Pods), and the same specification associates a specific sharePod with specific containers(s).).

Yeh also teaches the feature of “dispatching the GPU Pods and the Kubernetes Pods for calculation” (

    PNG
    media_image2.png
    763
    1111
    media_image2.png
    Greyscale

Fig. 6 shows the GPU utilization when GPU Pods (containers) and Kubernetes Pods (sharePods) are run based on computation jobs A-C. This is a calculation of the GPU utilization based on the GPU Pods and Kubernetes Pods.).

2. The applicant argues that reference Dong does not teach “a request of creating GPU services,” as “the ‘additional cloud resources’ are general-purpose computing resources (e.g., CPU, storage), which are different from the technical field and carrier of the exclusive ‘request of creating GPU services’ in the present application.”, and that “the aforementioned distinctive technical features do not constitute ‘integration of separate elements’ as described in MPEP 2144.04.V.B. Those skilled in the art cannot routinely derive the technical solution of the present application based on the Yeh and Dong.”
The Examiner respectfully notes, however, that reference Yeh, which does teach a request of creating GPU services, is combined with Dong to teach generating new requests to create GPU services. Dong teaches generating new requests to allocate “cloud resources”, and after the combination of Yeh with Dong, the “cloud resources” are replaced with GPU services as specified by Yeh.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), and Dong (US 10423456 B2).
Regarding Claim 1, Yeh teaches a method for sharing a same GPU by a plurality of services (
“KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud”), wherein the method comprises:
in response to receiving a request of creating GPU services (services based on sharePod) sent by a user, creating the corresponding GPU services according to the request, creating GPU Pods (GPU container) of a corresponding quantity according to the GPU services, and associating the GPU services with the GPU Pods (
Yeh teaches a request to create GPU services (services associated with a sharePod) that is sent by a user, stating, “Third and most importantly, KubeShare makes vGPU become a first class resource in Kubernetes where vGPU has unique identity (i.e., GPUID) and can be explicitly requested by users,” Yeh 4.6, and “vGPUs are resource objects managed by the KubeShare-DevMgr controller. The lifecycle of vGPU consists of four phases: creation, active, idle, and deletion. A vGPU is created when KubeShare-DevMgr receives a sharePod request containing an non-existent GPUID,” Yeh 4.4.
Here, a user can explicitly request to create a vGPU, and an associated sharePod will be created and associated with the vGPU.
The Examiner thanks the Applicant for clarification regarding the limitation “a request of creating GPU services”; however, no objection has been made regarding this limitation.
Yeh teaches creating corresponding GPU services – in the form of sharePods – based on requests sent by users, stating “First, our GPU sharing mechanism and implementation only applies to the sharePod objects that request our GPU sharing service… For instance, in our extended Kubernetes cluster, users can either create a pod with non-sharable GPU through the native pod API or create a pod with shared GPU through our extended sharePod API. In contrast, other solutions based on scheduler extender force all the GPUs in a cluster to be controlled and scheduled by their extended mechanism,” Yeh 4.6.
Yeh teaches that each sharePod may have a specified number of GPU containers, mapped to GPU Pods, stating that “A pod represents a logical host that contains one or more containers which are always co-located, co-scheduled, and run in a shared context,” Yeh 2.1.

    PNG
    media_image1.png
    17
    367
    media_image1.png
    Greyscale

Note for the specification for the sharePod, “containers” may be specified.
Yeh explains, “The role of KubeShare is to create and manage sharePod, which is a custom resource kind we created in Kubernetes to represent the pod with ability to attach share custom device on its containers. As shown in Script 1, the specification to create sharePod is called SharePodSpec, which contains the information of the original pod-Spec, the resource usage requirements of the GPU, the identifier of a GPU (GPUID), and the nodeName of the GPU,” Yeh 4.1.
Therefore, GPU Pods (GPU containers) have corresponding quantity as specified in the sharePod specification for the sharePod that provides the GPU services.);
creating Kubernetes Pods according to a configuration of the GPU Pods (

    PNG
    media_image1.png
    17
    367
    media_image1.png
    Greyscale
), and associating the Kubernetes Pods with the GPU Pods (

    PNG
    media_image1.png
    17
    367
    media_image1.png
    Greyscale
) (
Yeh explains, “The role of KubeShare is to create and manage sharePod, which is a custom resource kind we created in Kubernetes to represent the pod with ability to attach shared custom device on its containers. As shown in Script 1, the specification to create sharePod is called SharePodSpec, which contains the information of the original pod-Spec, the resource usage requirements of the GPU, the identifier of a GPU (GPUID), and the nodeName of the GPU,” Yeh 4.1.
Here, the sharePod (Kubernetes Pods) is created based on the specification as shown in “Script 1,” which provides the configuration for container for GPU (GPU Pods), and the same specification associates a specific sharePod with specific containers(s).);
in response to receiving a calculating request  (
Yeh discloses, “Here, we evaluate the GPU sharing overhead of KubeShare by comparing its end-to-end time delay on creating a user requested pod to the pod creation time of native Kubernetes,” Yeh 5.4, and
“To prove the correctness and the effectiveness of our device library implementation, we conducted an experiment by running three TensorFlow training jobs on a single GPU. Each job is running inside its own container. Job A arrived at time 0s with the request gpu_limit=0.6, and gpu_request=0.3. Job B arrived at time 200s with the request gpu_limit=0.6, and gpu_request=0.4. Finally, Job C arrived at time 400s with the request gpu_limit=0.5, and gpu_request=0.3,” Yeh 5.2.
The request for job C is gpu_limit=0.6, and gpu_request=0.4, which is a specification of GPU time slice, as Yeh explains, “The resource requirements include the computing and memory usage demand on GPU. As detailed in Section 4.5, GPU memory is shared by space, and GPU computing capacity is shared by time slice. For instance, gpu_mem=0.5 means that a container can allocate at most 50% of the total device memory space, while gpu_request=0.5 means that a container should have at least 50% of the kernel execution time in a sliding window. Similar to how Kubernetes manages CPU resource, KubeShare supports elastic resource allocation on GPU as well. That means KubeShare will guarantee the minimum resource allocation of a container specified by gpu_request, and allow a container to utilize the residual capacity on GPU as long as its usage doesn’t exceed the value of gpu_limit,” Yeh 4.2.
Job A does not specify a requirement for memory; however, according to the teaching, Job A could have a memory requirement: job A is gpu_limit=0.6, gpu_request=0.4, and gpu_mem=0.3.
The threshold could be mapped to “gpu_mem=0.5 means that a container can allocate at most 50% of the total device memory space.” Here, gpu_mem=0.3 would be compared with gpu_mem=0.5 to determine if the constraint is satisfied. In particular, Yeh states, “KubeShare treats GPUs as first class resources, thus it allows users to specify scheduling requirements and constraints on a GPU in the specification as shown in Script 1. The role of KubeShare is to ensure GPU resources are allocated and assigned to pods without violating these user requirements and constraints,” Yeh 4.2.
Similarly, the threshold could be mapped to “gpu_request=0.5” as well, the minimum kernel execution time needs to be smaller than 0.5. The threshold could also be mapped to the gpu_limit or 1, because the gpu_request has to be less than the gpu_limit, which is less than or equal to 1.);


and in response to the specification of the GPU graphic memory or GPU time slice being less than a sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods (The residual sum is sufficient to satisfy the requirement of a job request, e.g., job C.), according to a current resource utilization rate (

    PNG
    media_image2.png
    763
    1111
    media_image2.png
    Greyscale

“Therefore, the overall GPU utilization can be consistently maintained at 100% after time 200s,” Yeh 5.2.), dispatching the GPU Pods and the Kubernetes Pods for calculation (
Fig. 6 shows the GPU utilization when GPU Pods (containers) and Kubernetes Pods (sharePods) are run based on computation jobs A-C.).
Yeh does not teach in response to receiving a calculating request sent by the user to the GPU services, according to the calculating request, determining a specification of a GPU graphic memory or GPU time slice required to be applied for, and comparing with a threshold specified by the GPU services, or in response to the specification of the GPU graphic memory or GPU time slice being less than the threshold specified by the GPU services, reading current residual resource amounts of the GPU Pods and the Kubernetes Pods, and comparing with the specification of the GPU graphic memory or GPU time slice.
However, Chiu teaches actions being performed in response to receiving a calculating request sent by the user to the GPU services, according to the calculating request (
Chiu discloses, “A method for accelerating graphics processing units (GPUs) receives a request for usage of GPU resource sent by a user, calculates a quantity of GPUs which are necessary…,” Abstract, and “The storage device 300 can further store a formula for calculating a usage of the GPU under user resource request,” ¶ 0017.
Here, a request for usage of GPU resources, sent by a user, results in a calculation of the number of GPUs required; thus said request is a calculating request.),
and actions being performed in response to the specification of the GPU graphic memory or GPU time slice being less than the threshold specified by the GPU services (
Chiu discloses, “When the usage of GPUs calculated as being required is greater than the first threshold but less than a second threshold, the arranging module 430 distributes the GPUs into groups,” ¶ 0036.
Here, usage of GPUs required is a specification of GPU graphic memory, and it is checked whether this usage requirement is less than a second threshold.).
Yeh and Chiu are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh to incorporate the teachings of Chiu and provide in response to receiving a calculating request sent by the user to the GPU services, according to the calculating request, determining a specification of a GPU graphic memory or GPU time slice required to be applied for, and comparing with a threshold specified by the GPU services, and also provide actions being performed in response to the specification of the GPU graphic memory or GPU time slice being less than the threshold specified by the GPU services. Doing so would help ensure that the user can more efficiently calculate the amount of resources required for operations (Chiu discloses, “A method for accelerating graphics processing units (GPUs) receives a request for usage of GPU resource sent by a user, calculates a quantity of GPUs which are necessary…,” Abstract.).
Yeh in view of Chiu does not teach in response to the specification of the GPU graphic memory or GPU time slice being less than the threshold specified by the GPU services, reading current residual resource amounts of the GPU Pods and the Kubernetes Pods, and comparing with the specification of the GPU graphic memory or GPU time slice, or returning a calculation result to the user, wherein the method further comprises: in response to the specification of the GPU graphic memory or GPU time slice being not less than the threshold specified by the GPU services, according to the specification of the GPU graphic memory or GPU time slice, generating a new request of creating the GPU services.
However, Jiang teaches in response to the specification of the GPU graphic memory or GPU time slice being less than the threshold specified by the GPU services, reading current residual resource amounts of the GPU Pods and the Kubernetes Pods, and comparing with the specification of the GPU graphic memory or GPU time slice (
Jiang discloses, “In this embodiment, after receiving the first container creation request that carries the required resource capacity and the required image identifier, the management node selects, from the cluster of the working nodes, the at least two first working nodes whose unused resource capacity is greater than the required resource capacity. That is, the unused resource capacity of each working node in the cluster of the working nodes is compared with the required resource capacity. When the unused resource capacity of the working node is greater than the required resource capacity, it indicates that the unused resource capacity of the working node can meet the required resource capacity. Therefore, the working node is used as the first working node,” ¶ 0091.
Here, unused resource capacity (residual resource amounts) of nodes are measured (read), and then compared with the required resource capacity (specification of the GPU graphic memory).
After the combination of Yeh in view of Chiu, with Jiang, the unused resource capacity of Yeh’s pods is measured and then compared with the resources required.).
Yeh in view of Chiu, and Jiang are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu to incorporate the teachings of Jiang and provide in response to the specification of the GPU graphic memory or GPU time slice being less than the threshold specified by the GPU services, reading current residual resource amounts of the GPU Pods and the Kubernetes Pods, and comparing with the specification of the GPU graphic memory or GPU time slice. Doing so would help improve allocation of unused resources to pods/containers/devices that can use them (Jiang discloses, “When the unused resource capacity of the working node is greater than the required resource capacity, it indicates that the unused resource capacity of the working node can meet the required resource capacity. Therefore, the working node is used as the first working node,” ¶ 0091.).
Yeh in view of Chiu and Jiang does not teach returning a calculation result to the user, wherein the method further comprises: in response to the specification of the GPU graphic memory or GPU time slice being not less than the threshold specified by the GPU services, according to the specification of the GPU graphic memory or GPU time slice, generating a new request of creating the GPU services.
However, Dong teaches returning a calculation result to the user, wherein the method further comprises: in response to the specification of the GPU graphic memory or GPU time slice being not less than the threshold specified by the GPU services, according to the specification of the GPU graphic memory or GPU time slice, generating a new request of creating the GPU services (
Dong discloses, “Usage charge obtaining engine 121 may obtain charges for usage (also referred herein as “usage charge,” “costs,” and the like) of computing resources (e.g., central processing unit (CPU) resources, storage resources, network resources, etc.) by services associated with a user from a beginning of a first time period to a current time,” Col 3, Lines 40-45, “and in response to determining that the current usage rate exceeds the second resource utilization threshold value, generate a request to allocate additional cloud resources to the cloud user,” Col 12, Lines 59-63.
After the combination of Yeh in view of Chiu and Jiang, with Dong, Dong’s generated request to allocate additional cloud resources now allocates new GPU resources as specified by Yeh in view of Chiu and Jiang.).
Yeh in view of Chiu and Jiang, and Dong are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu and Jiang to incorporate the teachings of Dong and provide returning a calculation result to the user, wherein the method further comprises: in response to the specification of the GPU graphic memory or GPU time slice being not less than the threshold specified by the GPU services, according to the specification of the GPU graphic memory or GPU time slice, generating a new request of creating the GPU services. Doing so would help alleviate the usage of the GPU if it is too high. (Dong discloses, “…request generating engine 124 may generate a request to allocate additional computing resources to the user (e.g., scale-out action),” Col 6, Lines 12-14.).
Claims 9 and 10 are a computer device claim and a non-transitory computer-readable storage medium claim (Page 173 of Yeh.) corresponding to the method Claim 1.
In addition, Claim 9 recites “A computer device, wherein the computer device comprises: at least one processor; and a memory, wherein the memory stores a computer instruction that is executable in the processor, and the instruction, when executed by the processor, implements operations comprising…” (
Yeh teaches the use of a computer, stating, “While Kubernetes has the strength to support container management, the only computing resources that can be natively recognized and allocated by Kubernetes are the CPU and memory. To attach any other custom devices to a container, including GPU, high-performance NICs, FPGA, a device plugin [14, 22, 27] must be developed and installed following the framework defined by Kubernetes to perform vendor specific initialization and setup for the devices,” Page 173.). 
In addition, Claim 10 recites “A non-transitory computer-readable storage medium, the non- transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements operations comprising…” (
Yeh teaches the use of a computer, stating, “While Kubernetes has the strength to support container management, the only computing resources that can be natively recognized and allocated by Kubernetes are the CPU and memory. To attach any other custom devices to a container, including GPU, high-performance NICs, FPGA, a device plugin [14, 22, 27] must be developed and installed following the framework defined by Kubernetes to perform vendor specific initialization and setup for the devices,” Page 173.).
Therefore, Claims 9 and 10 are rejected for the same reasons set forth in the rejection of Claim 1, in addition to having their additional limitations being taught by Yeh.

Claims 3, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), and Hassan (US 20190327185 A1).
Regarding Claim 3, Yeh in view of Chiu, Jiang, and Dong teaches the method according to claim 1. Yeh in view of Chiu, Jiang, and Dong does not teach wherein the method further comprises: in response to the specification of the GPU graphic memory or GPU time slice being not less than the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods, increasing a failure time quantity by one, and, every predetermined duration, determining again whether the specification of the GPU graphic memory or GPU time slice is less than the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods.
However, Hassan teaches wherein the method further comprises:
in response to the specification of the GPU graphic memory or GPU time slice being not less than the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods, increasing a failure time quantity by one (
Hassan discloses, “…some of the disclosed embodiments may then monitor the utilization 305 during a period of time, represented as period of time 320. These embodiments may count a number of times the utilization exceeds the second threshold 310b,” ¶ 0029.
Whenever the utilization exceeds the disclosed “threshold 310b”, the current number of times that threshold was exceeded is incremented by 1, wherein the number count is treated as failure time quantity.
After the combination of Yeh in view of Chiu, Jiang, and Dong, with Hassan, the threshold from the latter is replaced with the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods from the former, as the sum of the current residual resource amounts also acts as a threshold.),
and, every predetermined duration, determining again whether the specification of the GPU graphic memory or GPU time slice is less than the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods (
Hassan discloses, “some of the disclosed embodiments may then monitor the utilization 305 during a period of time, represented as period of time 320. These embodiments may count a number of times the utilization exceeds the second threshold 310b,” ¶ 0029.
After the disclosed “threshold 310b” is exceeded, the resource utilization is periodically monitored in order to determine or detect a subsequent instance of that threshold being exceeded.
After the combination of Yeh in view of Chiu, Jiang, and Dong, with Hassan, the threshold from the latter is replaced with the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods from the former, as the sum of the current residual resource amounts also acts as a threshold.).
Yeh in view of Chiu, Jiang, and Dong, and Hassan are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Hassan and provide wherein the method further comprises: in response to the specification of the GPU graphic memory or GPU time slice being not less than the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods, increasing a failure time quantity by one, and, every predetermined duration, determining again whether the specification of the GPU graphic memory or GPU time slice is less than the sum of the current residual resource amounts of the GPU Pods and the Kubernetes Pods. Doing so would help keep track of how many times the threshold was exceeded in order to determine whether to perform mitigation measures (Hassan discloses, “For example, in some aspects, process 500 may include determining, based on the second utilization spikes detected in block 530, an amount of time that the utilization of a first network exceeded the adjusted or second threshold (e.g. 310b or 410b), and rerouting established call traffic from the first network to the second network based in the utilization exceeding the second threshold for a defined percentage of the time within the window,” ¶ 0040.).
Claims 15 and 19 are a computer device claim and a non-transitory computer readable storage medium claim, respectively, corresponding to the method Claim 3. Therefore, Claims 15 and 19 are rejected for the same reasons set forth in the rejection of Claim 3.

Claims 4, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), Hassan (US 20190327185 A1), and Moyer (US 20210096873 A1).
Regarding Claim 4, Yeh in view of Chiu, Jiang, Dong, and Hassan teaches the method according to claim 3, wherein the method further comprises: determining whether the failure time quantity reaches a second threshold (
Hassan discloses, “For example, in some aspects, process 500 may include determining, based on the second utilization spikes detected in block 530, an amount of time that the utilization of a first network exceeded the adjusted or second threshold (e.g. 310b or 410b), and rerouting established call traffic from the first network to the second network based in the utilization exceeding the second threshold for a defined percentage of the time within the window,” 0040.
The claimed threshold is mapped to the threshold defined by the disclosed “defined percentage of the time within the window” that the separate threshold 310b was exceeded, which is used to determine a decision in order to mitigate resource usage overloading.).
Yeh in view of Chiu, Jiang, and Dong, and Hassan are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Hassan and provide wherein the method further comprises: determining whether the failure time quantity reaches a second threshold. Doing so would help allow for determination whether to perform mitigation measures (Hassan discloses, “For example, in some aspects, process 500 may include determining, based on the second utilization spikes detected in block 530, an amount of time that the utilization of a first network exceeded the adjusted or second threshold (e.g. 310b or 410b), and rerouting established call traffic from the first network to the second network based in the utilization exceeding the second threshold for a defined percentage of the time within the window,” ¶ 0040.).
Yeh in view of Chiu, Jiang, Dong, and Hassan does not teach in response to the failure time quantity reaching the second threshold, increasing a magnitude of the predetermined duration.
However, Moyer teaches in response to the failure time quantity reaching the second threshold, increasing a magnitude of the predetermined duration (
Moyer discloses, “When the time period elapses, the thread throttling unit 150 updates the severity level to a more restrictive level of throttling if the number of cache misses at the L2 cache 130 is still greater than the threshold number of cache misses… In some embodiments, the time period corresponding to the more restrictive level of throttling is greater than the current time period,” ¶ 0029.
Moyer’s increasing of a throttling time period is similar to Hassan’s rerouting traffic from one asset, e.g., network or GPU pod to another, as both approaches help mitigate resource usage overload. After the combination of Yeh in view of Chiu, Jiang, Dong, and Hassan, with Moyer, Hassan’s rerouting traffic approach is replaced with Moyer’s increasing throttling time approach.).
Yeh in view of Chiu, Jiang, Dong, and Hassan, and Moyer are both considered to be analogous to the claimed invention because they are in the same field of online computing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, Dong, and Hassan to incorporate the teachings of Moyer and provide in response to the failure time quantity reaching the second threshold, increasing a magnitude of the predetermined duration. Doing so would help mitigate stalling of the system due to resource usage constraints being exceeded (Moyer discloses, “To reduce the penalties of stalls in the computing system, one or more of software monitors and hardware monitors determine whether a particular source (e.g., thread, core, other) qualifies for throttling, or reduction of instruction processing, at a shared resource,” ¶ 0015.).
Claims 16 and 20 are a computer device claim and a non-transitory computer readable storage medium claim, respectively, corresponding to the method Claim 4. Therefore, Claims 16 and 20 are rejected for the same reasons set forth in the rejection of Claim 4.

Claims 5, 17, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), and Balasubramanian (US 20220129583 A1).
Regarding Claim 5, Yeh in view of Chiu, Jiang, and Dong teaches the method according to claim 1. Yeh in view of Chiu, Jiang, and Dong does not teach wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation comprises: allocating calculation tasks to each of the GPU Pods and the Kubernetes Pods, so that resource utilization rates of the GPU Pods and the Kubernetes Pods are equal in calculation.
However, Balasubramanian teaches wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation (see Claim 1 rejection analysis) comprises:
allocating calculation tasks to each of the GPU Pods and the Kubernetes Pods, so that resource utilization rates of the GPU Pods and the Kubernetes Pods are equal in calculation (
Balasubramanian discloses, “Kubernetes creates the runtime environment, requests needed resources, handles launching services, and provides services with Internet Protocol (IP) addresses. Kubernetes can also scale the containers across the cluster and monitor the copies of each microservice 220 that are up and running to ensure work is evenly distributed across the cluster,” ¶ 0079.
After the combination of Yeh in view of Chiu, Jiang, and Dong, with Balasubramanian, the containers of Balasubramanian are replaced with the GPU Pods and the Kubernetes Pods of Yeh in view of Chiu, Jiang, and Dong so that work is evenly distributed across the pods.).
Yeh in view of Chiu, Jiang, and Dong, and Balasubramanian are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Balasubramanian and provide wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation comprises: allocating calculation tasks to each of the GPU Pods and the Kubernetes Pods, so that resource utilization rates of the GPU Pods and the Kubernetes Pods are equal in calculation. Doing so would help ensure that none of the pods have excessive work compared to other pods.
Claims 17 and 21 are a computer device claim and a non-transitory computer readable storage medium claim, respectively, corresponding to the method Claim 5. Therefore, Claims 17 and 21 are rejected for the same reasons set forth in the rejection of Claim 5.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), Venkatesh (US 20190042321 A1), and Zhang (WO 2021000830 A1).
Regarding Claim 6, Yeh in view of Chiu, Jiang, and Dong teaches the method according to claim 1. Yeh in view of Chiu, Jiang, and Dong does not teach wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation comprises: sorting the GPU Pods from a highest computing power to a lowest computing power, and allocating calculation tasks to the GPU Pods in order, to, after a resource utilization rate of a current GPU Pod reaches a third threshold, allocate remaining calculation tasks to a next one GPU Pod.
However, Venkatesh teaches wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation (see Claim 1 rejection analysis) comprises: allocating calculation tasks to the GPU Pods in order, to, after a resource utilization rate of a current GPU Pod reaches a third threshold, allocate remaining calculation tasks to a next one GPU Pod (
Venkatesh discloses, “FIG. 7 is a container bursting into private cloud from public cloud. According to one aspect of the invention, ECMS is running on the public cloud and the public cloud subscription resource utilization limit has been exceed… The ECMS identifies the container whose utilization is full, or which has reached a predefined limit, and images it and selects the corresponding scripts. One or more new containers are created on the VM in the private cloud and the scripts are applied to bring it to the one or more new containers to the desired state. The containers are registered with load balancer running in the public data center so it can distribute the load to the newly registered container(s),” ¶ 0050.
After the combination of Yeh in view of Chiu, Jiang, and Dong, with Venkatesh, the containers from Venkatesh are replaced with the GPU Pods from Yeh in view of Chiu, Jiang, and Dong.).
Yeh in view of Chiu, Jiang, and Dong, and Venkatesh are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Venkatesh and provide allocating calculation tasks to the GPU Pods in order, to, after a resource utilization rate of a current GPU Pod reaches a third threshold, allocate remaining calculation tasks to a next one GPU Pod. Doing so would help prevent the system from slowing down due to high resource usage (Venkatesh discloses, “Eventually, there is a point at which the infrastructure hosting the site runs out of resources. Running out of resources results in a slow response from the site and a loss of requests,” ¶ 0045.).
Yeh in view of Chiu, Jiang, Dong, and and Venkatesh does not teach sorting the GPU Pods from a highest computing power to a lowest computing power.
However, Zhang teaches sorting the GPU Pods from a highest computing power to a lowest computing power, and selecting GPU deployment based on the sorting (
Zhang discloses, “It is also sufficient; therefore, the processing rates of the GPUs are sorted in descending order, and multiple GPUs arranged in the top N are selected as the target GPU,” Page 4.).
Yeh in view of Chiu, Jiang, Dong, and Venkatesh, and Zhang are both considered to be analogous to the claimed invention because they are in the same field of GPUs. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, Dong, and Venkatesh to incorporate the teachings of Zhang and provide wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation comprises: sorting the GPU Pods from a highest computing power to a lowest computing power. Doing so would help ensure that the pods with the highest computing power can be more easily selected to increase processing speed and enhance service level. (Zhang discloses, “It is also sufficient; therefore, the processing rates of the GPUs are sorted in descending order, and multiple GPUs arranged in the top N are selected as the target GPU,” Page 4).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), Park (KR 102086757 B1), and Venkatesh (US 20190042321 A1).
Regarding Claim 7, Yeh in view of Chiu, Jiang, and Dong teaches the method according to claim 1. Yeh in view of Chiu, Jiang, and Dong does not teach wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation comprises: sorting the GPU Pods from a lowest current resource utilization rate to a highest current resource utilization rate, and allocating calculation tasks to the GPU Pods in order, to, after a resource utilization rate of a current GPU Pod reaches a third threshold, allocate remaining calculation tasks to a next one GPU Pod.
However, Park teaches sorting the GPU Pods from a lowest current resource utilization rate to a highest current resource utilization rate (
Park discloses, “As shown in FIG. 8, the scheduler 240 first sorts the entire GPU containers according to preset criteria. The criteria used herein may be any factors such as container creation time, priority, memory usage, memory requirements, and number of memory objects, depending on the purpose of the system administrator, such as maximizing utilization and minimizing average latency. In addition, it is also possible to apply not only an ascending and descending order but also a combination of various conditions as necessary,” Page 10.).
Yeh in view of Chiu, Jiang, and Dong, and Park are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Park and provide wherein according to the current resource utilization rate, dispatching the GPU Pods and the Kubernetes Pods for calculation comprises: sorting the GPU Pods from a lowest current resource utilization rate to a highest current resource utilization rate. Doing so would help allow for selecting the pods that maximize utilization and minimize latency (Park discloses, “The criteria used herein may be any factors such as container creation time, priority, memory usage, memory requirements, and number of memory objects, depending on the purpose of the system administrator, such as maximizing utilization and minimizing average latency,” Page 10).
Yeh in view of Chiu, Jiang, Dong, and Park does not teach allocating calculation tasks to the GPU Pods in order, to, after a resource utilization rate of a current GPU Pod reaches a third threshold, allocate remaining calculation tasks to a next one GPU Pod.
However, Venkatesh teaches allocating calculation tasks to the GPU Pods in order, to, after a resource utilization rate of a current GPU Pod reaches a third threshold, allocate remaining calculation tasks to a next one GPU Pod (
Venkatesh discloses, “FIG. 7 is a container bursting into private cloud from public cloud. According to one aspect of the invention, ECMS is running on the public cloud and the public cloud subscription resource utilization limit has been exceed… The ECMS identifies the container whose utilization is full, or which has reached a predefined limit, and images it and selects the corresponding scripts. One or more new containers are created on the VM in the private cloud and the scripts are applied to bring it to the one or more new containers to the desired state. The containers are registered with load balancer running in the public data center so it can distribute the load to the newly registered container(s),” ¶ 0050.
After the combination of Yeh in view of Chiu, Jiang, Dong, and Park with Venkatesh, the containers from Venkatesh are replaced with the GPU Pods from Yeh in view of Chiu, Jiang, Dong, and Park.).
Yeh in view of Chiu, Jiang, Dong, and Park, and Venkatesh are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, Dong, and Park to incorporate the teachings of Venkatesh and provide allocating calculation tasks to the GPU Pods in order, to, after a resource utilization rate of a current GPU Pod reaches a third threshold, allocate remaining calculation tasks to a next one GPU Pod. Doing so would help prevent the system from slowing down due to high resource usage (Venkatesh discloses, “Eventually, there is a point at which the infrastructure hosting the site runs out of resources. Running out of resources results in a slow response from the site and a loss of requests,” ¶ 0045.).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), and Jiang (US 20200334064 A1) (hereby referred to as Jiang 2).
Regarding Claim 11, Yeh in view of Chiu, Jiang, and Dong teaches the method according to claim 1, wherein the method further comprises: from the GPU services, receiving a request of applying for the GPU graphic memory and a request of applying for the GPU time slice (
Yeh discloses, “As detailed in Section 4.5, GPU memory is shared by space, and GPU computing capacity is shared by time slice,” Page 177, and “Third and most importantly, KubeShare makes vGPU become a first class resource in Kubernetes where vGPU has unique identity (i.e., GPUID) and can be explicitly requested by users,” Page 179.).
Yeh in view of Chiu, Jiang, and Dong does not teach according to a resource application quota of the GPU services, determining whether the request of applying is permitted; when the request of applying is not permitted, returning a failure to the GPU services; and when the request of applying is permitted, returning a success to the GPU services.
However, Jiang 2 teaches according to a resource application quota of the GPU services, determining whether the request of applying is permitted (
Jiang 2 discloses, “FIGS. 1-4 illustrate example systems and techniques for detecting excessive requests from a virtual function (VF) within a predetermined period of time and denying subsequent requests from that VF,” ¶ 0007.);
when the request of applying is not permitted, returning a failure to the GPU services (
Jiang 2 discloses, “FIGS. 1-4 illustrate example systems and techniques for detecting excessive requests from a virtual function (VF) within a predetermined period of time and denying subsequent requests from that VF until after the predetermined period of time has elapsed,” ¶ 0007.);
and when the request of applying is permitted, returning a success to the GPU services (
Jiang 2 discloses, “grant access to devices, such as GPUs, for servicing requests from other VFs,” ¶ 0007.).
Yeh in view of Chiu, Jiang, and Dong, and Jiang 2 are both considered to be analogous to the claimed invention because they are in the same field of resource usage management. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Jiang 2 and provide according to a resource application quota of the GPU services, determining whether the request of applying is permitted; when the request of applying is not permitted, returning a failure to the GPU services; and when the request of applying is permitted, returning a success to the GPU services. Doing so would help provide greater security for the system by blocking malicious/excessive requests.

Claims 12 is rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), and Ng (US 20140173594 A1).
Regarding Claim 12, Yeh in view of Chiu, Jiang, and Dong teaches the method according to claim 1, wherein in response to receiving a request of creating GPU services sent by a user, creating the corresponding GPU services according to the request, creating GPU Pods of a corresponding quantity according to the GPU services comprises: in response to receiving a request of creating the GPU services, creating a GPU-service-customized resource; and creating the GPU Pods when the GPU-service-customized resource is detected (
Yeh discloses, “The role of KubeShare is to create and manage sharePod, which is a custom resource kind we created in Kubernetes to represent the pod with ability to attach shared custom device on its containers. As shown in Script 1, the specification to create sharePod is called SharePodSpec, which contains the information of the original pod-Spec, the resource usage requirements of the GPU, the identifier of a GPU (GPUID), and the nodeName of the GPU,” Page 176, and “KubeShare-Sched is the scheduler that decides the mapping between containers and vGPUs according to the current resource status and the resource requirements specified by client. KubeShare-Sched then generates the SharePodSpec with the GPUID value decided by a scheduling policy, and asks KubeShare-DevMgr to createthe corresponding sharePod instance… KubeShare-DevMgr creates sharePod objects, and then initializes the device environment in containers upon receiving the SharePodSpec from KubeShare-Sched,” Page 176.
The request of creating the GPU service is mapped to the request that the KubeShare-Sched receives from the client in order to generate the SharePodSpec. 
The claimed “GPU-service-customized resource” is mapped to the disclosed “SharePodSpec”. This is a customized resource because it helps create pods that can have custom devices attached to them. This is supported by paragraph 33 of the specification of the present application, which states “The present application has the following advantageous technical effect. The present application uses the functions of Kubernetes such as customized resource and customized annotation to realize the registration and dispatching of virtual services, and realizes restriction on the application for the GPU graphic memory and the controlling on the occupation of the GPU time slice by means of CUDA hijack, thereby reasonably allocating the resource according to the calculating requests.”).
Yeh in view of Chiu, Jiang, and Dong does not teach wherein the request of creating the GPU services is a Hyper Text Transfer Protocol request.
However, Ng teaches wherein the request of creating the GPU services is a Hyper Text Transfer Protocol request (
Ng  discloses, “/ServiceInstances POST N/A This is a request to create an instance of the specified service instance resulting in the creation of a service instance that can be queried above,” ¶ 0043.
POST is a standard request method used in HTTP and HTTPS.
After the combination of Yeh in view of Chiu, Jiang, and Dong, with Ng, the request from Ng is used to create an instance of a GPU service from Yeh in view of Chiu, Jiang, and Dong.).
Yeh in view of Chiu, Jiang, and Dong, and Ng are both considered to be analogous to the claimed invention because they are in the same field of online computing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Ng and provide wherein the request of creating the GPU services is a Hyper Text Transfer Protocol request. Doing so would help provide an efficient/convenientz way to create Pods using web-related technologies.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Yeh (KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud) in view of Chiu (US 20200242724 A1), Jiang (US 20180157536 A1), Dong (US 10423456 B2), and Kannan (US 20190311807 A1).
Regarding Claim 13, Yeh in view of Chiu, Jiang, and Dong teaches the method according to claim 1. Yeh in view of Chiu, Jiang, and Dong does not teach wherein according to the calculating request, determining a specification of a GPU graphic memory or GPU time slice required to be applied for comprises: according to the calculating request, send a HTTP request to a GPU-node proxy to apply for the GPU graphic memory or the GPU time slice.
However, Kannan teaches wherein according to the calculating request, determining a specification of a GPU graphic memory or GPU time slice required to be applied for (see Claim 1 rejection for analysis ) comprises:
according to the calculating request, send a HTTP request to a GPU-node proxy to apply for the GPU graphic memory or the GPU time slice (
Kannan discloses, “The infrastructure 400 may be exposed to users via an application proxy (App Proxy) node 402, which may route requests to an administrative application (Admin App) 404 and/or a client application (Client App) 406, while restricting access to other internal resources. The App Proxy node 402 may be a Hypertext Transfer Protocol (HTTP) server, which may handle both serving web resources and acting as a proxy between the public internet and the Admin App 404 and the Client App 406… The infrastructure 400 may be built on top of a Docker container platform. Each of the major components in the infrastructure 400 may be packaged and deployed as a self-contained container, avoiding the need to keep development and production environments in synchronization in terms of tools, dependencies, etc.,” ¶ 0060.
The claimed “GPU-node proxy” is mapped to the disclosed “application proxy (App Proxy) node”.
After the combination of Yeh in view of Chiu, Jiang, and Dong, with Kannan, Kannan’s “application proxy node” is used to serve Yeh in view of Chiu, Jiang, and Dong’s GPU graphic memory/time slice to users.).
Yeh in view of Chiu, Jiang, and Dong, and Kannan are both considered to be analogous to the claimed invention because they are in the same field of container-based technologies. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh in view of Chiu, Jiang, and Dong to incorporate the teachings of Kannan and provide wherein according to the calculating request, determining a specification of a GPU graphic memory or GPU time slice required to be applied for comprises: according to the calculating request, send a HTTP request to a GPU-node proxy to apply for the GPU graphic memory or the GPU time slice. Doing so would help provide an efficient/convenient way to create Pods using web-related technologies.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Khalid et al. (US 20210027415 A1): System and Methods for Distributed GPU Using Multi-access Edge Compute Services
Bao et al. (US 20180373540 A1): Method For Graphical Processing Unit (GPU) Resource Sharing In Computing Cluster, Involves Scheduling Backlog Tasks And Shifting Tasks Within Set Of Tasks Among Respective Stages In Equivalence Stages According To Determined Cost
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW SUN whose telephone number is (571)272-6735. The examiner can normally be reached Monday-Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW NMN SUN/Examiner, Art Unit 2195                                                                                                                                                                                                        
/Aimee Li/Supervisory Patent Examiner, Art Unit 2195
Read full office action
Prosecution Timeline

May 24, 2023
Application Filed
Nov 19, 2025
Non-Final Rejection mailed — §103
Jan 21, 2026
Response Filed
Apr 07, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/643,258
Patent 12632312
AUTOMATIC RESOURCE QUOTA CALCULATIONS BASED ON TENANT WORKLOADS
4y 5m to grant Granted May 19, 2026
17/568,804
Patent 12625734
HIGH AVAILABILITY SCHEDULER EVENT TRACKING
4y 4m to grant Granted May 12, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
50%
Grant Probability
99%
With Interview (+100.0%)
3y 5m (~5m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 8 resolved cases by this examiner. Grant probability derived from career allowance rate.