Last updated: July 17, 2026
Application No. 18/060,104
SYSTEM AND METHOD FOR MANAGING INFERENCE MODELS BASED ON INFERENCE GENERATION FREQUENCIES

Non-Final OA §103
Filed
Nov 30, 2022
Examiner
TRAN, DAVID HOANG
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Dell Products L.P.
OA Round
3 (Non-Final)
This examiner grants 12% of cases after interview

— +21.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 16 resolved cases, 2023–2026
Examiner Intelligence

TRAN, DAVID HOANG View full profile →
Grants only 12% of cases
Career Allowance Rate
2 granted / 16 resolved
-42.5% vs TC avg
Strong +22% interview lift
Without
With
+21.9%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.2%
-36.8% vs TC avg
§103
95.7%
+55.7% vs TC avg
§102
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/04/2026 has been entered.
Information Disclosure Statement
Acknowledgment is made of the Information Disclosure Statement dated 05/04/2026. All of the cited references have been considered.
Response to Arguments
Applicant’s arguments on pages 10-13 regarding the rejection under 35 U.S.C. 103 with respect to claims 1-18 have been fully considered but are moot. New reference Zhang has been incorporated below to teach the newly presented limitations.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 10-18, 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Rafferty et al. (US20230168932A1); hereinafter Rafferty in view of Zhang et al. (Model-Switching: Dealing with Fluctuating Workloads in Machine-Learning-as-a-Service Systems)
	Claim 1 is rejected over Rafferty and Zhang.

Regarding claim 1, Rafferty teaches a method of managing execution of a first inference model hosted by data processing systems, the method comprising: (Rafferty [0037]: “an inference model is configured to receive a request and make an inference based on the request, and that inference is returned as a response to the request.“; and [0023]: “methods, systems, computer readable media, etc. for training a machine learning model to monitor and/or predict usage of computing services and inference models to scale up or down computing resources for use by those computing services and/or inference models.”)
obtaining an inference frequency capability of the first inference model, (Rafferty [0027]: “the number of historical inquiries into the inference model”)
making a first determination regarding whether the inference frequency capability of the first inference model meets an inference frequency requirement of a downstream consumer during a future period of time; (Rafferty [0028]: “the machine learning model may determine an estimated elapsed time or time range in which a request into the inference model may follow a user's interaction with the website or webpage. In this way, an upstream aspect of a computing system may therefore be used to predict future demand on the inference model … By receiving new, real-time usage data of the website or webpage, the system can then use the trained machine learning model to predict what demand for the inference model will be and scale the resources accordingly.”)
in an instance of the first determination in which the inference frequency capability of the first inference model does not meet the inference frequency requirement of the downstream consumer: (Rafferty [0029]: “Adjusting the computing resources for an inference model after a large number of requests have been received at an inference model may be too late, as performance of the inference model may already be significantly impacted (e.g., slowed response times, crashing, failed responses, etc.). Thus, the technical aspects of using a machine learning model to actually predict when an inference model's traffic/requests will increase so that computing resources can be scaled up or down before a change in usage provides a significant technical advantage over systems that cannot predict usage levels of an inference model.”)
obtaining an execution plan for the first inference model based on the inference frequency requirement of the downstream consumer; and (Rafferty [0023]: “In particular, by using a trained machine learning model to monitor usage of computing services and/or resources (e.g., a first usage) to determine and predict when usage of an inference model (e.g., a second usage) will scale up or down, the system can predictively increase or decrease computing resources allocated to and/or used by an inference model.”; and [0040]: “At an operation 262, a command is transmitted, in real-time, in response to the determination that the current usage data is indicative of the at least one first spike in the first usage of the at least one first computing service, where the command is transmitted to increase an amount of computing resources available for an execution of the inference model.”; Note: See Figure 2 262 to see the execution plan where computing resources are increased for execution of the inference model.)
prior to the future period of time, modifying a deployment of the first inference model to the data processing systems based on the execution plan. (Rafferty [0009]: “FIG. 2 is a flowchart illustrating a process for training a machine learning model and using that machine learning model to monitor and/or predict usage of computing services and inference models to scale up computing resources for use by those computing services and/or inference models in accordance with one or more embodiments of the present disclosure.”; and [0010]: “FIG. 3 is a flowchart illustrating a process for monitoring and predicting usage of computing services and inference models to scale down computing resources for use by those computing services and/or inference models in accordance with one or more embodiments of the present disclosure.”; Note: Scaling computing resources for the execution of the inference model is modifying the deployment of the first inference model.)
Rafferty does not appear to explicitly teach the inference capability indicating a rate at which the first inference model is able to generate inferences, the rate being indicative of a number of inferences the first inference model is capable of generating within a period of time;
by executing at least one action selected from a group of actions consisting of:
deploying additional instances of the first inference model to the data processing systems,
deploying instances of a third inference model to the data processing systems where the third inference model is a different and separately trained inference model from the first inference model, and
terminating one or more existing instances of the first inference model that are hosted on the data processing systems.
However, Zhang teaches the inference capability indicating a rate at which the first inference model is able to generate inferences, the rate being indicative of a number of inferences the first inference model is capable of generating within a period of time; (Zhang [2.1 DNNs and MLaaS]: “The execution time of DNN inference depends on its depth, the size of each layer’s feature maps and filters.”; [3.1 Effective Accuracy]: “the fraction of requests that meet deadline D assuming requests arrive at rate λ.”)
by executing at least one action selected from a group of actions consisting of:

deploying instances of a third inference model to the data processing systems where the third inference model is a different and separately trained inference model from the first inference model, and (Zhang [Abstract]: “we propose to switch from complex and highly accurate DNN models to simpler but less accurate models in the presence of load spikes. We show that the flexibility introduced by enabling online model switching provides higher effective accuracy in the presence of fluctuating workloads compared to serving using any single model”; and [4 Evaluation]: “Each model is pre-trained in Pytorch [32] on Imagenet [13], and deployed into container with <R:4,T:4> as microservices”)
It would have been obvious before the effective filing date to combine the scaling of resources for inference models of Rafferty with the model switching of Zhang to efficiently deal with fluctuating workloads (Zhang, [Abstract]). Zhang and Rafferty are analogous art because they both concern scaling computer resources.

Claim 2 is rejected over Rafferty and Zhang with the incorporation of claim 1.
Regarding claim 2, Rafferty teaches wherein the inference frequency capability of the first inference model is determined using historical operating data the first inference model during a previous period of time. (Rafferty [0036]: “FIG. 2 is a flowchart illustrating a process 250 for training a machine learning model and using that machine learning model to monitor and/or predict usage of computing services and inference models to scale up computing resources for use by those computing services and/or inference models in accordance with one or more embodiments of the present disclosure. At an operation 252, historical usage data associated with a plurality of computing services (e.g., first usage) provided by a plurality of distributed servers is received. The computing services may be one or more of a website provider service, an advertisement provider service, an in-store traffic monitoring service, a transaction or purchase tracking service, or a game console service. In such examples, usage patterns of those services may be determined by the machine learning model/algorithm to be indicative of usage spikes or decreases in the usage of a given inference model(s)”)
Claim 3 is rejected over Rafferty and Zhang with the incorporation of claim 1.
Regarding claim 3, Rafferty teaches obtaining data anticipating an event impacting the rate at which the first inference model is able to generate inferences; and (Rafferty [0038]: “At an operation 256, a machine learning model is trained based on the historical usage data to determine a correlation between a first usage of at least one first computing service of the plurality of computing services and a second usage of the inference model. That correlation may indicate that at least one first spike in the first usage of the at least one first computing service precedes at least one second spike in the second usage of the inference model.”; Note: The first spike in the first usage of the computing service will impact execution of the inference model.)
obtaining the inference frequency requirement of the downstream consumer during the future period of time based on the data anticipating the event impacting the rate at which the first inference model is able to generate inferences. (Rafferty [0038]: “In various embodiments, multiple correlations between usage data of the computing services and inference model may be made, where each correlation is representative of a different prediction for what usage of an inference model will look like based on usage of one or more computer services. In various embodiments, if usage data for multiple inference models is input, the machine learning model may also learn correlations between usage of inference models, so that scaling of resources for a first inference model may also be based on usage data of one or more second inference models.”)
Claim 4 is rejected over Rafferty and Zhang with the incorporation of claim 1.
Regarding claim 4, Rafferty teaches historical data indicating occurrences of events requiring a change in the inference frequency capability of the first inference model; (Rafferty [0038]: “At an operation 256, a machine learning model is trained based on the historical usage data to determine a correlation between a first usage of at least one first computing service of the plurality of computing services and a second usage of the inference model. That correlation may indicate that at least one first spike in the first usage of the at least one first computing service precedes at least one second spike in the second usage of the inference model.”; Note: The first spike in the first usage of the computing service will impact execution of the inference model.)
	current operational data of the data processing systems; and (Rafferty [0004]: “the method further includes receiving, by the one or more processors, in real-time, current usage data associated with the at least one first computing service of the plurality of computing services. The method further includes determining, by the one or more processors based on the current usage data and the correlation, in real-time, that the current usage data is indicative of the at least one first spike in the first usage of the at least one first computing service that precedes the at least one second spike in the second usage of the inference model.”)
a transmission from the downstream consumer indicating a change in operation of the downstream consumer. (Rafferty [0040]: “At an operation 262, a command is transmitted, in real-time, in response to the determination that the current usage data is indicative of the at least one first spike in the first usage of the at least one first computing service, where the command is transmitted to increase an amount of computing resources available for an execution of the inference model.”; Note: See Figure 2 262 to see the execution plan.)
Claim 5 is rejected over Rafferty and Zhang with the incorporation of claim 1.
Regarding claim 5, Rafferty teaches feeding the data anticipating the event impacting the rate at which the first inference model is able to generate inferences into a second inference model, the second inference model being trained to predict the inference frequency requirement of the downstream consumer during the future period of time. (Rafferty [0037]: “As also described herein, an inference model is configured to receive a request and make an inference based on the request, and that inference is returned as a response to the request. The inference model may be, for example, one or more of a credit checking service, a credit limit estimation service, a line of credit approval s “; and [0038]: “At an operation 256, a machine learning model is trained based on the historical usage data (second inference model) to determine a correlation between a first usage of at least one first computing service of the plurality of computing services and a second usage of the inference model. That correlation may indicate that at least one first spike in the first usage of the at least one first computing service precedes at least one second spike in the second usage of the inference model. In various embodiments, multiple correlations between usage data of the computing services and inference model may be made, where each correlation is representative of a different prediction for what usage of an inference model will look like based on usage of one or more computer services. In various embodiments, if usage data for multiple inference models is input, the machine learning model may also learn correlations between usage of inference models, so that scaling of resources for a first inference model may also be based on usage data of one or more second inference models. These correlations may indicate, for example, a number of queries or requests an inference model is predicted to receive in a future time window.”; The number of queries or requests is the inference frequency requirement.)
Claim 6 is rejected over Rafferty and Zhang with the incorporation of claim 1.
Regarding claim 6, Rafferty teaches wherein the execution plan indicates a change in the deployment of the first inference model to meet the inference frequency requirement of the downstream consumer during the future period of time. (Rafferty [0038]: “At an operation 256, a machine learning model is trained based on the historical usage data to determine a correlation between a first usage of at least one first computing service of the plurality of computing services and a second usage of the inference model. That correlation may indicate that at least one first spike in the first usage of the at least one first computing service precedes at least one second spike in the second usage of the inference model. In various embodiments, multiple correlations between usage data of the computing services and inference model may be made, where each correlation is representative of a different prediction for what usage of an inference model will look like based on usage of one or more computer services. In various embodiments, if usage data for multiple inference models is input, the machine learning model may also learn correlations between usage of inference models, so that scaling of resources for a first inference model may also be based on usage data of one or more second inference models. These correlations may indicate, for example, a number of queries or requests an inference model is predicted to receive in a future time window.”; and [0004]: “The method further includes transmitting, by the one or more processors in response to the determination that the current usage data is indicative of the at least one first spike in the first usage of the at least one first computing service, in real-time, at least one command to increase an amount of computing resources available for an execution of the inference model.”; and Note: The number of queries or requests is the inference frequency requirement.)
Claim 7 is rejected over Rafferty and Zhang with the incorporation of claim 1.
Regarding claim 7, Rafferty teaches obtaining a quantity of instances of the first inference model required to meet the inference frequency requirement of the downstream consumer during the future period of time based on characteristics of the first inference model; (Rafferty [0038]: “if usage data for multiple inference models is input, the machine learning model may also learn correlations between usage of inference models, so that scaling of resources for a first inference model may also be based on usage data of one or more second inference models. These correlations may indicate, for example, a number of queries or requests an inference model is predicted to receive in a future time window.”)
making a second determination that the data processing systems have sufficient computing resource capacity to execute the quantity of instances of the first inference model; and (Rafferty [0023]: “In particular, by using a trained machine learning model to monitor usage of computing services and/or resources (e.g., a first usage) to determine and predict when usage of an inference model (e.g., a second usage) will scale up or down, the system can predictively increase or decrease computing resources allocated to and/or used by an inference model.”)
based on the second determination:
generating the execution plan specifying which of the data processing systems are to host each of the quantity of the instances of the first inference model. (Rafferty [0040]: “At an operation 262, a command is transmitted, in real-time, in response to the determination that the current usage data is indicative of the at least one first spike in the first usage of the at least one first computing service, where the command is transmitted to increase an amount of computing resources available for an execution of the inference model.”; Note: See Figure 2 262 to see the execution plan.)
Claim 10 is rejected over Rafferty and Zhang with the incorporation of claim 1.
Regarding claim 10, Rafferty teaches a non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing execution of a first inference model hosted by data processing systems, the operations comprising: (Rafferty [0006]: “an exemplary technically improved non-transitory computer readable medium having instructions stored thereon that, upon execution by a computing device, cause the computing device to perform operations including receiving resource scheduling instructions configured to, upon execution by the computing device, increase an amount of computing resources available for an execution of an inference model.”)
	The remainder of claim 10 is claim 1 in a form of a non-transitory machine-readable medium and is rejected for the same reasons as claim 1 stated above.
Dependent claim 11 is claim 2 in the form of a non-transitory machine-readable medium and is rejected for the same reasons as claim 2 stated above. For the rejection of the limitations specifically pertaining to the non-transitory machine-readable medium of claim 10, see the rejection of claim 10 above.
Dependent claim 12 is claim 3 in the form of a non-transitory machine-readable medium and is rejected for the same reasons as claim 3 stated above. For the rejection of the limitations specifically pertaining to the non-transitory machine-readable medium of claim 10, see the rejection of claim 10 above.
Dependent claim 13 is claim 4 in the form of a non-transitory machine-readable medium and is rejected for the same reasons as claim 4 stated above. For the rejection of the limitations specifically pertaining to the non-transitory machine-readable medium of claim 10, see the rejection of claim 10 above.
Dependent claim 14 is claim 5 in the form of a non-transitory machine-readable medium and is rejected for the same reasons as claim 5 stated above. For the rejection of the limitations specifically pertaining to the non-transitory machine-readable medium of claim 10, see the rejection of claim 10 above.
Dependent claim 15 is claim 6 in the form of a non-transitory machine-readable medium and is rejected for the same reasons as claim 6 stated above. For the rejection of the limitations specifically pertaining to the non-transitory machine-readable medium of claim 10, see the rejection of claim 10 above.
Claim 16 is rejected over Rafferty and Zhang.
Regarding claim 16, Rafferty teaches a data processing system, comprising:  a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for managing execution of a first inference model hosted by data processing systems, the operations comprising: (Rafferty [0005]: “the present disclosure provides an exemplary technically improved computer-based system that includes at least the following components of a memory and at least one processor coupled to the memory. The processor is configured to receive historical usage data associated with a plurality of computing services provided by a plurality of distributed servers. The processor is further configured to receive historical usage data associated with an inference model associated with at least one of the plurality of computing services.”)
The remainder of claim 16 is claim 1 in a form of a processor medium and is rejected for the same reasons as claim 1 stated above.
Dependent claim 17 is claim 2 in the form of a processor and is rejected for the same reasons as claim 2 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 16, see the rejection of claim 16 above.
Dependent claim 18 is claim 3 in the form of a processor and is rejected for the same reasons as claim 3 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 16, see the rejection of claim 16 above.
Claim 21 is rejected over Rafferty and Zhang with the incorporation of claim 1.
	Regarding claim 21, Rafferty does not teach wherein the rate at which the first inference model is able to generate the inferences is determined based on a topology complexity of a neural network making up the first inference model.
	However, Zhang teaches wherein the rate at which the first inference model is able to generate the inferences is determined based on a topology complexity of a neural network making up the first inference model. (Zhang [2.1 DNNs and MLaaS]: “The execution time of DNN inference depends on its depth, the size of each layer’s feature maps and filters.”; [3.1 Effective Accuracy]: “the fraction of requests that meet deadline D assuming requests arrive at rate λ.”)
It would have been obvious before the effective filing date to combine the scaling of resources for inference models of Rafferty with the model switching of Zhang to efficiently deal with fluctuating workloads (Zhang, [Abstract]). Zhang and Rafferty are analogous art because they both concern scaling computer resources.
Claim 22 is rejected over Rafferty and Zhang with the incorporation of claim 1.
	Regarding claim 22, Rafferty does not appear to explicitly teach wherein multiple instances of the first inference model are hosted by the data processing system, and the inference frequency capability of the first inference model is based on a cumulative rate at which all of the multiple instances of the first inference model are able to generate inferences.
	However, Zhang teaches wherein multiple instances of the first inference model are hosted by the data processing system, and the inference frequency capability of the first inference model is based on a cumulative rate at which all of the multiple instances of the first inference model are able to generate inferences. (Zhang [3.2 Online Model-Switching]: “once a model is picked, the framework also selects the optimal number of threads and replicas of the model given the hardware constraints. “; and [3.2.1 Job-level and Thread-level Parallelism]: “How many requests can be serviced in parallel? The answer depends on the number of microservice replicas (R) we have in the system;”; Note: The cumulative rate is the aggregate throughput of the instances running in parallel.)
It would have been obvious before the effective filing date to combine the scaling of resources for inference models of Rafferty with the model switching of Zhang to efficiently deal with fluctuating workloads (Zhang, [Abstract]). Zhang and Rafferty are analogous art because they both concern scaling computer resources.
Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Rafferty and Zhang in view of Feldman et al. (US 20220237506 A1); hereinafter Feldman
Claim 8 is rejected over Rafferty, Zhang and Feldman with the incorporation of claim 1.
	Regarding claim 8, Rafferty teaches obtaining a quantity of instances of the first inference model required to meet the inference frequency requirement of the downstream consumer during the future period of time based on characteristics of the first inference model; (Rafferty [0038]: “if usage data for multiple inference models is input, the machine learning model may also learn correlations between usage of inference models, so that scaling of resources for a first inference model may also be based on usage data of one or more second inference models. These correlations may indicate, for example, a number of queries or requests an inference model is predicted to receive in a future time window.”)
making a second determination that the data processing systems do not have sufficient computing resource capacity to execute the quantity of instances of the first inference model required to meet the inference frequency requirement of the downstream consumer during the future period of time; and (Rafferty [0010]: “In particular, by using a trained machine learning model to monitor usage of computing services and/or resources (e.g., a first usage) to determine and predict when usage of an inference model (e.g., a second usage) will scale up or down, the system can predictively increase or decrease computing resources allocated to and/or used by an inference model.”; [0023] and “FIG. 3 is a flowchart illustrating a process for monitoring and predicting usage of computing services and inference models to scale down computing resources for use by those computing services and/or inference models in accordance with one or more embodiments of the present disclosure.”)
Rafferty does not teach based on the second determination:
obtaining a quantity of instances of a third inference model to be deployed to the data processing systems based on the inference frequency requirement of the downstream consumer during the future period of time; and 
However, Zhang teaches based on the second determination:
obtaining a quantity of instances of a third inference model to be deployed to the data processing systems based on the inference frequency requirement of the downstream consumer during the future period of time; and (Zhang [Abstract]: “we propose to switch from complex and highly accurate DNN models to simpler but less accurate models in the presence of load spikes. We show that the flexibility introduced by enabling online model switching provides higher effective accuracy in the presence of fluctuating workloads compared to serving using any single model”; and [4 Evaluation]: “Each model is pre-trained in Pytorch [32] on Imagenet [13], and deployed into container with <R:4,T:4> as microservices”)
It would have been obvious before the effective filing date to combine the scaling of resources for inference models of Rafferty with the model switching of Zhang to efficiently deal with fluctuating workloads (Zhang, [Abstract]). Zhang and Rafferty are analogous art because they both concern scaling computer resources.
Rafferty does not appear to explicitly teach generating the execution plan specifying which of the data processing systems are to host each of the quantity of the instances of the third inference model.
However, Feldman teaches generating the execution plan specifying which of the data processing systems are to host each of the quantity of the instances of the third inference model. (Feldman [0031]: “The routing manager 164 makes decisions to load, rebalance, delete, distribute, and replicate machine-learning models in the serving containers 128-152, based on the following information. The data model's hierarchy level (2) in the service discovery system 162 provides information about which serving containers are expected to host specific machine-learning models and which serving containers actually host the specified machine-learning models.”)
It would have been obvious before the effective filing date to combine the scaling of resources for inference models of Rafferty with replication of machine learning models of Feldman to effectively rebalance models in serving containers (Feldman [0031]). Rafferty and Feldman are analogous art because they both concern scaling deployment of models.
Claim 9 is rejected over Rafferty, Zhang and Feldman with the incorporation of claim 1.
	Regarding claim 9, Rafferty does not teach obtaining the third inference model, the third inference model being a lower complexity inference model than the first inference model and the data processing systems having capacity to host a sufficient quantity of instances of the third inference model to meet the inference frequency requirement of the downstream consumer during the future period of time; and
obtaining an inference frequency capability of the third inference model while hosted by the data processing systems.
However, Zhang teaches obtaining the third inference model, the third inference model being a lower complexity inference model than the first inference model and the data processing systems having capacity to host a sufficient quantity of instances of the third inference model to meet the inference frequency requirement of the downstream consumer during the future period of time; and (Zhang [Abstract]: “we propose to switch from complex and highly accurate DNN models to simpler but less accurate models in the presence of load spikes. We show that the flexibility introduced by enabling online model switching provides higher effective accuracy in the presence of fluctuating workloads compared to serving using any single model”; and [4 Evaluation]: “Each model is pre-trained in Pytorch [32] on Imagenet [13], and deployed into container with <R:4,T:4> as microservices”)
obtaining an inference frequency capability of the third inference model while hosted by the data processing systems. (Zhang [2.1 DNNs and MLaaS]: “The execution time of DNN inference depends on its depth, the size of each layer’s feature maps and filters.”; [3.1 Effective Accuracy]: “the fraction of requests that meet deadline D assuming requests arrive at rate λ.”)
It would have been obvious before the effective filing date to combine the scaling of resources for inference models of Rafferty with the model switching of Zhang to efficiently deal with fluctuating workloads (Zhang, [Abstract]). Zhang and Rafferty are analogous art because they both concern scaling computer resources.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID H TRAN whose telephone number is (703)756-1525. The examiner can normally be reached M-F 9:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/DAVID H TRAN/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
Prosecution Timeline

Nov 30, 2022
Application Filed
Oct 06, 2025
Non-Final Rejection mailed — §103
Jan 06, 2026
Response Filed
Feb 18, 2026
Final Rejection mailed — §103
May 04, 2026
Request for Continued Examination
May 05, 2026
Response after Non-Final Action
Jul 07, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/480,270
Patent 12632724
CANONICALIZATION OF DATA WITHIN OPEN KNOWLEDGE GRAPHS
4y 8m to grant Granted May 19, 2026
17/571,542
Patent 12579404
PROCESSOR FOR NEURAL NETWORK, PROCESSING METHOD FOR NEURAL NETWORK, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
4y 2m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
12%
Grant Probability
34%
With Interview (+21.9%)
4y 3m (~8m remaining)
Median Time to Grant
High
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allowance rate.