Last updated: April 19, 2026
Application No. 18/623,195
RESOURCE-EFFICIENT FOUNDATION MODEL DEPLOYMENT ON CONSTRAINED EDGE DEVICES

Non-Final OA §101§102§103
Filed
Apr 01, 2024
Examiner
MARLOW, ALEXANDER G
Art Unit
2658
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
Interview Optional

— +20.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 77 resolved cases, 2023–2026
Examiner Intelligence

MARLOW, ALEXANDER G View full profile →
Grants 77% — above average
Career Allow Rate
59 granted / 77 resolved
+14.6% vs TC avg
Strong +21% interview lift
Without
With
+20.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
9 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
16.0%
-24.0% vs TC avg
§103
50.3%
+10.3% vs TC avg
§102
15.0%
-25.0% vs TC avg
§112
9.8%
-30.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 77 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Introduction
This office action is in response to communications filed 04/01/2024. Claims 1-25 are pending, and likewise have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/01/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Independent claims 1, 8 and 15 recite the limitations “comprising: receiving a text-based service request for an artificial intelligence (AI) model for an edge device;”, “generating model and data descriptions using the text-based service request;”, “generating an AI task capacity profile;”, “and selecting a resource-optimal AI model for deployment on the edge device based on the AI task capacity profile”.
	Claim 1 also recites “A computer-implemented method comprising:”.
	Claim 8 also recites “A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising:”.
	Claim 15 also recites “a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform operations comprising:”.
	The limitations “comprising: receiving a text-based service request for an artificial intelligence (AI) model for an edge device;”, “generating model and data descriptions using the text-based service request;”, “generating an AI task capacity profile;”, “and selecting a resource-optimal AI model for deployment on the edge device based on the AI task capacity profile” as drafted, covers a mental process, as this could be done mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application. Claim 1 recites “A computer-implemented method comprising”. Claim 8 recites “A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising”. Claim 15 recites “a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform operations comprising”. These limitations direct towards using a computer for the method, and do not impose any meaningful limits on practicing the abstract idea. Claim 1, 8 and 15 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The addition of the generic computer
components recited above with regard to Claims 1, 8 and 15, does not amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claims 1, 8 and 15 do not contain any additional limitations. The claims as drafted, are not patent eligible.

Dependent claims 2, 9 and 16 recites the additional limitations “wherein the text-based service request comprises a description of an AI task, a description of an AI model architecture, a description of an input to the AI model, a description of an output of the AI model, an example of a deployment scenario of the AI model, an example of a specific use-case for the AI model, an example of a re-use of the AI model, a list of performance requirements of the AI model, or a list of generative prompts to the AI model”. These limitations cover mental processes, as they could be done mentally or by hand with pen and paper.
These judicial exceptions are not integrated into a practical application. Claims 2, 9 and 16 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. Claims 2, 9 and 16 do not contain any additional limitations. The claims as drafted, are not patent eligible.

	Dependent Claims 3, 10 and 17 recite the additional limitations “wherein generating(“the operations to generate” in claims 10 and 17) the model and data descriptions using the text-based service request further comprises: providing the text-based service request to a pre-trained large language model as input;”, “and generating the model and data descriptions using results received from the pre-trained large language model.” These limitations cover a mental processes, as they could be done mentally or by hand with pen and paper. Specifically, the act of providing data to a model and doing something using results from a model, is still an abstract idea. The use of the model is addressed below.
These judicial exceptions are not integrated into a practical application Claims 3 10 and 17 recite “generating the model and data descriptions using results received from the pre-trained large language model”. These limitations direct towards using a computer for the method, and do not impose any meaningful limits on practicing the abstract idea. Claims 3, 10 and 17 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The addition of the generic computer
components recited above with regard to Claims 3, 10 and 17, does not amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claims 3, 10 and 17 do not contain any additional limitations. The claims as drafted, are not patent eligible.

Dependent Claim 4, 11 and 18 recite the additional limitations “wherein generating(“the operations to generate” in claims 11 and 18) the AI task capacity profile further comprises: retrieving a capacity profile of the edge device;”, “identifying performance and resource parameters by comparing the capacity profile of the edge device to an AI model requirements mapping;”, “and generating the AI task capacity profile using the performance and resource parameters.”. These limitations cover a mental processes, as they could be done mentally or by hand with pen and paper.
These judicial exceptions are not integrated into a practical application. Claims 4, 11 and 18 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. Claims 4, 11 and 18 do not contain any additional limitations. The claims as drafted, are not patent eligible.

Dependent Claims 5, 12 and 19 recite the additional limitations “wherein the AI task capacity profile comprises a compatibility list that comprises hardware and software mismatches between a potential AI model and edge device or potential bottlenecks in memory, CPU, GPU, or software infrastructure of the edge device.”. These limitations cover a mental processes, as they could be done mentally or by hand with pen and paper.
These judicial exceptions are not integrated into a practical application. Claims 5, 12 and 19 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. Claims 5, 12 and 19 do not contain any additional limitations. The claims as drafted, are not patent eligible.

Dependent Claims 6, 13 and 20 recite the additional limitations “wherein selecting(“the operations to select” in claims 13 and 20) the resource-optimal AI model for deployment on the edge device based on the AI task capacity profile further comprises: identifying an AI model family using the model and data descriptions and the AI task capacity profile;”, “and selecting a model variant of the AI model family based on the AI task capacity profile and resources of the edge device.”. These limitations cover a mental processes, as they could be done mentally or by hand with pen and paper.
These judicial exceptions are not integrated into a practical application. Claims 6, 13 and 20 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. Claims 6, 13 and 20 do not contain any additional limitations. The claims as drafted, are not patent eligible.

Dependent Claims 7, 14 and 21 recite the additional limitations “wherein the model variant is a compressed, pruned, or quantized AI model to correspond to resources of the edge device. These limitations cover a mental processes, as they could be done mentally or by hand with pen and paper.
These judicial exceptions are not integrated into a practical application. Claims 7, 14 and 21 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. Claims 7, 14 and 21 do not contain any additional limitations. The claims as drafted, are not patent eligible.

	Independent Claims 22 and 24 recite the limitations “receiving a service request for an artificial intelligence (AI) model for an edge device;”, “generating model and data specifications by using automated generative translations of the service request;”, “performing an AI task capacity profiling using the model and data specifications and a capacity profile of the edge device to identify a key performance parameter and a key resource parameter of the AI model;”, “and selecting the AI model for deployment on the edge device based on the key performance parameter and the key resource parameter”.
	Claim 22 also recites “A computer-implemented method comprising:”.
	Claim 24 also recites “A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising:”.
	The limitations “receiving a service request for an artificial intelligence (AI) model for an edge device;”, “generating model and data specifications by using automated generative translations of the service request;”, “performing an AI task capacity profiling using the model and data specifications and a capacity profile of the edge device to identify a key performance parameter and a key resource parameter of the AI model;”, “and selecting the AI model for deployment on the edge device based on the key performance parameter and the key resource parameter”, as drafted, covers a mental process, as this could be done mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application. Claim 22 recites “A computer-implemented method comprising”. Claim 24 recites “A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising”. These limitations direct towards using a computer for the method, and do not impose any meaningful limits on practicing the abstract idea. Claims 22 and 24 do not contain any additional limitations.

	Dependent Claims 23 and 25 recite the additional limitations “where generating(“the operations to generate” in claim 25) the model and data specifications by using the automated generative translations of the service request further comprise: providing the service request to a pre-trained large language model as input;”, “and generating the model and data specifications using results generated by the pre-trained large language model.”. These limitations cover a mental processes, as they could be done mentally or by hand with pen and paper. Specifically, the act of providing data to a model and doing something using results from a model, is still an abstract idea. The use of the model is addressed below.
These judicial exceptions are not integrated into a practical application Claims 23 and 25 recite “and generating the model and data specifications using results generated by the pre-trained large language model”. These limitations direct towards using a computer for the method, and do not impose any meaningful limits on practicing the abstract idea. Claims 23 and 25 do not contain any additional limitations.
The Claims do not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The addition of the generic computer
components recited above with regard to Claims 23 and 25, does not amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claims 23 and 25 do not contain any additional limitations. The claims as drafted, are not patent eligible.

Examiner note#
While Claim 15-21 are a computer program product comprising a computer readable storage medium, the specification of the instant application explicitly states that a computer readable storage medium should not be construed as a form of transitory signal, in the present disclosure(Para [0062]). For this reason, claims 15-21 were not rejected for claiming signals per se. Also, as the computer program product comprises a non-transitory CRM, so claims 15-21 are not rejected for claiming software per se.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen et al “Foundation Model Based Native AI Framework in 6G with Cloud-Edge-End Collaboration”, hereinafter Chen.

Regarding Claim 1:
Chen teaches a computer-implemented method comprising: receiving a text-based service request for an artificial intelligence (AI) model for an edge device(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device); 
generating model and data descriptions using the text-based service request(Pg 4, Col 1, Para 2, Ln 1-8, two prerequisites, namely, the dataset and fine-tuning scheme for the PFM. For the former, we endeavor to construct an expert knowledge library by aggregating non-private data from multiple clouds. To fully exploit the standardized characteristics of wireless communication processes, we categorize user requests based on various criteria, such as the type/target of the task, processing workflow, and signal processing methodologies. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning); 
generating an AI task capacity profile(Pg 4, Col 2, Para 5, Ln 1-6, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices.); 
and selecting a resource-optimal AI model for deployment on the edge device based on the AI task capacity profile(Pg 4, Col 2, Para 5, Ln 1-15, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices……..resource scheduling and task orchestration for a cell have shifted from its edge server to the cloud-based intent aware PFM within this framework. See Pg 5, Fig 3, Edge AI models toolkit. Pg 5, Col 2, Para 1, Ln 1-15, AI models stored in the algorithm toolkits are orchestrated by the well trained PFM to handle tasks across multiple edge/end-user devices……Through intent recognition and unified orchestration, these models are assigned to tasks that align with their capabilities, enabling them to effectively leverage the relationships between tasks).

Regarding Claim 2:
Chen teaches the computer-implemented method of claim 1, wherein the text-based service request comprises a description of an AI task, a description of an AI model architecture, a description of an input to the AI model, a description of an output of the AI model, an example of a deployment scenario of the AI model, an example of a specific use-case for the AI model, an example of a re-use of the AI model, a list of performance requirements of the AI model, or a list of generative prompts to the AI model(Pg 5, Fig 3: Application 1, Application 2).

Regarding Claim 3:
Chen teaches the computer-implemented method of claim 1, wherein generating the model and data descriptions using the text-based service request further comprises: providing the text-based service request to a pre-trained large language model as input(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device);
 and generating the model and data descriptions using results received from the pre-trained large language model(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning. Also See Pg 5 Fig 3, library of expert knowledge & task data -> PFM).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 4 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen as applied to claim 1 above, and further in view of Xu et al. “Joint Foundation Model Caching and Inference of Generative AI Services for Edge Intelligence”, hereinafter Xu.

Regarding Claim 4:
Chen teaches the computer-implemented method of claim 1, wherein generating the AI task capacity profile further comprises: retrieving a capacity profile of the edge device(Pg 4, Col 2, Para 5, Ln 1-15, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices); 
Chen does not specifically teach identifying performance and resource parameters by comparing the capacity profile of the edge device to an AI model requirements mapping; and generating the AI task capacity profile using the performance and resource parameters.
In the same field of AI Edge Computing, Xu teaches identifying performance and resource parameters by comparing the capacity profile of the edge device to an AI model requirements mapping(Pg 2, Col 2, Para 1, Ln 1-13, we consider an edge intelligence system model……cloud data center and edge servers can serve generative AI services. The cloud data center is represented by 0 and the set of edge servers is represented by N = {1,2,...,N}. In this system, edge servers and the cloud center provide generic AI services such as AIGC, depending on different PFMs. Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users…….the binary variable indicating whether model m of application i is cached at edge server n. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Let Gn denote the capacity of GPU memory of edge server n); 
and generating the AI task capacity profile using the performance and resource parameters(Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Chen with the Edge Computing system of Xu, as it can help improve model performance(Pg 6, Col 1, Para 3, Ln 1-12).

Regarding Claim 5:
Chen teaches the computer-implemented method of claim 1, but does not specifically teach wherein the AI task capacity profile comprises a compatibility list that comprises hardware and software mismatches between a potential AI model and edge device or potential bottlenecks in memory, CPU, GPU, or software infrastructure of the edge device.
In the same field of AI Edge Computing, Xu teaches wherein the AI task capacity profile comprises a compatibility list that comprises hardware and software mismatches between a potential AI model and edge device or potential bottlenecks in memory, CPU, GPU, or software infrastructure of the edge device(Pg 2, Col 2, Para 1, Ln 1-13, we consider an edge intelligence system model……cloud data center and edge servers can serve generative AI services. The cloud data center is represented by 0 and the set of edge servers is represented by N = {1,2,...,N}. In this system, edge servers and the cloud center provide generic AI services such as AIGC, depending on different PFMs. Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users…….the binary variable indicating whether model m of application i is cached at edge server n. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Let Gn denote the capacity of GPU memory of edge server n).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Chen with the Edge Computing system of Xu, as it can help improve model performance(Pg 6, Col 1, Para 3, Ln 1-12).

Claim(s) 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen as applied to claim 1 above, and further in view of Zawish et al. “Complexity-Driven Model Compression for Resource-Constrained Deep Learning on Edge”.

Regarding Claim 6:
Chen teaches the computer-implemented method of claim 1, wherein selecting the resource-optimal AI model for deployment on the edge device based on the AI task capacity profile further comprises: identifying an AI model family using the model and data descriptions and the AI task capacity profile(Pg 5, Col 2, Para 1, Ln 1-15, AI models stored in the algorithm toolkits are orchestrated by the well trained PFM to handle tasks across multiple edge/end-user devices……Through intent recognition and unified orchestration, these models are assigned to tasks that align with their capabilities, enabling them to effectively leverage the relationships between tasks. Pg 4, Col 2, Para 5, Ln 1-6, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices. Pg 4, Col 1, Para 2, Ln 1-8, two prerequisites, namely, the dataset and fine-tuning scheme for the PFM. For the former, we endeavor to construct an expert knowledge library by aggregating non-private data from multiple clouds. To fully exploit the standardized characteristics of wireless communication processes, we categorize user requests based on various criteria, such as the type/target of the task, processing workflow, and signal processing methodologies. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods); 
Chen does not teach and selecting a model variant of the AI model family based on the AI task capacity profile and resources of the edge device.
In the same field of AI Edge Computing, Zawish teaches and selecting a model variant of the AI model family based on the AI task capacity profile and resources of the edge device(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13, reduction in memory of a CNN is critical when the aim is to achieve both computation and energy efficiency…..Moreover, in order for the deep models to run at the edge, they must fit within the target device’s RAM without disrupting the IoT application at the runtime. To achieve this, the memory-based complexity of each convolutional layer k can be calculated using. Pg 3894, Col 1, Para 1, Ln 1-2, model M must be either less than or equal to the desired complexityCr).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Chen with the Pruning methods of Zawish, as it can improve computation and energy efficiency of the model(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13).

Regarding Claim 7:
The combination of Chen and Zawish teaches the computer-implemented method of claim 6, but does not teach wherein the model variant is a compressed, pruned, or quantized AI model to correspond to resources of the edge device.
In the same field of AI Edge Computing, Zawish teaches wherein the model variant is a compressed, pruned, or quantized AI model to correspond to resources of the edge device(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13, reduction in memory of a CNN is critical when the aim is to achieve both computation and energy efficiency…..Moreover, in order for the deep models to run at the edge, they must fit within the target device’s RAM without disrupting the IoT application at the runtime. To achieve this, the memory-based complexity of each convolutional layer k can be calculated using. Pg 3894, Col 1, Para 1, Ln 1-2, model M must be either less than or equal to the desired complexityCr).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Chen and Zawish with the Pruning methods of Zawish, as it can improve computation and energy efficiency of the model(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13).

Claim(s) 8-10 and 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen, and further in view of Persia et al. (US 20220327442 A1).

Regarding Claim 8:
Chen teaches receiving a text-based service request for an artificial intelligence (AI) model for an edge device(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device); 
generating model and data descriptions using the text-based service request(Pg 4, Col 1, Para 2, Ln 1-8, two prerequisites, namely, the dataset and fine-tuning scheme for the PFM. For the former, we endeavor to construct an expert knowledge library by aggregating non-private data from multiple clouds. To fully exploit the standardized characteristics of wireless communication processes, we categorize user requests based on various criteria, such as the type/target of the task, processing workflow, and signal processing methodologies. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning); 
generating an AI task capacity profile(Pg 4, Col 2, Para 5, Ln 1-6, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices.); 
and selecting a resource-optimal AI model for deployment on the edge device based on the AI task capacity profile(Pg 4, Col 2, Para 5, Ln 1-15, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices……..resource scheduling and task orchestration for a cell have shifted from its edge server to the cloud-based intent aware PFM within this framework. See Pg 5, Fig 3, Edge AI models toolkit. Pg 5, Col 2, Para 1, Ln 1-15, AI models stored in the algorithm toolkits are orchestrated by the well trained PFM to handle tasks across multiple edge/end-user devices……Through intent recognition and unified orchestration, these models are assigned to tasks that align with their capabilities, enabling them to effectively leverage the relationships between tasks).
Chen does not explicitly teach a system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising.
Persia teaches a system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising(Para [0083], Ln 1-14, Processor 620 is implemented in hardware, firmware, or a combination of hardware and software. In some implementations, processor 620 includes one or more processors capable of being programmed to perform a function. Memory 630 includes a random access memory):
It would have been obvious for one skilled in the art, at the effective time of filing, to modify Chen with the computer components of Persia, as it provides an environment for the system to be realized(Para [0082], Ln 1-13, Para [0083], Ln 1-14).

Regarding Claim 9:
The combination of Chen and Persia teaches the system of claim 8, and Chen teaches wherein the text-based service request comprises a description of an AI task, a description of an AI model architecture, a description of an input to the AI model, a description of an output of the AI model, an example of a deployment scenario of the AI model, an example of a specific use-case for the AI model, an example of a re-use of the AI model, a list of performance requirements of the AI model, or a list of generative prompts to the AI model(Pg 5, Fig 3: Application 1, Application 2).

Regarding Claim 10:
The combination of Chen and Persia teaches the system of claim 8, and Chen teaches wherein the operations to generate the model and data descriptions using the text-based service request further comprise: providing the text-based service request to a pre-trained large language model as input(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device);
 and generating the model and data descriptions using results received from the pre-trained large language model(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning. Also See Pg 5 Fig 3, library of expert knowledge & task data -> PFM).

Regarding Claim 15:
Chen teaches receiving a text-based service request for an artificial intelligence (AI) model for an edge device(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device); 
generating model and data descriptions using the text-based service request(Pg 4, Col 1, Para 2, Ln 1-8, two prerequisites, namely, the dataset and fine-tuning scheme for the PFM. For the former, we endeavor to construct an expert knowledge library by aggregating non-private data from multiple clouds. To fully exploit the standardized characteristics of wireless communication processes, we categorize user requests based on various criteria, such as the type/target of the task, processing workflow, and signal processing methodologies. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning); 
generating an AI task capacity profile(Pg 4, Col 2, Para 5, Ln 1-6, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices.); 
and selecting a resource-optimal AI model for deployment on the edge device based on the AI task capacity profile(Pg 4, Col 2, Para 5, Ln 1-15, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices……..resource scheduling and task orchestration for a cell have shifted from its edge server to the cloud-based intent aware PFM within this framework. See Pg 5, Fig 3, Edge AI models toolkit. Pg 5, Col 2, Para 1, Ln 1-15, AI models stored in the algorithm toolkits are orchestrated by the well trained PFM to handle tasks across multiple edge/end-user devices……Through intent recognition and unified orchestration, these models are assigned to tasks that align with their capabilities, enabling them to effectively leverage the relationships between tasks).
Chen does not explicitly teach a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform operations comprising.
Persia teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform operations comprising(Para [0083], Ln 1-14, Processor 620 is implemented in hardware, firmware, or a combination of hardware and software. In some implementations, processor 620 includes one or more processors capable of being programmed to perform a function. Memory 630 includes a random access memory):
It would have been obvious for one skilled in the art, at the effective time of filing, to modify Chen with the computer components of Persia, as it provides an environment for the system to be realized(Para [0082], Ln 1-13, Para [0083], Ln 1-14).

Regarding Claim 16:
Claim 16 contains similar limitations as Claim 9, and is therefore rejected for the same reasons.

Regarding Claim 17:
Claim 17 contains similar limitations as Claim 10, and is therefore rejected for the same reasons.

Claim(s) 11-12 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chen and Persia as applied to claim 8 above, and further in view of Xu.

Regarding Claim 11:
The combination of Chen and Persia teaches the system of claim 8, and Chen teaches wherein the operations to generate the AI task capacity profile further comprises: retrieving a capacity profile of the edge device(Pg 4, Col 2, Para 5, Ln 1-15, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices); 
The combination of Chen and Persia does not specifically teach identifying performance and resource parameters by comparing the capacity profile of the edge device to an AI model requirements mapping; and generating the AI task capacity profile using the performance and resource parameters.
In the same field of AI Edge Computing, Xu teaches identifying performance and resource parameters by comparing the capacity profile of the edge device to an AI model requirements mapping(Pg 2, Col 2, Para 1, Ln 1-13, we consider an edge intelligence system model……cloud data center and edge servers can serve generative AI services. The cloud data center is represented by 0 and the set of edge servers is represented by N = {1,2,...,N}. In this system, edge servers and the cloud center provide generic AI services such as AIGC, depending on different PFMs. Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users…….the binary variable indicating whether model m of application i is cached at edge server n. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Let Gn denote the capacity of GPU memory of edge server n); 
and generating the AI task capacity profile using the performance and resource parameters(Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Chen and Persia with the Edge Computing system of Xu, as it can help improve model performance(Pg 6, Col 1, Para 3, Ln 1-12).

Regarding Claim 12:
The combination of Chen and Persia teaches the system of claim 8, but does not specifically teach wherein the AI task capacity profile comprises a compatibility list that comprises hardware and software mismatches between a potential AI model and edge device or potential bottlenecks in memory, CPU, GPU, or software infrastructure of the edge device.
In the same field of AI Edge Computing, Xu teaches wherein the AI task capacity profile comprises a compatibility list that comprises hardware and software mismatches between a potential AI model and edge device or potential bottlenecks in memory, CPU, GPU, or software infrastructure of the edge device(Pg 2, Col 2, Para 1, Ln 1-13, we consider an edge intelligence system model……cloud data center and edge servers can serve generative AI services. The cloud data center is represented by 0 and the set of edge servers is represented by N = {1,2,...,N}. In this system, edge servers and the cloud center provide generic AI services such as AIGC, depending on different PFMs. Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users…….the binary variable indicating whether model m of application i is cached at edge server n. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Let Gn denote the capacity of GPU memory of edge server n).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Chen and Persia with the Edge Computing system of Xu, as it can help improve model performance(Pg 6, Col 1, Para 3, Ln 1-12).

Regarding Claim 18:
Claim 18 contains similar limitations as Claim 11, and is therefore rejected for the same reasons.

Regarding Claim 19:
Claim 19 contains similar limitations as Claim 12, and is therefore rejected for the same reasons.

Claim(s) 13-14 and 20-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chen and Persia as applied to claim 8 above, and further in view of Zawish.

Regarding Claim 13:
The combination of Chen and Persia teaches the system of claim 8, and Chen teaches wherein the operations to select the resource-optimal AI model for deployment on the edge device based on the AI task capacity profile further comprise: identifying an AI model family using the model and data descriptions and the AI task capacity profile(Pg 5, Col 2, Para 1, Ln 1-15, AI models stored in the algorithm toolkits are orchestrated by the well trained PFM to handle tasks across multiple edge/end-user devices……Through intent recognition and unified orchestration, these models are assigned to tasks that align with their capabilities, enabling them to effectively leverage the relationships between tasks. Pg 4, Col 2, Para 5, Ln 1-6, intelligent edge within this framework pools the majority of its local resources. These resources are represented as part of the edge’s status information, which is then transmitted to the cloud along with requests from end-user devices. Pg 4, Col 1, Para 2, Ln 1-8, two prerequisites, namely, the dataset and fine-tuning scheme for the PFM. For the former, we endeavor to construct an expert knowledge library by aggregating non-private data from multiple clouds. To fully exploit the standardized characteristics of wireless communication processes, we categorize user requests based on various criteria, such as the type/target of the task, processing workflow, and signal processing methodologies. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods); 
The combination of Chen and Persia does not teach and selecting a model variant of the AI model family based on the AI task capacity profile and resources of the edge device.
In the same field of AI Edge Computing, Zawish teaches and selecting a model variant of the AI model family based on the AI task capacity profile and resources of the edge device(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13, reduction in memory of a CNN is critical when the aim is to achieve both computation and energy efficiency…..Moreover, in order for the deep models to run at the edge, they must fit within the target device’s RAM without disrupting the IoT application at the runtime. To achieve this, the memory-based complexity of each convolutional layer k can be calculated using. Pg 3894, Col 1, Para 1, Ln 1-2, model M must be either less than or equal to the desired complexityCr).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Chen and Persia with the Pruning methods of Zawish, as it can improve computation and energy efficiency of the model(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13).

Regarding Claim 14:
The combination of Chen, Persia and Zawish teaches the system of claim 13, but does not teach wherein the model variant is a compressed, pruned, or quantized AI model to correspond to resources of the edge device.
In the same field of AI Edge Computing, Zawish teaches wherein the model variant is a compressed, pruned, or quantized AI model to correspond to resources of the edge device(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13, reduction in memory of a CNN is critical when the aim is to achieve both computation and energy efficiency…..Moreover, in order for the deep models to run at the edge, they must fit within the target device’s RAM without disrupting the IoT application at the runtime. To achieve this, the memory-based complexity of each convolutional layer k can be calculated using. Pg 3894, Col 1, Para 1, Ln 1-2, model M must be either less than or equal to the desired complexityCr).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Chen, Persia and Zawish with the Pruning methods of Zawish, as it can improve computation and energy efficiency of the model(Pg 3892, Col 2, Para 2, Memory Aware Pruning, Ln 1-13).

Regarding Claim 20:
Claim 20 contains similar limitations as Claim 13, and is therefore rejected for the same reasons.

Regarding Claim 21:
Claim 21 contains similar limitations as Claim 14, and is therefore rejected for the same reasons.

Claim(s) 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen, and further in view of Xu.

Regarding Claim 22:
Chen teaches a computer-implemented method comprising: receiving a service request for an artificial intelligence (AI) model for an edge device(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device); 
generating model and data specifications by using automated generative translations of the service request(Pg 4, Col 1, Para 2, Ln 1-8, two prerequisites, namely, the dataset and fine-tuning scheme for the PFM. For the former, we endeavor to construct an expert knowledge library by aggregating non-private data from multiple clouds. To fully exploit the standardized characteristics of wireless communication processes, we categorize user requests based on various criteria, such as the type/target of the task, processing workflow, and signal processing methodologies. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning);
Chen does not teach performing an AI task capacity profiling using the model and data specifications and a capacity profile of the edge device to identify a key performance parameter and a key resource parameter of the AI model; and selecting the AI model for deployment on the edge device based on the key performance parameter and the key resource parameter.
In the same field of AI Edge Computing, Xu teaches performing an AI task capacity profiling using the model and data specifications and a capacity profile of the edge device to identify a key performance parameter and a key resource parameter of the AI model(Pg 2, Col 2, Para 1, Ln 1-13, we consider an edge intelligence system model……cloud data center and edge servers can serve generative AI services. The cloud data center is represented by 0 and the set of edge servers is represented by N = {1,2,...,N}. In this system, edge servers and the cloud center provide generic AI services such as AIGC, depending on different PFMs. Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users…….the binary variable indicating whether model m of application i is cached at edge server n. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Let Gn denote the capacity of GPU memory of edge server n. Pg 4, Col 2, Para 4, Ln 1 – Pg 5, Col 1, Para 1, Ln 7, When additional GPU memory is required for loading an uncached requested PFM, the LC algorithm counts the number of examples in context, calculates them, and removes the cached PFM with the fewest effective examples in context. Therefore, at each timeslot t, the model caching decisions can be obtained by solving the maximization problem of the number of effective examples for the cached models, which can be represented as(eq 13(a-c)).); 
and selecting the AI model for deployment on the edge device based on the key performance parameter and the key resource parameter(. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Pg 4, Col 2, Para 4, Ln 1 – Pg 5, Col 1, Para 1, Ln 7, When additional GPU memory is required for loading an uncached requested PFM, the LC algorithm counts the number of examples in context, calculates them, and removes the cached PFM with the fewest effective examples in context. Therefore, at each timeslot t, the model caching decisions can be obtained by solving the maximization problem of the number of effective examples for the cached models, which can be represented as(eq 13(a-c)).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Chen with the Edge Computing system of Xu, as it can help improve model performance(Pg 6, Col 1, Para 3, Ln 1-12).

Regarding Claim 23:
The combination of Chen and Xu teaches the computer-implemented method of claim 22, and Chen teaches where generating the model and data specifications by using the automated generative translations of the service request further comprise: providing the service request to a pre-trained large language model as input(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device); 
and generating the model and data specifications using results generated by the pre-trained large language model(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning. Also See Pg 5 Fig 3, library of expert knowledge & task data -> PFM).

Claim(s) 24-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen, and further in view of Persia and further in view of Xu.

Regarding Claim 24:
Chen teaches receiving a service request for an artificial intelligence (AI) model for an edge device(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device); 
generating model and data specifications by using automated generative translations of the service request(Pg 4, Col 1, Para 2, Ln 1-8, two prerequisites, namely, the dataset and fine-tuning scheme for the PFM. For the former, we endeavor to construct an expert knowledge library by aggregating non-private data from multiple clouds. To fully exploit the standardized characteristics of wireless communication processes, we categorize user requests based on various criteria, such as the type/target of the task, processing workflow, and signal processing methodologies. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning);
Chen does not teach a system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising.
Persia teaches a system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising(Para [0083], Ln 1-14, Processor 620 is implemented in hardware, firmware, or a combination of hardware and software. In some implementations, processor 620 includes one or more processors capable of being programmed to perform a function. Memory 630 includes a random access memory):
It would have been obvious for one skilled in the art, at the effective time of filing, to modify Chen with the computer components of Persia, as it provides an environment for the system to be realized(Para [0082], Ln 1-13, Para [0083], Ln 1-14).
The combination of Chen and Persia does not teach performing an AI task capacity profiling using the model and data specifications and a capacity profile of the edge device to identify a key performance parameter and a key resource parameter of the AI model; and selecting the AI model for deployment on the edge device based on the key performance parameter and the key resource parameter.
In the same field of AI Edge Computing, Xu teaches performing an AI task capacity profiling using the model and data specifications and a capacity profile of the edge device to identify a key performance parameter and a key resource parameter of the AI model(Pg 2, Col 2, Para 1, Ln 1-13, we consider an edge intelligence system model……cloud data center and edge servers can serve generative AI services. The cloud data center is represented by 0 and the set of edge servers is represented by N = {1,2,...,N}. In this system, edge servers and the cloud center provide generic AI services such as AIGC, depending on different PFMs. Pg 3, Col 1, Para 1, Ln 1-16, To offer AI services based on PFMs, we propose a joint foundation model caching and inference framework. Edge servers need to make model caching and request offloading decisions to utilize the existing edge computing resources for accommodating generative AI service requests of mobile users…….the binary variable indicating whether model m of application i is cached at edge server n. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Let Gn denote the capacity of GPU memory of edge server n. Pg 4, Col 2, Para 4, Ln 1 – Pg 5, Col 1, Para 1, Ln 7, When additional GPU memory is required for loading an uncached requested PFM, the LC algorithm counts the number of examples in context, calculates them, and removes the cached PFM with the fewest effective examples in context. Therefore, at each timeslot t, the model caching decisions can be obtained by solving the maximization problem of the number of effective examples for the cached models, which can be represented as(eq 13(a-c)).); 
and selecting the AI model for deployment on the edge device based on the key performance parameter and the key resource parameter(. Pg 3, Col 1, Para 2, Ln 1-5, The generative AI service requests of users can be executed at edge servers if the required components of models are loaded at the GPU memories. Pg 4, Col 2, Para 4, Ln 1 – Pg 5, Col 1, Para 1, Ln 7, When additional GPU memory is required for loading an uncached requested PFM, the LC algorithm counts the number of examples in context, calculates them, and removes the cached PFM with the fewest effective examples in context. Therefore, at each timeslot t, the model caching decisions can be obtained by solving the maximization problem of the number of effective examples for the cached models, which can be represented as(eq 13(a-c)).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Chen and Persia with the Edge Computing system of Xu, as it can help improve model performance(Pg 6, Col 1, Para 3, Ln 1-12).

Regarding Claim 25:
The combination of Chen, Persia and Xu teaches the system of claim 24, and Chen teaches where the operations to generate the model and data specifications by using the automated generative translations of the service request further comprise: providing the service request to a pre-trained large language model as input(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device); 
and generating the model and data specifications using results generated by the pre-trained large language model(Pg 3, Col 2, Para 2, Ln 1-6, The cloud identifies user intent by processing queries from the end-user device using the intent-aware PFM. It then manages task orchestration and resource allocation for the edge or end-user device. Pg 4, Col 1, Para 3, Ln 1-5, The latter requires a meticulously designed fine-tuning method due to the massive scale and resource requirements of the majority of foundation models. Thus, we implement the native intent-aware PFM by fine-tuning the existing PFM with parameter-efficient fine-tuning methods. Examples available in Pg 4, Col 1, Para 4 - Col 2 Para 3: Prompt tuning, Prefix Tuning, LoRA and Adaptor tuning. Also See Pg 5 Fig 3, library of expert knowledge & task data -> PFM).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Griffin et al. (US 20250284884 A1).
Foundation models on edge devices with text based query processing.

Hong et al. (US 20240202592 A1)
Model splitting between edge device and server, based on computing resources of edge devices.

Johnsson et al. (US 20240135247 A1)
Model selection for edge computing based on edge device computing resources.

Ferreira et al. (US 20240070518 A1)
Model quantization for edge devices based on edge device computing resources.

Munoz et al. (US 20210110140 A1)
Model selection based on service request and constraints of computing device.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER G MARLOW whose telephone number is (571)272-4536. The examiner can normally be reached Monday - Thursday 10:00 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richmond Dorvil can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER G MARLOW/           Assistant Examiner, Art Unit 2658                                                                                                                                                                                             

/RICHEMOND DORVIL/           Supervisory Patent Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Apr 01, 2024
Application Filed
Feb 21, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/432,998
Patent 12548568
INTERFACING WITH APPLICATIONS VIA DYNAMICALLY UPDATING NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Feb 10, 2026
17/437,982
Patent 12511483
NAME ENTITY RECOGNITION WITH DEEP LEARNING
2y 5m to grant Granted Dec 30, 2025
17/599,974
Patent 12488802
METHOD AND APPARATUS FOR ERROR RECOVERY IN PREDICTIVE CODING IN MULTICHANNEL AUDIO FRAMES
2y 5m to grant Granted Dec 02, 2025
17/133,323
Patent 12482460
METHOD AND SYSTEM OF ENVIRONMENT-SENSITIVE WAKE-ON-VOICE INITIATION USING ULTRASOUND
2y 5m to grant Granted Nov 25, 2025
17/782,139
Patent 12462799
VOICE CONTROL METHOD, SERVER APPARATUS, AND UTTERANCE OBJECT
2y 5m to grant Granted Nov 04, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
77%
Grant Probability
97%
With Interview (+20.8%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 77 resolved cases by this examiner. Grant probability derived from career allow rate.