Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. DETAILED ACTION Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1- 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Regarding independent claims the limitations determine a configuration, structure weights based on the configuration , generating an endpoint , as drafted, recites functions that, under its broadest reasonable interpretation, covers a function that could reasonably be performed in the mind, including with the aid of pen and paper, but for the recitation of generic computer components. That is, the limitations as cited above as drafted, are functions that, under its broadest reasonable interpretation, recite the abstract idea of a mental process. Thus, these limitation falls within the “Mental Processes” grouping of abstract ideas under Prong 1. Under Prong 2, this judicial exception is not integrated into a practical application. The claim recites the following additional limitations: container image, container registry, medium, processor . The additional elements are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using generic computer, and/or mere computer components, MPEP 2106.05(f), and steps of receiving, registering , and deploying do nothing more than add insignificant extra solution activity to the judicial exception of merely gathering data. Accordingly, the additional elements do not integrate the recited judicial exception into a practical application and the claim is therefore directed to the judicial exception. See MPEP 2106.05(g) (Ex. v. Consulting and updating an activity log, Ultramercial , 772 F.3d at 715, 112 USPQ2d at 1754). Under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of container image, container registry, medium, processor , amount to no more than mere instructions, or generic computer/computer components to carry out the exception. Furthermore, the limitations directed to receiving and registering the courts have identified mere data gathering is well-understood, routine and conventional activity. See MPEP 2106.05(d) (Ex. iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;). The recitation of generic computer instruction and computer components to apply the judicial exception, and mere data gathering do not amount to significantly more, thus, cannot provide an inventive concept. Accordingly, the claims are not patent eligible under 35 USC 101. Regarding claim 2 -8, 10-16, 18-20 the limitations of splitting values, selecting configuration, quantizing, determining, computing an expected price, comparing, selecting comprising a determination, are functions that can be reasonably performed in the human mind, thus, additional mental process defined in the claims. The claim does not include any additional element, thus, no limitation that needs to be analyzed under prong 2 for practical application, or under step 2B for significantly more. Claim Rejections - 35 USC §103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim /s 1, 4, 5, 9, 12, 13, 17, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over He (Pub. No. US 2025/0021837) in view of Singh (Pub. No. US 2024/0095077) in view of Mariano (Pub. No. US 2023/0229938) . Claim 1 , 9, 17 He teaches “a method of building a container for a client to run a trained large language model (LLM) comprising: receiving the trained LLM and a desired configuration ([0068] At 420, the inference platform can communicate with the training platform to perform the onboarding process. The inference platform can perform a one-time registration with a controller at the training platform during an initialization process. The inference platform can monitor for an update to the controller, parse the model information, and download the model at the inference platform. [0056] As noted above, a model can be trained and validated at a training platform using training datasets. Further, the model can be migrated from the training platform to an inference platform via an onboarding process. Once onboarded to the inference platform, the model can be processed to derive configuration values for each setting of a set of settings (i.e. desired configuration) of the model that optimizes a hardware performance characteristics for the mode at the inference platform. ) , the trained LLM including a set of weights ([0075 , 0103 ] FIG. 5 is a flow diagram for an example model optimization process. The model optimization process can derive a configuration option (or configuration value) for each setting ( Examiner notes as evidence by Singh , setting/parameters of He are weights [0626] “ In at least one embodiment, model training 3414 may include retraining or updating an initial model 3804 (e.g., a pre-trained model) using new training data (e.g., new input data, such as customer dataset 3806, and/or new ground truth data associated with input data). In at least one embodiment, to retrain, or update, initial model 3804, output or loss layer(s) of initial model 3804 may be reset, or deleted, and/or replaced with an updated or new output or loss layer(s). In at least one embodiment, initial model 3804 may have previously fine tuned parameters (e.g., weights and/or biases) that remain from prior training, so training or retraining 3414 may not take as long or require as much processing as training a model from scratch of the model that optimizes a hardware performance of the model at the inference platform. ” Therefore, i t would be obvious to one of ordinarily skilled in the art, setting/parameters of He are weights as evidenced by Singh for the purposes of design choice ) ; selecting a hardware configuration based on the desired configuration ([0062] The inference platform can further perform model optimization by identifying a combination of model parameters that optimize hardware performance characteristics (e.g., a data processing latency, throughput (i.e. hardware configuration) ) for the model. For instance, a searching-based heuristic process can be implemented to derive a combination of configuration options for settings of the model that optimize the hardware performance characteristics for the model. Optimizing the hardware performance characteristics for a model can increase model efficiency in processing volumes of data (e.g., access requests) obtained at the inference platform 304.) ; structuring the set of weights of the trained LLM based on the hardware configuration ([0035] The present embodiments relate to onboarding a model from a training platform to an inference platform and selecting parameters of the model to optimize performance of the model. [0076] At 510, a set of settings, for optimization can be identified. For example, for a model, each of the set of settings can include settings of a model that, if modified, can impact hardware performance characteristics of the model. Example settings can include the number of instances running in each GPU or CPU, the batch size for each machine learning model instance, the layers of the model running in a different GPU or CPU, the inference accuracy for different operators, etc. As an illustrative example, a setting can include a maximum number of cached engines used for the model that impact a throughput in processing access requests at the inference platform. Increasing the number of cached engines can enable more throughputs at the same time, but may require more computation resources (e.g., more GPU or CPU cores and data bandwidth) which may influence other services or go beyond the computational resource limitations. The settings for a model can differ based on a type of model. Further, each setting can include multiple configuration options (e.g., multiple cached engine values for a setting comprising a maximum number of cached engines).) ”. However, He may not explicitly teach the remaining limitations. Mariano teaches “ generating a container image reflecting the hardware configuration ([0025] In some embodiments, model containers 108 a -108 n each comprise a software image (i.e., software code files, environment variables, libraries, other dependencies, and the like) and a data set (i.e., data files and/or a local database). [0029] Server computing device 106 deploys (step 202) a software container (e.g., container 108 a ) that includes executable code for a machine learning (ML) model (i.e. optimized model as taught by Hu) (e.g., classification model 110 a ), inputs to the ML model, and outputs to the ML model…. In some embodiments, server computing device 106 generates the software container using an image upon receiving instructions from a remote computing device.) ; registering the container image to a container registry ([0046] Docker registry 630 can use a docker image or read-only template that contains instructions on creating a model container that can run the docker platform. In some embodiments, Artifactory™ is used to store one or more docker images as binary artifacts.) ; generating the container from the container image to deploy the trained LLM in the container ([0029] In some embodiments, server computing device 106 generates the software container using an image upon receiving instructions from a remote computing device.) ; generating an application programming interface (API) endpoint for the container; and deploying the trained LLM in the API endpoint using the container ([0039] As can be appreciated, model container 108 a can execute ML model 110 a immediately upon receiving the request from client application 103 and provide the output to client application 103 when the model execution is complete. In some embodiments, model container 108 a can execute ML model 110 a asynchronously from receipt of the request from client application 103.) , the trained LLM accessible through API calls ([0030] Once the model container 108 a is deployed, server computing device 106 generates (step 204) a protocol buffer profile 109 a from the model container 108 a image. As described above, protocol buffer profile 109 a defines one or more Remote Procedure Call (RPC) functions for interactions between ML classification model 110 a and a consuming client application (e.g., application 103) using the RPC server module 111 a and RPC client module 109 a . For example, protocol buffer profile 109 a can map an RPC request function to an input API call for ML classification model 110 a that includes one or more input parameters. Protocol buffer profile 109 a can map an RPC response function to an API response call returned from ML classification model 110 a that includes output parameters from execution of the model. In some embodiments, protocol buffer profile 109 a is stored as a data set or file in model container 108 a . [0031] RPC client module 103a of client application 103 establishes a connection to server computing device 106 and client application 103 calls the RPC client module 103a using an RPC request function (as defined in protocol buffer profile 109a) to submit a request to RPC server module 111a to access ML classification model 110a.) ” . It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Mariano with the teachings of He, Singh in order to provide deployment options of an optimized model. The motivation for applying Mariano teaching with He, Singh teaching is to provide a system that allows for accessibility of ML models post deployment . He, Singh , Mariano are analogous art directed towards software deployment . Together He, Singh , Mariano teaches every limitation of the claimed invention. Since the teachings were analogous art known at the filing time of invention, one of ordinary skill could have applied the teachings of Mariano with the teachings of Singh , Mariano by known methods and gained expected results. Claim 4, 12, 20 the combination teaches the claim, wherein He teaches “the method of claim 1, further comprising selecting a batching configuration for the trained LLM ([0024] Examples of settings for a model can include a maximum number of cached engines for the model, a minimum segment size for the model, a batch size during inference, a number of model instances for each device executing on the inference platform (e.g., a GPU, CPU), a number of machine learning operators and/or application layers running on different computing devices, etc.) ” . Rationale to claim 1 is applied here. Claim 5, 13 the combination teaches the claim, wherein He teaches “the method of claim 1, further comprising quantizing the trained LLM ([0024] The term “setting” generally refers to a configurable setting for the machine learning model. A given setting can include one or more configuration options (or “configuration values”) that can modify how a model is performed on a platform and that can affect a hardware performance of the model. Further, a combination of configuration options for a set of settings for a model can be identified (e.g., via an optimization process) that optimize the hardware performance of the model. Examples of settings for a model can include a maximum number of cached engines for the model, a minimum segment size for the model, a batch size during inference, a number of model instances for each device executing on the inference platform (e.g., a GPU, CPU), a number of machine learning operators and/or application layers running on different computing devices, etc.) ” . Claim/s 2, 10, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over He, Singh, Mariano in further view of Kalkunte (Pub. No. US 2025/0141470). Claim 2 , 10, 18 the combination may not explicitly teach the limitation. Kalkunte teaches “t he method of claim 1, wherein structuring the set of weights of the trained LLM comprises splitting the weights using tensor parallelism ([0115] Each of the plurality of sub-feature maps 1125A-1125I may be processed in parallel in different sparse tensor compute units (e.g., the sparse tensor compute unit 1000).) ” . It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Kalkunte with the teachings of He, Singh , Mariano in order to provide evidence parameters of Bhatnager are processed in parallel . The motivation for applying Kalkunte teaching with He, Singh , Mariano teaching is to provide a system that allows for design choice . He, Singh , Mariano, Kalkunte are analogous art directed towards distributed computing . Together He, Singh , Mariano, Kalkunte teach every limitation of the claimed invention. Since the teachings were analogous art known at the filing time of invention, one of ordinary skill could have applied the teachings of Kalkunte with the teachings of He, Singh , Mariano by known methods and gained expected results. Claim /s 3, 11, 19 is/ are rejected under 35 U.S.C. 103 as being unpatentable over He, Singh, Mariano in further view of AGRAWAL (Pub. No. US 2025/0085981 ). Claim 3 , 11, 19 the combination may not explicitly teach the limitation. AGRAWAL teaches “t he method of claim 1, wherein selecting the hardware configuration comprises selecting the hardware configuration based on a queries per second (QPS) of the trained LLM ([0028] Applications Development/Architecture team 210 define dimensioning for an application and provide the dimensioning information to a Dimensioning Management Platform 220. Dimensioning Management Platform 220 captures basic dimensioning of all applications (POD configuration, capacity of one POD in terms of QPS, supported user count etc.). Applications Development/Architecture team 210 provide the Dimensioning Management Platform 220 information such as Application Name 221, Number of CPUs 222, an Amount of Memory 223, Supported Queries Per Second (QPS) 224, a Maximum Simultaneous Users 225, a Maximum Disconnections 226, and additional dimensioning information 227. Basic dimensioning parameters and attributes described above are provided as examples, but additional or other information is possible as such information is dependent upon the application type.) ” . It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of AGRAWAL with the teachings of He, Singh , Mariano in order to provide evidence parameters of He include QPS . The motivation for applying AGRAWAL teaching with He, Singh , Mariano teaching is to provide a system that allows for design choice . He, Singh , Mariano , AGRAWAL are analogous art directed towards distributed computing . Together He, Singh , Mariano , AGRAWAL teach every limitation of the claimed invention. Since the teachings were analogous art known at the filing time of invention, one of ordinary skill could have applied the teachings of AGRAWAL with the teachings of He, Singh , Mariano by known methods and gained expected results. Claim/ s 6, 7, 14 , 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over He, Singh, Mariano , in further view of Bhatnager (Pub. No. US 2025/0030759 ) and VENKATARAGHAVAN ( Pub. No. US 2024/0244104 ) . Claim 6, 14 the combination may not explicitly teach the claim. Bhatnager teaches “t he method of claim 1, wherein selecting the hardware configuration comprises: determining a particular hardware configuration in a hardware configuration table … ; computing an expected price per hour of the determined particular hardware configuration; and comparing the expected price per hour to a cost threshold ([0248] 4. The method of any of the preceding statements wherein the one or more application deployment preferences specify at least one of a threshold for a resource load percentage associated with the deployment of the application, a threshold for a total number of concurrent users of the application, or a threshold for a total resource cost associated with the deployment of the application. [0287] a cost of resources associated with the application (e.g., a dollar value of the resources used to deploy the application), a user access level associated with the application (e.g., restriction levels for users of the application), and/or any other suitable information and/or configuration parameter associated with the deployment of the application. Deployment data 610 may be generated by one or more compute resources within compute environments 606 and/or by application management system 602 itself before, during, or after a deployment of an application (e.g., application 616).)” . It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Bhatnager with the teachings of He, Singh , Mariano in order to provide cost per hour calculations . The motivation for applying Bhatnager teaching with He, Singh , Mariano teaching is to provide a system that allows for design choice . He, Singh , Mariano, Bhatnager are analogous art directed towards distributed computing . Together He, Singh , Mariano, Bhatnager teach every limitation of the claimed invention. Since the teachings were analogous art known at the filing time of invention, one of ordinary skill could have applied the teachings of Bhatnager with the teachings of He, Singh , Mariano by known methods and gained expected results. However, the combination may not explicitly teach the remaining limitation. VENKATARAGHAVAN teaches “a hardware configuration table that has a highest throughput for a model type of the trained LLM ([0109] Furthermore, in this example embodiment, service metrics in service metric profile 1 may define a first preferred service performance requirement (e.g., service performance requirement for achieving low latency performance, etc.) for the first service and service metrics in service metric profile 2 may define a second preferred service performance requirement (e.g., service performance requirement for achieving low page load performance, etc.) for the first service. Alternatively, the service metrics in service metric profile 1 may define the most preferable service performance requirement (e.g., service performance requirement for achieving lowest possible latency performance, etc.) and the service metrics in service metric profile 2 may define the second most preferable service performance requirement (e.g., service performance requirement for achieving the second lowest possible latency performance, etc.). [Fig. 7] Type with highest throughput) ”. It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of VENKATARAGHAVAN with the teachings of He, Singh , Mariano, Bhatnager in order to provide evidence parameters of He include throughput . The motivation for applying VENKATARAGHAVAN teaching with He, Singh , Mariano, Bhatnager teaching is to provide a system that allows for design choice . He, Singh , Mariano, Bhatnager , VENKATARAGHAVAN are analogous art directed towards distributed computing . Together He, Singh , Mariano, Bhatnager , VENKATARAGHAVAN teach every limitation of the claimed invention. Since the teachings were analogous art known at the filing time of invention, one of ordinary skill could have applied the teachings of VENKATARAGHAVAN with the teachings of He, Singh , Mariano, Bhatnager by known methods and gained expected results. Claim 7 , 15 the combination may not explicitly teach the claim. Bhatnager teaches “t he method of claim 1, wherein selecting the hardware configuration comprises determining a particular hardware configuration in a hardware configuration table … of the trained LLM and has an expected price per hour that does not exceed a cost threshold ([0248] 4. The method of any of the preceding statements wherein the one or more application deployment preferences specify at least one of a threshold for a resource load percentage associated with the deployment of the application, a threshold for a total number of concurrent users of the application, or a threshold for a total resource cost associated with the deployment of the application. [0287] a cost of resources associated with the application (e.g., a dollar value of the resources used to deploy the application), a user access level associated with the application (e.g., restriction levels for users of the application), and/or any other suitable information and/or configuration parameter associated with the deployment of the application. Deployment data 610 may be generated by one or more compute resources within compute environments 606 and/or by application management system 602 itself before, during, or after a deployment of an application (e.g., application 616).)”. Rationale to claim 6 is applied here. However, the combination may not explicitly teach the remaining limitations. VENKATARAGHAVAN teaches a hardware configuration table that has a lowest latency for a model type ([0109] Furthermore, in this example embodiment, service metrics in service metric profile 1 may define a first preferred service performance requirement (e.g., service performance requirement for achieving low latency performance, etc.) for the first service and service metrics in service metric profile 2 may define a second preferred service performance requirement (e.g., service performance requirement for achieving low page load performance, etc.) for the first service. Alternatively, the service metrics in service metric profile 1 may define the most preferable service performance requirement (e.g., service performance requirement for achieving lowest possible latency performance, etc.) and the service metrics in service metric profile 2 may define the second most preferable service performance requirement (e.g., service performance requirement for achieving the second lowest possible latency performance, etc.). [Fig. 7] Type with highest throughput) ”. Rationale to claim 6 is applied here. Claim/s 8, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over He, Singh, Mariano, Bhatnager , VENKATARAGHAVAN in view of Soceanu (Pub. No. US 2023/0421350 ) . Claim 8 , 16 the combination may not explicitly teach the remaining limitations. Soceanu teaches “t he method of claim 7, wherein determining the hardware configuration in the hardware configuration table that has the lowest latency for the model type of the trained LLM comprises simulating an expected latency of at least one hardware configuration ([0068] The optimizer's simulator estimates the time and memory usage for a given configuration option on a single CPU thread. For that, it relies on pre-benchmarked measures. of the different FHE operations. To assess the accuracy of these estimations, we performed the following experiment on HE-friendly AlexNet using encrypted model. We chose the four configuration options that achieved the lowest estimated latency when using local search (Section V) and compared the inference time and the encryption time of the input and the model between the simulation output and an actual run over encrypted data. Table V summarizes the results. We observe that the simulator provides relatively accurate time estimations for all four configurations. The average estimated time deviation is −15.8%, −11.9%, and −7.2% for inference, model encryption, and batch input encryption, respectively. We note that the simulated storage matches the measured storage for all configurations.) ” . It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Soceanu with the teachings of He, Singh , Mariano, Bhatnager , VENKATARAGHAVAN in order to provide evidence parameters of He include simulation testing . The motivation for applying Soceanu teaching with He, Singh , Mariano, Bhatnager , VENKATARAGHAVAN teaching is to provide a system that allows for design choice . He, Singh , Mariano, Bhatnager , VENKATARAGHAVAN , Soceanu are analogous art directed towards distributed computing . Together He, Singh , Mariano, Bhatnager , VENKATARAGHAVAN , Soceanu teach every limitation of the claimed invention. Since the teachings were analogous art known at the filing time of invention, one of ordinary skill could have applied the teachings of Soceanu with the teachings of He, Singh , Mariano, Bhatnager , VENKATARAGHAVAN by known methods and gained expected results. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT WYNUEL S AQUINO whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-7478 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT 9AM-5PM EST M-F . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Lewis Bullock can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT 571-272-3759 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /WYNUEL S AQUINO/ Primary Examiner, Art Unit 2199