Last updated: May 29, 2026
Application No. 17/250,928
System and Method for Automated Precision Configuration for Deep Neural Networks

Non-Final OA §103
Filed
Mar 29, 2021
Priority
Nov 19, 2018 — provisional 62/769,403 +1 more
Examiner
TRIEU, EM N
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Deeplite Inc.
OA Round
4 (Non-Final)
Interview Optional

— +6.4% interview lift. Interview lift (+6.4%) is below the 15.0% threshold. A written response is recommended.
Based on 64 resolved cases, 2023–2026
Examiner Intelligence

TRIEU, EM N View full profile →
Grants 48% of resolved cases
Career Allowance Rate
31 granted / 64 resolved
-6.6% vs TC avg
Moderate +6% lift
Without
With
+6.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
13 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
4.7%
-35.3% vs TC avg
§103
87.6%
+47.6% vs TC avg
§102
2.6%
-37.4% vs TC avg
§112
2.9%
-37.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 64 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is in response to the claims filed on 08/25/2025. 
Claims 1-7, 9-17, 19, 22 are presented for examination. Claims  8, 18, 20, 21 were canceled. 
Response to Arguments
In reference to applicant’s argument regrading rejections under 35 U.S.C. § 103:
Applicant’s Argument:  
Applicant’s argument regarding the 103 rejection based on the claim amendment filed on 08/25/2025.
Examiner’s Response: 
The applicant’s argument includes the newly amended limitations filed on 08/25/2025. It has been fully considered but is moot in view of the new grounds of rejection presented below necessitated by the amendment.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
          4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 9-17, and 19, 22 are rejected under 35 U.S.C. 103 as being unpatentable over Ravi et al. (US Patent Application Publication 20200125956 A1), hereafter referred to as Ravi, in view of Chai et al. (US Patent Application Publication 20200134461 A1), hereafter referred to as Chai, further in view of Elthakeb et al. (ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks), hereafter referred to as Elthakeb and further in view of Masoud et al (US Patent Application Publication 20180144244 ).
Regarding claim 1, Ravi teaches a method of automated precision configuration for deep neural networks, the method comprising: obtaining an input model and one or more constraints associated with an application, the application being deployed on a target device, the application configured to utilize a deep neural network (Ravi, Paragraph 0030, “Generally, the present disclosure is directed to an application development platform and associated software development kits (“SDKs”) that provide comprehensive services for generation, deployment, and management of machine-learned models used by computer applications such as, for example, mobile applications executed by a mobile computing device”; Paragraph 0042, “In some implementations, an input for training the compact machine-learned model can be data, one or more input functions, and/or one or more parameters defining the compact machine-learned model for training the compact machine-learned model”; Paragraph 0048, “According to another aspect of the present disclosure, the application development platform can improve prediction accuracy of the compact machine-learned model by jointly training the compact machine-learned model with a trainer model, which may in some instances, be a pre-trained machine-learned model”; Paragraph 0051, “The trainer or teacher model can be any type of model, including, as examples, feed forward neural networks, recurrent neural networks (e.g., long short-term memory networks), quasi-RNNs, convolutional neural networks … The compact or student model can be any type of model but is typically lighter weight than the trainer model. Example student models include feed forward neural networks, recurrent neural networks (e.g., long short-term memory networks), quasi-RNNs, convolutional neural networks.” Examiner’s note, the one or more parameters defining the compact machine-learning model, which can be any type of model, are considered analogous to constraints associated with an application, the application configured to utilize a deep neural network. The trainer model is considered analogous to an input model. The computer application using the machine learning model which may be deployed on a mobile computing device is considered analogous to the application being deployed on a target device.), learning an output model as a smaller configuration of the input intermediate representation model, the learning based on the intermediate representation model, the one or more constraints, a training data set, and a validation data set, the one or more constraints comprising precisions supported by the target device), 
converting the input model into an intermediate representation model based on stored model frameworks and standardizing the intermediate representation model (Ravi, [Par.0099-0103], “As illustrated, the model manager 120 can provide a model compression service, a model conversion service, a model evaluation service and a model hosting service. The model compression service and/or model conversion service can enable the developer to compress and/or convert the models to optimize the models for use by a mobile device or in the mobile environment. For example, compressing the model can include performing quantization (e.g., scalar quantization, vector quantization weight sharing, product quantization, etc.), pruning (e.g., pruning by values, L1 regularization, etc.), low rank representation (e.g., circulatent matrix, Kronecker structures, SVD decompositions, etc.), distillation, and/or other compression techniques.[0100] Pruning reduces model size by removing weights or operations from the model that are least useful for predictions, including, for example, low-scoring weights. This can be very effective especially for on-device models involving sparse inputs. For example, certain on-device conversational models can be pruned further to achieve up-to 2X reduction in size with just 25% lower triggering rate while retaining 97% of the original prediction quality.[0101] Quantization techniques can improve inference speed by reducing the number of bits used for model weights and activations. For example, using 8-bit fixed point representation instead of floats can speed up the model inference, reduce power and reduce size by 4×. [0102] Thus, various compression tools can optionally be accessed and used to compress a learned or uploaded model. [0103] Converting the model can include converting the model from a standard version into a mobile-optimized version that is compatible with a lightweight machine learning library designed specifically for mobile and embedded devices, in one example, the platform can use a conversion tool known as TensorFlow Lite Optimizing Converter (“TOCO”) to convert a standard TensorFlow graph of a model into a TensorFlow Lite graph, where TensorFlow Lite is a lightweight machine learning library designed for mobile applications. Thus, various conversion tools can optionally be accessed and used to convert a learned or uploaded model into a mobile-optimized version.” Examiner’s note, the model is converted to the second (intermediate ) representation model based on the lightweight machine learning library (stored model frameworks) designed for particular application (standardizing).
 learning an output model as a smaller configuration of the input model, the learning based on the input model, the one or more constraints, a training data set, and a validation data set, the one or more constraints comprising precisions supported by the target device (Ravi, [Paragraph 0032, “Further, after training of the model or, in some implementations, as part of the training process itself, the application development platform can enable the developer to compress and/or convert the models to optimize the models for use by a resource-constrained device (e.g., mobile or embedded device) or in a resource-constrained environment. For example, compressing the model can include performing quantization (e.g., scalar quantization, vector quantization/weight sharing, product quantization, etc.)”; Paragraph 0042, “In some implementations, an input for training the compact machine-learned model can be data, one or more input functions, and/or one or more parameters defining the compact machine-learned model for training the compact machine-learned model”; Paragraph 0048, “According to another aspect of the present disclosure, the application development platform can improve prediction accuracy of the compact machine-learned model by jointly training the compact machine-learned model with a trainer model, which may in some instances, be a pre-trained machine-learned model”; Paragraph 0052, “In some implementations, the training pipeline can receive one or more inputs from users. For instance, the training pipeline can receive training data along with corresponding input functions for training and input functions for evaluation”; Paragraph 0174, “The cloud storage database 124 can store machine-leaned models (including both third party models and first party models) and training data. The training data can be uploaded by developers and can include validation data.” Here, the training of the compact model includes quantization for a resource constrained device, parameters, a trainer model, training data, and validation data. This is considered analogous to learning an output model as a smaller configuration of the input model, the learning based on the input model, the one or more constraints, a training data set, and a validation data set, the one or more constraints comprising precisions supported by the target device, And further in par .0144 teaches the learning is based on the parametrized representation of the hidden unit of the neural network (intermediate representation model) ;
and deploying the output model for use in the application (Ravi, [Paragraph 0179], “the console can provide interfaces that respectively upload training data (e.g., to cloud storage buckets), optionally augment the training data, train and/or compress the model, and then deploy the model and monitor performance.” Examiner’s note, deploying the trained and compressed model is considered analogous to deploying the optimal configuration on the target device or process for use in the application.)
                However, Ravi does not teach the one or more constraints comprising precisions supported by the target device and a cumulative bit-budget of the output model, the cumulative bit-budget being smaller than a cumulative bit-budget for the intermediate representation model, wherein the learning comprises repeatedly generating a set of precision configurations based on the precisions supported by the target device and the cumulative bit-budget; and wherein the learning comprises generating candidate output models with function-preserving transformations that represent the same function as the input model but use different parameterization;	
On the other hand, Chai teaches the one or more constraints comprising precisions supported by the target device and a cumulative bit-budget of the output model, the cumulative bit-budget being smaller than a cumulative bit-budget for the intermediate representation model (Chai, (Paragraph 0005, “Because the precision-optimized weights may have lower precision than the fixed-precision weights, the low-precision methods of this disclosure may enable lower memory and computation requirements for DNN processing. In some examples, the low-precision methods may affect microprocessor design in the sense that microprocessors with lower capabilities or computing resources may be used with the low-precision methods and in the sense that microprocessors may be designed for efficient use of precision-optimized weights”; Paragraph 0037, “How DNN 106 uses memory 102 is important because memory 102 stores the parameters of DNN 106 (e.g., high-precision weights 114, low-precision weights 116).  … Thus, the computation and run-time memory footprint required for these modern DNNs in inference mode may exceed the power and memory size budgets for a typical mobile device”; Paragraph 0043, “There may be several advantageous outcomes from the techniques set forth in this disclosure. First, the memory size for DNN 106 may be optimized”; Paragraph 0084, “This is a generic method to train DNNs that may also take a specification of a target field device into account (e.g., available memory) and may guide the learning for that specific training data and device … BitNet DNNs may be beneficial to practitioners who deploy DNN-based machine learning solutions to real world applications, including but not limited to mobile platforms and smartphones. The techniques of this disclosure may enable powerful DNNs for resource constrained environments. Coupling in the simplicity of the basic DNN processing to reduce memory sizes and finding new forms of parallelisms may be advantageous.” Examiner’s note, determining precisions for a neural network and optimizing the memory size to prevent exceeding a memory size budget for a device is considered as the one or more constraints comprising precisions supported by the target device and a cumulative bit-budget of the output model. Chai further teaches the cumulative bit-budget being smaller than a cumulative bit-budget for the intermediate representation model, as it can be seen at [Par. 0112-0113]), 
Ravi and Chai are considered to be analogous to each other as they are both in the field of machine learning. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to create the optimized model based on a precision constraint and a bit budget constraint. Someone would have been motivated to do this in order to create effective machine learning models in resource-constrained environments (Chai, Paragraph 0084, “This is a generic method to train DNNs that may also take a specification of a target field device into account (e.g., available memory) and may guide the learning for that specific training data and device … BitNet DNNs may be beneficial to practitioners who deploy DNN-based machine learning solutions to real world applications, including but not limited to mobile platforms and smartphones. The techniques of this disclosure may enable powerful DNNs for resource constrained environments”).
   	However, neither Ravi nor Chai teaches wherein the learning comprises repeatedly generating a set of precision configurations based on the precisions supported by the target device and the cumulative bit-budget; and wherein the learning comprises generating candidate output models with function-preserving transformations that represent the same function as the input model but use different parameterization;	
    On the other hand, Elthakeb teaches wherein the learning comprises repeatedly generating a set of precision configurations based on the precisions supported by the target device and the cumulative bit-budget (Page 2, “To solve this issue, different combinations of quantization bitwidths can be tested for each layer of a DNN”;
Page 2, “To determine the quantization level for each layer of a neural network, we train a Reinforcement Learning (RL) agent which explores the search space of quantization levels for each layer within a neural network. The agent utilizes a reward function which seeks to minimize the average bitwidth of the neural network layers while also minimizing the accuracy loss relative to full-precision accuracy.” Examiner’s note, in a potential combination of Ravi, Chai, and Elthakeb, testing multiple precisions for each layer of a neural network is considered analogous to repeatedly generating a set of precision configurations based on the precisions supported by the target device and the cumulative bit-budget.)
Ravi, Chai, and Elthakeb are considered to be analogous to each other as they are both in the field of machine learning.
 Therefore, it would have been obvious to one of ordinary skill in the art to repeatedly explore different precision configurations for a neural network. Someone would have been motivated to do this in order to efficiently quantize deep neural networks (Elthakeb, page 2, “By formulating quantization bitwidth as a hyperparameter in the optimization problem of selecting the bitwidth, we tackle this issue by leveraging a state-of-the-art policy gradient based Reinforcement Learning (RL) algorithm called Proximal Policy Optimization [10] (PPO), to efficiently explore a large design space of DNN Quantization”).
                However, Ravi, Chai, Elthakeb do not teach and wherein the learning comprises generating candidate output models with function-preserving transformations that represent the same function as the input model but use different parameterization of the deep neural network ;	
          On the other hand, Masoud teaches and wherein the learning comprises generating candidate output models with function-preserving transformations that represent the same function as the input model but use different parameterization of the deep neural network (Masoud, [Par. 0039-0043], “[0039], FIG. 3 illustrates a system diagram depicting training and deployment operations for a deep learning neural network according to an example described herein. [0041], The updated model produced from the training process 324, which includes a new version of the algorithm weights 326 (weights N+1), then can be broadcast back to the distributed sites, automatically or as the result of a future update deployment.
[0042] In some examples, an existing version of an algorithm may be updated with a new set of weights, and this new set of weights then may be distributed to the respective clients for further use and testing. In other examples, an entirely new algorithm (e.g., with new neural network processing nodes for different processing actions) can be generated and distributed to the respective clients. In either case, the use of updated algorithm weights or an updated model algorithm may be further evaluated, tested, and observed in a product release evaluation 332 performed by a product release server 330. The result of the product release evaluation 332 may be a released, verified version of the algorithm 334 that incorporates one or multiple version improvements from the training process 324.” Examiner’s note, the model or the algorithm weight is updated.);	
Ravi, Chai, Elthakeb and Masoud are considered to be analogous to each other as they are using the computer to generate the plurality of dataset.
	          Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to modify the obtaining an input model and one or more constraints associated with an application, the application being deployed on a target device, the application configured to utilize a deep neural network, converting the input model into an intermediate representation model based on stored model frameworks and standardizing the intermediate representation model, as taught by Ravi, to include the the learning comprises generating candidate output models with function-preserving transformations that represent the same function as the input model but use different parameterization of the deep neural network, as taught by Masoud . The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the deep learning model, (Masoud, [Par.0028], “n response to the user interaction processing changes 120 and the user interaction processing acceptance 122, a set of model feedback 128 may be generated. This model feedback 128 may be used to provide improvement for the deep learning model 108, which is used to generate a subsequent version(s) of the processing algorithm 116. Respective versions of the model feedback 128 can be provided from a large number of computing systems, from distributed use cases, to generate new training for multiple features and layers of the deep learning model 108. The model feedback 128, when collected in combination with other distributed feedback from other client computer systems and other executions of the processing algorithm 116, may be input into the training process 104 and the verification process 106 at a later time. The generation of the feedback for the deep learning model 108 thus may occur in a distributed and recurring fashion.”).  
Regarding claim 2, Ravi teaches wherein the smaller configuration is learned using a policy to generate models from the input model (Ravi, [Paragraph 0050], “Thus, in some implementations, the application development platform can include and implement a training pipeline to train a compact machine-learned model. The training pipeline can train the compact machine-learned model individually and/or jointly train the compact machine-learned model with a trainer model (e.g., pre-trained model).” Examiner’s note, a training pipeline to jointly train the compact machine learning model with a trainer model is considered analogous to a policy to generate models from the input model.)
Regarding claim 3, Ravi teaches  wherein the smaller configuration is learned using the policy to generate a quantized network, the method further comprising: fine tuning the quantized network with a knowledge distillation process (Ravi, [Paragraph 0032], “Further, after training of the model or, in some implementations, as part of the training process itself, the application development platform can enable the developer to compress and/or convert the models to optimize the models for use by a resource-constrained device (e.g., mobile or embedded device) or in a resource-constrained environment. For example, compressing the model can include performing quantization (e.g., scalar quantization, vector quantization/weight sharing, product quantization, etc.)” and Paragraph 0049, “The joint training enables the compact machine-learned model to learn from (and/or with) the trainer model, thereby improving the prediction accuracy of the compact machine-learned model. Thus, the joint training can follow a teacher-student joint training architecture” and Paragraph 0111, “Some of the joint training and distillation approaches provided by the present disclosure follow a teacher-student setup where the knowledge of the trainer model is utilized to learn an equivalent compact student model with minimal loss in accuracy.” Examiner’s note, the training of an equivalent, quantized compact student model from a trainer model is considered analogous to fine tuning the quantized network with a knowledge distillation process.)
evaluating the fine-tuned quantized network (Ravi, [Paragraph 0099-0105], “As illustrated, the model manager 120 can provide a model compression service, a model conversion service, a model evaluation service and a model hosting service. The model compression service and/or model conversion service can enable the developer to compress and/or convert the models to optimize the models for use by a mobile device or in the mobile environment. For example, compressing the model can include performing quantization (e.g., scalar quantization, vector quantization weight sharing, product quantization, etc.), pruning (e.g., pruning by values, L1 regularization, etc.), low rank representation (e.g., circulatent matrix, Kronecker structures, SVD decompositions, etc.), distillation, and/or other compression techniques. [0100] Pruning reduces model size by removing weights or operations from the model that are least useful for predictions, including, for example, low-scoring weights. This can be very effective especially for on-device models involving sparse inputs. For example, certain on-device conversational models can be pruned further to achieve up-to 2X reduction in size with just 25% lower triggering rate while retaining 97% of the original prediction quality.” Examiner’s note, based on the Broadest Reasonable Interpreted, the model is fine tuned to achieve the high prediction quality that is corresponding to the fine-tuned quantized network.. ) 
applying a reward function (Ravi, [Paragraph 0057], “The training pipeline can jointly train the compact machine-learned model with the pre-trained machine-learned model until a joint training loss function indicates that a difference between an output of the compact machine-learned model and an expected output is less than a threshold value.” Examiner’s note, under the broadest reasonable interpretation of the claim, using the joint training loss function on the compact machine-learning model is considered analogous to applying a reward function.) 
and iterating for at least one additional quantized network and selecting the smaller configuration (Ravi, [Paragraph 0112], “In some implementations, the trainer model can also be jointly trained with multiple student models of different sizes. So instead of providing a single compressed model, the machine learning manager 122 can generate multiple on-device models at different sizes and inference speeds and the developer can select the model that is best suited for their application needs (e.g., provides the most appropriate tradeoff between size and performance).” Examiner’s note,  the training of multiple student models and the selection of one that best fits the application is considered analogous to iterating for at least one additional quantized network and selecting the smaller configuration.)
Regarding claim 4, Rivas teaches wherein selecting the smaller configuration comprises selecting a precision configuration that achieves a best reward as determined by the reward function, for the one or  more constraints on the target device (Ravi, [Paragraph 0032], “Further, after training of the model or, in some implementations, as part of the training process itself, the application development platform can enable the developer to compress and/or convert the models to optimize the models for use by a resource-constrained device (e.g., mobile or embedded device) or in a resource-constrained environment. For example, compressing the model can include performing quantization (e.g., scalar quantization, vector quantization/weight sharing, product quantization, etc.)”; Paragraph 0057, “The training pipeline can jointly train the compact machine-learned model with the pre-trained machine-learned model until a joint training loss function indicates that a difference between an output of the compact machine-learned model and an expected output is less than a threshold value”; Paragraph 0112, “In some implementations, the trainer model can also be jointly trained with multiple student models of different sizes. So instead of providing a single compressed model, the machine learning manager 122 can generate multiple on-device models at different sizes and inference speeds and the developer can select the model that is best suited for their application needs (e.g., provides the most appropriate tradeoff between size and performance).” Examiner’s note, the loss calculated from the compact model is considered analogous to the reward. In at least one embodiment of Ravi, the student model out of multiple quantized student models with the lowest loss can be selected. Under the broadest reasonable interpretation of the claims, this is considered analogous to selecting a precision configuration that achieves the best reward as determined by the reward function, for the constraints on the target device.)
Regarding claim 5, Elthakeb as modified in view of Ravi teaches wherein learning the smaller configuration comprises exploiting low precision weights using reinforcement learning to learn the smaller configuration across the deep neural network (Elthakeb , [Page 2], “To determine the quantization level for each layer of a neural network, we train a Reinforcement Learning (RL) agent which explores the search space of quantization levels for each layer within a neural network. The agent utilizes a reward function which seeks to minimize the average bitwidth of the neural network layers while also minimizing the accuracy loss relative to full-precision accuracy.” Examiner’s note, using reinforcement learning to find the minimal effective bitwidth of the neural network layers is considered analogous to exploiting low precision weights using reinforcement learning to learn the smaller configuration across the deep neural network.)
Ravi, Chai, and Elthakeb are considered to be analogous to each other as they are all in the field of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art to use reinforcement learning to find the optimal low precision configuration of a neural network. Someone would have been motivated to do this in order to efficiently quantize deep neural networks (Elthakeb , [Page 2],, “By formulating quantization bitwidth as a hyperparameter in the optimization problem of selecting the bitwidth, we tackle this issue by leveraging a state-of-the-art policy gradient based Reinforcement Learning (RL) algorithm called Proximal Policy Optimization [10] (PPO), to efficiently explore a large design space of DNN Quantization”).
Regarding claim 6, Elthakeb as modified in view of Ravi teaches wherein each layer comprises a different precision (Elthakeb , [Page 2],, “To solve this issue, different combinations of quantization bitwidths can be tested for each layer of a DNN”;Page 4, “As shown in Figure 1, the agent steps through all layers one by one, determining the quantization level for the layer at each step.” Here, the quantization of each layer individually is considered analogous to each layer comprising a different precision.)
Ravi, Chai, and Elthakeb are considered to be analogous to each other as they are all in the field of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art to determine quantization for each layer in a neural network. Someone would have been motivated to do this in order to avoid over-quantizing individual layers (Elthakeb, [page 2], “However, with each layer in a neural network playing different roles and having unique properties in terms of weight distribution, over-quantizing an important layer can result in unnecessary pressure on subsequent layers to maintain accuracy. Such pressure leads to longer re-training and fine tuning times, as well as potentially sub-optimal accuracy due to the sensitivity of layers not being accounted for. To solve this issue, different combinations of quantization bitwidths can be tested for each layer of a DNN”).
Regarding claim 7, Ravi teaches: wherein the one or more constraints comprise at least one of: accuracy, power, cost, supported precision, speed (Ravi, [Paragraph 0044], “Inference speed can be a flexible parameter in the input space (e.g., definable by the developer). The inference speed describes a computational efficiency of the compact machine-learned model to run on a wide range of computing devices (e.g., mobile devices, devices able to be worn, embedded devices, etc.).” Examiner’s note, inference speed is considered analogous to speed.).
Regarding claim 9, Ravi teaches wherein the application is an artificial intelligence-based application (Ravi, Paragraph 0030, “Generally, the present disclosure is directed to an application development platform and associated software development kits (“SDKs”) that provide comprehensive services for generation, deployment, and management of machine-learned models used by computer applications such as, for example, mobile applications executed by a mobile computing device.” Examiner’s note, the computer applications using machine-learned models are considered analogous to the application being an artificial intelligence-based application.)
Regarding claim 10, is rejected for the same reason as the claim 1, since these claims recite the same limitation.
Additionally, Ravi further teaches a non-transitory computer readable medium comprising computer executable instructions for automated design space exploration for deep neural networks, the computer executable instructions comprising instructions (Ravi, Paragraph 0084, “The memory 114 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. The memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations.”).
Regarding claim 11, is rejected for the same reason as the claim 1, since these claims recite the same limitation.
Additionally, Ravi further teaches a deep neural network optimization engine configured to perform automated design space exploration for deep neural networks, the engine comprising a processor and memory, the memory comprising computer executable instructions (Ravi, Paragraph 0084, “The application development computing system 102 can include one or more processors 112 and a memory 114. The one or more processors 112 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. The memory 114 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. The memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations.”).
Regarding claim 12, is rejected for the same reason as the claim 2, since these claims recite the same limitation.
Regarding claim 13, is rejected for the same reason as the claim 3, since these claims recite the same limitation.
Regarding claim 14, is rejected for the same reason as the claim 4, since these claims recite the same limitation.
Regarding claim 15, is rejected for the same reason as the claim 5, since these claims recite the same limitation.
Regarding claim 16, is rejected for the same reason as the claim 6, since these claims recite the same limitation.
Regarding claim 17, is rejected for the same reason as the claim 7, since these claims recite the same limitation.
Regarding claim 19, is rejected for the same reason as the claim 9, since these claims recite the same limitation.
Regarding claim 22, Ravi teaches the deep neural network optimization engine of claim 11, wherein: the input model is converted into an intermediate representation model based on stored model frameworks and standardizing the intermediate representation model, and the intermediate representation model is used as the input model for the learning (Ravi, [Par.0042, 0099-0103], “As illustrated, the model manager 120 can provide a model compression service, a model conversion service, a model evaluation service and a model hosting service. The model compression service and/or model conversion service can enable the developer to compress and/or convert the models to optimize the models for use by a mobile device or in the mobile environment. For example, compressing the model can include performing quantization (e.g., scalar quantization, vector quantization weight sharing, product quantization, etc.), pruning (e.g., pruning by values, L1 regularization, etc.), low rank representation (e.g., circulatent matrix, Kronecker structures, SVD decompositions, etc.), distillation, and/or other compression techniques.[0100] Pruning reduces model size by removing weights or operations from the model that are least useful for predictions, including, for example, low-scoring weights. This can be very effective especially for on-device models involving sparse inputs. For example, certain on-device conversational models can be pruned further to achieve up-to 2X reduction in size with just 25% lower triggering rate while retaining 97% of the original prediction quality.[0101] Quantization techniques can improve inference speed by reducing the number of bits used for model weights and activations. For example, using 8-bit fixed point representation instead of floats can speed up the model inference, reduce power and reduce size by 4×. [0102] Thus, various compression tools can optionally be accessed and used to compress a learned or uploaded model. [0103] Converting the model can include converting the model from a standard version into a mobile-optimized version that is compatible with a lightweight machine learning library designed specifically for mobile and embedded devices, in one example, the platform can use a conversion tool known as TensorFlow Lite Optimizing Converter (“TOCO”) to convert a standard TensorFlow graph of a model into a TensorFlow Lite graph, where TensorFlow Lite is a lightweight machine learning library designed for mobile applications. Thus, various conversion tools can optionally be accessed and used to convert a learned or uploaded model into a mobile-optimized version.” Examiner’s note, the model is converted to the second (intermediate ) representation model based on the lightweight machine learning library (stored model frameworks) designed for particular application (standardizing).).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on Mon-Fri from 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-27325-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.T./Examiner, Art Unit 2128 

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Show 2 earlier events
Jun 11, 2024
Response Filed
Oct 15, 2024
Final Rejection mailed — §103
Feb 07, 2025
Request for Continued Examination
Feb 11, 2025
Response after Non-Final Action
Jul 02, 2025
Non-Final Rejection mailed — §103
Aug 25, 2025
Response Filed
Oct 01, 2025
Final Rejection mailed — §103
Dec 01, 2025
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/083,839
Patent 12638549
METHOD OF TRAINING A MACHINE LEARNING SYSTEM FOR AN OBJECT RECOGNITION DEVICE
5y 7m to grant Granted May 26, 2026
17/091,837
Patent 12572779
INTERFACE NEURAL NETWORK
5y 4m to grant Granted Mar 10, 2026
17/089,645
Patent 12541705
SYSTEM AND METHOD FOR FACILITATING A MACHINE LEARNING MODEL REBUILD
5y 3m to grant Granted Feb 03, 2026
17/124,106
Patent 12511531
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
5y 0m to grant Granted Dec 30, 2025
17/366,315
Patent 12493804
METHOD OF BUILDING AND OPERATING DECODING STATUS AND PREDICTION SYSTEM
4y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
48%
Grant Probability
55%
With Interview (+6.4%)
4y 5m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 64 resolved cases by this examiner. Grant probability derived from career allowance rate.